ccextractor/docs/OCR.txt


Overview
========
OCR (Optical Character Recognisation ) is an technique used to 
extract text from images. In the World of Subtile, subtitle stored 
in bitmap format are common and even neccassary. for converting subtile 
in bitmap format to subtilte in text format ocr is used.

Dependecy
=========
Tesseract (OCR library by google)
Leptonica (image processing library)

How to compile ccextractor  on linux with OCR
=============================================

Download and Install Leptonnica.
-------------------------------
This package is available, you need liblept-devel library.

If Leptonica isn't available for your distribution, or you want to use a newer version
 than they offer, you can compile your own.

you can download lib leptonica from  http://www.leptonica.com/download.html

Download and Install Tesseract.
-------------------------------
Tesseract is available directly from many Linux distributions. The package is generally
 called 'tesseract' or 'tesseract-ocr' - search your distribution's repositories to
 find it. Packages are also generally available for language training data (search the
 repositories,) but if not you will need to download the appropriate training data,
 unpack it, and copy the .traineddata file into the 'tessdata' directory, probably
 /usr/share/tesseract-ocr/tessdata or /usr/share/tessdata.

If Tesseract isn't available for your distribution, or you want to use a newer version
 than they offer, you can compile your own.

If you compile Tesseract then following command in its source code are enough
./autogen.sh
./configure
make
sudo make install
sudo ldconfig

 Note: 
1) CCExtractor is tested with Tesseract 3.02.02 version. 

you can download tesseract from https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&usp=sharing


Compile CCextractor passing flags like following
-------------------------------------------------
make ENABLE_OCR=yes


How to compile ccextractor  on Windows with OCR
=============================================

Download prebuild library of leptonica from following link
http://www.leptonica.com/source/leptonica-1.68-win32-lib-include-dirs.zip

Download prebuild library of tesseract from following tesseract official link
https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-3.02.02-win32-lib-include-dirs.zip

put the path of libs/include of leptonica and tesseract in library paths.
step 1) In visual studio 2013 right click <Project> and select property.
step 2) Select Configuration properties in left panel(column) of property.
step 3) Select VC++ Directory.
step 4) In the right pane, in the right-hand column of the VC++ Directory property,
	open the drop-down menu and choose Edit.
Step 5) Add path of Directory where you have kept uncompressed library of leptonica
	and tesseract.


Set preprocessor flag ENABLE_OCR=1
Step 1)In visual studio 2013 right click <Project> and select property.
Step 2)In the left panel, select Configuration Properties, C/C++, Preprocessor.
Step 3)In the right panel, in the right-hand column of the Preprocessor Definitions property, open the drop-down menu and choose Edit.
Step 4)In the Preprocessor Definitions dialog box, add ENABLE_OCR=1. Choose OK to save your changes.

Add library in linker
step 1)Open property of project
Step 2)Select Configuration properties
Step 3)Select Linker in left panel(column)
Step 4)Select Input
Step 5)Select Additional dependencies in right panel
Step 6)Add libtesseract302.lib in new line
Step 7)Add liblept168.lib in new line

Download language data from following link
https://code.google.com/p/tesseract-ocr/downloads/list
after downloading the tesseract-ocr-3.02.eng.tar extract the tar file and put
tessdata folder where you have kept ccextractor executable

Copy the tesseract and leptonica dll in the folder of executable or in system32.
optical character recognisation 2014-06-07 19:45:32 +00:00
			`Overview`
			`========`
			`OCR (Optical Character Recognisation ) is an technique used to`
			`extract text from images. In the World of Subtile, subtitle stored`
			`in bitmap format are common and even neccassary. for converting subtile`
			`in bitmap format to subtilte in text format ocr is used.`

			`Dependecy`
			`=========`
			`Tesseract (OCR library by google)`
			`Leptonica (image processing library)`

			`How to compile ccextractor on linux with OCR`
			`=============================================`

			`Download and Install Leptonnica.`
			`-------------------------------`
			`This package is available, you need liblept-devel library.`

			`If Leptonica isn't available for your distribution, or you want to use a newer version`
			`than they offer, you can compile your own.`

			`you can download lib leptonica from http://www.leptonica.com/download.html`

			`Download and Install Tesseract.`
			`-------------------------------`
			`Tesseract is available directly from many Linux distributions. The package is generally`
			`called 'tesseract' or 'tesseract-ocr' - search your distribution's repositories to`
			`find it. Packages are also generally available for language training data (search the`
			`repositories,) but if not you will need to download the appropriate training data,`
			`unpack it, and copy the .traineddata file into the 'tessdata' directory, probably`
			`/usr/share/tesseract-ocr/tessdata or /usr/share/tessdata.`

			`If Tesseract isn't available for your distribution, or you want to use a newer version`
			`than they offer, you can compile your own.`

			`If you compile Tesseract then following command in its source code are enough`
			`./autogen.sh`
			`./configure`
			`make`
			`sudo make install`
			`sudo ldconfig`

			`Note:`
			`1) CCExtractor is tested with Tesseract 3.02.02 version.`

			`you can download tesseract from https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&usp=sharing`



			`Compile CCextractor passing flags like following`
			`-------------------------------------------------`
			`make ENABLE_OCR=yes`

changes to run ocr on windows 2014-06-11 12:02:40 +00:00
			`How to compile ccextractor on Windows with OCR`
			`=============================================`

			`Download prebuild library of leptonica from following link`
			`http://www.leptonica.com/source/leptonica-1.68-win32-lib-include-dirs.zip`

			`Download prebuild library of tesseract from following tesseract official link`
			`https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-3.02.02-win32-lib-include-dirs.zip`

			`put the path of libs/include of leptonica and tesseract in library paths.`
			`step 1) In visual studio 2013 right click <Project> and select property.`
			`step 2) Select Configuration properties in left panel(column) of property.`
			`step 3) Select VC++ Directory.`
			`step 4) In the right pane, in the right-hand column of the VC++ Directory property,`
			`open the drop-down menu and choose Edit.`
			`Step 5) Add path of Directory where you have kept uncompressed library of leptonica`
			`and tesseract.`


			`Set preprocessor flag ENABLE_OCR=1`
			`Step 1)In visual studio 2013 right click <Project> and select property.`
			`Step 2)In the left panel, select Configuration Properties, C/C++, Preprocessor.`
			`Step 3)In the right panel, in the right-hand column of the Preprocessor Definitions property, open the drop-down menu and choose Edit.`
			`Step 4)In the Preprocessor Definitions dialog box, add ENABLE_OCR=1. Choose OK to save your changes.`

			`Add library in linker`
			`step 1)Open property of project`
			`Step 2)Select Configuration properties`
			`Step 3)Select Linker in left panel(column)`
			`Step 4)Select Input`
			`Step 5)Select Additional dependencies in right panel`
			`Step 6)Add libtesseract302.lib in new line`
			`Step 7)Add liblept168.lib in new line`

			`Download language data from following link`
			`https://code.google.com/p/tesseract-ocr/downloads/list`
			`after downloading the tesseract-ocr-3.02.eng.tar extract the tar file and put`
			`tessdata folder where you have kept ccextractor executable`

			`Copy the tesseract and leptonica dll in the folder of executable or in system32.`