mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2024-12-26 12:52:15 +00:00
651dc67a5d
Signed-off-by: Anshul Maheshwari <er.anshul.maheshwari@gmail.com>
95 lines
3.8 KiB
Plaintext
95 lines
3.8 KiB
Plaintext
|
|
Overview
|
|
========
|
|
OCR (Optical Character Recognisation ) is an technique used to
|
|
extract text from images. In the World of Subtile, subtitle stored
|
|
in bitmap format are common and even neccassary. for converting subtile
|
|
in bitmap format to subtilte in text format ocr is used.
|
|
|
|
Dependency
|
|
==========
|
|
Tesseract (OCR library by google)
|
|
Leptonica (image processing library)
|
|
|
|
How to compile ccextractor on linux with OCR
|
|
=============================================
|
|
|
|
Download and Install Leptonnica.
|
|
-------------------------------
|
|
This package is available, you need liblept-devel library.
|
|
|
|
If Leptonica isn't available for your distribution, or you want to use a newer version
|
|
than they offer, you can compile your own.
|
|
|
|
you can download lib leptonica from http://www.leptonica.com/download.html
|
|
|
|
Download and Install Tesseract.
|
|
-------------------------------
|
|
Tesseract is available directly from many Linux distributions. The package is generally
|
|
called 'tesseract' or 'tesseract-ocr' - search your distribution's repositories to
|
|
find it. Packages are also generally available for language training data (search the
|
|
repositories,) but if not you will need to download the appropriate training data,
|
|
unpack it, and copy the .traineddata file into the 'tessdata' directory, probably
|
|
/usr/share/tesseract-ocr/tessdata or /usr/share/tessdata.
|
|
|
|
If Tesseract isn't available for your distribution, or you want to use a newer version
|
|
than they offer, you can compile your own.
|
|
|
|
If you compile Tesseract then following command in its source code are enough
|
|
./autogen.sh
|
|
./configure
|
|
make
|
|
sudo make install
|
|
sudo ldconfig
|
|
|
|
Note:
|
|
1) CCExtractor is tested with Tesseract 3.04 version but it works with older versions.
|
|
|
|
you can download tesseract from https://github.com/tesseract-ocr/tesseract/archive/3.04.00.tar.gz
|
|
you can download tesseract training data from https://github.com/tesseract-ocr/tessdata/archive/3.04.00.tar.gz
|
|
|
|
|
|
|
|
Compile CCextractor passing flags like following
|
|
-------------------------------------------------
|
|
make ENABLE_OCR=yes
|
|
|
|
|
|
How to compile ccextractor on Windows with OCR
|
|
===============================================
|
|
|
|
Download prebuild library of leptonica and tesseract from following link
|
|
https://drive.google.com/file/d/0B2ou7ZfB-2nZOTRtc3hJMHBtUFk/view?usp=sharing
|
|
|
|
put the path of libs/include of leptonica and tesseract in library paths.
|
|
step 1) In visual studio 2013 right click <Project> and select property.
|
|
step 2) Select Configuration properties in left panel(column) of property.
|
|
step 3) Select VC++ Directory.
|
|
step 4) In the right pane, in the right-hand column of the VC++ Directory property,
|
|
open the drop-down menu and choose Edit.
|
|
Step 5) Add path of Directory where you have kept uncompressed library of leptonica
|
|
and tesseract.
|
|
|
|
|
|
Set preprocessor flag ENABLE_OCR=1
|
|
Step 1)In visual studio 2013 right click <Project> and select property.
|
|
Step 2)In the left panel, select Configuration Properties, C/C++, Preprocessor.
|
|
Step 3)In the right panel, in the right-hand column of the Preprocessor Definitions property, open the drop-down menu and choose Edit.
|
|
Step 4)In the Preprocessor Definitions dialog box, add ENABLE_OCR=1. Choose OK to save your changes.
|
|
|
|
Add library in linker
|
|
step 1)Open property of project
|
|
Step 2)Select Configuration properties
|
|
Step 3)Select Linker in left panel(column)
|
|
Step 4)Select Input
|
|
Step 5)Select Additional dependencies in right panel
|
|
Step 6)Add libtesseract304d.lib in new line
|
|
Step 7)Add liblept172.lib in new line
|
|
|
|
Download language data from following link
|
|
https://code.google.com/p/tesseract-ocr/downloads/list
|
|
after downloading the tesseract-ocr-3.02.eng.tar.gz extract the tar file and put
|
|
tessdata folder where you have kept ccextractor executable
|
|
|
|
Copy the tesseract and leptonica dll from lib folder downloaded from above link to folder of executable or in system32.
|