mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2024-12-24 11:53:25 +00:00
96 lines
3.8 KiB
Plaintext
96 lines
3.8 KiB
Plaintext
|
|
Overview
|
|
========
|
|
OCR (Optical Character Recognisation ) is an technique used to
|
|
extract text from images. In the World of Subtile, subtitle stored
|
|
in bitmap format are common and even neccassary. for converting subtile
|
|
in bitmap format to subtilte in text format ocr is used.
|
|
|
|
Dependency
|
|
==========
|
|
Tesseract (OCR library by google)
|
|
Leptonica (image processing library)
|
|
|
|
How to compile ccextractor on linux with OCR
|
|
=============================================
|
|
|
|
Download and Install Leptonnica.
|
|
-------------------------------
|
|
This package is available, you need liblept-devel library.
|
|
|
|
If Leptonica isn't available for your distribution, or you want to use a newer version
|
|
than they offer, you can compile your own.
|
|
|
|
you can download lib leptonica from http://www.leptonica.com/download.html
|
|
|
|
Download and Install Tesseract.
|
|
-------------------------------
|
|
Tesseract is available directly from many Linux distributions. The package is generally
|
|
called 'tesseract' or 'tesseract-ocr' - search your distribution's repositories to
|
|
find it. Packages are also generally available for language training data (search the
|
|
repositories,) but if not you will need to download the appropriate training data,
|
|
unpack it, and copy the .traineddata file into the 'tessdata' directory, probably
|
|
/usr/share/tesseract-ocr/tessdata or /usr/share/tessdata.
|
|
|
|
If Tesseract isn't available for your distribution, or you want to use a newer version
|
|
than they offer, you can compile your own.
|
|
|
|
If you compile Tesseract then following command in its source code are enough
|
|
./autogen.sh
|
|
./configure
|
|
make
|
|
sudo make install
|
|
sudo ldconfig
|
|
|
|
Note:
|
|
1) CCExtractor is tested with Tesseract 3.02.02 version.
|
|
|
|
you can download tesseract from https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0&usp=sharing
|
|
|
|
|
|
|
|
Compile CCextractor passing flags like following
|
|
-------------------------------------------------
|
|
make ENABLE_OCR=yes
|
|
|
|
|
|
How to compile ccextractor on Windows with OCR
|
|
===============================================
|
|
|
|
Download prebuild library of leptonica from following link
|
|
http://www.leptonica.com/source/leptonica-1.68-win32-lib-include-dirs.zip
|
|
|
|
Download prebuild library of tesseract from following tesseract official link
|
|
https://code.google.com/p/tesseract-ocr/downloads/detail?name=tesseract-3.02.02-win32-lib-include-dirs.zip
|
|
|
|
put the path of libs/include of leptonica and tesseract in library paths.
|
|
step 1) In visual studio 2013 right click <Project> and select property.
|
|
step 2) Select Configuration properties in left panel(column) of property.
|
|
step 3) Select VC++ Directory.
|
|
step 4) In the right pane, in the right-hand column of the VC++ Directory property,
|
|
open the drop-down menu and choose Edit.
|
|
Step 5) Add path of Directory where you have kept uncompressed library of leptonica
|
|
and tesseract.
|
|
|
|
|
|
Set preprocessor flag ENABLE_OCR=1
|
|
Step 1)In visual studio 2013 right click <Project> and select property.
|
|
Step 2)In the left panel, select Configuration Properties, C/C++, Preprocessor.
|
|
Step 3)In the right panel, in the right-hand column of the Preprocessor Definitions property, open the drop-down menu and choose Edit.
|
|
Step 4)In the Preprocessor Definitions dialog box, add ENABLE_OCR=1. Choose OK to save your changes.
|
|
|
|
Add library in linker
|
|
step 1)Open property of project
|
|
Step 2)Select Configuration properties
|
|
Step 3)Select Linker in left panel(column)
|
|
Step 4)Select Input
|
|
Step 5)Select Additional dependencies in right panel
|
|
Step 6)Add libtesseract302.lib in new line
|
|
Step 7)Add liblept168.lib in new line
|
|
|
|
Download language data from following link
|
|
https://code.google.com/p/tesseract-ocr/downloads/list
|
|
after downloading the tesseract-ocr-3.02.eng.tar.gz extract the tar file and put
|
|
tessdata folder where you have kept ccextractor executable
|
|
|
|
Copy the tesseract and leptonica dll in the folder of executable or in system32. |