ccextractor/docs/HARDSUBX.txt


Overview
========
Subtitles which are burned into the video (or hard subbed) can be extracted using the -hardsubx flag.
The system works by processing video frames and extracting only the subtitles from them, followed
by an OCR recognition using Tesseract.

Dependencies
============
Tesseract (OCR library by Google)
Leptonica (C Image processing library)
FFMpeg (Video Processing Library)

Compilation
===========

Linux
-----

Make sure Tesseract, Leptonica and FFMPeg are installed, and that their libraries can be found using pkg-config.
Refer to OCR.txt for installation details.

FFmpeg from packages (on Debian) plus a couple of other dependencies you will need:
sudo apt-get install libavcodec-dev libavformat-dev libavutil-dev libswscale-dev libxcb-shm0-dev liblzma-dev

FFmpeg from source:
To install FFmpeg (libav), follow the steps at:-
https://trac.ffmpeg.org/wiki/CompilationGuide/Ubuntu - For Ubuntu, Debian and Linux Mint
https://trac.ffmpeg.org/wiki/CompilationGuide/Generic - For generic Linux compilation

To validate your FFMpeg installation, make sure you can run the following commands on your terminal:-
pkg-config --cflags libavcodec
pkg-config --cflags libavformat
pkg-config --cflags libavutil
pkg-config --cflags libswscale
pkg-config --libs libavcodec
pkg-config --libs libavformat
pkg-config --libs libavutil
pkg-config --libs libswscale

On success, you should see the correct include directory path and the linker flags.

To build the program with hardsubx support,

== from the Linux directory run:-
    ./configure --enable-hardsubx
    make ENABLE_HARDSUBX=yes

== using cmake from root directory
    mkdir build
    cd build
    cmake -DWITH_OCR=on -DWITH_HARDSUBX=on ../src/
    make

NOTE: The build has been tested with FFMpeg version 3.1.0, and Tesseract 3.04.

Windows
-------

Coming Soon