Commit Graph

78 Commits

Author SHA1 Message Date
Diptanshu Jamgade
21eaa3de04 Python bindings with extraction of CE608 grid and writing to a SRT output. (#768)
* added python_extract to encoders_srt and the captions are being
extracted in needed format. Search for an alternative to asprintf

* Checking if the alternative to asprintf generate proper srts

* CC captions accessible via python script

* Removing python caption code from __wrap_write function

* removing old cc_to_python functions

* Removing python_subs structure and all the changes done for that struct

* Removing filename functions from ccextractor.*

* Renaming make_message to time_wrapper

* Applying to python_extract codebase: SSA format

* Added python_extract_time_based and done validation for ssa

* pplying python_extract_time_based: Done validation for srt and webvtt

* led attempt for SAMI support of python_extract. Code is commented

* Appluing python_extract_time_based: validate support for SMPTETT

* Added python_extract_transcript and made changes for time printing.

* added show_extracted_captions_wtih_timings function

* Added show_extracted_captions_with_timings to python script for testing
purpose.

* refactored extractors to api directory. commented out show captions in main()

* build and build library working for the extractors.

* made caption generator work with a 0.1 time sleep. Start refactoring

* added asprintf for windows.

* file being written in the running directory

* Auto -deletion of python temporary file

* Python captions printing status set to proper.

* termination of tail successful

* Writing successful for the sample

* Generating unalternating output

* adding api_support.py

* Adding bld_flags in build_api

* Added  to build_library

* Auto deletion of temporary file on SIGINT

* Discussing Seg fault with Izaron

* working for python and linux with samples. testing -out=pythonapi with stream

* Done adding bitmap support

* added -out=pythonapi support for bitmap

* Setting the messages_target to 0 for output = pythonapi

* Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python.

* adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future.

* added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608

* Removed overlap of -out=pythonapi by adding -pythonapi and
signal_python_api global variable.

* added support for seperate c608 grid catching. Need to test the output
via python.

* added support for seperate printing of text font and color in CE608.
Need to make sure that the function is inbuilt.

* ADDED ce608 GRID SUPPORT FROM PYTHON
need to discuss whether to keep the print_cc_grid function specific to
the module or make it user accessible.
Mostly it would be better to make it user accessible.

* made changes in the call_from_python_api function such that only
api_options is needed to be passed.
An if statement before the call to g608_extractor has also been added.
Waiting for Carlos to comment on the output generated till this stage.

* added a signal_python_api check before calling every write function.
Thus basic writing output can be avoided.

* Commented all calls to python_extract_time_based.
making changes to python_extract_g608 to be called only from the point
when a g608 caption is detected.

* Added pass_cc_buffer_to_python in encoders_common.c temporarily
redefined get_*_encoded from static to normal
included the above functions in encoders_common.h

* Added if-else statement for switch in encode_sub function.
This is done mainly for making sure no output is generated in the api
call.

* Added ccx_encoders_python.c
Defined pass_cc_buffer_to_python in ccx_encoders_python.c
added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call

* Removed __wrap_write from the entire code base.
It's declaration and definition are only present in CCExtractor.*

* Commented out the /dev/null part in ccx_encoders_common.c.
Proceeding further on checking for file generation.

* Added output_filename in array global variable and is generated in
init_write function.
included ccextractor.h in output.c to access global variable
signal_python_api for avoiding output generation in init_write and
invalid free in dinit_write.

* Modified the definition of init_write function for accessing
signal_python_api.

* Deleted the commented part of /dev/null in ccx_encoders_common.c.

* Added target_message=0 in -pythonapi param parsing in param.c to avoid
the API from printing to STDOUT.
Deleted the commented part of -out=pythonapi.
Thinking of adding a different param for silencing the output when the
call is made from python api.

* Removed __wrap_write from ccextractor.c and ccextractor.h.

* Added ccx_to_python_g608 and modified api_support.py file.
added documentation in ccextractor.c.

* added the generate srt script. However, some random characters are
coming in first line. Need to talk about this.

* Added SRT generator for python.
Using string to remove the garbage value.
Add code for srt counter and also the start_time and end_time
conversion.

* removed the trash characters and added code to print the timings.
However, the last blank frame also results in a print. Need to take care
of this.

* rectified the mistake of writing only timings and not captions.
now next step is to just make the timings print properly

* some minor changes before diving into extracting srt_counter from the made codebase

* Added extraction of srt_counter in python_extract via fflush
srt_counter-value.
Need to modify the processing in python.

* Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt

* Processing into a srt working properly.
Next step is to add the information of font into the caption text.

* the data is getting generated for proper SRT counters.

* A turning point to the appraoch.
Added END OF FRAME line for printing the data for every particular
srt_counter.
Proceeding further with the generation of srt by data manipulation.

* some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done.

* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.

* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.

* Added fflush and cleaned up the python code of srt generation

* Added <i> tag for italics.
Proceeding further with other types.

* Added the code to check for underline.
However, need to check how CCExtractor generates srt when both italics
and underline are present. For now a new line is added if both are
present.

* Shifting for making changes in th i/O work.

* Stable ouput for samples with italics is being generated.

* Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function.

* build script for linux is working correctly.
Build_library is showing error of invalid def of set_pythonapi.
Moreover, extractor has some memory seg fault.

* Added mod to set a MACRO as my_python_api to set the callback function.
Till now all calls to the reporter are commented.
Working on getting the reporter to print the lines.

* Changes have been implemented to bring reporter in working state.
For now a constant string is passed from extractor. Need to make the
proper parsing possible.

* Changed the code in extractor such that entire grid is returned to the
callback function.
Need to provide this grid to the write function and also cleanup the
codebase.

* Writing the outputted srt in a file called "temp.srt".
Need to modify init_write to push filename that is to be created in
python using callback.

* Added code to get start and end time simultaneously.
entire SRT is getting generated.

* removed ccx_python_encoders.c

* Compiling and executing on Windows

* Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c.
Also, deleted the static definition of get_font_encoded from
ccx_encoders_webvtt.c

* added a write statement in write_cc_bitmap_as_srt

* Rectified transfer of get_line_encoded, get_color_encoded and
get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 08:54:35 -07:00
Carlos Fernandez
710a205f99 Add support for file split on keyframe (-segmentonkeyonly)
Segmenting now doesn't destroys the whole encoding context, just closes and reopens the output file
Correct a wrong function prototype for process_hex()
OCR: Attempt to correctly deal with TessBaseAPIRecognize returning an error
Changed output for parse PMT to CCX_DMT_PMT instead of CCX_DMT_VERBOSE
2017-07-06 11:57:17 -07:00
Evgeny
76cb7b91ee Added matroska.c to filters and fixed _MSC_VER 2017-03-02 16:18:03 +03:00
Evgeny
071386d552 Fixed DLL requiring in non-full version 2017-01-26 12:27:39 +03:00
Evgeny Shulgin
2229a51b66 Added FFMPEG 3.0 what compatible with XP (#610) 2017-01-07 11:52:13 -08:00
cfsmp3
57ef958250 Corrected header directories in non-full versions. 2017-01-01 22:12:35 +01:00
Evgeny
2942e84a6f Solved the Windows dependency hell 2017-01-01 21:34:43 +03:00
Evgeny
fe9cd61d1d Fixed bad tesseract library 2016-12-24 22:19:23 +03:00
Evgeny
08b2bcb88b Added dependencies .dll-s and copy command 2016-12-24 18:26:06 +03:00
Evgeny
0befb3c5b1 Renamed tess version from 3.05 to 4.00 2016-12-24 10:59:52 +03:00
Evgeny
331a64e387 Added working tesseract 4.00 2016-12-23 18:01:12 +03:00
Evgeny
4c78e47404 Fixed mess in the filters 2016-12-22 18:37:02 +03:00
Evgeny
4b80441164 Renamed OCR to Full and copy ffmpeg DLLs to folder 2016-12-22 18:19:42 +03:00
cfsmp3
e2cc2f9fd7 ImageHasSafeExceptionHandlers>false 2016-12-22 09:21:24 +01:00
cfsmp3
b669733bd8 Added pre-build.bat to Release-OCR 2016-12-22 08:54:59 +01:00
Evgeny
802360b008 Ported HARDSUBX to Windows 2016-12-18 19:42:23 +03:00
Carlos Fernandez
7acb3c3874 Version bump (to 0.84). Rename target name of the Windows OCR binaries. 2016-12-16 10:41:02 -08:00
Carlos Fernandez
05da03a259 Changed dependency for OCR in release version - use non-debug version of tesseract 2016-12-15 10:24:34 -08:00
Carlos Fernandez
7aaa1e3edb Corrected timing in Itunes
Added list of changes to CHANGES.TXT
2016-12-13 17:39:05 -08:00
canihavesomecoffee
7b55f61396 Remove hardcoded references in project file, add relative ones instead 2016-12-10 08:38:58 +01:00
Carlos Fernandez
6dc941d4e6 Changed platform target to v120_xp, fixed some missing dirs. 2016-12-09 14:02:10 -08:00
AlexBratosin2001
ce15155956 Updated GPAC library to v0.6.2 (#500)
Replaced GPAC.
2016-12-09 13:47:54 -08:00
Carlos Fernandez
d453d9327e Minor changes IN README.md 2016-12-05 12:44:57 -08:00
canihavesomecoffee
814eaab300 Add utf8proc folder to the include directories
Regular debug & release have a missing folder
2016-11-29 22:38:04 +01:00
Carlos Fernandez
6f2becc42e Fixed OCR libraries dependencies for the release version in Windows. 2016-10-13 11:50:35 -07:00
Carlos Fernandez
17dd6696df Initial libcurl integration work, linux only. Just groundwork, lots of dummy things yet. 2016-09-26 13:36:04 -07:00
Carlos Fernandez
4101fe3880 Fixes #425 - the 708 decoder needs access the encoder. Reference was missing for .bin. 2016-09-20 16:04:33 -07:00
Carlos Fernandez
b00f8e75f6 Added dvb_subtitle_decoder.c to the project 2016-08-22 16:17:17 -07:00
Carlos Fernandez
358b8ef579 Initial backport of Oleg Kisselef's WITH_SHARING options. Most likely it breaks stuff. 2016-08-17 17:40:11 -07:00
Carlos Fernandez
c4073d1813 leptonica/tesseract version upgrade in release build (VS) 2016-08-16 10:33:17 -07:00
Carlos Fernandez
676539cf8c Updated Tesseract and leptonica versions, included the files in the repo because there's a royal pain to find and/or build. 2016-08-15 16:15:50 -07:00
canihavesomecoffee
9f4bff884f Update build script for windows
-
2016-07-05 18:01:47 +02:00
canihavesomecoffee
b8eec82f2a Update file to copy necessary DLL to output folder
Updates the project file to copy the two DLL's after compiling, so that
we can run from that directory.
2016-06-08 02:50:23 +02:00
canihavesomecoffee
04be7be06b Add OCR build support
Adds OCR build support by creating two new build configs (one debug, one
release) and some instructions about what VS expects on those configs.
2016-06-08 02:25:13 +02:00
canihavesomecoffee
05e451d41e Rename ccextractor to ccextractorwin for compilation 2016-06-04 20:40:34 +02:00
Carlos Fernandez
0b2e12ce0c Changed target to XP 2016-05-27 10:45:27 -07:00
canihavesomecoffee
a0787e740e Add windows pre-build event
Updates the .h file that contains the build date & git commit hash (if
available)
2016-05-22 08:13:53 +02:00
canihavesomecoffee
f694c95510 Add hashing library and update makefiles
- Adds a open-source hashing lib for SHA2 (SHA-256, SHA-386, SHA-512)
from http://www.aarongifford.com/computers/sha.html, with some small
modifications to make it work unders windows too
- Updates build commands to reflect this change
2016-05-22 03:49:31 +02:00
canihavesomecoffee
046fd4c435 Fix windows build
Fixes windows build by adding zvbi folder to includes
2016-03-21 23:08:21 +01:00
Willem
d3862ba88b Fix path
-
2016-03-05 13:19:20 +01:00
Anshul Maheshwari
0023c6545b make code windows compatible
Signed-off-by: Anshul Maheshwari <er.anshul.maheshwari@gmail.com>
2016-02-17 21:13:09 +05:30
Anshul Maheshwari
d2d7a17f3b strtok_r for windows
Signed-off-by: Anshul Maheshwari <er.anshul.maheshwari@gmail.com>
2015-10-05 12:58:09 +05:30
Anshul Maheshwari
cc0ee507dd remove some vs warning
Signed-off-by: Anshul Maheshwari <er.anshul.maheshwari@gmail.com>
2015-09-30 13:37:27 +05:30
wforums
6e9a30b354 fix solution
-
2015-09-24 00:08:19 +02:00
wforums
37091708b7 Merge remote-tracking branch 'CCExtractor/master'
Conflicts:
	windows/ccextractor.vcxproj.filters
2015-09-24 00:05:46 +02:00
wforums
0885aae79c Updating project files
-
2015-09-24 00:03:54 +02:00
Anshul Maheshwari
ad5b917f3b Compile code in vs2013
Signed-off-by: Anshul Maheshwari <er.anshul.maheshwari@gmail.com>
2015-09-15 15:04:59 +05:30
Anshul Maheshwari
57eb42c7bb Compile code using Visual studio
Signed-off-by: Anshul Maheshwari <er.anshul.maheshwari@gmail.com>
2015-08-18 12:19:33 +05:30
wforums
74ad11b44f Release project build fix
Fixed release project for VS. New subfolders weren't included.
2015-06-04 00:34:08 +02:00
wforums
051a6f1f67 git ignore update
Updated gitignore with some more VS project files.
2015-06-04 00:31:29 +02:00