Commit Graph

1852 Commits

Author SHA1 Message Date
Saurabh Shrivastava
2eb5fd26de [FIX] Move files into appropriate directories & fix build scripts. (#781)
* Move wrappers and extracters inside src/ and update CMakeLists.

* Reflect change in path across build scripts.

* Remove redundant source file inclusion.

* Always use supplied libpng.
2017-10-02 12:16:04 -07:00
Hugh Mackworth
01852ef055 Compilation on the Mac (#777)
* Update README.md

* Delete README.MAC.TXT

No longer accurate given work done to integrate Mac into build processes.

* Change to use project's PNG/ZLIB libraries

* Fix Mac build command
Makes OCR an optional parameter
Adds python API file to build

* Update README.md
2017-10-02 11:59:00 -07:00
AlexBratosin2001
59c0de46e2 Fix Windows project files (#782) 2017-10-02 11:58:15 -07:00
Vinícius Lugão
f8d9e042bb Fix to output CC data when -out=raw is used (#775)
When the -out=raw option is used, the ccextractor jumped to spupng output
format, generating broken files in spupng format without CC data.
With this fix, now it generates CC data in McPoodle's Broadcast format.
2017-09-08 10:06:00 -07:00
Diptanshu Jamgade
0596d375b7 Python Extension Module (#773)
* Added self as contributor

* Added extension module documentation to docs/
2017-09-03 15:37:24 -07:00
Diptanshu Jamgade
47c5a6e73b Cleaning up the codebase and additional changes in Python SRT generator. (#771)
* Removed all extractors except the grid extractor.
Removed the call to transcript extractor in ccx_encoders_transcript.c

* Removed unnecessary array appening statements in python_grid_extractor.
WIP: switch in extractor.

* Added switch in g608 grid extractor.

* Deleted comments from wrappers.

* Refactored code in ccextractor.c and .h files.
Removed all the commented part.
Made proper changes according to the coding conventions.

* Removed calls to extractor from all the encoders.
The only call made to extractor is from ccx_encoders_python.c.

* Removed a comment from wrapper.c.
In init_write function of output.c added a call to free the output string returned by asprintf in case of
sending filename to callback function.

* Added calls to free the char* which is malloced by asprintf in
extractor.c
WIP: Free the global variable elements.

* Sample testing correctly for italics tag.
Also added a hack to print only 32 characters when unicode fails.
WIP: Font tag.

* Added support for handling font and italics in Python SRT generator.

* modified the font generator.
Also, added count method for checking blank strings in
python_srt_generator.

* Added free statements for avoiding memory leaks.

* added return code for failure of asprintf calls.

* Removing unnecessary code from api_testing.py

* Made modifications to Makefile and build script.

* Added recursive_tester.py
Autoconf builds successfully.

* BUG: Made change to get_line_encoded to encode the last \0 character in a
line. Otherwise the EOL characted is absent causing garbage value to be
present in SRT.

* Exporting the encoding of the captions from CCExtractor to Python so
that the python SRT generator can generate proper SRT files.

* Modified the include statement in extractor.h
2017-08-25 11:03:00 -07:00
Carlos Fernandez
022463b9a2 Moved ccx_encoders_python to right filter in project. 2017-08-21 14:25:35 -07:00
Saurabh Shrivastava
d19f471352 Correctly handle return codes. (#763)
Return code after parameter parsing were incorrectly handles leading to errors such as `Error: Invalid option to CCextractor Library`.
2017-08-21 14:11:19 -07:00
Saurabh Shrivastava
8f2f38bf07 Fix builddebug to include Python API changes. (#770) 2017-08-21 13:21:48 -07:00
Saurabh Shrivastava
4fe82abbfc Get commit hash and compilation date when built using cmake. (#764)
Who knew I would have to read so much documentation for such trivial task 😒
2017-08-20 08:55:09 -07:00
Mayank Gupta
32710eff1d Fix failing build with autoconf due to ccextractor.h (#765) 2017-08-20 08:54:51 -07:00
Diptanshu Jamgade
21eaa3de04 Python bindings with extraction of CE608 grid and writing to a SRT output. (#768)
* added python_extract to encoders_srt and the captions are being
extracted in needed format. Search for an alternative to asprintf

* Checking if the alternative to asprintf generate proper srts

* CC captions accessible via python script

* Removing python caption code from __wrap_write function

* removing old cc_to_python functions

* Removing python_subs structure and all the changes done for that struct

* Removing filename functions from ccextractor.*

* Renaming make_message to time_wrapper

* Applying to python_extract codebase: SSA format

* Added python_extract_time_based and done validation for ssa

* pplying python_extract_time_based: Done validation for srt and webvtt

* led attempt for SAMI support of python_extract. Code is commented

* Appluing python_extract_time_based: validate support for SMPTETT

* Added python_extract_transcript and made changes for time printing.

* added show_extracted_captions_wtih_timings function

* Added show_extracted_captions_with_timings to python script for testing
purpose.

* refactored extractors to api directory. commented out show captions in main()

* build and build library working for the extractors.

* made caption generator work with a 0.1 time sleep. Start refactoring

* added asprintf for windows.

* file being written in the running directory

* Auto -deletion of python temporary file

* Python captions printing status set to proper.

* termination of tail successful

* Writing successful for the sample

* Generating unalternating output

* adding api_support.py

* Adding bld_flags in build_api

* Added  to build_library

* Auto deletion of temporary file on SIGINT

* Discussing Seg fault with Izaron

* working for python and linux with samples. testing -out=pythonapi with stream

* Done adding bitmap support

* added -out=pythonapi support for bitmap

* Setting the messages_target to 0 for output = pythonapi

* Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python.

* adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future.

* added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608

* Removed overlap of -out=pythonapi by adding -pythonapi and
signal_python_api global variable.

* added support for seperate c608 grid catching. Need to test the output
via python.

* added support for seperate printing of text font and color in CE608.
Need to make sure that the function is inbuilt.

* ADDED ce608 GRID SUPPORT FROM PYTHON
need to discuss whether to keep the print_cc_grid function specific to
the module or make it user accessible.
Mostly it would be better to make it user accessible.

* made changes in the call_from_python_api function such that only
api_options is needed to be passed.
An if statement before the call to g608_extractor has also been added.
Waiting for Carlos to comment on the output generated till this stage.

* added a signal_python_api check before calling every write function.
Thus basic writing output can be avoided.

* Commented all calls to python_extract_time_based.
making changes to python_extract_g608 to be called only from the point
when a g608 caption is detected.

* Added pass_cc_buffer_to_python in encoders_common.c temporarily
redefined get_*_encoded from static to normal
included the above functions in encoders_common.h

* Added if-else statement for switch in encode_sub function.
This is done mainly for making sure no output is generated in the api
call.

* Added ccx_encoders_python.c
Defined pass_cc_buffer_to_python in ccx_encoders_python.c
added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call

* Removed __wrap_write from the entire code base.
It's declaration and definition are only present in CCExtractor.*

* Commented out the /dev/null part in ccx_encoders_common.c.
Proceeding further on checking for file generation.

* Added output_filename in array global variable and is generated in
init_write function.
included ccextractor.h in output.c to access global variable
signal_python_api for avoiding output generation in init_write and
invalid free in dinit_write.

* Modified the definition of init_write function for accessing
signal_python_api.

* Deleted the commented part of /dev/null in ccx_encoders_common.c.

* Added target_message=0 in -pythonapi param parsing in param.c to avoid
the API from printing to STDOUT.
Deleted the commented part of -out=pythonapi.
Thinking of adding a different param for silencing the output when the
call is made from python api.

* Removed __wrap_write from ccextractor.c and ccextractor.h.

* Added ccx_to_python_g608 and modified api_support.py file.
added documentation in ccextractor.c.

* added the generate srt script. However, some random characters are
coming in first line. Need to talk about this.

* Added SRT generator for python.
Using string to remove the garbage value.
Add code for srt counter and also the start_time and end_time
conversion.

* removed the trash characters and added code to print the timings.
However, the last blank frame also results in a print. Need to take care
of this.

* rectified the mistake of writing only timings and not captions.
now next step is to just make the timings print properly

* some minor changes before diving into extracting srt_counter from the made codebase

* Added extraction of srt_counter in python_extract via fflush
srt_counter-value.
Need to modify the processing in python.

* Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt

* Processing into a srt working properly.
Next step is to add the information of font into the caption text.

* the data is getting generated for proper SRT counters.

* A turning point to the appraoch.
Added END OF FRAME line for printing the data for every particular
srt_counter.
Proceeding further with the generation of srt by data manipulation.

* some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done.

* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.

* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.

* Added fflush and cleaned up the python code of srt generation

* Added <i> tag for italics.
Proceeding further with other types.

* Added the code to check for underline.
However, need to check how CCExtractor generates srt when both italics
and underline are present. For now a new line is added if both are
present.

* Shifting for making changes in th i/O work.

* Stable ouput for samples with italics is being generated.

* Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function.

* build script for linux is working correctly.
Build_library is showing error of invalid def of set_pythonapi.
Moreover, extractor has some memory seg fault.

* Added mod to set a MACRO as my_python_api to set the callback function.
Till now all calls to the reporter are commented.
Working on getting the reporter to print the lines.

* Changes have been implemented to bring reporter in working state.
For now a constant string is passed from extractor. Need to make the
proper parsing possible.

* Changed the code in extractor such that entire grid is returned to the
callback function.
Need to provide this grid to the write function and also cleanup the
codebase.

* Writing the outputted srt in a file called "temp.srt".
Need to modify init_write to push filename that is to be created in
python using callback.

* Added code to get start and end time simultaneously.
entire SRT is getting generated.

* removed ccx_python_encoders.c

* Compiling and executing on Windows

* Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c.
Also, deleted the static definition of get_font_encoded from
ccx_encoders_webvtt.c

* added a write statement in write_cc_bitmap_as_srt

* Rectified transfer of get_line_encoded, get_color_encoded and
get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 08:54:35 -07:00
Carlos Fernandez Sanz
773ddf8bc2 Merge pull request #769 from Izaron/patch-1
**[IMPROVEMENT]** Added gui mode reports for Matroska decoder
2017-08-20 08:54:05 -07:00
Evgeny Shulgin
14e0d86df8 Added gui mode reports for Matroska decoder 2017-08-20 15:08:20 +03:00
Saurabh Shrivastava
333fb6eb6e Cleanly format the compiling documentation and cmake instructions. 2017-07-26 04:24:19 +05:30
Saurabh Shrivastava
da0893fdb3 Fix CMakeLists for MacOS and Linux.
With #742 and this, CCExtractor could be build across all three platforms using CMake.
2017-07-26 04:23:48 +05:30
Carlos Fernandez
ce2b680a43 Merge branch 'pr/n759_Abhinav95' 2017-07-21 11:25:24 -07:00
Abhinav95
b1cc95d972 Adding grayscale conversion for better OCR 2017-07-21 12:12:50 +05:30
Diptanshu8
10eb52e651 pushing 4 wrapper codes 2017-07-20 02:50:53 +00:00
Diptanshu8
13b3dadb45 Wrapper for debugdvbsub and pesheader 2017-07-20 02:50:53 +00:00
Diptanshu8
cff69bef5e added wrapper code for setstdout and setautoprogram 2017-07-20 02:50:53 +00:00
Carlos Fernandez
536082ae6e Merge branch 'pr/n751_Diptanshu8' 2017-07-19 10:59:56 -07:00
Diptanshu8
3f069b84c9 fixed -out=dvdraw sample error. 2017-07-18 04:48:08 +00:00
Carlos Fernandez
ddca8001cc Merge branch 'pr/n755_Abhinav95' 2017-07-17 11:44:11 -07:00
Diptanshu8
02b4427260 making changes to write wrapper 2017-07-17 08:59:00 +00:00
Abhinav95
ec5618dd1f Fixing end timestamp in DVB transcripts + spelling/readme improvements 2017-07-17 04:23:34 +05:30
Carlos Fernandez
e8f742a627 Corrected function prototype 2017-07-14 13:01:39 -07:00
Saurabh Shrivastava
45946e3ac9 Initialise timing for MP4 webvtt.
Fixes #753 .
2017-07-14 18:59:02 +05:30
Diptanshu8
e3e5f8b36e Apply write wrapper across entire database. 2017-07-13 07:26:49 +00:00
Diptanshu8
1435411861 Commenting out the file name related functions. 2017-07-13 05:48:14 +00:00
Diptanshu8
86b7e7348e Added extension to python_subs 2017-07-11 21:34:05 +00:00
Diptanshu8
d2bd2d1397 added basefilename to python_subs 2017-07-11 21:21:18 +00:00
Diptanshu8
8895b27552 CC being shown in python script. 2017-07-11 21:21:18 +00:00
Diptanshu8
57424857b0 Working on PR 2017-07-11 21:21:18 +00:00
Diptanshu8
2ced408994 build and build_library working correctly 2017-07-11 21:21:18 +00:00
Diptanshu8
976f01cee1 CCs to python_subs extracted properly 2017-07-11 21:17:46 +00:00
Diptanshu8
4d5f80a01d Found wrapper for write. Check file_handle and start processing. 2017-07-11 21:17:46 +00:00
Carlos Fernandez
0327e676dd Merge branch 'pr/n747_Diptanshu8' 2017-07-11 11:40:10 -07:00
Diptanshu8
91ea65d2a3 Removed ccextractor.pyc 2017-07-11 11:18:20 +00:00
Diptanshu8
de5fcf27f3 adding .pyc to gitignore 2017-07-06 23:34:08 +00:00
Diptanshu8
fe6813736c segregating the code and changing myarguments and argument_count. Also, gsoc directory has been created. 2017-07-06 22:58:23 +00:00
Diptanshu8
dc35af0bc0 Modifications to the code. 2017-07-06 22:22:59 +00:00
Carlos Fernandez
0c0bf1aafd -Added -nospupngocr (don't OCR bitmaps when generating spupng, faster) 2017-07-06 13:37:20 -07:00
Carlos Fernandez
62dab0dde9 Merge branch 'pr/n746_Abhinav95' 2017-07-06 12:59:26 -07:00
=
31a2d46996 Forcing -noru to cause deduplication in ISDB 2017-07-07 01:22:11 +05:30
Carlos Fernandez
710a205f99 Add support for file split on keyframe (-segmentonkeyonly)
Segmenting now doesn't destroys the whole encoding context, just closes and reopens the output file
Correct a wrong function prototype for process_hex()
OCR: Attempt to correctly deal with TessBaseAPIRecognize returning an error
Changed output for parse PMT to CCX_DMT_PMT instead of CCX_DMT_VERBOSE
2017-07-06 11:57:17 -07:00
Diptanshu8
69a956f3c2 removing api.so 2017-07-04 09:48:44 +00:00
Diptanshu8
7839403266 adding .so to .gitignore 2017-07-04 09:07:14 +00:00
Diptanshu8
6e50104da4 Cyclic rotation and python script argv passing solved 2017-06-28 21:35:32 +00:00
Diptanshu8
edb2431cf9 Cyclic rotation patch 2017-06-28 19:07:44 +00:00