ccextractor

mirror of https://github.com/CCExtractor/ccextractor.git synced 2024-12-28 22:04:45 +00:00

Author	SHA1	Message	Date
Saurabh Shrivastava	2eb5fd26de	[FIX] Move files into appropriate directories & fix build scripts. (#781 ) * Move wrappers and extracters inside src/ and update CMakeLists. * Reflect change in path across build scripts. * Remove redundant source file inclusion. * Always use supplied libpng.	2017-10-02 12:16:04 -07:00
Hugh Mackworth	01852ef055	Compilation on the Mac (#777 ) * Update README.md * Delete README.MAC.TXT No longer accurate given work done to integrate Mac into build processes. * Change to use project's PNG/ZLIB libraries * Fix Mac build command Makes OCR an optional parameter Adds python API file to build * Update README.md	2017-10-02 11:59:00 -07:00
AlexBratosin2001	59c0de46e2	Fix Windows project files (#782 )	2017-10-02 11:58:15 -07:00
Vinícius Lugão	f8d9e042bb	Fix to output CC data when -out=raw is used (#775 ) When the -out=raw option is used, the ccextractor jumped to spupng output format, generating broken files in spupng format without CC data. With this fix, now it generates CC data in McPoodle's Broadcast format.	2017-09-08 10:06:00 -07:00
Diptanshu Jamgade	0596d375b7	Python Extension Module (#773 ) * Added self as contributor * Added extension module documentation to docs/	2017-09-03 15:37:24 -07:00
Diptanshu Jamgade	47c5a6e73b	Cleaning up the codebase and additional changes in Python SRT generator. (#771 ) * Removed all extractors except the grid extractor. Removed the call to transcript extractor in ccx_encoders_transcript.c * Removed unnecessary array appening statements in python_grid_extractor. WIP: switch in extractor. * Added switch in g608 grid extractor. * Deleted comments from wrappers. * Refactored code in ccextractor.c and .h files. Removed all the commented part. Made proper changes according to the coding conventions. * Removed calls to extractor from all the encoders. The only call made to extractor is from ccx_encoders_python.c. * Removed a comment from wrapper.c. In init_write function of output.c added a call to free the output string returned by asprintf in case of sending filename to callback function. * Added calls to free the char* which is malloced by asprintf in extractor.c WIP: Free the global variable elements. * Sample testing correctly for italics tag. Also added a hack to print only 32 characters when unicode fails. WIP: Font tag. * Added support for handling font and italics in Python SRT generator. * modified the font generator. Also, added count method for checking blank strings in python_srt_generator. * Added free statements for avoiding memory leaks. * added return code for failure of asprintf calls. * Removing unnecessary code from api_testing.py * Made modifications to Makefile and build script. * Added recursive_tester.py Autoconf builds successfully. * BUG: Made change to get_line_encoded to encode the last \0 character in a line. Otherwise the EOL characted is absent causing garbage value to be present in SRT. * Exporting the encoding of the captions from CCExtractor to Python so that the python SRT generator can generate proper SRT files. * Modified the include statement in extractor.h	2017-08-25 11:03:00 -07:00
Carlos Fernandez	022463b9a2	Moved ccx_encoders_python to right filter in project.	2017-08-21 14:25:35 -07:00
Saurabh Shrivastava	d19f471352	Correctly handle return codes. (#763 ) Return code after parameter parsing were incorrectly handles leading to errors such as `Error: Invalid option to CCextractor Library`.	2017-08-21 14:11:19 -07:00
Saurabh Shrivastava	8f2f38bf07	Fix builddebug to include Python API changes. (#770 )	2017-08-21 13:21:48 -07:00
Saurabh Shrivastava	4fe82abbfc	Get commit hash and compilation date when built using cmake. (#764 ) Who knew I would have to read so much documentation for such trivial task 😒	2017-08-20 08:55:09 -07:00
Mayank Gupta	32710eff1d	Fix failing build with autoconf due to ccextractor.h (#765 )	2017-08-20 08:54:51 -07:00
Diptanshu Jamgade	21eaa3de04	Python bindings with extraction of CE608 grid and writing to a SRT output. (#768 ) * added python_extract to encoders_srt and the captions are being extracted in needed format. Search for an alternative to asprintf * Checking if the alternative to asprintf generate proper srts * CC captions accessible via python script * Removing python caption code from __wrap_write function * removing old cc_to_python functions * Removing python_subs structure and all the changes done for that struct * Removing filename functions from ccextractor.* * Renaming make_message to time_wrapper * Applying to python_extract codebase: SSA format * Added python_extract_time_based and done validation for ssa * pplying python_extract_time_based: Done validation for srt and webvtt * led attempt for SAMI support of python_extract. Code is commented * Appluing python_extract_time_based: validate support for SMPTETT * Added python_extract_transcript and made changes for time printing. * added show_extracted_captions_wtih_timings function * Added show_extracted_captions_with_timings to python script for testing purpose. * refactored extractors to api directory. commented out show captions in main() * build and build library working for the extractors. * made caption generator work with a 0.1 time sleep. Start refactoring * added asprintf for windows. * file being written in the running directory * Auto -deletion of python temporary file * Python captions printing status set to proper. * termination of tail successful * Writing successful for the sample * Generating unalternating output * adding api_support.py * Adding bld_flags in build_api * Added to build_library * Auto deletion of temporary file on SIGINT * Discussing Seg fault with Izaron * working for python and linux with samples. testing -out=pythonapi with stream * Done adding bitmap support * added -out=pythonapi support for bitmap * Setting the messages_target to 0 for output = pythonapi * Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python. * adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future. * added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608 * Removed overlap of -out=pythonapi by adding -pythonapi and signal_python_api global variable. * added support for seperate c608 grid catching. Need to test the output via python. * added support for seperate printing of text font and color in CE608. Need to make sure that the function is inbuilt. * ADDED ce608 GRID SUPPORT FROM PYTHON need to discuss whether to keep the print_cc_grid function specific to the module or make it user accessible. Mostly it would be better to make it user accessible. * made changes in the call_from_python_api function such that only api_options is needed to be passed. An if statement before the call to g608_extractor has also been added. Waiting for Carlos to comment on the output generated till this stage. * added a signal_python_api check before calling every write function. Thus basic writing output can be avoided. * Commented all calls to python_extract_time_based. making changes to python_extract_g608 to be called only from the point when a g608 caption is detected. * Added pass_cc_buffer_to_python in encoders_common.c temporarily redefined get__encoded from static to normal included the above functions in encoders_common.h Added if-else statement for switch in encode_sub function. This is done mainly for making sure no output is generated in the api call. * Added ccx_encoders_python.c Defined pass_cc_buffer_to_python in ccx_encoders_python.c added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call * Removed __wrap_write from the entire code base. It's declaration and definition are only present in CCExtractor.* * Commented out the /dev/null part in ccx_encoders_common.c. Proceeding further on checking for file generation. * Added output_filename in array global variable and is generated in init_write function. included ccextractor.h in output.c to access global variable signal_python_api for avoiding output generation in init_write and invalid free in dinit_write. * Modified the definition of init_write function for accessing signal_python_api. * Deleted the commented part of /dev/null in ccx_encoders_common.c. * Added target_message=0 in -pythonapi param parsing in param.c to avoid the API from printing to STDOUT. Deleted the commented part of -out=pythonapi. Thinking of adding a different param for silencing the output when the call is made from python api. * Removed __wrap_write from ccextractor.c and ccextractor.h. * Added ccx_to_python_g608 and modified api_support.py file. added documentation in ccextractor.c. * added the generate srt script. However, some random characters are coming in first line. Need to talk about this. * Added SRT generator for python. Using string to remove the garbage value. Add code for srt counter and also the start_time and end_time conversion. * removed the trash characters and added code to print the timings. However, the last blank frame also results in a print. Need to take care of this. * rectified the mistake of writing only timings and not captions. now next step is to just make the timings print properly * some minor changes before diving into extracting srt_counter from the made codebase * Added extraction of srt_counter in python_extract via fflush srt_counter-value. Need to modify the processing in python. * Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt * Processing into a srt working properly. Next step is to add the information of font into the caption text. * the data is getting generated for proper SRT counters. * A turning point to the appraoch. Added END OF FRAME line for printing the data for every particular srt_counter. Proceeding further with the generation of srt by data manipulation. * some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Added fflush and cleaned up the python code of srt generation * Added <i> tag for italics. Proceeding further with other types. * Added the code to check for underline. However, need to check how CCExtractor generates srt when both italics and underline are present. For now a new line is added if both are present. * Shifting for making changes in th i/O work. * Stable ouput for samples with italics is being generated. * Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function. * build script for linux is working correctly. Build_library is showing error of invalid def of set_pythonapi. Moreover, extractor has some memory seg fault. * Added mod to set a MACRO as my_python_api to set the callback function. Till now all calls to the reporter are commented. Working on getting the reporter to print the lines. * Changes have been implemented to bring reporter in working state. For now a constant string is passed from extractor. Need to make the proper parsing possible. * Changed the code in extractor such that entire grid is returned to the callback function. Need to provide this grid to the write function and also cleanup the codebase. * Writing the outputted srt in a file called "temp.srt". Need to modify init_write to push filename that is to be created in python using callback. * Added code to get start and end time simultaneously. entire SRT is getting generated. * removed ccx_python_encoders.c * Compiling and executing on Windows * Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c. Also, deleted the static definition of get_font_encoded from ccx_encoders_webvtt.c * added a write statement in write_cc_bitmap_as_srt * Rectified transfer of get_line_encoded, get_color_encoded and get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.	2017-08-20 08:54:35 -07:00
Carlos Fernandez Sanz	773ddf8bc2	Merge pull request #769 from Izaron/patch-1 [IMPROVEMENT] Added gui mode reports for Matroska decoder	2017-08-20 08:54:05 -07:00
Evgeny Shulgin	14e0d86df8	Added gui mode reports for Matroska decoder	2017-08-20 15:08:20 +03:00
Saurabh Shrivastava	333fb6eb6e	Cleanly format the compiling documentation and cmake instructions.	2017-07-26 04:24:19 +05:30
Saurabh Shrivastava	da0893fdb3	Fix CMakeLists for MacOS and Linux. With #742 and this, CCExtractor could be build across all three platforms using CMake.	2017-07-26 04:23:48 +05:30
Carlos Fernandez	ce2b680a43	Merge branch 'pr/n759_Abhinav95'	2017-07-21 11:25:24 -07:00
Abhinav95	b1cc95d972	Adding grayscale conversion for better OCR	2017-07-21 12:12:50 +05:30
Diptanshu8	10eb52e651	pushing 4 wrapper codes	2017-07-20 02:50:53 +00:00
Diptanshu8	13b3dadb45	Wrapper for debugdvbsub and pesheader	2017-07-20 02:50:53 +00:00
Diptanshu8	cff69bef5e	added wrapper code for setstdout and setautoprogram	2017-07-20 02:50:53 +00:00
Carlos Fernandez	536082ae6e	Merge branch 'pr/n751_Diptanshu8'	2017-07-19 10:59:56 -07:00
Diptanshu8	3f069b84c9	fixed -out=dvdraw sample error.	2017-07-18 04:48:08 +00:00
Carlos Fernandez	ddca8001cc	Merge branch 'pr/n755_Abhinav95'	2017-07-17 11:44:11 -07:00
Diptanshu8	02b4427260	making changes to write wrapper	2017-07-17 08:59:00 +00:00
Abhinav95	ec5618dd1f	Fixing end timestamp in DVB transcripts + spelling/readme improvements	2017-07-17 04:23:34 +05:30
Carlos Fernandez	e8f742a627	Corrected function prototype	2017-07-14 13:01:39 -07:00
Saurabh Shrivastava	45946e3ac9	Initialise timing for MP4 webvtt. Fixes #753 .	2017-07-14 18:59:02 +05:30
Diptanshu8	e3e5f8b36e	Apply write wrapper across entire database.	2017-07-13 07:26:49 +00:00
Diptanshu8	1435411861	Commenting out the file name related functions.	2017-07-13 05:48:14 +00:00
Diptanshu8	86b7e7348e	Added extension to python_subs	2017-07-11 21:34:05 +00:00
Diptanshu8	d2bd2d1397	added basefilename to python_subs	2017-07-11 21:21:18 +00:00
Diptanshu8	8895b27552	CC being shown in python script.	2017-07-11 21:21:18 +00:00
Diptanshu8	57424857b0	Working on PR	2017-07-11 21:21:18 +00:00
Diptanshu8	2ced408994	build and build_library working correctly	2017-07-11 21:21:18 +00:00
Diptanshu8	976f01cee1	CCs to python_subs extracted properly	2017-07-11 21:17:46 +00:00
Diptanshu8	4d5f80a01d	Found wrapper for write. Check file_handle and start processing.	2017-07-11 21:17:46 +00:00
Carlos Fernandez	0327e676dd	Merge branch 'pr/n747_Diptanshu8'	2017-07-11 11:40:10 -07:00
Diptanshu8	91ea65d2a3	Removed ccextractor.pyc	2017-07-11 11:18:20 +00:00
Diptanshu8	de5fcf27f3	adding .pyc to gitignore	2017-07-06 23:34:08 +00:00
Diptanshu8	fe6813736c	segregating the code and changing myarguments and argument_count. Also, gsoc directory has been created.	2017-07-06 22:58:23 +00:00
Diptanshu8	dc35af0bc0	Modifications to the code.	2017-07-06 22:22:59 +00:00
Carlos Fernandez	0c0bf1aafd	-Added -nospupngocr (don't OCR bitmaps when generating spupng, faster)	2017-07-06 13:37:20 -07:00
Carlos Fernandez	62dab0dde9	Merge branch 'pr/n746_Abhinav95'	2017-07-06 12:59:26 -07:00
=	31a2d46996	Forcing -noru to cause deduplication in ISDB	2017-07-07 01:22:11 +05:30
Carlos Fernandez	710a205f99	Add support for file split on keyframe (-segmentonkeyonly) Segmenting now doesn't destroys the whole encoding context, just closes and reopens the output file Correct a wrong function prototype for process_hex() OCR: Attempt to correctly deal with TessBaseAPIRecognize returning an error Changed output for parse PMT to CCX_DMT_PMT instead of CCX_DMT_VERBOSE	2017-07-06 11:57:17 -07:00
Diptanshu8	69a956f3c2	removing api.so	2017-07-04 09:48:44 +00:00
Diptanshu8	7839403266	adding .so to .gitignore	2017-07-04 09:07:14 +00:00
Diptanshu8	6e50104da4	Cyclic rotation and python script argv passing solved	2017-06-28 21:35:32 +00:00
Diptanshu8	edb2431cf9	Cyclic rotation patch	2017-06-28 19:07:44 +00:00

1 2 3 4 5 ...

1852 Commits