ccextractor

mirror of https://github.com/CCExtractor/ccextractor.git synced 2025-01-04 09:13:46 +00:00

Author	SHA1	Message	Date
Diptanshu Jamgade	21eaa3de04	Python bindings with extraction of CE608 grid and writing to a SRT output. (#768 ) * added python_extract to encoders_srt and the captions are being extracted in needed format. Search for an alternative to asprintf * Checking if the alternative to asprintf generate proper srts * CC captions accessible via python script * Removing python caption code from __wrap_write function * removing old cc_to_python functions * Removing python_subs structure and all the changes done for that struct * Removing filename functions from ccextractor.* * Renaming make_message to time_wrapper * Applying to python_extract codebase: SSA format * Added python_extract_time_based and done validation for ssa * pplying python_extract_time_based: Done validation for srt and webvtt * led attempt for SAMI support of python_extract. Code is commented * Appluing python_extract_time_based: validate support for SMPTETT * Added python_extract_transcript and made changes for time printing. * added show_extracted_captions_wtih_timings function * Added show_extracted_captions_with_timings to python script for testing purpose. * refactored extractors to api directory. commented out show captions in main() * build and build library working for the extractors. * made caption generator work with a 0.1 time sleep. Start refactoring * added asprintf for windows. * file being written in the running directory * Auto -deletion of python temporary file * Python captions printing status set to proper. * termination of tail successful * Writing successful for the sample * Generating unalternating output * adding api_support.py * Adding bld_flags in build_api * Added to build_library * Auto deletion of temporary file on SIGINT * Discussing Seg fault with Izaron * working for python and linux with samples. testing -out=pythonapi with stream * Done adding bitmap support * added -out=pythonapi support for bitmap * Setting the messages_target to 0 for output = pythonapi * Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python. * adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future. * added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608 * Removed overlap of -out=pythonapi by adding -pythonapi and signal_python_api global variable. * added support for seperate c608 grid catching. Need to test the output via python. * added support for seperate printing of text font and color in CE608. Need to make sure that the function is inbuilt. * ADDED ce608 GRID SUPPORT FROM PYTHON need to discuss whether to keep the print_cc_grid function specific to the module or make it user accessible. Mostly it would be better to make it user accessible. * made changes in the call_from_python_api function such that only api_options is needed to be passed. An if statement before the call to g608_extractor has also been added. Waiting for Carlos to comment on the output generated till this stage. * added a signal_python_api check before calling every write function. Thus basic writing output can be avoided. * Commented all calls to python_extract_time_based. making changes to python_extract_g608 to be called only from the point when a g608 caption is detected. * Added pass_cc_buffer_to_python in encoders_common.c temporarily redefined get__encoded from static to normal included the above functions in encoders_common.h Added if-else statement for switch in encode_sub function. This is done mainly for making sure no output is generated in the api call. * Added ccx_encoders_python.c Defined pass_cc_buffer_to_python in ccx_encoders_python.c added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call * Removed __wrap_write from the entire code base. It's declaration and definition are only present in CCExtractor.* * Commented out the /dev/null part in ccx_encoders_common.c. Proceeding further on checking for file generation. * Added output_filename in array global variable and is generated in init_write function. included ccextractor.h in output.c to access global variable signal_python_api for avoiding output generation in init_write and invalid free in dinit_write. * Modified the definition of init_write function for accessing signal_python_api. * Deleted the commented part of /dev/null in ccx_encoders_common.c. * Added target_message=0 in -pythonapi param parsing in param.c to avoid the API from printing to STDOUT. Deleted the commented part of -out=pythonapi. Thinking of adding a different param for silencing the output when the call is made from python api. * Removed __wrap_write from ccextractor.c and ccextractor.h. * Added ccx_to_python_g608 and modified api_support.py file. added documentation in ccextractor.c. * added the generate srt script. However, some random characters are coming in first line. Need to talk about this. * Added SRT generator for python. Using string to remove the garbage value. Add code for srt counter and also the start_time and end_time conversion. * removed the trash characters and added code to print the timings. However, the last blank frame also results in a print. Need to take care of this. * rectified the mistake of writing only timings and not captions. now next step is to just make the timings print properly * some minor changes before diving into extracting srt_counter from the made codebase * Added extraction of srt_counter in python_extract via fflush srt_counter-value. Need to modify the processing in python. * Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt * Processing into a srt working properly. Next step is to add the information of font into the caption text. * the data is getting generated for proper SRT counters. * A turning point to the appraoch. Added END OF FRAME line for printing the data for every particular srt_counter. Proceeding further with the generation of srt by data manipulation. * some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Added fflush and cleaned up the python code of srt generation * Added <i> tag for italics. Proceeding further with other types. * Added the code to check for underline. However, need to check how CCExtractor generates srt when both italics and underline are present. For now a new line is added if both are present. * Shifting for making changes in th i/O work. * Stable ouput for samples with italics is being generated. * Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function. * build script for linux is working correctly. Build_library is showing error of invalid def of set_pythonapi. Moreover, extractor has some memory seg fault. * Added mod to set a MACRO as my_python_api to set the callback function. Till now all calls to the reporter are commented. Working on getting the reporter to print the lines. * Changes have been implemented to bring reporter in working state. For now a constant string is passed from extractor. Need to make the proper parsing possible. * Changed the code in extractor such that entire grid is returned to the callback function. Need to provide this grid to the write function and also cleanup the codebase. * Writing the outputted srt in a file called "temp.srt". Need to modify init_write to push filename that is to be created in python using callback. * Added code to get start and end time simultaneously. entire SRT is getting generated. * removed ccx_python_encoders.c * Compiling and executing on Windows * Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c. Also, deleted the static definition of get_font_encoded from ccx_encoders_webvtt.c * added a write statement in write_cc_bitmap_as_srt * Rectified transfer of get_line_encoded, get_color_encoded and get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.	2017-08-20 08:54:35 -07:00
Carlos Fernandez	710a205f99	Add support for file split on keyframe (-segmentonkeyonly) Segmenting now doesn't destroys the whole encoding context, just closes and reopens the output file Correct a wrong function prototype for process_hex() OCR: Attempt to correctly deal with TessBaseAPIRecognize returning an error Changed output for parse PMT to CCX_DMT_PMT instead of CCX_DMT_VERBOSE	2017-07-06 11:57:17 -07:00
Evgeny	76cb7b91ee	Added matroska.c to filters and fixed _MSC_VER	2017-03-02 16:18:03 +03:00
Evgeny	071386d552	Fixed DLL requiring in non-full version	2017-01-26 12:27:39 +03:00
Evgeny Shulgin	2229a51b66	Added FFMPEG 3.0 what compatible with XP (#610 )	2017-01-07 11:52:13 -08:00
cfsmp3	57ef958250	Corrected header directories in non-full versions.	2017-01-01 22:12:35 +01:00
Evgeny	2942e84a6f	Solved the Windows dependency hell	2017-01-01 21:34:43 +03:00
Evgeny	fe9cd61d1d	Fixed bad tesseract library	2016-12-24 22:19:23 +03:00
Evgeny	08b2bcb88b	Added dependencies .dll-s and copy command	2016-12-24 18:26:06 +03:00
Evgeny	0befb3c5b1	Renamed tess version from 3.05 to 4.00	2016-12-24 10:59:52 +03:00
Evgeny	331a64e387	Added working tesseract 4.00	2016-12-23 18:01:12 +03:00
Evgeny	4c78e47404	Fixed mess in the filters	2016-12-22 18:37:02 +03:00
Evgeny	4b80441164	Renamed OCR to Full and copy ffmpeg DLLs to folder	2016-12-22 18:19:42 +03:00
cfsmp3	e2cc2f9fd7	ImageHasSafeExceptionHandlers>false	2016-12-22 09:21:24 +01:00
cfsmp3	b669733bd8	Added pre-build.bat to Release-OCR	2016-12-22 08:54:59 +01:00
Evgeny	802360b008	Ported HARDSUBX to Windows	2016-12-18 19:42:23 +03:00
Carlos Fernandez	7acb3c3874	Version bump (to 0.84). Rename target name of the Windows OCR binaries.	2016-12-16 10:41:02 -08:00
Carlos Fernandez	05da03a259	Changed dependency for OCR in release version - use non-debug version of tesseract	2016-12-15 10:24:34 -08:00
Carlos Fernandez	7aaa1e3edb	Corrected timing in Itunes Added list of changes to CHANGES.TXT	2016-12-13 17:39:05 -08:00
canihavesomecoffee	7b55f61396	Remove hardcoded references in project file, add relative ones instead	2016-12-10 08:38:58 +01:00
Carlos Fernandez	6dc941d4e6	Changed platform target to v120_xp, fixed some missing dirs.	2016-12-09 14:02:10 -08:00
AlexBratosin2001	ce15155956	Updated GPAC library to v0.6.2 (#500 ) Replaced GPAC.	2016-12-09 13:47:54 -08:00
Carlos Fernandez	d453d9327e	Minor changes IN README.md	2016-12-05 12:44:57 -08:00
canihavesomecoffee	814eaab300	Add utf8proc folder to the include directories Regular debug & release have a missing folder	2016-11-29 22:38:04 +01:00
Carlos Fernandez	6f2becc42e	Fixed OCR libraries dependencies for the release version in Windows.	2016-10-13 11:50:35 -07:00
Carlos Fernandez	17dd6696df	Initial libcurl integration work, linux only. Just groundwork, lots of dummy things yet.	2016-09-26 13:36:04 -07:00
Carlos Fernandez	4101fe3880	Fixes #425 - the 708 decoder needs access the encoder. Reference was missing for .bin.	2016-09-20 16:04:33 -07:00
Carlos Fernandez	b00f8e75f6	Added dvb_subtitle_decoder.c to the project	2016-08-22 16:17:17 -07:00
Carlos Fernandez	358b8ef579	Initial backport of Oleg Kisselef's WITH_SHARING options. Most likely it breaks stuff.	2016-08-17 17:40:11 -07:00
Carlos Fernandez	c4073d1813	leptonica/tesseract version upgrade in release build (VS)	2016-08-16 10:33:17 -07:00
Carlos Fernandez	676539cf8c	Updated Tesseract and leptonica versions, included the files in the repo because there's a royal pain to find and/or build.	2016-08-15 16:15:50 -07:00
canihavesomecoffee	9f4bff884f	Update build script for windows -	2016-07-05 18:01:47 +02:00
canihavesomecoffee	b8eec82f2a	Update file to copy necessary DLL to output folder Updates the project file to copy the two DLL's after compiling, so that we can run from that directory.	2016-06-08 02:50:23 +02:00
canihavesomecoffee	04be7be06b	Add OCR build support Adds OCR build support by creating two new build configs (one debug, one release) and some instructions about what VS expects on those configs.	2016-06-08 02:25:13 +02:00
canihavesomecoffee	05e451d41e	Rename ccextractor to ccextractorwin for compilation	2016-06-04 20:40:34 +02:00
Carlos Fernandez	0b2e12ce0c	Changed target to XP	2016-05-27 10:45:27 -07:00
canihavesomecoffee	a0787e740e	Add windows pre-build event Updates the .h file that contains the build date & git commit hash (if available)	2016-05-22 08:13:53 +02:00
canihavesomecoffee	f694c95510	Add hashing library and update makefiles - Adds a open-source hashing lib for SHA2 (SHA-256, SHA-386, SHA-512) from http://www.aarongifford.com/computers/sha.html, with some small modifications to make it work unders windows too - Updates build commands to reflect this change	2016-05-22 03:49:31 +02:00
canihavesomecoffee	046fd4c435	Fix windows build Fixes windows build by adding zvbi folder to includes	2016-03-21 23:08:21 +01:00
Willem	d3862ba88b	Fix path -	2016-03-05 13:19:20 +01:00
Anshul Maheshwari	0023c6545b	make code windows compatible Signed-off-by: Anshul Maheshwari <er.anshul.maheshwari@gmail.com>	2016-02-17 21:13:09 +05:30
Anshul Maheshwari	d2d7a17f3b	strtok_r for windows Signed-off-by: Anshul Maheshwari <er.anshul.maheshwari@gmail.com>	2015-10-05 12:58:09 +05:30
Anshul Maheshwari	cc0ee507dd	remove some vs warning Signed-off-by: Anshul Maheshwari <er.anshul.maheshwari@gmail.com>	2015-09-30 13:37:27 +05:30
wforums	6e9a30b354	fix solution -	2015-09-24 00:08:19 +02:00
wforums	37091708b7	Merge remote-tracking branch 'CCExtractor/master' Conflicts: windows/ccextractor.vcxproj.filters	2015-09-24 00:05:46 +02:00
wforums	0885aae79c	Updating project files -	2015-09-24 00:03:54 +02:00
Anshul Maheshwari	ad5b917f3b	Compile code in vs2013 Signed-off-by: Anshul Maheshwari <er.anshul.maheshwari@gmail.com>	2015-09-15 15:04:59 +05:30
Anshul Maheshwari	57eb42c7bb	Compile code using Visual studio Signed-off-by: Anshul Maheshwari <er.anshul.maheshwari@gmail.com>	2015-08-18 12:19:33 +05:30
wforums	74ad11b44f	Release project build fix Fixed release project for VS. New subfolders weren't included.	2015-06-04 00:34:08 +02:00
wforums	051a6f1f67	git ignore update Updated gitignore with some more VS project files.	2015-06-04 00:31:29 +02:00

1 2

78 Commits