ccextractor/api/python_srt_generator.py

153 lines
4.8 KiB
Python
Raw Normal View History

Python bindings with extraction of CE608 grid and writing to a SRT output. (#768) * added python_extract to encoders_srt and the captions are being extracted in needed format. Search for an alternative to asprintf * Checking if the alternative to asprintf generate proper srts * CC captions accessible via python script * Removing python caption code from __wrap_write function * removing old cc_to_python functions * Removing python_subs structure and all the changes done for that struct * Removing filename functions from ccextractor.* * Renaming make_message to time_wrapper * Applying to python_extract codebase: SSA format * Added python_extract_time_based and done validation for ssa * pplying python_extract_time_based: Done validation for srt and webvtt * led attempt for SAMI support of python_extract. Code is commented * Appluing python_extract_time_based: validate support for SMPTETT * Added python_extract_transcript and made changes for time printing. * added show_extracted_captions_wtih_timings function * Added show_extracted_captions_with_timings to python script for testing purpose. * refactored extractors to api directory. commented out show captions in main() * build and build library working for the extractors. * made caption generator work with a 0.1 time sleep. Start refactoring * added asprintf for windows. * file being written in the running directory * Auto -deletion of python temporary file * Python captions printing status set to proper. * termination of tail successful * Writing successful for the sample * Generating unalternating output * adding api_support.py * Adding bld_flags in build_api * Added to build_library * Auto deletion of temporary file on SIGINT * Discussing Seg fault with Izaron * working for python and linux with samples. testing -out=pythonapi with stream * Done adding bitmap support * added -out=pythonapi support for bitmap * Setting the messages_target to 0 for output = pythonapi * Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python. * adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future. * added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608 * Removed overlap of -out=pythonapi by adding -pythonapi and signal_python_api global variable. * added support for seperate c608 grid catching. Need to test the output via python. * added support for seperate printing of text font and color in CE608. Need to make sure that the function is inbuilt. * ADDED ce608 GRID SUPPORT FROM PYTHON need to discuss whether to keep the print_cc_grid function specific to the module or make it user accessible. Mostly it would be better to make it user accessible. * made changes in the call_from_python_api function such that only api_options is needed to be passed. An if statement before the call to g608_extractor has also been added. Waiting for Carlos to comment on the output generated till this stage. * added a signal_python_api check before calling every write function. Thus basic writing output can be avoided. * Commented all calls to python_extract_time_based. making changes to python_extract_g608 to be called only from the point when a g608 caption is detected. * Added pass_cc_buffer_to_python in encoders_common.c temporarily redefined get_*_encoded from static to normal included the above functions in encoders_common.h * Added if-else statement for switch in encode_sub function. This is done mainly for making sure no output is generated in the api call. * Added ccx_encoders_python.c Defined pass_cc_buffer_to_python in ccx_encoders_python.c added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call * Removed __wrap_write from the entire code base. It's declaration and definition are only present in CCExtractor.* * Commented out the /dev/null part in ccx_encoders_common.c. Proceeding further on checking for file generation. * Added output_filename in array global variable and is generated in init_write function. included ccextractor.h in output.c to access global variable signal_python_api for avoiding output generation in init_write and invalid free in dinit_write. * Modified the definition of init_write function for accessing signal_python_api. * Deleted the commented part of /dev/null in ccx_encoders_common.c. * Added target_message=0 in -pythonapi param parsing in param.c to avoid the API from printing to STDOUT. Deleted the commented part of -out=pythonapi. Thinking of adding a different param for silencing the output when the call is made from python api. * Removed __wrap_write from ccextractor.c and ccextractor.h. * Added ccx_to_python_g608 and modified api_support.py file. added documentation in ccextractor.c. * added the generate srt script. However, some random characters are coming in first line. Need to talk about this. * Added SRT generator for python. Using string to remove the garbage value. Add code for srt counter and also the start_time and end_time conversion. * removed the trash characters and added code to print the timings. However, the last blank frame also results in a print. Need to take care of this. * rectified the mistake of writing only timings and not captions. now next step is to just make the timings print properly * some minor changes before diving into extracting srt_counter from the made codebase * Added extraction of srt_counter in python_extract via fflush srt_counter-value. Need to modify the processing in python. * Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt * Processing into a srt working properly. Next step is to add the information of font into the caption text. * the data is getting generated for proper SRT counters. * A turning point to the appraoch. Added END OF FRAME line for printing the data for every particular srt_counter. Proceeding further with the generation of srt by data manipulation. * some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Added fflush and cleaned up the python code of srt generation * Added <i> tag for italics. Proceeding further with other types. * Added the code to check for underline. However, need to check how CCExtractor generates srt when both italics and underline are present. For now a new line is added if both are present. * Shifting for making changes in th i/O work. * Stable ouput for samples with italics is being generated. * Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function. * build script for linux is working correctly. Build_library is showing error of invalid def of set_pythonapi. Moreover, extractor has some memory seg fault. * Added mod to set a MACRO as my_python_api to set the callback function. Till now all calls to the reporter are commented. Working on getting the reporter to print the lines. * Changes have been implemented to bring reporter in working state. For now a constant string is passed from extractor. Need to make the proper parsing possible. * Changed the code in extractor such that entire grid is returned to the callback function. Need to provide this grid to the write function and also cleanup the codebase. * Writing the outputted srt in a file called "temp.srt". Need to modify init_write to push filename that is to be created in python using callback. * Added code to get start and end time simultaneously. entire SRT is getting generated. * removed ccx_python_encoders.c * Compiling and executing on Windows * Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c. Also, deleted the static definition of get_font_encoded from ccx_encoders_webvtt.c * added a write statement in write_cc_bitmap_as_srt * Rectified transfer of get_line_encoded, get_color_encoded and get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 15:54:35 +00:00
import ccextractor as cc
import re
"""
#Handling underline
buff = ""
underline_flag = 0
for i,font_type in enumerate(font_line):
if font_type == 'U' and not underline_flag:
buff = buff + '<u> '
underline_flag = 1
underline=1
elif font_type =="R" and underline_flag:
buff = buff + '</u>'
underline_flag = 0
continue;
buff += letter[i]
#adding a new line after buff has seen underline
#need to cross check with CCExtractor output as to how they are doing
if underline:
buff+= "\n"
else:
buff=""
"""
Cleaning up the codebase and additional changes in Python SRT generator. (#771) * Removed all extractors except the grid extractor. Removed the call to transcript extractor in ccx_encoders_transcript.c * Removed unnecessary array appening statements in python_grid_extractor. WIP: switch in extractor. * Added switch in g608 grid extractor. * Deleted comments from wrappers. * Refactored code in ccextractor.c and .h files. Removed all the commented part. Made proper changes according to the coding conventions. * Removed calls to extractor from all the encoders. The only call made to extractor is from ccx_encoders_python.c. * Removed a comment from wrapper.c. In init_write function of output.c added a call to free the output string returned by asprintf in case of sending filename to callback function. * Added calls to free the char* which is malloced by asprintf in extractor.c WIP: Free the global variable elements. * Sample testing correctly for italics tag. Also added a hack to print only 32 characters when unicode fails. WIP: Font tag. * Added support for handling font and italics in Python SRT generator. * modified the font generator. Also, added count method for checking blank strings in python_srt_generator. * Added free statements for avoiding memory leaks. * added return code for failure of asprintf calls. * Removing unnecessary code from api_testing.py * Made modifications to Makefile and build script. * Added recursive_tester.py Autoconf builds successfully. * BUG: Made change to get_line_encoded to encode the last \0 character in a line. Otherwise the EOL characted is absent causing garbage value to be present in SRT. * Exporting the encoding of the captions from CCExtractor to Python so that the python SRT generator can generate proper SRT files. * Modified the include statement in extractor.h
2017-08-25 18:03:00 +00:00
encodings_map = {
'0':'unicode',
'1':'latin1',
'2':'utf-8',
'3':'ascii',
}
Python bindings with extraction of CE608 grid and writing to a SRT output. (#768) * added python_extract to encoders_srt and the captions are being extracted in needed format. Search for an alternative to asprintf * Checking if the alternative to asprintf generate proper srts * CC captions accessible via python script * Removing python caption code from __wrap_write function * removing old cc_to_python functions * Removing python_subs structure and all the changes done for that struct * Removing filename functions from ccextractor.* * Renaming make_message to time_wrapper * Applying to python_extract codebase: SSA format * Added python_extract_time_based and done validation for ssa * pplying python_extract_time_based: Done validation for srt and webvtt * led attempt for SAMI support of python_extract. Code is commented * Appluing python_extract_time_based: validate support for SMPTETT * Added python_extract_transcript and made changes for time printing. * added show_extracted_captions_wtih_timings function * Added show_extracted_captions_with_timings to python script for testing purpose. * refactored extractors to api directory. commented out show captions in main() * build and build library working for the extractors. * made caption generator work with a 0.1 time sleep. Start refactoring * added asprintf for windows. * file being written in the running directory * Auto -deletion of python temporary file * Python captions printing status set to proper. * termination of tail successful * Writing successful for the sample * Generating unalternating output * adding api_support.py * Adding bld_flags in build_api * Added to build_library * Auto deletion of temporary file on SIGINT * Discussing Seg fault with Izaron * working for python and linux with samples. testing -out=pythonapi with stream * Done adding bitmap support * added -out=pythonapi support for bitmap * Setting the messages_target to 0 for output = pythonapi * Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python. * adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future. * added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608 * Removed overlap of -out=pythonapi by adding -pythonapi and signal_python_api global variable. * added support for seperate c608 grid catching. Need to test the output via python. * added support for seperate printing of text font and color in CE608. Need to make sure that the function is inbuilt. * ADDED ce608 GRID SUPPORT FROM PYTHON need to discuss whether to keep the print_cc_grid function specific to the module or make it user accessible. Mostly it would be better to make it user accessible. * made changes in the call_from_python_api function such that only api_options is needed to be passed. An if statement before the call to g608_extractor has also been added. Waiting for Carlos to comment on the output generated till this stage. * added a signal_python_api check before calling every write function. Thus basic writing output can be avoided. * Commented all calls to python_extract_time_based. making changes to python_extract_g608 to be called only from the point when a g608 caption is detected. * Added pass_cc_buffer_to_python in encoders_common.c temporarily redefined get_*_encoded from static to normal included the above functions in encoders_common.h * Added if-else statement for switch in encode_sub function. This is done mainly for making sure no output is generated in the api call. * Added ccx_encoders_python.c Defined pass_cc_buffer_to_python in ccx_encoders_python.c added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call * Removed __wrap_write from the entire code base. It's declaration and definition are only present in CCExtractor.* * Commented out the /dev/null part in ccx_encoders_common.c. Proceeding further on checking for file generation. * Added output_filename in array global variable and is generated in init_write function. included ccextractor.h in output.c to access global variable signal_python_api for avoiding output generation in init_write and invalid free in dinit_write. * Modified the definition of init_write function for accessing signal_python_api. * Deleted the commented part of /dev/null in ccx_encoders_common.c. * Added target_message=0 in -pythonapi param parsing in param.c to avoid the API from printing to STDOUT. Deleted the commented part of -out=pythonapi. Thinking of adding a different param for silencing the output when the call is made from python api. * Removed __wrap_write from ccextractor.c and ccextractor.h. * Added ccx_to_python_g608 and modified api_support.py file. added documentation in ccextractor.c. * added the generate srt script. However, some random characters are coming in first line. Need to talk about this. * Added SRT generator for python. Using string to remove the garbage value. Add code for srt counter and also the start_time and end_time conversion. * removed the trash characters and added code to print the timings. However, the last blank frame also results in a print. Need to take care of this. * rectified the mistake of writing only timings and not captions. now next step is to just make the timings print properly * some minor changes before diving into extracting srt_counter from the made codebase * Added extraction of srt_counter in python_extract via fflush srt_counter-value. Need to modify the processing in python. * Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt * Processing into a srt working properly. Next step is to add the information of font into the caption text. * the data is getting generated for proper SRT counters. * A turning point to the appraoch. Added END OF FRAME line for printing the data for every particular srt_counter. Proceeding further with the generation of srt by data manipulation. * some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Added fflush and cleaned up the python code of srt generation * Added <i> tag for italics. Proceeding further with other types. * Added the code to check for underline. However, need to check how CCExtractor generates srt when both italics and underline are present. For now a new line is added if both are present. * Shifting for making changes in th i/O work. * Stable ouput for samples with italics is being generated. * Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function. * build script for linux is working correctly. Build_library is showing error of invalid def of set_pythonapi. Moreover, extractor has some memory seg fault. * Added mod to set a MACRO as my_python_api to set the callback function. Till now all calls to the reporter are commented. Working on getting the reporter to print the lines. * Changes have been implemented to bring reporter in working state. For now a constant string is passed from extractor. Need to make the proper parsing possible. * Changed the code in extractor such that entire grid is returned to the callback function. Need to provide this grid to the write function and also cleanup the codebase. * Writing the outputted srt in a file called "temp.srt". Need to modify init_write to push filename that is to be created in python using callback. * Added code to get start and end time simultaneously. entire SRT is getting generated. * removed ccx_python_encoders.c * Compiling and executing on Windows * Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c. Also, deleted the static definition of get_font_encoded from ccx_encoders_webvtt.c * added a write statement in write_cc_bitmap_as_srt * Rectified transfer of get_line_encoded, get_color_encoded and get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 15:54:35 +00:00
Cleaning up the codebase and additional changes in Python SRT generator. (#771) * Removed all extractors except the grid extractor. Removed the call to transcript extractor in ccx_encoders_transcript.c * Removed unnecessary array appening statements in python_grid_extractor. WIP: switch in extractor. * Added switch in g608 grid extractor. * Deleted comments from wrappers. * Refactored code in ccextractor.c and .h files. Removed all the commented part. Made proper changes according to the coding conventions. * Removed calls to extractor from all the encoders. The only call made to extractor is from ccx_encoders_python.c. * Removed a comment from wrapper.c. In init_write function of output.c added a call to free the output string returned by asprintf in case of sending filename to callback function. * Added calls to free the char* which is malloced by asprintf in extractor.c WIP: Free the global variable elements. * Sample testing correctly for italics tag. Also added a hack to print only 32 characters when unicode fails. WIP: Font tag. * Added support for handling font and italics in Python SRT generator. * modified the font generator. Also, added count method for checking blank strings in python_srt_generator. * Added free statements for avoiding memory leaks. * added return code for failure of asprintf calls. * Removing unnecessary code from api_testing.py * Made modifications to Makefile and build script. * Added recursive_tester.py Autoconf builds successfully. * BUG: Made change to get_line_encoded to encode the last \0 character in a line. Otherwise the EOL characted is absent causing garbage value to be present in SRT. * Exporting the encoding of the captions from CCExtractor to Python so that the python SRT generator can generate proper SRT files. * Modified the include statement in extractor.h
2017-08-25 18:03:00 +00:00
color_text_start={
"0":"",
"1":"<font color=\"#00ff00\">",
"2":"<font color=\"#0000ff\">",
"3":"<font color=\"#00ffff\">",
"4":"<font color=\"#ff0000\">",
"5":"<font color=\"#ffff00\">",
"6":"<font color=\"#ff00ff\">",
"7":"<font color=\"",
"8":"",
"9":""
};
color_text_end={
"0":"",
"1":"</font",
"2":"</font>",
"3":"</font>",
"4":"</font>",
"5":"</font>",
"6":"</font>",
"7":"</font>",
"8":"",
"9":""
};
no_color_tag = ['0','8','9']
def comparing_text_font_grids(text, font, color):
original_text = text
original_color = color
temp_color = []
for letter,color_line in zip(original_text,color):
color = 0
prev = color_line[0]
buff = color_text_start[str(prev)]
if prev not in no_color_tag:
color_flag = 1
else:
color_flag = 0
if letter.count(" ")<32:
for i,color_type in enumerate(color_line):
if color_type not in no_color_tag and prev!=color_type and not color_flag:
color = 1
buff = buff + color_text_start[str(color_type)]
color_flag = 1
elif prev!=color_type and color_flag:
color = 1
buff = buff + color_text_end[str(prev)]
color_flag = 0
buff += letter[i]
prev=color_type
if color_flag:
color_flag=0
buff+=color_text_end[str(prev)]
if color:
temp_color.append((buff,1))
else:
temp_color.append((letter,0))
temp_font_italics=[]
for letter,font_line in zip(original_text,font):
if letter.count(" ")<32:
Python bindings with extraction of CE608 grid and writing to a SRT output. (#768) * added python_extract to encoders_srt and the captions are being extracted in needed format. Search for an alternative to asprintf * Checking if the alternative to asprintf generate proper srts * CC captions accessible via python script * Removing python caption code from __wrap_write function * removing old cc_to_python functions * Removing python_subs structure and all the changes done for that struct * Removing filename functions from ccextractor.* * Renaming make_message to time_wrapper * Applying to python_extract codebase: SSA format * Added python_extract_time_based and done validation for ssa * pplying python_extract_time_based: Done validation for srt and webvtt * led attempt for SAMI support of python_extract. Code is commented * Appluing python_extract_time_based: validate support for SMPTETT * Added python_extract_transcript and made changes for time printing. * added show_extracted_captions_wtih_timings function * Added show_extracted_captions_with_timings to python script for testing purpose. * refactored extractors to api directory. commented out show captions in main() * build and build library working for the extractors. * made caption generator work with a 0.1 time sleep. Start refactoring * added asprintf for windows. * file being written in the running directory * Auto -deletion of python temporary file * Python captions printing status set to proper. * termination of tail successful * Writing successful for the sample * Generating unalternating output * adding api_support.py * Adding bld_flags in build_api * Added to build_library * Auto deletion of temporary file on SIGINT * Discussing Seg fault with Izaron * working for python and linux with samples. testing -out=pythonapi with stream * Done adding bitmap support * added -out=pythonapi support for bitmap * Setting the messages_target to 0 for output = pythonapi * Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python. * adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future. * added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608 * Removed overlap of -out=pythonapi by adding -pythonapi and signal_python_api global variable. * added support for seperate c608 grid catching. Need to test the output via python. * added support for seperate printing of text font and color in CE608. Need to make sure that the function is inbuilt. * ADDED ce608 GRID SUPPORT FROM PYTHON need to discuss whether to keep the print_cc_grid function specific to the module or make it user accessible. Mostly it would be better to make it user accessible. * made changes in the call_from_python_api function such that only api_options is needed to be passed. An if statement before the call to g608_extractor has also been added. Waiting for Carlos to comment on the output generated till this stage. * added a signal_python_api check before calling every write function. Thus basic writing output can be avoided. * Commented all calls to python_extract_time_based. making changes to python_extract_g608 to be called only from the point when a g608 caption is detected. * Added pass_cc_buffer_to_python in encoders_common.c temporarily redefined get_*_encoded from static to normal included the above functions in encoders_common.h * Added if-else statement for switch in encode_sub function. This is done mainly for making sure no output is generated in the api call. * Added ccx_encoders_python.c Defined pass_cc_buffer_to_python in ccx_encoders_python.c added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call * Removed __wrap_write from the entire code base. It's declaration and definition are only present in CCExtractor.* * Commented out the /dev/null part in ccx_encoders_common.c. Proceeding further on checking for file generation. * Added output_filename in array global variable and is generated in init_write function. included ccextractor.h in output.c to access global variable signal_python_api for avoiding output generation in init_write and invalid free in dinit_write. * Modified the definition of init_write function for accessing signal_python_api. * Deleted the commented part of /dev/null in ccx_encoders_common.c. * Added target_message=0 in -pythonapi param parsing in param.c to avoid the API from printing to STDOUT. Deleted the commented part of -out=pythonapi. Thinking of adding a different param for silencing the output when the call is made from python api. * Removed __wrap_write from ccextractor.c and ccextractor.h. * Added ccx_to_python_g608 and modified api_support.py file. added documentation in ccextractor.c. * added the generate srt script. However, some random characters are coming in first line. Need to talk about this. * Added SRT generator for python. Using string to remove the garbage value. Add code for srt counter and also the start_time and end_time conversion. * removed the trash characters and added code to print the timings. However, the last blank frame also results in a print. Need to take care of this. * rectified the mistake of writing only timings and not captions. now next step is to just make the timings print properly * some minor changes before diving into extracting srt_counter from the made codebase * Added extraction of srt_counter in python_extract via fflush srt_counter-value. Need to modify the processing in python. * Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt * Processing into a srt working properly. Next step is to add the information of font into the caption text. * the data is getting generated for proper SRT counters. * A turning point to the appraoch. Added END OF FRAME line for printing the data for every particular srt_counter. Proceeding further with the generation of srt by data manipulation. * some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Added fflush and cleaned up the python code of srt generation * Added <i> tag for italics. Proceeding further with other types. * Added the code to check for underline. However, need to check how CCExtractor generates srt when both italics and underline are present. For now a new line is added if both are present. * Shifting for making changes in th i/O work. * Stable ouput for samples with italics is being generated. * Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function. * build script for linux is working correctly. Build_library is showing error of invalid def of set_pythonapi. Moreover, extractor has some memory seg fault. * Added mod to set a MACRO as my_python_api to set the callback function. Till now all calls to the reporter are commented. Working on getting the reporter to print the lines. * Changes have been implemented to bring reporter in working state. For now a constant string is passed from extractor. Need to make the proper parsing possible. * Changed the code in extractor such that entire grid is returned to the callback function. Need to provide this grid to the write function and also cleanup the codebase. * Writing the outputted srt in a file called "temp.srt". Need to modify init_write to push filename that is to be created in python using callback. * Added code to get start and end time simultaneously. entire SRT is getting generated. * removed ccx_python_encoders.c * Compiling and executing on Windows * Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c. Also, deleted the static definition of get_font_encoded from ccx_encoders_webvtt.c * added a write statement in write_cc_bitmap_as_srt * Rectified transfer of get_line_encoded, get_color_encoded and get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 15:54:35 +00:00
buff=""
underline,italics = 0,0
#Handling italics
italics_flag = 0
for i,font_type in enumerate(font_line):
if font_type == 'I' and not italics_flag:
Cleaning up the codebase and additional changes in Python SRT generator. (#771) * Removed all extractors except the grid extractor. Removed the call to transcript extractor in ccx_encoders_transcript.c * Removed unnecessary array appening statements in python_grid_extractor. WIP: switch in extractor. * Added switch in g608 grid extractor. * Deleted comments from wrappers. * Refactored code in ccextractor.c and .h files. Removed all the commented part. Made proper changes according to the coding conventions. * Removed calls to extractor from all the encoders. The only call made to extractor is from ccx_encoders_python.c. * Removed a comment from wrapper.c. In init_write function of output.c added a call to free the output string returned by asprintf in case of sending filename to callback function. * Added calls to free the char* which is malloced by asprintf in extractor.c WIP: Free the global variable elements. * Sample testing correctly for italics tag. Also added a hack to print only 32 characters when unicode fails. WIP: Font tag. * Added support for handling font and italics in Python SRT generator. * modified the font generator. Also, added count method for checking blank strings in python_srt_generator. * Added free statements for avoiding memory leaks. * added return code for failure of asprintf calls. * Removing unnecessary code from api_testing.py * Made modifications to Makefile and build script. * Added recursive_tester.py Autoconf builds successfully. * BUG: Made change to get_line_encoded to encode the last \0 character in a line. Otherwise the EOL characted is absent causing garbage value to be present in SRT. * Exporting the encoding of the captions from CCExtractor to Python so that the python SRT generator can generate proper SRT files. * Modified the include statement in extractor.h
2017-08-25 18:03:00 +00:00
italics=1
Python bindings with extraction of CE608 grid and writing to a SRT output. (#768) * added python_extract to encoders_srt and the captions are being extracted in needed format. Search for an alternative to asprintf * Checking if the alternative to asprintf generate proper srts * CC captions accessible via python script * Removing python caption code from __wrap_write function * removing old cc_to_python functions * Removing python_subs structure and all the changes done for that struct * Removing filename functions from ccextractor.* * Renaming make_message to time_wrapper * Applying to python_extract codebase: SSA format * Added python_extract_time_based and done validation for ssa * pplying python_extract_time_based: Done validation for srt and webvtt * led attempt for SAMI support of python_extract. Code is commented * Appluing python_extract_time_based: validate support for SMPTETT * Added python_extract_transcript and made changes for time printing. * added show_extracted_captions_wtih_timings function * Added show_extracted_captions_with_timings to python script for testing purpose. * refactored extractors to api directory. commented out show captions in main() * build and build library working for the extractors. * made caption generator work with a 0.1 time sleep. Start refactoring * added asprintf for windows. * file being written in the running directory * Auto -deletion of python temporary file * Python captions printing status set to proper. * termination of tail successful * Writing successful for the sample * Generating unalternating output * adding api_support.py * Adding bld_flags in build_api * Added to build_library * Auto deletion of temporary file on SIGINT * Discussing Seg fault with Izaron * working for python and linux with samples. testing -out=pythonapi with stream * Done adding bitmap support * added -out=pythonapi support for bitmap * Setting the messages_target to 0 for output = pythonapi * Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python. * adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future. * added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608 * Removed overlap of -out=pythonapi by adding -pythonapi and signal_python_api global variable. * added support for seperate c608 grid catching. Need to test the output via python. * added support for seperate printing of text font and color in CE608. Need to make sure that the function is inbuilt. * ADDED ce608 GRID SUPPORT FROM PYTHON need to discuss whether to keep the print_cc_grid function specific to the module or make it user accessible. Mostly it would be better to make it user accessible. * made changes in the call_from_python_api function such that only api_options is needed to be passed. An if statement before the call to g608_extractor has also been added. Waiting for Carlos to comment on the output generated till this stage. * added a signal_python_api check before calling every write function. Thus basic writing output can be avoided. * Commented all calls to python_extract_time_based. making changes to python_extract_g608 to be called only from the point when a g608 caption is detected. * Added pass_cc_buffer_to_python in encoders_common.c temporarily redefined get_*_encoded from static to normal included the above functions in encoders_common.h * Added if-else statement for switch in encode_sub function. This is done mainly for making sure no output is generated in the api call. * Added ccx_encoders_python.c Defined pass_cc_buffer_to_python in ccx_encoders_python.c added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call * Removed __wrap_write from the entire code base. It's declaration and definition are only present in CCExtractor.* * Commented out the /dev/null part in ccx_encoders_common.c. Proceeding further on checking for file generation. * Added output_filename in array global variable and is generated in init_write function. included ccextractor.h in output.c to access global variable signal_python_api for avoiding output generation in init_write and invalid free in dinit_write. * Modified the definition of init_write function for accessing signal_python_api. * Deleted the commented part of /dev/null in ccx_encoders_common.c. * Added target_message=0 in -pythonapi param parsing in param.c to avoid the API from printing to STDOUT. Deleted the commented part of -out=pythonapi. Thinking of adding a different param for silencing the output when the call is made from python api. * Removed __wrap_write from ccextractor.c and ccextractor.h. * Added ccx_to_python_g608 and modified api_support.py file. added documentation in ccextractor.c. * added the generate srt script. However, some random characters are coming in first line. Need to talk about this. * Added SRT generator for python. Using string to remove the garbage value. Add code for srt counter and also the start_time and end_time conversion. * removed the trash characters and added code to print the timings. However, the last blank frame also results in a print. Need to take care of this. * rectified the mistake of writing only timings and not captions. now next step is to just make the timings print properly * some minor changes before diving into extracting srt_counter from the made codebase * Added extraction of srt_counter in python_extract via fflush srt_counter-value. Need to modify the processing in python. * Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt * Processing into a srt working properly. Next step is to add the information of font into the caption text. * the data is getting generated for proper SRT counters. * A turning point to the appraoch. Added END OF FRAME line for printing the data for every particular srt_counter. Proceeding further with the generation of srt by data manipulation. * some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Added fflush and cleaned up the python code of srt generation * Added <i> tag for italics. Proceeding further with other types. * Added the code to check for underline. However, need to check how CCExtractor generates srt when both italics and underline are present. For now a new line is added if both are present. * Shifting for making changes in th i/O work. * Stable ouput for samples with italics is being generated. * Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function. * build script for linux is working correctly. Build_library is showing error of invalid def of set_pythonapi. Moreover, extractor has some memory seg fault. * Added mod to set a MACRO as my_python_api to set the callback function. Till now all calls to the reporter are commented. Working on getting the reporter to print the lines. * Changes have been implemented to bring reporter in working state. For now a constant string is passed from extractor. Need to make the proper parsing possible. * Changed the code in extractor such that entire grid is returned to the callback function. Need to provide this grid to the write function and also cleanup the codebase. * Writing the outputted srt in a file called "temp.srt". Need to modify init_write to push filename that is to be created in python using callback. * Added code to get start and end time simultaneously. entire SRT is getting generated. * removed ccx_python_encoders.c * Compiling and executing on Windows * Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c. Also, deleted the static definition of get_font_encoded from ccx_encoders_webvtt.c * added a write statement in write_cc_bitmap_as_srt * Rectified transfer of get_line_encoded, get_color_encoded and get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 15:54:35 +00:00
buff = buff + '<i>'
italics_flag = 1
elif font_type =="R" and italics_flag:
Cleaning up the codebase and additional changes in Python SRT generator. (#771) * Removed all extractors except the grid extractor. Removed the call to transcript extractor in ccx_encoders_transcript.c * Removed unnecessary array appening statements in python_grid_extractor. WIP: switch in extractor. * Added switch in g608 grid extractor. * Deleted comments from wrappers. * Refactored code in ccextractor.c and .h files. Removed all the commented part. Made proper changes according to the coding conventions. * Removed calls to extractor from all the encoders. The only call made to extractor is from ccx_encoders_python.c. * Removed a comment from wrapper.c. In init_write function of output.c added a call to free the output string returned by asprintf in case of sending filename to callback function. * Added calls to free the char* which is malloced by asprintf in extractor.c WIP: Free the global variable elements. * Sample testing correctly for italics tag. Also added a hack to print only 32 characters when unicode fails. WIP: Font tag. * Added support for handling font and italics in Python SRT generator. * modified the font generator. Also, added count method for checking blank strings in python_srt_generator. * Added free statements for avoiding memory leaks. * added return code for failure of asprintf calls. * Removing unnecessary code from api_testing.py * Made modifications to Makefile and build script. * Added recursive_tester.py Autoconf builds successfully. * BUG: Made change to get_line_encoded to encode the last \0 character in a line. Otherwise the EOL characted is absent causing garbage value to be present in SRT. * Exporting the encoding of the captions from CCExtractor to Python so that the python SRT generator can generate proper SRT files. * Modified the include statement in extractor.h
2017-08-25 18:03:00 +00:00
italics=1
Python bindings with extraction of CE608 grid and writing to a SRT output. (#768) * added python_extract to encoders_srt and the captions are being extracted in needed format. Search for an alternative to asprintf * Checking if the alternative to asprintf generate proper srts * CC captions accessible via python script * Removing python caption code from __wrap_write function * removing old cc_to_python functions * Removing python_subs structure and all the changes done for that struct * Removing filename functions from ccextractor.* * Renaming make_message to time_wrapper * Applying to python_extract codebase: SSA format * Added python_extract_time_based and done validation for ssa * pplying python_extract_time_based: Done validation for srt and webvtt * led attempt for SAMI support of python_extract. Code is commented * Appluing python_extract_time_based: validate support for SMPTETT * Added python_extract_transcript and made changes for time printing. * added show_extracted_captions_wtih_timings function * Added show_extracted_captions_with_timings to python script for testing purpose. * refactored extractors to api directory. commented out show captions in main() * build and build library working for the extractors. * made caption generator work with a 0.1 time sleep. Start refactoring * added asprintf for windows. * file being written in the running directory * Auto -deletion of python temporary file * Python captions printing status set to proper. * termination of tail successful * Writing successful for the sample * Generating unalternating output * adding api_support.py * Adding bld_flags in build_api * Added to build_library * Auto deletion of temporary file on SIGINT * Discussing Seg fault with Izaron * working for python and linux with samples. testing -out=pythonapi with stream * Done adding bitmap support * added -out=pythonapi support for bitmap * Setting the messages_target to 0 for output = pythonapi * Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python. * adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future. * added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608 * Removed overlap of -out=pythonapi by adding -pythonapi and signal_python_api global variable. * added support for seperate c608 grid catching. Need to test the output via python. * added support for seperate printing of text font and color in CE608. Need to make sure that the function is inbuilt. * ADDED ce608 GRID SUPPORT FROM PYTHON need to discuss whether to keep the print_cc_grid function specific to the module or make it user accessible. Mostly it would be better to make it user accessible. * made changes in the call_from_python_api function such that only api_options is needed to be passed. An if statement before the call to g608_extractor has also been added. Waiting for Carlos to comment on the output generated till this stage. * added a signal_python_api check before calling every write function. Thus basic writing output can be avoided. * Commented all calls to python_extract_time_based. making changes to python_extract_g608 to be called only from the point when a g608 caption is detected. * Added pass_cc_buffer_to_python in encoders_common.c temporarily redefined get_*_encoded from static to normal included the above functions in encoders_common.h * Added if-else statement for switch in encode_sub function. This is done mainly for making sure no output is generated in the api call. * Added ccx_encoders_python.c Defined pass_cc_buffer_to_python in ccx_encoders_python.c added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call * Removed __wrap_write from the entire code base. It's declaration and definition are only present in CCExtractor.* * Commented out the /dev/null part in ccx_encoders_common.c. Proceeding further on checking for file generation. * Added output_filename in array global variable and is generated in init_write function. included ccextractor.h in output.c to access global variable signal_python_api for avoiding output generation in init_write and invalid free in dinit_write. * Modified the definition of init_write function for accessing signal_python_api. * Deleted the commented part of /dev/null in ccx_encoders_common.c. * Added target_message=0 in -pythonapi param parsing in param.c to avoid the API from printing to STDOUT. Deleted the commented part of -out=pythonapi. Thinking of adding a different param for silencing the output when the call is made from python api. * Removed __wrap_write from ccextractor.c and ccextractor.h. * Added ccx_to_python_g608 and modified api_support.py file. added documentation in ccextractor.c. * added the generate srt script. However, some random characters are coming in first line. Need to talk about this. * Added SRT generator for python. Using string to remove the garbage value. Add code for srt counter and also the start_time and end_time conversion. * removed the trash characters and added code to print the timings. However, the last blank frame also results in a print. Need to take care of this. * rectified the mistake of writing only timings and not captions. now next step is to just make the timings print properly * some minor changes before diving into extracting srt_counter from the made codebase * Added extraction of srt_counter in python_extract via fflush srt_counter-value. Need to modify the processing in python. * Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt * Processing into a srt working properly. Next step is to add the information of font into the caption text. * the data is getting generated for proper SRT counters. * A turning point to the appraoch. Added END OF FRAME line for printing the data for every particular srt_counter. Proceeding further with the generation of srt by data manipulation. * some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Added fflush and cleaned up the python code of srt generation * Added <i> tag for italics. Proceeding further with other types. * Added the code to check for underline. However, need to check how CCExtractor generates srt when both italics and underline are present. For now a new line is added if both are present. * Shifting for making changes in th i/O work. * Stable ouput for samples with italics is being generated. * Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function. * build script for linux is working correctly. Build_library is showing error of invalid def of set_pythonapi. Moreover, extractor has some memory seg fault. * Added mod to set a MACRO as my_python_api to set the callback function. Till now all calls to the reporter are commented. Working on getting the reporter to print the lines. * Changes have been implemented to bring reporter in working state. For now a constant string is passed from extractor. Need to make the proper parsing possible. * Changed the code in extractor such that entire grid is returned to the callback function. Need to provide this grid to the write function and also cleanup the codebase. * Writing the outputted srt in a file called "temp.srt". Need to modify init_write to push filename that is to be created in python using callback. * Added code to get start and end time simultaneously. entire SRT is getting generated. * removed ccx_python_encoders.c * Compiling and executing on Windows * Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c. Also, deleted the static definition of get_font_encoded from ccx_encoders_webvtt.c * added a write statement in write_cc_bitmap_as_srt * Rectified transfer of get_line_encoded, get_color_encoded and get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 15:54:35 +00:00
buff = buff + '</i>'
italics_flag = 0
buff += letter[i]
Cleaning up the codebase and additional changes in Python SRT generator. (#771) * Removed all extractors except the grid extractor. Removed the call to transcript extractor in ccx_encoders_transcript.c * Removed unnecessary array appening statements in python_grid_extractor. WIP: switch in extractor. * Added switch in g608 grid extractor. * Deleted comments from wrappers. * Refactored code in ccextractor.c and .h files. Removed all the commented part. Made proper changes according to the coding conventions. * Removed calls to extractor from all the encoders. The only call made to extractor is from ccx_encoders_python.c. * Removed a comment from wrapper.c. In init_write function of output.c added a call to free the output string returned by asprintf in case of sending filename to callback function. * Added calls to free the char* which is malloced by asprintf in extractor.c WIP: Free the global variable elements. * Sample testing correctly for italics tag. Also added a hack to print only 32 characters when unicode fails. WIP: Font tag. * Added support for handling font and italics in Python SRT generator. * modified the font generator. Also, added count method for checking blank strings in python_srt_generator. * Added free statements for avoiding memory leaks. * added return code for failure of asprintf calls. * Removing unnecessary code from api_testing.py * Made modifications to Makefile and build script. * Added recursive_tester.py Autoconf builds successfully. * BUG: Made change to get_line_encoded to encode the last \0 character in a line. Otherwise the EOL characted is absent causing garbage value to be present in SRT. * Exporting the encoding of the captions from CCExtractor to Python so that the python SRT generator can generate proper SRT files. * Modified the include statement in extractor.h
2017-08-25 18:03:00 +00:00
if italics_flag:
buff+='</i>'
if italics:
temp_font_italics.append((buff,1))
else:
temp_font_italics.append((letter,0))
else:
temp_font_italics.append((letter,0))
final = []
for i,j in zip(temp_color,temp_font_italics):
if i[1] and not j[1]:
final.append(i[0])
elif j[1] and not i[1]:
final.append(j[0])
else:
if not i[1]:
final.append(i[0])
else:
print "error"
return (final,font,color)
Python bindings with extraction of CE608 grid and writing to a SRT output. (#768) * added python_extract to encoders_srt and the captions are being extracted in needed format. Search for an alternative to asprintf * Checking if the alternative to asprintf generate proper srts * CC captions accessible via python script * Removing python caption code from __wrap_write function * removing old cc_to_python functions * Removing python_subs structure and all the changes done for that struct * Removing filename functions from ccextractor.* * Renaming make_message to time_wrapper * Applying to python_extract codebase: SSA format * Added python_extract_time_based and done validation for ssa * pplying python_extract_time_based: Done validation for srt and webvtt * led attempt for SAMI support of python_extract. Code is commented * Appluing python_extract_time_based: validate support for SMPTETT * Added python_extract_transcript and made changes for time printing. * added show_extracted_captions_wtih_timings function * Added show_extracted_captions_with_timings to python script for testing purpose. * refactored extractors to api directory. commented out show captions in main() * build and build library working for the extractors. * made caption generator work with a 0.1 time sleep. Start refactoring * added asprintf for windows. * file being written in the running directory * Auto -deletion of python temporary file * Python captions printing status set to proper. * termination of tail successful * Writing successful for the sample * Generating unalternating output * adding api_support.py * Adding bld_flags in build_api * Added to build_library * Auto deletion of temporary file on SIGINT * Discussing Seg fault with Izaron * working for python and linux with samples. testing -out=pythonapi with stream * Done adding bitmap support * added -out=pythonapi support for bitmap * Setting the messages_target to 0 for output = pythonapi * Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python. * adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future. * added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608 * Removed overlap of -out=pythonapi by adding -pythonapi and signal_python_api global variable. * added support for seperate c608 grid catching. Need to test the output via python. * added support for seperate printing of text font and color in CE608. Need to make sure that the function is inbuilt. * ADDED ce608 GRID SUPPORT FROM PYTHON need to discuss whether to keep the print_cc_grid function specific to the module or make it user accessible. Mostly it would be better to make it user accessible. * made changes in the call_from_python_api function such that only api_options is needed to be passed. An if statement before the call to g608_extractor has also been added. Waiting for Carlos to comment on the output generated till this stage. * added a signal_python_api check before calling every write function. Thus basic writing output can be avoided. * Commented all calls to python_extract_time_based. making changes to python_extract_g608 to be called only from the point when a g608 caption is detected. * Added pass_cc_buffer_to_python in encoders_common.c temporarily redefined get_*_encoded from static to normal included the above functions in encoders_common.h * Added if-else statement for switch in encode_sub function. This is done mainly for making sure no output is generated in the api call. * Added ccx_encoders_python.c Defined pass_cc_buffer_to_python in ccx_encoders_python.c added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call * Removed __wrap_write from the entire code base. It's declaration and definition are only present in CCExtractor.* * Commented out the /dev/null part in ccx_encoders_common.c. Proceeding further on checking for file generation. * Added output_filename in array global variable and is generated in init_write function. included ccextractor.h in output.c to access global variable signal_python_api for avoiding output generation in init_write and invalid free in dinit_write. * Modified the definition of init_write function for accessing signal_python_api. * Deleted the commented part of /dev/null in ccx_encoders_common.c. * Added target_message=0 in -pythonapi param parsing in param.c to avoid the API from printing to STDOUT. Deleted the commented part of -out=pythonapi. Thinking of adding a different param for silencing the output when the call is made from python api. * Removed __wrap_write from ccextractor.c and ccextractor.h. * Added ccx_to_python_g608 and modified api_support.py file. added documentation in ccextractor.c. * added the generate srt script. However, some random characters are coming in first line. Need to talk about this. * Added SRT generator for python. Using string to remove the garbage value. Add code for srt counter and also the start_time and end_time conversion. * removed the trash characters and added code to print the timings. However, the last blank frame also results in a print. Need to take care of this. * rectified the mistake of writing only timings and not captions. now next step is to just make the timings print properly * some minor changes before diving into extracting srt_counter from the made codebase * Added extraction of srt_counter in python_extract via fflush srt_counter-value. Need to modify the processing in python. * Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt * Processing into a srt working properly. Next step is to add the information of font into the caption text. * the data is getting generated for proper SRT counters. * A turning point to the appraoch. Added END OF FRAME line for printing the data for every particular srt_counter. Proceeding further with the generation of srt by data manipulation. * some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Added fflush and cleaned up the python code of srt generation * Added <i> tag for italics. Proceeding further with other types. * Added the code to check for underline. However, need to check how CCExtractor generates srt when both italics and underline are present. For now a new line is added if both are present. * Shifting for making changes in th i/O work. * Stable ouput for samples with italics is being generated. * Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function. * build script for linux is working correctly. Build_library is showing error of invalid def of set_pythonapi. Moreover, extractor has some memory seg fault. * Added mod to set a MACRO as my_python_api to set the callback function. Till now all calls to the reporter are commented. Working on getting the reporter to print the lines. * Changes have been implemented to bring reporter in working state. For now a constant string is passed from extractor. Need to make the proper parsing possible. * Changed the code in extractor such that entire grid is returned to the callback function. Need to provide this grid to the write function and also cleanup the codebase. * Writing the outputted srt in a file called "temp.srt". Need to modify init_write to push filename that is to be created in python using callback. * Added code to get start and end time simultaneously. entire SRT is getting generated. * removed ccx_python_encoders.c * Compiling and executing on Windows * Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c. Also, deleted the static definition of get_font_encoded from ccx_encoders_webvtt.c * added a write statement in write_cc_bitmap_as_srt * Rectified transfer of get_line_encoded, get_color_encoded and get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 15:54:35 +00:00
Cleaning up the codebase and additional changes in Python SRT generator. (#771) * Removed all extractors except the grid extractor. Removed the call to transcript extractor in ccx_encoders_transcript.c * Removed unnecessary array appening statements in python_grid_extractor. WIP: switch in extractor. * Added switch in g608 grid extractor. * Deleted comments from wrappers. * Refactored code in ccextractor.c and .h files. Removed all the commented part. Made proper changes according to the coding conventions. * Removed calls to extractor from all the encoders. The only call made to extractor is from ccx_encoders_python.c. * Removed a comment from wrapper.c. In init_write function of output.c added a call to free the output string returned by asprintf in case of sending filename to callback function. * Added calls to free the char* which is malloced by asprintf in extractor.c WIP: Free the global variable elements. * Sample testing correctly for italics tag. Also added a hack to print only 32 characters when unicode fails. WIP: Font tag. * Added support for handling font and italics in Python SRT generator. * modified the font generator. Also, added count method for checking blank strings in python_srt_generator. * Added free statements for avoiding memory leaks. * added return code for failure of asprintf calls. * Removing unnecessary code from api_testing.py * Made modifications to Makefile and build script. * Added recursive_tester.py Autoconf builds successfully. * BUG: Made change to get_line_encoded to encode the last \0 character in a line. Otherwise the EOL characted is absent causing garbage value to be present in SRT. * Exporting the encoding of the captions from CCExtractor to Python so that the python SRT generator can generate proper SRT files. * Modified the include statement in extractor.h
2017-08-25 18:03:00 +00:00
def generate_output_srt(filename,d, encoding):
if encoding in encodings_map.keys():
if encoding!='0':
encoding_format = encodings_map[encoding]
else:
encoding_format = ""
else:
print "encoding error in python"
return
if encoding_format:
d['text'] = [unicode(item,encoding_format) for item in d['text']]
else:
d['text'] = [unicode(item) for item in d['text']]
d['text'],d['font'],d['color']= comparing_text_font_grids(d['text'],d['font'],d['color'])
Python bindings with extraction of CE608 grid and writing to a SRT output. (#768) * added python_extract to encoders_srt and the captions are being extracted in needed format. Search for an alternative to asprintf * Checking if the alternative to asprintf generate proper srts * CC captions accessible via python script * Removing python caption code from __wrap_write function * removing old cc_to_python functions * Removing python_subs structure and all the changes done for that struct * Removing filename functions from ccextractor.* * Renaming make_message to time_wrapper * Applying to python_extract codebase: SSA format * Added python_extract_time_based and done validation for ssa * pplying python_extract_time_based: Done validation for srt and webvtt * led attempt for SAMI support of python_extract. Code is commented * Appluing python_extract_time_based: validate support for SMPTETT * Added python_extract_transcript and made changes for time printing. * added show_extracted_captions_wtih_timings function * Added show_extracted_captions_with_timings to python script for testing purpose. * refactored extractors to api directory. commented out show captions in main() * build and build library working for the extractors. * made caption generator work with a 0.1 time sleep. Start refactoring * added asprintf for windows. * file being written in the running directory * Auto -deletion of python temporary file * Python captions printing status set to proper. * termination of tail successful * Writing successful for the sample * Generating unalternating output * adding api_support.py * Adding bld_flags in build_api * Added to build_library * Auto deletion of temporary file on SIGINT * Discussing Seg fault with Izaron * working for python and linux with samples. testing -out=pythonapi with stream * Done adding bitmap support * added -out=pythonapi support for bitmap * Setting the messages_target to 0 for output = pythonapi * Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python. * adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future. * added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608 * Removed overlap of -out=pythonapi by adding -pythonapi and signal_python_api global variable. * added support for seperate c608 grid catching. Need to test the output via python. * added support for seperate printing of text font and color in CE608. Need to make sure that the function is inbuilt. * ADDED ce608 GRID SUPPORT FROM PYTHON need to discuss whether to keep the print_cc_grid function specific to the module or make it user accessible. Mostly it would be better to make it user accessible. * made changes in the call_from_python_api function such that only api_options is needed to be passed. An if statement before the call to g608_extractor has also been added. Waiting for Carlos to comment on the output generated till this stage. * added a signal_python_api check before calling every write function. Thus basic writing output can be avoided. * Commented all calls to python_extract_time_based. making changes to python_extract_g608 to be called only from the point when a g608 caption is detected. * Added pass_cc_buffer_to_python in encoders_common.c temporarily redefined get_*_encoded from static to normal included the above functions in encoders_common.h * Added if-else statement for switch in encode_sub function. This is done mainly for making sure no output is generated in the api call. * Added ccx_encoders_python.c Defined pass_cc_buffer_to_python in ccx_encoders_python.c added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call * Removed __wrap_write from the entire code base. It's declaration and definition are only present in CCExtractor.* * Commented out the /dev/null part in ccx_encoders_common.c. Proceeding further on checking for file generation. * Added output_filename in array global variable and is generated in init_write function. included ccextractor.h in output.c to access global variable signal_python_api for avoiding output generation in init_write and invalid free in dinit_write. * Modified the definition of init_write function for accessing signal_python_api. * Deleted the commented part of /dev/null in ccx_encoders_common.c. * Added target_message=0 in -pythonapi param parsing in param.c to avoid the API from printing to STDOUT. Deleted the commented part of -out=pythonapi. Thinking of adding a different param for silencing the output when the call is made from python api. * Removed __wrap_write from ccextractor.c and ccextractor.h. * Added ccx_to_python_g608 and modified api_support.py file. added documentation in ccextractor.c. * added the generate srt script. However, some random characters are coming in first line. Need to talk about this. * Added SRT generator for python. Using string to remove the garbage value. Add code for srt counter and also the start_time and end_time conversion. * removed the trash characters and added code to print the timings. However, the last blank frame also results in a print. Need to take care of this. * rectified the mistake of writing only timings and not captions. now next step is to just make the timings print properly * some minor changes before diving into extracting srt_counter from the made codebase * Added extraction of srt_counter in python_extract via fflush srt_counter-value. Need to modify the processing in python. * Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt * Processing into a srt working properly. Next step is to add the information of font into the caption text. * the data is getting generated for proper SRT counters. * A turning point to the appraoch. Added END OF FRAME line for printing the data for every particular srt_counter. Proceeding further with the generation of srt by data manipulation. * some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing. * Added fflush and cleaned up the python code of srt generation * Added <i> tag for italics. Proceeding further with other types. * Added the code to check for underline. However, need to check how CCExtractor generates srt when both italics and underline are present. For now a new line is added if both are present. * Shifting for making changes in th i/O work. * Stable ouput for samples with italics is being generated. * Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function. * build script for linux is working correctly. Build_library is showing error of invalid def of set_pythonapi. Moreover, extractor has some memory seg fault. * Added mod to set a MACRO as my_python_api to set the callback function. Till now all calls to the reporter are commented. Working on getting the reporter to print the lines. * Changes have been implemented to bring reporter in working state. For now a constant string is passed from extractor. Need to make the proper parsing possible. * Changed the code in extractor such that entire grid is returned to the callback function. Need to provide this grid to the write function and also cleanup the codebase. * Writing the outputted srt in a file called "temp.srt". Need to modify init_write to push filename that is to be created in python using callback. * Added code to get start and end time simultaneously. entire SRT is getting generated. * removed ccx_python_encoders.c * Compiling and executing on Windows * Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c. Also, deleted the static definition of get_font_encoded from ccx_encoders_webvtt.c * added a write statement in write_cc_bitmap_as_srt * Rectified transfer of get_line_encoded, get_color_encoded and get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 15:54:35 +00:00
for item in d['text']:
Cleaning up the codebase and additional changes in Python SRT generator. (#771) * Removed all extractors except the grid extractor. Removed the call to transcript extractor in ccx_encoders_transcript.c * Removed unnecessary array appening statements in python_grid_extractor. WIP: switch in extractor. * Added switch in g608 grid extractor. * Deleted comments from wrappers. * Refactored code in ccextractor.c and .h files. Removed all the commented part. Made proper changes according to the coding conventions. * Removed calls to extractor from all the encoders. The only call made to extractor is from ccx_encoders_python.c. * Removed a comment from wrapper.c. In init_write function of output.c added a call to free the output string returned by asprintf in case of sending filename to callback function. * Added calls to free the char* which is malloced by asprintf in extractor.c WIP: Free the global variable elements. * Sample testing correctly for italics tag. Also added a hack to print only 32 characters when unicode fails. WIP: Font tag. * Added support for handling font and italics in Python SRT generator. * modified the font generator. Also, added count method for checking blank strings in python_srt_generator. * Added free statements for avoiding memory leaks. * added return code for failure of asprintf calls. * Removing unnecessary code from api_testing.py * Made modifications to Makefile and build script. * Added recursive_tester.py Autoconf builds successfully. * BUG: Made change to get_line_encoded to encode the last \0 character in a line. Otherwise the EOL characted is absent causing garbage value to be present in SRT. * Exporting the encoding of the captions from CCExtractor to Python so that the python SRT generator can generate proper SRT files. * Modified the include statement in extractor.h
2017-08-25 18:03:00 +00:00
if item.count(" ")<32:
o=item
with open(filename,'ab+') as fh:
if encoding_format:
fh.write(o.encode(encoding_format))
else:
fh.write(str(o))
fh.write("\n")
fh.flush()
with open(filename,'ab+') as fh:
fh.write("\n")
fh.flush()