Python bindings with extraction of CE608 grid and writing to a SRT output. (#768)
* added python_extract to encoders_srt and the captions are being
extracted in needed format. Search for an alternative to asprintf
* Checking if the alternative to asprintf generate proper srts
* CC captions accessible via python script
* Removing python caption code from __wrap_write function
* removing old cc_to_python functions
* Removing python_subs structure and all the changes done for that struct
* Removing filename functions from ccextractor.*
* Renaming make_message to time_wrapper
* Applying to python_extract codebase: SSA format
* Added python_extract_time_based and done validation for ssa
* pplying python_extract_time_based: Done validation for srt and webvtt
* led attempt for SAMI support of python_extract. Code is commented
* Appluing python_extract_time_based: validate support for SMPTETT
* Added python_extract_transcript and made changes for time printing.
* added show_extracted_captions_wtih_timings function
* Added show_extracted_captions_with_timings to python script for testing
purpose.
* refactored extractors to api directory. commented out show captions in main()
* build and build library working for the extractors.
* made caption generator work with a 0.1 time sleep. Start refactoring
* added asprintf for windows.
* file being written in the running directory
* Auto -deletion of python temporary file
* Python captions printing status set to proper.
* termination of tail successful
* Writing successful for the sample
* Generating unalternating output
* adding api_support.py
* Adding bld_flags in build_api
* Added to build_library
* Auto deletion of temporary file on SIGINT
* Discussing Seg fault with Izaron
* working for python and linux with samples. testing -out=pythonapi with stream
* Done adding bitmap support
* added -out=pythonapi support for bitmap
* Setting the messages_target to 0 for output = pythonapi
* Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python.
* adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future.
* added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608
* Removed overlap of -out=pythonapi by adding -pythonapi and
signal_python_api global variable.
* added support for seperate c608 grid catching. Need to test the output
via python.
* added support for seperate printing of text font and color in CE608.
Need to make sure that the function is inbuilt.
* ADDED ce608 GRID SUPPORT FROM PYTHON
need to discuss whether to keep the print_cc_grid function specific to
the module or make it user accessible.
Mostly it would be better to make it user accessible.
* made changes in the call_from_python_api function such that only
api_options is needed to be passed.
An if statement before the call to g608_extractor has also been added.
Waiting for Carlos to comment on the output generated till this stage.
* added a signal_python_api check before calling every write function.
Thus basic writing output can be avoided.
* Commented all calls to python_extract_time_based.
making changes to python_extract_g608 to be called only from the point
when a g608 caption is detected.
* Added pass_cc_buffer_to_python in encoders_common.c temporarily
redefined get_*_encoded from static to normal
included the above functions in encoders_common.h
* Added if-else statement for switch in encode_sub function.
This is done mainly for making sure no output is generated in the api
call.
* Added ccx_encoders_python.c
Defined pass_cc_buffer_to_python in ccx_encoders_python.c
added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call
* Removed __wrap_write from the entire code base.
It's declaration and definition are only present in CCExtractor.*
* Commented out the /dev/null part in ccx_encoders_common.c.
Proceeding further on checking for file generation.
* Added output_filename in array global variable and is generated in
init_write function.
included ccextractor.h in output.c to access global variable
signal_python_api for avoiding output generation in init_write and
invalid free in dinit_write.
* Modified the definition of init_write function for accessing
signal_python_api.
* Deleted the commented part of /dev/null in ccx_encoders_common.c.
* Added target_message=0 in -pythonapi param parsing in param.c to avoid
the API from printing to STDOUT.
Deleted the commented part of -out=pythonapi.
Thinking of adding a different param for silencing the output when the
call is made from python api.
* Removed __wrap_write from ccextractor.c and ccextractor.h.
* Added ccx_to_python_g608 and modified api_support.py file.
added documentation in ccextractor.c.
* added the generate srt script. However, some random characters are
coming in first line. Need to talk about this.
* Added SRT generator for python.
Using string to remove the garbage value.
Add code for srt counter and also the start_time and end_time
conversion.
* removed the trash characters and added code to print the timings.
However, the last blank frame also results in a print. Need to take care
of this.
* rectified the mistake of writing only timings and not captions.
now next step is to just make the timings print properly
* some minor changes before diving into extracting srt_counter from the made codebase
* Added extraction of srt_counter in python_extract via fflush
srt_counter-value.
Need to modify the processing in python.
* Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt
* Processing into a srt working properly.
Next step is to add the information of font into the caption text.
* the data is getting generated for proper SRT counters.
* A turning point to the appraoch.
Added END OF FRAME line for printing the data for every particular
srt_counter.
Proceeding further with the generation of srt by data manipulation.
* some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Added fflush and cleaned up the python code of srt generation
* Added <i> tag for italics.
Proceeding further with other types.
* Added the code to check for underline.
However, need to check how CCExtractor generates srt when both italics
and underline are present. For now a new line is added if both are
present.
* Shifting for making changes in th i/O work.
* Stable ouput for samples with italics is being generated.
* Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function.
* build script for linux is working correctly.
Build_library is showing error of invalid def of set_pythonapi.
Moreover, extractor has some memory seg fault.
* Added mod to set a MACRO as my_python_api to set the callback function.
Till now all calls to the reporter are commented.
Working on getting the reporter to print the lines.
* Changes have been implemented to bring reporter in working state.
For now a constant string is passed from extractor. Need to make the
proper parsing possible.
* Changed the code in extractor such that entire grid is returned to the
callback function.
Need to provide this grid to the write function and also cleanup the
codebase.
* Writing the outputted srt in a file called "temp.srt".
Need to modify init_write to push filename that is to be created in
python using callback.
* Added code to get start and end time simultaneously.
entire SRT is getting generated.
* removed ccx_python_encoders.c
* Compiling and executing on Windows
* Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c.
Also, deleted the static definition of get_font_encoded from
ccx_encoders_webvtt.c
* added a write statement in write_cc_bitmap_as_srt
* Rectified transfer of get_line_encoded, get_color_encoded and
get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 15:54:35 +00:00
|
|
|
import ccextractor as cc
|
|
|
|
import re
|
|
|
|
"""
|
|
|
|
#Handling underline
|
|
|
|
buff = ""
|
|
|
|
underline_flag = 0
|
|
|
|
for i,font_type in enumerate(font_line):
|
|
|
|
if font_type == 'U' and not underline_flag:
|
|
|
|
buff = buff + '<u> '
|
|
|
|
underline_flag = 1
|
|
|
|
underline=1
|
|
|
|
elif font_type =="R" and underline_flag:
|
|
|
|
buff = buff + '</u>'
|
|
|
|
underline_flag = 0
|
|
|
|
continue;
|
|
|
|
buff += letter[i]
|
|
|
|
#adding a new line after buff has seen underline
|
|
|
|
#need to cross check with CCExtractor output as to how they are doing
|
|
|
|
if underline:
|
|
|
|
buff+= "\n"
|
|
|
|
else:
|
|
|
|
buff=""
|
|
|
|
"""
|
2017-08-25 18:03:00 +00:00
|
|
|
encodings_map = {
|
|
|
|
'0':'unicode',
|
|
|
|
'1':'latin1',
|
|
|
|
'2':'utf-8',
|
|
|
|
'3':'ascii',
|
|
|
|
}
|
Python bindings with extraction of CE608 grid and writing to a SRT output. (#768)
* added python_extract to encoders_srt and the captions are being
extracted in needed format. Search for an alternative to asprintf
* Checking if the alternative to asprintf generate proper srts
* CC captions accessible via python script
* Removing python caption code from __wrap_write function
* removing old cc_to_python functions
* Removing python_subs structure and all the changes done for that struct
* Removing filename functions from ccextractor.*
* Renaming make_message to time_wrapper
* Applying to python_extract codebase: SSA format
* Added python_extract_time_based and done validation for ssa
* pplying python_extract_time_based: Done validation for srt and webvtt
* led attempt for SAMI support of python_extract. Code is commented
* Appluing python_extract_time_based: validate support for SMPTETT
* Added python_extract_transcript and made changes for time printing.
* added show_extracted_captions_wtih_timings function
* Added show_extracted_captions_with_timings to python script for testing
purpose.
* refactored extractors to api directory. commented out show captions in main()
* build and build library working for the extractors.
* made caption generator work with a 0.1 time sleep. Start refactoring
* added asprintf for windows.
* file being written in the running directory
* Auto -deletion of python temporary file
* Python captions printing status set to proper.
* termination of tail successful
* Writing successful for the sample
* Generating unalternating output
* adding api_support.py
* Adding bld_flags in build_api
* Added to build_library
* Auto deletion of temporary file on SIGINT
* Discussing Seg fault with Izaron
* working for python and linux with samples. testing -out=pythonapi with stream
* Done adding bitmap support
* added -out=pythonapi support for bitmap
* Setting the messages_target to 0 for output = pythonapi
* Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python.
* adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future.
* added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608
* Removed overlap of -out=pythonapi by adding -pythonapi and
signal_python_api global variable.
* added support for seperate c608 grid catching. Need to test the output
via python.
* added support for seperate printing of text font and color in CE608.
Need to make sure that the function is inbuilt.
* ADDED ce608 GRID SUPPORT FROM PYTHON
need to discuss whether to keep the print_cc_grid function specific to
the module or make it user accessible.
Mostly it would be better to make it user accessible.
* made changes in the call_from_python_api function such that only
api_options is needed to be passed.
An if statement before the call to g608_extractor has also been added.
Waiting for Carlos to comment on the output generated till this stage.
* added a signal_python_api check before calling every write function.
Thus basic writing output can be avoided.
* Commented all calls to python_extract_time_based.
making changes to python_extract_g608 to be called only from the point
when a g608 caption is detected.
* Added pass_cc_buffer_to_python in encoders_common.c temporarily
redefined get_*_encoded from static to normal
included the above functions in encoders_common.h
* Added if-else statement for switch in encode_sub function.
This is done mainly for making sure no output is generated in the api
call.
* Added ccx_encoders_python.c
Defined pass_cc_buffer_to_python in ccx_encoders_python.c
added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call
* Removed __wrap_write from the entire code base.
It's declaration and definition are only present in CCExtractor.*
* Commented out the /dev/null part in ccx_encoders_common.c.
Proceeding further on checking for file generation.
* Added output_filename in array global variable and is generated in
init_write function.
included ccextractor.h in output.c to access global variable
signal_python_api for avoiding output generation in init_write and
invalid free in dinit_write.
* Modified the definition of init_write function for accessing
signal_python_api.
* Deleted the commented part of /dev/null in ccx_encoders_common.c.
* Added target_message=0 in -pythonapi param parsing in param.c to avoid
the API from printing to STDOUT.
Deleted the commented part of -out=pythonapi.
Thinking of adding a different param for silencing the output when the
call is made from python api.
* Removed __wrap_write from ccextractor.c and ccextractor.h.
* Added ccx_to_python_g608 and modified api_support.py file.
added documentation in ccextractor.c.
* added the generate srt script. However, some random characters are
coming in first line. Need to talk about this.
* Added SRT generator for python.
Using string to remove the garbage value.
Add code for srt counter and also the start_time and end_time
conversion.
* removed the trash characters and added code to print the timings.
However, the last blank frame also results in a print. Need to take care
of this.
* rectified the mistake of writing only timings and not captions.
now next step is to just make the timings print properly
* some minor changes before diving into extracting srt_counter from the made codebase
* Added extraction of srt_counter in python_extract via fflush
srt_counter-value.
Need to modify the processing in python.
* Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt
* Processing into a srt working properly.
Next step is to add the information of font into the caption text.
* the data is getting generated for proper SRT counters.
* A turning point to the appraoch.
Added END OF FRAME line for printing the data for every particular
srt_counter.
Proceeding further with the generation of srt by data manipulation.
* some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Added fflush and cleaned up the python code of srt generation
* Added <i> tag for italics.
Proceeding further with other types.
* Added the code to check for underline.
However, need to check how CCExtractor generates srt when both italics
and underline are present. For now a new line is added if both are
present.
* Shifting for making changes in th i/O work.
* Stable ouput for samples with italics is being generated.
* Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function.
* build script for linux is working correctly.
Build_library is showing error of invalid def of set_pythonapi.
Moreover, extractor has some memory seg fault.
* Added mod to set a MACRO as my_python_api to set the callback function.
Till now all calls to the reporter are commented.
Working on getting the reporter to print the lines.
* Changes have been implemented to bring reporter in working state.
For now a constant string is passed from extractor. Need to make the
proper parsing possible.
* Changed the code in extractor such that entire grid is returned to the
callback function.
Need to provide this grid to the write function and also cleanup the
codebase.
* Writing the outputted srt in a file called "temp.srt".
Need to modify init_write to push filename that is to be created in
python using callback.
* Added code to get start and end time simultaneously.
entire SRT is getting generated.
* removed ccx_python_encoders.c
* Compiling and executing on Windows
* Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c.
Also, deleted the static definition of get_font_encoded from
ccx_encoders_webvtt.c
* added a write statement in write_cc_bitmap_as_srt
* Rectified transfer of get_line_encoded, get_color_encoded and
get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 15:54:35 +00:00
|
|
|
|
2017-08-25 18:03:00 +00:00
|
|
|
color_text_start={
|
|
|
|
"0":"",
|
|
|
|
"1":"<font color=\"#00ff00\">",
|
|
|
|
"2":"<font color=\"#0000ff\">",
|
|
|
|
"3":"<font color=\"#00ffff\">",
|
|
|
|
"4":"<font color=\"#ff0000\">",
|
|
|
|
"5":"<font color=\"#ffff00\">",
|
|
|
|
"6":"<font color=\"#ff00ff\">",
|
|
|
|
"7":"<font color=\"",
|
|
|
|
"8":"",
|
|
|
|
"9":""
|
|
|
|
};
|
|
|
|
color_text_end={
|
|
|
|
"0":"",
|
|
|
|
"1":"</font",
|
|
|
|
"2":"</font>",
|
|
|
|
"3":"</font>",
|
|
|
|
"4":"</font>",
|
|
|
|
"5":"</font>",
|
|
|
|
"6":"</font>",
|
|
|
|
"7":"</font>",
|
|
|
|
"8":"",
|
|
|
|
"9":""
|
|
|
|
};
|
|
|
|
no_color_tag = ['0','8','9']
|
|
|
|
def comparing_text_font_grids(text, font, color):
|
|
|
|
original_text = text
|
|
|
|
original_color = color
|
|
|
|
temp_color = []
|
|
|
|
for letter,color_line in zip(original_text,color):
|
|
|
|
color = 0
|
|
|
|
prev = color_line[0]
|
|
|
|
buff = color_text_start[str(prev)]
|
|
|
|
if prev not in no_color_tag:
|
|
|
|
color_flag = 1
|
|
|
|
else:
|
|
|
|
color_flag = 0
|
|
|
|
if letter.count(" ")<32:
|
|
|
|
for i,color_type in enumerate(color_line):
|
|
|
|
if color_type not in no_color_tag and prev!=color_type and not color_flag:
|
|
|
|
color = 1
|
|
|
|
buff = buff + color_text_start[str(color_type)]
|
|
|
|
color_flag = 1
|
|
|
|
elif prev!=color_type and color_flag:
|
|
|
|
color = 1
|
|
|
|
buff = buff + color_text_end[str(prev)]
|
|
|
|
color_flag = 0
|
|
|
|
buff += letter[i]
|
|
|
|
prev=color_type
|
|
|
|
if color_flag:
|
|
|
|
color_flag=0
|
|
|
|
buff+=color_text_end[str(prev)]
|
|
|
|
if color:
|
|
|
|
temp_color.append((buff,1))
|
|
|
|
else:
|
|
|
|
temp_color.append((letter,0))
|
|
|
|
temp_font_italics=[]
|
|
|
|
for letter,font_line in zip(original_text,font):
|
|
|
|
if letter.count(" ")<32:
|
Python bindings with extraction of CE608 grid and writing to a SRT output. (#768)
* added python_extract to encoders_srt and the captions are being
extracted in needed format. Search for an alternative to asprintf
* Checking if the alternative to asprintf generate proper srts
* CC captions accessible via python script
* Removing python caption code from __wrap_write function
* removing old cc_to_python functions
* Removing python_subs structure and all the changes done for that struct
* Removing filename functions from ccextractor.*
* Renaming make_message to time_wrapper
* Applying to python_extract codebase: SSA format
* Added python_extract_time_based and done validation for ssa
* pplying python_extract_time_based: Done validation for srt and webvtt
* led attempt for SAMI support of python_extract. Code is commented
* Appluing python_extract_time_based: validate support for SMPTETT
* Added python_extract_transcript and made changes for time printing.
* added show_extracted_captions_wtih_timings function
* Added show_extracted_captions_with_timings to python script for testing
purpose.
* refactored extractors to api directory. commented out show captions in main()
* build and build library working for the extractors.
* made caption generator work with a 0.1 time sleep. Start refactoring
* added asprintf for windows.
* file being written in the running directory
* Auto -deletion of python temporary file
* Python captions printing status set to proper.
* termination of tail successful
* Writing successful for the sample
* Generating unalternating output
* adding api_support.py
* Adding bld_flags in build_api
* Added to build_library
* Auto deletion of temporary file on SIGINT
* Discussing Seg fault with Izaron
* working for python and linux with samples. testing -out=pythonapi with stream
* Done adding bitmap support
* added -out=pythonapi support for bitmap
* Setting the messages_target to 0 for output = pythonapi
* Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python.
* adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future.
* added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608
* Removed overlap of -out=pythonapi by adding -pythonapi and
signal_python_api global variable.
* added support for seperate c608 grid catching. Need to test the output
via python.
* added support for seperate printing of text font and color in CE608.
Need to make sure that the function is inbuilt.
* ADDED ce608 GRID SUPPORT FROM PYTHON
need to discuss whether to keep the print_cc_grid function specific to
the module or make it user accessible.
Mostly it would be better to make it user accessible.
* made changes in the call_from_python_api function such that only
api_options is needed to be passed.
An if statement before the call to g608_extractor has also been added.
Waiting for Carlos to comment on the output generated till this stage.
* added a signal_python_api check before calling every write function.
Thus basic writing output can be avoided.
* Commented all calls to python_extract_time_based.
making changes to python_extract_g608 to be called only from the point
when a g608 caption is detected.
* Added pass_cc_buffer_to_python in encoders_common.c temporarily
redefined get_*_encoded from static to normal
included the above functions in encoders_common.h
* Added if-else statement for switch in encode_sub function.
This is done mainly for making sure no output is generated in the api
call.
* Added ccx_encoders_python.c
Defined pass_cc_buffer_to_python in ccx_encoders_python.c
added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call
* Removed __wrap_write from the entire code base.
It's declaration and definition are only present in CCExtractor.*
* Commented out the /dev/null part in ccx_encoders_common.c.
Proceeding further on checking for file generation.
* Added output_filename in array global variable and is generated in
init_write function.
included ccextractor.h in output.c to access global variable
signal_python_api for avoiding output generation in init_write and
invalid free in dinit_write.
* Modified the definition of init_write function for accessing
signal_python_api.
* Deleted the commented part of /dev/null in ccx_encoders_common.c.
* Added target_message=0 in -pythonapi param parsing in param.c to avoid
the API from printing to STDOUT.
Deleted the commented part of -out=pythonapi.
Thinking of adding a different param for silencing the output when the
call is made from python api.
* Removed __wrap_write from ccextractor.c and ccextractor.h.
* Added ccx_to_python_g608 and modified api_support.py file.
added documentation in ccextractor.c.
* added the generate srt script. However, some random characters are
coming in first line. Need to talk about this.
* Added SRT generator for python.
Using string to remove the garbage value.
Add code for srt counter and also the start_time and end_time
conversion.
* removed the trash characters and added code to print the timings.
However, the last blank frame also results in a print. Need to take care
of this.
* rectified the mistake of writing only timings and not captions.
now next step is to just make the timings print properly
* some minor changes before diving into extracting srt_counter from the made codebase
* Added extraction of srt_counter in python_extract via fflush
srt_counter-value.
Need to modify the processing in python.
* Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt
* Processing into a srt working properly.
Next step is to add the information of font into the caption text.
* the data is getting generated for proper SRT counters.
* A turning point to the appraoch.
Added END OF FRAME line for printing the data for every particular
srt_counter.
Proceeding further with the generation of srt by data manipulation.
* some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Added fflush and cleaned up the python code of srt generation
* Added <i> tag for italics.
Proceeding further with other types.
* Added the code to check for underline.
However, need to check how CCExtractor generates srt when both italics
and underline are present. For now a new line is added if both are
present.
* Shifting for making changes in th i/O work.
* Stable ouput for samples with italics is being generated.
* Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function.
* build script for linux is working correctly.
Build_library is showing error of invalid def of set_pythonapi.
Moreover, extractor has some memory seg fault.
* Added mod to set a MACRO as my_python_api to set the callback function.
Till now all calls to the reporter are commented.
Working on getting the reporter to print the lines.
* Changes have been implemented to bring reporter in working state.
For now a constant string is passed from extractor. Need to make the
proper parsing possible.
* Changed the code in extractor such that entire grid is returned to the
callback function.
Need to provide this grid to the write function and also cleanup the
codebase.
* Writing the outputted srt in a file called "temp.srt".
Need to modify init_write to push filename that is to be created in
python using callback.
* Added code to get start and end time simultaneously.
entire SRT is getting generated.
* removed ccx_python_encoders.c
* Compiling and executing on Windows
* Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c.
Also, deleted the static definition of get_font_encoded from
ccx_encoders_webvtt.c
* added a write statement in write_cc_bitmap_as_srt
* Rectified transfer of get_line_encoded, get_color_encoded and
get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 15:54:35 +00:00
|
|
|
buff=""
|
|
|
|
underline,italics = 0,0
|
|
|
|
#Handling italics
|
|
|
|
italics_flag = 0
|
|
|
|
for i,font_type in enumerate(font_line):
|
|
|
|
if font_type == 'I' and not italics_flag:
|
2017-08-25 18:03:00 +00:00
|
|
|
italics=1
|
Python bindings with extraction of CE608 grid and writing to a SRT output. (#768)
* added python_extract to encoders_srt and the captions are being
extracted in needed format. Search for an alternative to asprintf
* Checking if the alternative to asprintf generate proper srts
* CC captions accessible via python script
* Removing python caption code from __wrap_write function
* removing old cc_to_python functions
* Removing python_subs structure and all the changes done for that struct
* Removing filename functions from ccextractor.*
* Renaming make_message to time_wrapper
* Applying to python_extract codebase: SSA format
* Added python_extract_time_based and done validation for ssa
* pplying python_extract_time_based: Done validation for srt and webvtt
* led attempt for SAMI support of python_extract. Code is commented
* Appluing python_extract_time_based: validate support for SMPTETT
* Added python_extract_transcript and made changes for time printing.
* added show_extracted_captions_wtih_timings function
* Added show_extracted_captions_with_timings to python script for testing
purpose.
* refactored extractors to api directory. commented out show captions in main()
* build and build library working for the extractors.
* made caption generator work with a 0.1 time sleep. Start refactoring
* added asprintf for windows.
* file being written in the running directory
* Auto -deletion of python temporary file
* Python captions printing status set to proper.
* termination of tail successful
* Writing successful for the sample
* Generating unalternating output
* adding api_support.py
* Adding bld_flags in build_api
* Added to build_library
* Auto deletion of temporary file on SIGINT
* Discussing Seg fault with Izaron
* working for python and linux with samples. testing -out=pythonapi with stream
* Done adding bitmap support
* added -out=pythonapi support for bitmap
* Setting the messages_target to 0 for output = pythonapi
* Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python.
* adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future.
* added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608
* Removed overlap of -out=pythonapi by adding -pythonapi and
signal_python_api global variable.
* added support for seperate c608 grid catching. Need to test the output
via python.
* added support for seperate printing of text font and color in CE608.
Need to make sure that the function is inbuilt.
* ADDED ce608 GRID SUPPORT FROM PYTHON
need to discuss whether to keep the print_cc_grid function specific to
the module or make it user accessible.
Mostly it would be better to make it user accessible.
* made changes in the call_from_python_api function such that only
api_options is needed to be passed.
An if statement before the call to g608_extractor has also been added.
Waiting for Carlos to comment on the output generated till this stage.
* added a signal_python_api check before calling every write function.
Thus basic writing output can be avoided.
* Commented all calls to python_extract_time_based.
making changes to python_extract_g608 to be called only from the point
when a g608 caption is detected.
* Added pass_cc_buffer_to_python in encoders_common.c temporarily
redefined get_*_encoded from static to normal
included the above functions in encoders_common.h
* Added if-else statement for switch in encode_sub function.
This is done mainly for making sure no output is generated in the api
call.
* Added ccx_encoders_python.c
Defined pass_cc_buffer_to_python in ccx_encoders_python.c
added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call
* Removed __wrap_write from the entire code base.
It's declaration and definition are only present in CCExtractor.*
* Commented out the /dev/null part in ccx_encoders_common.c.
Proceeding further on checking for file generation.
* Added output_filename in array global variable and is generated in
init_write function.
included ccextractor.h in output.c to access global variable
signal_python_api for avoiding output generation in init_write and
invalid free in dinit_write.
* Modified the definition of init_write function for accessing
signal_python_api.
* Deleted the commented part of /dev/null in ccx_encoders_common.c.
* Added target_message=0 in -pythonapi param parsing in param.c to avoid
the API from printing to STDOUT.
Deleted the commented part of -out=pythonapi.
Thinking of adding a different param for silencing the output when the
call is made from python api.
* Removed __wrap_write from ccextractor.c and ccextractor.h.
* Added ccx_to_python_g608 and modified api_support.py file.
added documentation in ccextractor.c.
* added the generate srt script. However, some random characters are
coming in first line. Need to talk about this.
* Added SRT generator for python.
Using string to remove the garbage value.
Add code for srt counter and also the start_time and end_time
conversion.
* removed the trash characters and added code to print the timings.
However, the last blank frame also results in a print. Need to take care
of this.
* rectified the mistake of writing only timings and not captions.
now next step is to just make the timings print properly
* some minor changes before diving into extracting srt_counter from the made codebase
* Added extraction of srt_counter in python_extract via fflush
srt_counter-value.
Need to modify the processing in python.
* Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt
* Processing into a srt working properly.
Next step is to add the information of font into the caption text.
* the data is getting generated for proper SRT counters.
* A turning point to the appraoch.
Added END OF FRAME line for printing the data for every particular
srt_counter.
Proceeding further with the generation of srt by data manipulation.
* some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Added fflush and cleaned up the python code of srt generation
* Added <i> tag for italics.
Proceeding further with other types.
* Added the code to check for underline.
However, need to check how CCExtractor generates srt when both italics
and underline are present. For now a new line is added if both are
present.
* Shifting for making changes in th i/O work.
* Stable ouput for samples with italics is being generated.
* Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function.
* build script for linux is working correctly.
Build_library is showing error of invalid def of set_pythonapi.
Moreover, extractor has some memory seg fault.
* Added mod to set a MACRO as my_python_api to set the callback function.
Till now all calls to the reporter are commented.
Working on getting the reporter to print the lines.
* Changes have been implemented to bring reporter in working state.
For now a constant string is passed from extractor. Need to make the
proper parsing possible.
* Changed the code in extractor such that entire grid is returned to the
callback function.
Need to provide this grid to the write function and also cleanup the
codebase.
* Writing the outputted srt in a file called "temp.srt".
Need to modify init_write to push filename that is to be created in
python using callback.
* Added code to get start and end time simultaneously.
entire SRT is getting generated.
* removed ccx_python_encoders.c
* Compiling and executing on Windows
* Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c.
Also, deleted the static definition of get_font_encoded from
ccx_encoders_webvtt.c
* added a write statement in write_cc_bitmap_as_srt
* Rectified transfer of get_line_encoded, get_color_encoded and
get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 15:54:35 +00:00
|
|
|
buff = buff + '<i>'
|
|
|
|
italics_flag = 1
|
|
|
|
elif font_type =="R" and italics_flag:
|
2017-08-25 18:03:00 +00:00
|
|
|
italics=1
|
Python bindings with extraction of CE608 grid and writing to a SRT output. (#768)
* added python_extract to encoders_srt and the captions are being
extracted in needed format. Search for an alternative to asprintf
* Checking if the alternative to asprintf generate proper srts
* CC captions accessible via python script
* Removing python caption code from __wrap_write function
* removing old cc_to_python functions
* Removing python_subs structure and all the changes done for that struct
* Removing filename functions from ccextractor.*
* Renaming make_message to time_wrapper
* Applying to python_extract codebase: SSA format
* Added python_extract_time_based and done validation for ssa
* pplying python_extract_time_based: Done validation for srt and webvtt
* led attempt for SAMI support of python_extract. Code is commented
* Appluing python_extract_time_based: validate support for SMPTETT
* Added python_extract_transcript and made changes for time printing.
* added show_extracted_captions_wtih_timings function
* Added show_extracted_captions_with_timings to python script for testing
purpose.
* refactored extractors to api directory. commented out show captions in main()
* build and build library working for the extractors.
* made caption generator work with a 0.1 time sleep. Start refactoring
* added asprintf for windows.
* file being written in the running directory
* Auto -deletion of python temporary file
* Python captions printing status set to proper.
* termination of tail successful
* Writing successful for the sample
* Generating unalternating output
* adding api_support.py
* Adding bld_flags in build_api
* Added to build_library
* Auto deletion of temporary file on SIGINT
* Discussing Seg fault with Izaron
* working for python and linux with samples. testing -out=pythonapi with stream
* Done adding bitmap support
* added -out=pythonapi support for bitmap
* Setting the messages_target to 0 for output = pythonapi
* Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python.
* adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future.
* added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608
* Removed overlap of -out=pythonapi by adding -pythonapi and
signal_python_api global variable.
* added support for seperate c608 grid catching. Need to test the output
via python.
* added support for seperate printing of text font and color in CE608.
Need to make sure that the function is inbuilt.
* ADDED ce608 GRID SUPPORT FROM PYTHON
need to discuss whether to keep the print_cc_grid function specific to
the module or make it user accessible.
Mostly it would be better to make it user accessible.
* made changes in the call_from_python_api function such that only
api_options is needed to be passed.
An if statement before the call to g608_extractor has also been added.
Waiting for Carlos to comment on the output generated till this stage.
* added a signal_python_api check before calling every write function.
Thus basic writing output can be avoided.
* Commented all calls to python_extract_time_based.
making changes to python_extract_g608 to be called only from the point
when a g608 caption is detected.
* Added pass_cc_buffer_to_python in encoders_common.c temporarily
redefined get_*_encoded from static to normal
included the above functions in encoders_common.h
* Added if-else statement for switch in encode_sub function.
This is done mainly for making sure no output is generated in the api
call.
* Added ccx_encoders_python.c
Defined pass_cc_buffer_to_python in ccx_encoders_python.c
added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call
* Removed __wrap_write from the entire code base.
It's declaration and definition are only present in CCExtractor.*
* Commented out the /dev/null part in ccx_encoders_common.c.
Proceeding further on checking for file generation.
* Added output_filename in array global variable and is generated in
init_write function.
included ccextractor.h in output.c to access global variable
signal_python_api for avoiding output generation in init_write and
invalid free in dinit_write.
* Modified the definition of init_write function for accessing
signal_python_api.
* Deleted the commented part of /dev/null in ccx_encoders_common.c.
* Added target_message=0 in -pythonapi param parsing in param.c to avoid
the API from printing to STDOUT.
Deleted the commented part of -out=pythonapi.
Thinking of adding a different param for silencing the output when the
call is made from python api.
* Removed __wrap_write from ccextractor.c and ccextractor.h.
* Added ccx_to_python_g608 and modified api_support.py file.
added documentation in ccextractor.c.
* added the generate srt script. However, some random characters are
coming in first line. Need to talk about this.
* Added SRT generator for python.
Using string to remove the garbage value.
Add code for srt counter and also the start_time and end_time
conversion.
* removed the trash characters and added code to print the timings.
However, the last blank frame also results in a print. Need to take care
of this.
* rectified the mistake of writing only timings and not captions.
now next step is to just make the timings print properly
* some minor changes before diving into extracting srt_counter from the made codebase
* Added extraction of srt_counter in python_extract via fflush
srt_counter-value.
Need to modify the processing in python.
* Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt
* Processing into a srt working properly.
Next step is to add the information of font into the caption text.
* the data is getting generated for proper SRT counters.
* A turning point to the appraoch.
Added END OF FRAME line for printing the data for every particular
srt_counter.
Proceeding further with the generation of srt by data manipulation.
* some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Added fflush and cleaned up the python code of srt generation
* Added <i> tag for italics.
Proceeding further with other types.
* Added the code to check for underline.
However, need to check how CCExtractor generates srt when both italics
and underline are present. For now a new line is added if both are
present.
* Shifting for making changes in th i/O work.
* Stable ouput for samples with italics is being generated.
* Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function.
* build script for linux is working correctly.
Build_library is showing error of invalid def of set_pythonapi.
Moreover, extractor has some memory seg fault.
* Added mod to set a MACRO as my_python_api to set the callback function.
Till now all calls to the reporter are commented.
Working on getting the reporter to print the lines.
* Changes have been implemented to bring reporter in working state.
For now a constant string is passed from extractor. Need to make the
proper parsing possible.
* Changed the code in extractor such that entire grid is returned to the
callback function.
Need to provide this grid to the write function and also cleanup the
codebase.
* Writing the outputted srt in a file called "temp.srt".
Need to modify init_write to push filename that is to be created in
python using callback.
* Added code to get start and end time simultaneously.
entire SRT is getting generated.
* removed ccx_python_encoders.c
* Compiling and executing on Windows
* Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c.
Also, deleted the static definition of get_font_encoded from
ccx_encoders_webvtt.c
* added a write statement in write_cc_bitmap_as_srt
* Rectified transfer of get_line_encoded, get_color_encoded and
get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 15:54:35 +00:00
|
|
|
buff = buff + '</i>'
|
|
|
|
italics_flag = 0
|
|
|
|
buff += letter[i]
|
2017-08-25 18:03:00 +00:00
|
|
|
if italics_flag:
|
|
|
|
buff+='</i>'
|
|
|
|
if italics:
|
|
|
|
temp_font_italics.append((buff,1))
|
|
|
|
else:
|
|
|
|
temp_font_italics.append((letter,0))
|
|
|
|
else:
|
|
|
|
temp_font_italics.append((letter,0))
|
|
|
|
final = []
|
|
|
|
for i,j in zip(temp_color,temp_font_italics):
|
|
|
|
if i[1] and not j[1]:
|
|
|
|
final.append(i[0])
|
|
|
|
elif j[1] and not i[1]:
|
|
|
|
final.append(j[0])
|
|
|
|
else:
|
|
|
|
if not i[1]:
|
|
|
|
final.append(i[0])
|
|
|
|
else:
|
|
|
|
print "error"
|
|
|
|
return (final,font,color)
|
Python bindings with extraction of CE608 grid and writing to a SRT output. (#768)
* added python_extract to encoders_srt and the captions are being
extracted in needed format. Search for an alternative to asprintf
* Checking if the alternative to asprintf generate proper srts
* CC captions accessible via python script
* Removing python caption code from __wrap_write function
* removing old cc_to_python functions
* Removing python_subs structure and all the changes done for that struct
* Removing filename functions from ccextractor.*
* Renaming make_message to time_wrapper
* Applying to python_extract codebase: SSA format
* Added python_extract_time_based and done validation for ssa
* pplying python_extract_time_based: Done validation for srt and webvtt
* led attempt for SAMI support of python_extract. Code is commented
* Appluing python_extract_time_based: validate support for SMPTETT
* Added python_extract_transcript and made changes for time printing.
* added show_extracted_captions_wtih_timings function
* Added show_extracted_captions_with_timings to python script for testing
purpose.
* refactored extractors to api directory. commented out show captions in main()
* build and build library working for the extractors.
* made caption generator work with a 0.1 time sleep. Start refactoring
* added asprintf for windows.
* file being written in the running directory
* Auto -deletion of python temporary file
* Python captions printing status set to proper.
* termination of tail successful
* Writing successful for the sample
* Generating unalternating output
* adding api_support.py
* Adding bld_flags in build_api
* Added to build_library
* Auto deletion of temporary file on SIGINT
* Discussing Seg fault with Izaron
* working for python and linux with samples. testing -out=pythonapi with stream
* Done adding bitmap support
* added -out=pythonapi support for bitmap
* Setting the messages_target to 0 for output = pythonapi
* Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python.
* adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future.
* added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608
* Removed overlap of -out=pythonapi by adding -pythonapi and
signal_python_api global variable.
* added support for seperate c608 grid catching. Need to test the output
via python.
* added support for seperate printing of text font and color in CE608.
Need to make sure that the function is inbuilt.
* ADDED ce608 GRID SUPPORT FROM PYTHON
need to discuss whether to keep the print_cc_grid function specific to
the module or make it user accessible.
Mostly it would be better to make it user accessible.
* made changes in the call_from_python_api function such that only
api_options is needed to be passed.
An if statement before the call to g608_extractor has also been added.
Waiting for Carlos to comment on the output generated till this stage.
* added a signal_python_api check before calling every write function.
Thus basic writing output can be avoided.
* Commented all calls to python_extract_time_based.
making changes to python_extract_g608 to be called only from the point
when a g608 caption is detected.
* Added pass_cc_buffer_to_python in encoders_common.c temporarily
redefined get_*_encoded from static to normal
included the above functions in encoders_common.h
* Added if-else statement for switch in encode_sub function.
This is done mainly for making sure no output is generated in the api
call.
* Added ccx_encoders_python.c
Defined pass_cc_buffer_to_python in ccx_encoders_python.c
added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call
* Removed __wrap_write from the entire code base.
It's declaration and definition are only present in CCExtractor.*
* Commented out the /dev/null part in ccx_encoders_common.c.
Proceeding further on checking for file generation.
* Added output_filename in array global variable and is generated in
init_write function.
included ccextractor.h in output.c to access global variable
signal_python_api for avoiding output generation in init_write and
invalid free in dinit_write.
* Modified the definition of init_write function for accessing
signal_python_api.
* Deleted the commented part of /dev/null in ccx_encoders_common.c.
* Added target_message=0 in -pythonapi param parsing in param.c to avoid
the API from printing to STDOUT.
Deleted the commented part of -out=pythonapi.
Thinking of adding a different param for silencing the output when the
call is made from python api.
* Removed __wrap_write from ccextractor.c and ccextractor.h.
* Added ccx_to_python_g608 and modified api_support.py file.
added documentation in ccextractor.c.
* added the generate srt script. However, some random characters are
coming in first line. Need to talk about this.
* Added SRT generator for python.
Using string to remove the garbage value.
Add code for srt counter and also the start_time and end_time
conversion.
* removed the trash characters and added code to print the timings.
However, the last blank frame also results in a print. Need to take care
of this.
* rectified the mistake of writing only timings and not captions.
now next step is to just make the timings print properly
* some minor changes before diving into extracting srt_counter from the made codebase
* Added extraction of srt_counter in python_extract via fflush
srt_counter-value.
Need to modify the processing in python.
* Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt
* Processing into a srt working properly.
Next step is to add the information of font into the caption text.
* the data is getting generated for proper SRT counters.
* A turning point to the appraoch.
Added END OF FRAME line for printing the data for every particular
srt_counter.
Proceeding further with the generation of srt by data manipulation.
* some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Added fflush and cleaned up the python code of srt generation
* Added <i> tag for italics.
Proceeding further with other types.
* Added the code to check for underline.
However, need to check how CCExtractor generates srt when both italics
and underline are present. For now a new line is added if both are
present.
* Shifting for making changes in th i/O work.
* Stable ouput for samples with italics is being generated.
* Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function.
* build script for linux is working correctly.
Build_library is showing error of invalid def of set_pythonapi.
Moreover, extractor has some memory seg fault.
* Added mod to set a MACRO as my_python_api to set the callback function.
Till now all calls to the reporter are commented.
Working on getting the reporter to print the lines.
* Changes have been implemented to bring reporter in working state.
For now a constant string is passed from extractor. Need to make the
proper parsing possible.
* Changed the code in extractor such that entire grid is returned to the
callback function.
Need to provide this grid to the write function and also cleanup the
codebase.
* Writing the outputted srt in a file called "temp.srt".
Need to modify init_write to push filename that is to be created in
python using callback.
* Added code to get start and end time simultaneously.
entire SRT is getting generated.
* removed ccx_python_encoders.c
* Compiling and executing on Windows
* Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c.
Also, deleted the static definition of get_font_encoded from
ccx_encoders_webvtt.c
* added a write statement in write_cc_bitmap_as_srt
* Rectified transfer of get_line_encoded, get_color_encoded and
get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 15:54:35 +00:00
|
|
|
|
|
|
|
|
2017-08-25 18:03:00 +00:00
|
|
|
def generate_output_srt(filename,d, encoding):
|
|
|
|
if encoding in encodings_map.keys():
|
|
|
|
if encoding!='0':
|
|
|
|
encoding_format = encodings_map[encoding]
|
|
|
|
else:
|
|
|
|
encoding_format = ""
|
|
|
|
else:
|
|
|
|
print "encoding error in python"
|
|
|
|
return
|
|
|
|
if encoding_format:
|
|
|
|
d['text'] = [unicode(item,encoding_format) for item in d['text']]
|
|
|
|
else:
|
|
|
|
d['text'] = [unicode(item) for item in d['text']]
|
|
|
|
d['text'],d['font'],d['color']= comparing_text_font_grids(d['text'],d['font'],d['color'])
|
Python bindings with extraction of CE608 grid and writing to a SRT output. (#768)
* added python_extract to encoders_srt and the captions are being
extracted in needed format. Search for an alternative to asprintf
* Checking if the alternative to asprintf generate proper srts
* CC captions accessible via python script
* Removing python caption code from __wrap_write function
* removing old cc_to_python functions
* Removing python_subs structure and all the changes done for that struct
* Removing filename functions from ccextractor.*
* Renaming make_message to time_wrapper
* Applying to python_extract codebase: SSA format
* Added python_extract_time_based and done validation for ssa
* pplying python_extract_time_based: Done validation for srt and webvtt
* led attempt for SAMI support of python_extract. Code is commented
* Appluing python_extract_time_based: validate support for SMPTETT
* Added python_extract_transcript and made changes for time printing.
* added show_extracted_captions_wtih_timings function
* Added show_extracted_captions_with_timings to python script for testing
purpose.
* refactored extractors to api directory. commented out show captions in main()
* build and build library working for the extractors.
* made caption generator work with a 0.1 time sleep. Start refactoring
* added asprintf for windows.
* file being written in the running directory
* Auto -deletion of python temporary file
* Python captions printing status set to proper.
* termination of tail successful
* Writing successful for the sample
* Generating unalternating output
* adding api_support.py
* Adding bld_flags in build_api
* Added to build_library
* Auto deletion of temporary file on SIGINT
* Discussing Seg fault with Izaron
* working for python and linux with samples. testing -out=pythonapi with stream
* Done adding bitmap support
* added -out=pythonapi support for bitmap
* Setting the messages_target to 0 for output = pythonapi
* Added wrapper for setting -out=pythonapi. Checking if -stdout value can be used in python.
* adding the cc_to_stdout=1 value for -out=pythonapi. Thus generation of output file has been avoided. May be needed to change in future.
* added extractor for g608 grid. removed sami extractor. need to work on overlap of -out=pythonapi and -out=g608
* Removed overlap of -out=pythonapi by adding -pythonapi and
signal_python_api global variable.
* added support for seperate c608 grid catching. Need to test the output
via python.
* added support for seperate printing of text font and color in CE608.
Need to make sure that the function is inbuilt.
* ADDED ce608 GRID SUPPORT FROM PYTHON
need to discuss whether to keep the print_cc_grid function specific to
the module or make it user accessible.
Mostly it would be better to make it user accessible.
* made changes in the call_from_python_api function such that only
api_options is needed to be passed.
An if statement before the call to g608_extractor has also been added.
Waiting for Carlos to comment on the output generated till this stage.
* added a signal_python_api check before calling every write function.
Thus basic writing output can be avoided.
* Commented all calls to python_extract_time_based.
making changes to python_extract_g608 to be called only from the point
when a g608 caption is detected.
* Added pass_cc_buffer_to_python in encoders_common.c temporarily
redefined get_*_encoded from static to normal
included the above functions in encoders_common.h
* Added if-else statement for switch in encode_sub function.
This is done mainly for making sure no output is generated in the api
call.
* Added ccx_encoders_python.c
Defined pass_cc_buffer_to_python in ccx_encoders_python.c
added if else statement in encode_sub's switch to make sure that the output is not generated in case of -pythonapi call
* Removed __wrap_write from the entire code base.
It's declaration and definition are only present in CCExtractor.*
* Commented out the /dev/null part in ccx_encoders_common.c.
Proceeding further on checking for file generation.
* Added output_filename in array global variable and is generated in
init_write function.
included ccextractor.h in output.c to access global variable
signal_python_api for avoiding output generation in init_write and
invalid free in dinit_write.
* Modified the definition of init_write function for accessing
signal_python_api.
* Deleted the commented part of /dev/null in ccx_encoders_common.c.
* Added target_message=0 in -pythonapi param parsing in param.c to avoid
the API from printing to STDOUT.
Deleted the commented part of -out=pythonapi.
Thinking of adding a different param for silencing the output when the
call is made from python api.
* Removed __wrap_write from ccextractor.c and ccextractor.h.
* Added ccx_to_python_g608 and modified api_support.py file.
added documentation in ccextractor.c.
* added the generate srt script. However, some random characters are
coming in first line. Need to talk about this.
* Added SRT generator for python.
Using string to remove the garbage value.
Add code for srt counter and also the start_time and end_time
conversion.
* removed the trash characters and added code to print the timings.
However, the last blank frame also results in a print. Need to take care
of this.
* rectified the mistake of writing only timings and not captions.
now next step is to just make the timings print properly
* some minor changes before diving into extracting srt_counter from the made codebase
* Added extraction of srt_counter in python_extract via fflush
srt_counter-value.
Need to modify the processing in python.
* Added the entire method to extract captions and generate srt files. Next, step would be a to define a concise function for writing the srt
* Processing into a srt working properly.
Next step is to add the information of font into the caption text.
* the data is getting generated for proper SRT counters.
* A turning point to the appraoch.
Added END OF FRAME line for printing the data for every particular
srt_counter.
Proceeding further with the generation of srt by data manipulation.
* some minor bugs but the output srt is being generated correctly. However, The font and colour encoding needs to be done.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Taken care of random characters. Need to discuss this with Carlos. Moving further to font/color processing.
* Added fflush and cleaned up the python code of srt generation
* Added <i> tag for italics.
Proceeding further with other types.
* Added the code to check for underline.
However, need to check how CCExtractor generates srt when both italics
and underline are present. For now a new line is added if both are
present.
* Shifting for making changes in th i/O work.
* Stable ouput for samples with italics is being generated.
* Added the PYTHONAPI macro definition and testing for its existence in the set_python_api function.
* build script for linux is working correctly.
Build_library is showing error of invalid def of set_pythonapi.
Moreover, extractor has some memory seg fault.
* Added mod to set a MACRO as my_python_api to set the callback function.
Till now all calls to the reporter are commented.
Working on getting the reporter to print the lines.
* Changes have been implemented to bring reporter in working state.
For now a constant string is passed from extractor. Need to make the
proper parsing possible.
* Changed the code in extractor such that entire grid is returned to the
callback function.
Need to provide this grid to the write function and also cleanup the
codebase.
* Writing the outputted srt in a file called "temp.srt".
Need to modify init_write to push filename that is to be created in
python using callback.
* Added code to get start and end time simultaneously.
entire SRT is getting generated.
* removed ccx_python_encoders.c
* Compiling and executing on Windows
* Moved definitions get_line_encoded, get_color_encoded, get_font_encoded from ccx_encoders_g608.c to ccx_encoders_common.c.
Also, deleted the static definition of get_font_encoded from
ccx_encoders_webvtt.c
* added a write statement in write_cc_bitmap_as_srt
* Rectified transfer of get_line_encoded, get_color_encoded and
get_font_encoded from ccx_decoders_common.c to ccx_encoders_common.c.
2017-08-20 15:54:35 +00:00
|
|
|
for item in d['text']:
|
2017-08-25 18:03:00 +00:00
|
|
|
if item.count(" ")<32:
|
|
|
|
o=item
|
|
|
|
with open(filename,'ab+') as fh:
|
|
|
|
if encoding_format:
|
|
|
|
fh.write(o.encode(encoding_format))
|
|
|
|
else:
|
|
|
|
fh.write(str(o))
|
|
|
|
fh.write("\n")
|
|
|
|
fh.flush()
|
|
|
|
with open(filename,'ab+') as fh:
|
|
|
|
fh.write("\n")
|
|
|
|
fh.flush()
|