diff --git a/Dictionary/dict_adventure_time.txt b/Dictionary/dict_adventure_time.txt new file mode 100644 index 00000000..614b15cd --- /dev/null +++ b/Dictionary/dict_adventure_time.txt @@ -0,0 +1,54 @@ +Ancient Psychic Tandem War Elephant +Banana Guard +Candy Kingdom +Candy People +Choose Goose +Cinnamon Bun +City of Thieves +Colonel Candycorn +Cosmic Owl +Crab Princess +Dr. Donut +Dr. Ice Cream +Duchess of Nuts +Earl of Lemongrab +Everything Burrito +Finn the Human +Fire Kingdom +Flame Princess +Flying Lettuce Bros. +Ghost Princess +Hotdog Knight +Ice King +Ice Kingdom +Jake the Dog +Lady Rainicorn +Lake Butterscotch +Land of Ooo +Lumpy Space Princess +Marauder Village +Marshmallow Kid +Mr. Cream Puff +Muscle Princess +Nice King +Nice Knights +Nightosphere +Nurse Poundcake +Old Lady Princess +Party Pat +Peppermint Butler +Pillow World +Princess Bubblegum +Raggedy Princess +Root Beer Guy +Sir Slicer +Skeleton Princess +Slime Princess +Snow Golem +The Enchiridion +The Lich +Toast Princess +Tree Fort +Tree Trunks +Wildberry Princess +Wizard Battle diff --git a/Dictionary/dict_greys.anatomy.txt b/Dictionary/dict_greys.anatomy.txt index e515aba9..6637e2f1 100644 --- a/Dictionary/dict_greys.anatomy.txt +++ b/Dictionary/dict_greys.anatomy.txt @@ -6,12 +6,37 @@ Thatcher Grey Derek Shepherd Amelia Shepherd Owen Hunt -Maggie Pierce -Teddy -Dr. Altman +Dr. Margaret Pierce +Dr. Teddy Altman +Alex Karev +Callie Torres +Izzie Stevens +Christina Yang +Mark Sloan +Jackson Avery +Leah Murphy +April Kepner +Arizona Robbins +George O'Malley +Preston Bruke +Miranda Bailey +Denny Duquette +Dr. Addison Montgomery +Richard Webber +Adele Webber +Jo Wilson +Andrew Deluca +Nathan Riggs +Erica Hahn +Sadie Harris +Stephanie Edwards +Jason Myers +Dr. Nicole Herman +Hannah Davies +Shane Ross Seattle Grace Hospital Mercy West Medical Center -Seatle Grace Mercy West Hospital +Seattle Grace Mercy West Hospital Denny Duquette Memorial Clinic Grey Sloan Memorial Hospital Mayo Clinic diff --git a/Dictionary/dict_master_of_none.txt b/Dictionary/dict_master_of_none.txt new file mode 100644 index 00000000..5d0f501d --- /dev/null +++ b/Dictionary/dict_master_of_none.txt @@ -0,0 +1,11 @@ +Dev +Rachel +Go-Gurt +Arnold +Brian +Denise +The Sickening +Nina +Nashville +Paro +Benjamin \ No newline at end of file diff --git a/Dictionary/dict_mr_robot.txt b/Dictionary/dict_mr_robot.txt index 44c009fa..26aac091 100644 --- a/Dictionary/dict_mr_robot.txt +++ b/Dictionary/dict_mr_robot.txt @@ -1,5 +1,9 @@ Mr. Robot +Elliot Alderson +Darlene Angela Moss +Tyrell Wellick +Joanna Wellick Phillip Price Federal Bureau of Investigation Fun Society @@ -8,4 +12,4 @@ New York Evil Corp Headquarters Allsafe Cybersecurity Ron’s Coffee -Python \ No newline at end of file +Python diff --git a/Dictionary/dict_new_girl.txt b/Dictionary/dict_new_girl.txt new file mode 100644 index 00000000..23d86393 --- /dev/null +++ b/Dictionary/dict_new_girl.txt @@ -0,0 +1,11 @@ +Jess +Jessica Day +Nick Miller +Winston Bishop +Schmidt +Cece Parekh +Coach +Latvian Basketball League +Ferguson +True American +Los Angeles middle school diff --git a/Dictionary/dict_steven_universe.txt b/Dictionary/dict_steven_universe.txt new file mode 100644 index 00000000..abf2ca43 --- /dev/null +++ b/Dictionary/dict_steven_universe.txt @@ -0,0 +1,16 @@ +Amethyst +Beach City +Cookie Cat +Crying Breakfast Friends +Crystal Gems +Crystal Temple +Earthlings +Fryman +Garnet +Lion +Pearl +Peridot +Rose Quartz +Ruby +Sapphire +Steven Universe diff --git a/Dictionary/dict_the.big.bang.theory.txt b/Dictionary/dict_the.big.bang.theory.txt index 2f28a5ef..73ff74e6 100644 --- a/Dictionary/dict_the.big.bang.theory.txt +++ b/Dictionary/dict_the.big.bang.theory.txt @@ -1,15 +1,30 @@ The Big Bang Theory Penny +Leonard Hofstadter +Sheldon Cooper +Raj Koothrappali +Bernadette Rostenkowski +Howard Wolowitz +Amy Farrah Fowler +Leslie Winkle +Stuart Bloom +Arthur Jeffries +Mrs. Wolowitz +Barry Kripke +Priya Koothrappali +Mrs. Koothrappali +Mr. Koothrappali +Lucy Sheldon’s Spot The Apartment Building Apartment 4A/B The Laundry Room The Roof -Wolowitz House +Wolowitzs' House Capitol Comics The Cheesecake Factory The Comic Center of Pasadena California Institute of Technology Massachusetts Institute of Technology Jet Propulsion Laboratory -Pasadena \ No newline at end of file +Pasadena diff --git a/Dictionary/dict_the_it_crowd.txt b/Dictionary/dict_the_it_crowd.txt new file mode 100644 index 00000000..d6854ce9 --- /dev/null +++ b/Dictionary/dict_the_it_crowd.txt @@ -0,0 +1,23 @@ +Arsenal Football Club +Aunt Irma +Big Ben +Countdown +Dragon's Den +Emergency Services +Employee of the Month +Friendface +Gay: A Gay Musical +Information Technology +Jen Barber +Lonely Hearts +Maurice Moss +Random Access Memory +Sea Parks +Spaceology +The Banner +The Evening Informer +The Internet +The London Echo +Tnetennba +Windows Vista +Word diff --git a/docs/CHANGES.TXT b/docs/CHANGES.TXT index b60374de..45ebf361 100644 --- a/docs/CHANGES.TXT +++ b/docs/CHANGES.TXT @@ -156,7 +156,7 @@ version of CCExtractor. - Display end time - Display caption mode - Display caption channel - - Use a relative timestamp ( relative to the sample) + - Use a relative timestamp (relative to the sample) - Display XDS info - Use colors Examples: @@ -209,7 +209,7 @@ version of CCExtractor. .raw, which depends on padding. Fixed. - MythTV's branch had a fixed size buffer that could not be enough some times. Made dynamic. -- Better support for PAT changing mid stream. +- Better support for PAT changing mid-stream. - Removed quotes in Start in .smi (format fix). - Added multicast support (Chris Small) - Added ability to select IP address to bind in UDP (Chris Small) @@ -239,10 +239,10 @@ version of CCExtractor. their PMT entry. - Added -datastreamtype to manually selecting a stream based on its type instead of its PID. Useful if your recording program - always hides the caption under the stream stream type. + always hides the caption under the stream type. - Added -streamtype so if an elementary stream is selected manually - for processing the streamtype can be selected too. This can be - needed if you process for example a stream that is declared as + for processing, the streamtype can be selected too. This can be + needed if you process, for example a stream that is declared as "private MPEG" in the PMT, so CCExtractor can't tell what it is. Usually you'll want -streamtype 2 (MPEG video) or -streamtype 6 (MPEG private data). @@ -251,10 +251,10 @@ version of CCExtractor. - Fixes in roll-up, cursor was being moved to column 1 if a RU2, RU3 or RU4 was received even if already in roll-up mode. - Added -autoprogram. If a multiprogram TS is processed and - -autoprogram is used CCExtractor will analyze all PMTs and use + -autoprogram is used, CCExtractor will analyze all PMTs and use the first program that has a suitable data stream. - Timed transcript (ttxt) now also exports the caption mode - (roll-up, paint-on, etc) next to each line, as it's useful to + (roll-up, paint-on, etc.) next to each line, as it's useful to detect things like commercials. - Content Advisory information from XDS is now decoded if it's transmitted in "US TV parental guidelines" or "MPA". @@ -522,12 +522,12 @@ version of CCExtractor. - Removed -autopad and -goppad, no longer needed. - In preparation to a new binary format we have renamed the current .bin to .raw. Raw files - have only CC data (with no header, timing, etc). + have only CC data (with no header, timing, etc.). - The input file format (when forced) is now specified with -in=format such as -in=ts, -in=raw, -in=ps ... - The old switches (-ts, -ps, etc) still work. + The old switches (-ts, -ps, etc.) still work. The only exception is -bin which has been removed (reserved for the new binary format). Use -in=raw to process a raw file. @@ -569,7 +569,7 @@ version of CCExtractor. 0.46 (2008-11-24) ----------------- -- Added support for live streaming, ccextractor +- Added support for live streaming, CCExtractor can now process files that are being recorded at the same time. @@ -619,7 +619,7 @@ version of CCExtractor. - Fixed a bug in the read loop (no less) that caused some files to fail when reading without buffering (which is - the default in the linux build). + the default in the Linux build). - Several improvements in the GUI, such as saving current options as default. @@ -642,7 +642,7 @@ version of CCExtractor. deaf people know if the person talking is at the left or the right of the screen, i.e. there aren't useless. But if they annoy - you go ahead... + you, go ahead... 0.40 (2008-05-20) ----------------- @@ -661,7 +661,7 @@ version of CCExtractor. - Fixed a bug in the CC decoder that could cause the first line not to be cleared in roll-up mode. -- ccextractor can now follow number sequences in +- CCExtractor can now follow number sequences in file names, by suffixing the name with +. For example, @@ -698,7 +698,7 @@ version of CCExtractor. that have been added because old behaviour was annoying to most people: _1 and _2 at the end of the output file names is now added ONLY if - -12 is used (ie when there are two output + -12 is used (i.e. when there are two output files to produce). So ccextractor -srt sopranos.mpg @@ -800,7 +800,7 @@ version of CCExtractor. 0.32 (unreleased) ----------------- -- Added -delay ms, which adds (or substracts) +- Added -delay ms, which adds (or subtracts) a number of milliseconds to all times in .srt/.sami files. For example, @@ -811,7 +811,7 @@ version of CCExtractor. -delay -400 - causes all substitles to appear 400 ms before + causes all subtitles to appear 400 ms before they would normally do. - Added -startat at -endat which lets you select just a portion of data to be processed, @@ -837,7 +837,7 @@ version of CCExtractor. 0.29 (unreleased) ----------------- -- Minor bugfix. +- Minor bug fix. 0.28 (unreleased) ----------------- @@ -851,7 +851,7 @@ version of CCExtractor. 0.27 (unreleased) ----------------- -- Modified sanitizing code, it's less aggresive +- Modified sanitizing code, it's less aggressive now. Ideally it should mean that characters won't be missed anymore. We'll see. @@ -906,7 +906,7 @@ version of CCExtractor. many others (bttv) with the same closed caption recording format. This is the result of hacking MythTV's MPEG parser into - ccextractor. Integration is not very good (to put it + CCExtractor. Integration is not very good (to put it midly) but it seems to work. Depending on the feedback I may continue working on this or just leave it 'as it' (good enough). @@ -923,7 +923,7 @@ version of CCExtractor. It's fixed now at least for the samples I have, if it's not completely fixed let me know. Credit for this goes to Jack Ha who sent me a couple of samples and a first - implementation of a semiworking fix. + implementation of a semi working-fix. - Added support for several input files (see help screen for details). - Added Unicode and Latin-1 encoding. @@ -955,7 +955,7 @@ version of CCExtractor. 0.07 (2007-04-19) ----------------- - Added MPEG reference clock parsing. -- Added autopadding in TS. Does miracles with timing. +- Added auto padding in TS. Does miracles with timing. - Added video information (as extracted from sequence header). - Some code clean-up. - FF sanity check enabled by default. diff --git a/docs/FFMPEG.TXT b/docs/FFMPEG.TXT index 248aa223..4262e786 100644 --- a/docs/FFMPEG.TXT +++ b/docs/FFMPEG.TXT @@ -1,8 +1,8 @@ Overview ======== -FFmpeg Intigration was done to support multiple encapsulations. +FFmpeg Integration was done to support multiple encapsulations. -Dependecy +Dependency ========= FFmpeg library's @@ -35,24 +35,24 @@ make ENABLE_FFMPEG=yes On Windows ---------- put the path of libs/include of ffmpeg library in library paths. -step 1) In visual studio 2013 right click and select property. -step 2) Select Configuration properties in left panel(column) of property. -step 3) Select VC++ Directory. -step 4) In the right pane, in the right-hand column of the VC++ Directory property, +Step 1) In visual studio 2013 right click and select property. +Step 2) Select Configuration properties in left panel(column) of property. +Step 3) Select VC++ Directory. +Step 4) In the right pane, in the right-hand column of the VC++ Directory property, open the drop-down menu and choose Edit. Step 5) Add path of Directory where you have kept uncompressed library of FFmpeg. Set preprocessor flag ENABLE_FFMPEG=1 -Step 1)In visual studio 2013 right click and select property. -Step 2)In the left panel, select Configuration Properties, C/C++, Preprocessor. -Step 3)In the right panel, in the right-hand column of the Preprocessor Definitions property, open the drop-down menu and choose Edit. -Step 4)In the Preprocessor Definitions dialog box, add ENABLE_FFMPEG=1. Choose OK to save your changes. +Step 1) In visual studio 2013 right click and select property. +Step 2) In the left panel, select Configuration Properties, C/C++, Preprocessor. +Step 3) In the right panel, in the right-hand column of the Preprocessor Definitions property, open the drop-down menu and choose Edit. +Step 4) In the Preprocessor Definitions dialog box, add ENABLE_FFMPEG=1. Choose OK to save your changes. Add library in linker -step 1)Open property of project -Step 2)Select Configuration properties -Step 3)Select Linker in left panel(column) -Step 4)Select Input -Step 5)Select Additional dependencies in right panel -Step 6)Add all FFmpeg's lib in new line +Step 1) Open property of project +Step 2) Select Configuration properties +Step 3) Select Linker in left panel(column) +Step 4) Select Input +Step 5) Select Additional dependencies in right panel +Step 6) Add all FFmpeg's lib in new line diff --git a/docs/FRONTEND_COMMUNICATIONS.TXT b/docs/FRONTEND_COMMUNICATIONS.TXT index 1f8aa686..6d8b07e9 100644 --- a/docs/FRONTEND_COMMUNICATIONS.TXT +++ b/docs/FRONTEND_COMMUNICATIONS.TXT @@ -1,4 +1,4 @@ -Starting with version 0.51, ccextractor has a mode +Starting with version 0.51, CCExtractor has a mode that allows frontends and other programs know what the current progress is as well as get information on interesting events, such as a file being open diff --git a/docs/G608.TXT b/docs/G608.TXT index d5168481..936567e6 100644 --- a/docs/G608.TXT +++ b/docs/G608.TXT @@ -46,13 +46,13 @@ The possible color values are: And the possible font values are: R => Regular - I => Italic + I => Italics U => Underlined - B => Underlined + italic + B => Underlined + Italics -If a 'E' is found in ether color or font that means a bug in CCExtractor. Should you ever get +If a 'E' is found in either color or font that means a bug in CCExtractor. Should you ever get an E please send us a .bin file that causes it. This format is intended for post processing tools that need to represent the output of a 608 decoder accurately but that don't want to deal with the madness of other more generic subtitle -formats. \ No newline at end of file +formats. diff --git a/docs/HARDSUBX.txt b/docs/HARDSUBX.txt index 9a5e3542..969ba2bf 100644 --- a/docs/HARDSUBX.txt +++ b/docs/HARDSUBX.txt @@ -36,7 +36,7 @@ pkg-config --libs libswscale On success, you should see the correct include directory path and the linker flags. -To build the program with hardsubx support, from the linux directory run:- +To build the program with hardsubx support, from the Linux directory run:- make ENABLE_HARDSUBX=yes NOTE: The build has been tested with FFMpeg version 3.1.0, and Tesseract 3.04. @@ -44,4 +44,4 @@ NOTE: The build has been tested with FFMpeg version 3.1.0, and Tesseract 3.04. Windows ------- -Coming Soon \ No newline at end of file +Coming Soon diff --git a/docs/MAILINGLIST.TXT b/docs/MAILINGLIST.TXT index 46a60902..1a42e1f7 100644 --- a/docs/MAILINGLIST.TXT +++ b/docs/MAILINGLIST.TXT @@ -3,7 +3,7 @@ A mailing list is now available from sourceforge: https://lists.sourceforge.net/lists/listinfo/ccextractor-users I expect it to be very low traffic (right now there's around 10 -people actively helping with ccextractor in one way or +people actively helping with CCExtractor in one way or another), so almost everything goes here: - Bug reports diff --git a/docs/OCR.txt b/docs/OCR.txt index ed75abe2..e0bf2d0f 100644 --- a/docs/OCR.txt +++ b/docs/OCR.txt @@ -4,14 +4,14 @@ Overview OCR (Optical Character Recognition) is a technique used to extract text from images. In the World of Subtitle, subtitle stored in bitmap format are common and even necessary for converting subtitle -in bitmap format to subtitle in text format ocr is used. +in bitmap format to subtitle in text format OCR is used. Dependency ========== Tesseract (OCR library by Google) -Leptonica (image processing library) +Leptonica (Image processing library) -How to compile ccextractor on linux with OCR +How to compile CCExtractor on Linux with OCR ============================================= Download and Install Leptonnica. @@ -50,12 +50,12 @@ you can download tesseract training data from https://github.com/tesseract-ocr/t -Compile CCextractor passing flags like following +Compile CCExtractor passing flags like following ------------------------------------------------- make ENABLE_OCR=yes -How to compile ccextractor on Windows with OCR +How to compile CCExtractor on Windows with OCR =============================================== Download prebuild library of leptonica and tesseract from following link @@ -72,23 +72,23 @@ Step 5) Add path of Directory where you have kept uncompressed library of lepton Set preprocessor flag ENABLE_OCR=1 -Step 1)In visual studio 2013 right click and select property. -Step 2)In the left panel, select Configuration Properties, C/C++, Preprocessor. -Step 3)In the right panel, in the right-hand column of the Preprocessor Definitions property, open the drop-down menu and choose Edit. -Step 4)In the Preprocessor Definitions dialog box, add ENABLE_OCR=1. Choose OK to save your changes. +Step 1) In visual studio 2013 right click and select property. +Step 2) In the left panel, select Configuration Properties, C/C++, Preprocessor. +Step 3) In the right panel, in the right-hand column of the Preprocessor Definitions property, open the drop-down menu and choose Edit. +Step 4) In the Preprocessor Definitions dialog box, add ENABLE_OCR=1. Choose OK to save your changes. Add library in linker -step 1)Open property of project -Step 2)Select Configuration properties -Step 3)Select Linker in left panel(column) -Step 4)Select Input -Step 5)Select Additional dependencies in right panel -Step 6)Add libtesseract304d.lib in new line -Step 7)Add liblept172.lib in new line +step 1) Open property of project +Step 2) Select Configuration properties +Step 3) Select Linker in left panel(column) +Step 4) Select Input +Step 5) Select Additional dependencies in right panel +Step 6) Add libtesseract304d.lib in new line +Step 7) Add liblept172.lib in new line Download language data from following link https://code.google.com/p/tesseract-ocr/downloads/list after downloading the tesseract-ocr-3.02.eng.tar.gz extract the tar file and put -tessdata folder where you have kept ccextractor executable +tessdata folder where you have kept CCExtractor executable Copy the tesseract and leptonica dll from lib folder downloaded from above link to folder of executable or in system32. diff --git a/docs/using_cmake_build.txt b/docs/using_cmake_build.txt index 9326de99..8707cac7 100644 --- a/docs/using_cmake_build.txt +++ b/docs/using_cmake_build.txt @@ -1,4 +1,4 @@ -For building ccextractor using cmake follow steps below.. +For building CCExtractor using cmake follow steps below.. Step 1) Check you have right version of cmake installed. ( version >= 3.0.2 ) We are using CMP0037 policy of cmake which was introduced in 3.0.0 diff --git a/src/ccextractor.c b/src/ccextractor.c index 3275565f..cb90c896 100644 --- a/src/ccextractor.c +++ b/src/ccextractor.c @@ -1,4 +1,4 @@ -/* CCExtractor, carlos at ccextractor org +/* CCExtractor, originally by carlos at ccextractor.org, now a lot of people. Credits: See CHANGES.TXT License: GPL 2.0 */