mirror of
https://github.com/CCExtractor/ccextractor.git
synced 2025-01-04 09:13:46 +00:00
1052 lines
42 KiB
Plaintext
1052 lines
42 KiB
Plaintext
0.86
|
|
-----------------
|
|
- Fix: Prevent the OCR being initialized more than once (happened on multiprogram and
|
|
PAT changes)
|
|
- New: Added Autoconf build scripts for CCExtractor to generate makefiles (mac).
|
|
- New: Added Autoconf build scripts for CCExtractor to generate makefiles (linux).
|
|
- New: Added .rpm package generation script.
|
|
- New: Added build/installation script for .pkg.tar.xz (Arch Linux).
|
|
- New: Added tarball generation script.
|
|
- New: Added --analyzevideo. If present the video stream will be processed even if the
|
|
subtitles are in a different stream. This is useful when we want video information
|
|
(resolution, frame type, etc). -vides now implies this option too.
|
|
[Note: Tentative - some possibly breaking changed were made for this, so if you
|
|
use it validate results]
|
|
|
|
|
|
0.85b (2017-1-26)
|
|
-----------------
|
|
- Fix: Base Windows binary (without OCR) compiled without DLL dependencies.
|
|
|
|
0.85 (2017-01-23)
|
|
-----------------
|
|
- New: Added FFMPEG 3.0 to Windows build - last one that is XP compatible.
|
|
- New: Major improvements in CEA-608 to WebVTT (styles, etc).
|
|
- New: Return a non-zero return code if no subtitles are found.
|
|
- New: Windows build files updated to Visual Studio 2015, new target platform is 140_xp.
|
|
- New: Added basic support of Tesseract 4.0.0.
|
|
- New: Added build script for .deb.
|
|
- New: Updated -debugdvbsub parameter to get the most relevant DVB traces for debugging.
|
|
- New: SMPTE-TT files are now compatible with Adobe Premiere.
|
|
- New: Updated libpng.
|
|
- New: Added 3rd party (Tracy from archive.org) static linux build script.
|
|
- New: Add chapter extraction for MP4 files.
|
|
- New: Return code 10 if no captions are found at all.
|
|
- Fix: Teletext duplicate lines in certain cases.
|
|
- Fix: Improved teletext timing.
|
|
- Fix: DVB timing is finally good.
|
|
- Fix: A few minor memory leaks.
|
|
- Fix: tesseract library file included in mac build command.
|
|
- Fix: Bad WTV timings in some cases.
|
|
- Fix: Mac build script.
|
|
- Fix: Memory optimization in HARDSUBX edit_distance.
|
|
- Fix: SubStation Alpha subtitles in bitmap.
|
|
- Fix: lept msg severity in linux.
|
|
- Fix: SSA, SPUPNG and VTT timing and skipping of subtitles for SAMI and TTML.
|
|
- Fix: SMPTE-TT : Added support for font color.
|
|
- Fix: SAMI unnecessary empty subtitle when extracting DVB subs.
|
|
- Fix: Skip the packet if the adaptation field length is broken.
|
|
- Fix: 708 - lots of work done in the decoder. Implementation of more commands. Better timing.
|
|
|
|
|
|
|
|
0.84 (2016-12-16)
|
|
-----------------
|
|
- New: In Windows, both with and without-OCR binaries are bundled, since the OCR one causes problems due to
|
|
dependencies in some system. So unless you need the OCR just use the non-OCR version.
|
|
- New: Added -sbs (sentence by sentence) for DVB output. Each frame in the output file contains a complete
|
|
sentence (experimental).
|
|
- New: Added -curlposturl. If used each output frame will be sent with libcurl by doing a POST to that URL.
|
|
- Fix: More code consistency checking in function names.
|
|
- Fix: linux build script now tries to verify dependencies.
|
|
- Fix: Mac build script was missing a directory.
|
|
|
|
|
|
0.83 (2016-12-13)
|
|
-----------------
|
|
- Fix: Duplicate lines in mp4 (specifically affects itunes).
|
|
- Fix: Timing in .mp4, timing now calculated for each CC pair instead of per atom.
|
|
- Fix: Typos everywhere in the documentation and source code.
|
|
- Fix: CMakeLists for build in cmake.
|
|
- Fix: -unixts option.
|
|
- Fix: FPS switching messages.
|
|
- Fix: Removed ugly debug statement with local path in HardsubX.
|
|
- Fix: Changed platform target to v120_xp in Visual Studio (so XP is supported again).
|
|
- Fix: Added detail in many error messages.
|
|
- Fix: Memory leaks in videos with XDS.
|
|
- Fix: Makefile compatibility issues with Raspberry pi.
|
|
- Fix: missing separation between WebVTT header and body.
|
|
- Fix: Stupid bug in M2TS that preventing it from working.
|
|
- Fix: OCR libraries dependencies for the release version in Windows.
|
|
- Fix: non-buffered reading from pipes.
|
|
- Fix: --stream option with stdin.
|
|
- New: terminate_asap to buffered_read_opt
|
|
- New: Added some TV-show specific spelling dictionaries.
|
|
- New: Updated GPAC library.
|
|
- New: ASS/SSA.
|
|
- New: Capture sigterm to do some clean up before terminating.
|
|
- New: Work on 708: Changed DefineWindow behavior, only clear text of an existing window is style has changed.
|
|
|
|
0.82 (2016-08-15)
|
|
-----------------
|
|
- New: HardsubX - Burned in subtitle extraction subsystem.
|
|
- New: Color Detection in DVB Subtitles
|
|
- Fix: Corrected sentence capitalization
|
|
- Fix: Skipping redundant bytes at the end of tx3g atom in MP4
|
|
- Fix: Illegal SRT files being created from DVB subtitles
|
|
- Fix: Incorrect Progress Display
|
|
|
|
0.81 (2016-06-13)
|
|
-----------------
|
|
- New: --version parameter for extensive version information (version number, compile date, executable hash, git commit (if appropriate))
|
|
- New: Add -sem (semaphore) to create a .sem file when an output file is open and delete it when it's closed.
|
|
- New: Add --append parameter. This will prevent overwriting of existing files.
|
|
- New: File Rotation support added. The user has to send a USR1 signal to rotate.
|
|
- Fix: Issues with files <1 Mb
|
|
- Fix: Preview of generated transcript.
|
|
- Fix: Statistics were not generated anymore.
|
|
- Fix: Correcting display of sub mode and info in transcripts.
|
|
- Fix: Teletext page number displayed in -UCLA.
|
|
- Fix: Removal of excessive XDS notices about aspect ratio info.
|
|
- Fix: Force Flushing of file buffers works for all files now.
|
|
- Fix: mp4 void atoms that was causing some .mp4 files to fail.
|
|
- Fix: Memory usage caused by EPG processing was high due to many non-dynamic buffers.
|
|
- Fix: Project files for Visual Studio now include OCR support in Windows.
|
|
|
|
0.80 (2016-04-24)
|
|
-----------------
|
|
- Fix: "Premature end of file" (one of the scenarios)
|
|
- Fix: XDS data is always parsed again (needed to extract information such as program name)
|
|
- Fix: Teletext parsing: @ was incorrectly exported as * - X/26 packet specifications in ETS 300 706 v1.2.1 now better followed
|
|
- Fix: Teletext parsing: Latin G2 subsets and accented characters not displaying properly
|
|
- Fix: Timing in -ucla
|
|
- Fix: Timing in ISDB (some instances)
|
|
- Fix: "mfra" mp4 box weight changed to 1 (this helps with correct file format detection)
|
|
- Fix: Fix for TARGET File is null.
|
|
- Fix: Fixed SegFaults while parsing parameters (if mandatory parameter is not present in -outinterval, -codec or -nocodec)
|
|
- Fix: Crash when input small is too small
|
|
- Fix: Update some URLs in code (references to docs)
|
|
- Fix: -delay now updates final timestamp in ISDB, too
|
|
- Fix: Removed minor compiler warnings
|
|
- Fix: Visual Studio solution files working again
|
|
- Fix: ffmpeg integration working again
|
|
- New: Added --forceflush (-ff). If used, output file descriptors will be flushed immediately after being written to
|
|
- New: Hexdump XDS packets that we cannot parse (shouldn't be many of those anyway)
|
|
- New: If input file cannot be open, provide a decent human readable explanation
|
|
- New: GXF support
|
|
|
|
0.79 (2016-01-09)
|
|
-----------------
|
|
- Support for Grid Format (g608)
|
|
- Show Correct number of teletext packet processed
|
|
- Removed Segfault on incorrect mp4 detection
|
|
- Remove xml header from transcript format
|
|
- Help message updated for Teletext
|
|
- Added --help and -h for help message
|
|
- Added --nohtmlescape option
|
|
- Added --noscte20 option
|
|
|
|
0.78 (2015-12-12)
|
|
-----------------
|
|
- Support to extract Closed Caption from MultiProgram at once.
|
|
- CEA-708: exporting to SAMI (.smi), Transcript (.txt), Timed Transcript (ttxt) and SubRip (.srt).
|
|
- CEA-708: 16 bit charset support (tested on Korean).
|
|
- CEA-708: Roll Up captions handling.
|
|
- Changed TCP connection protocol (BIN data is now wrapped in packets, added EPG support and keep-alive packets).
|
|
- TCP connection password prompt is removed. To set connection password use -tcppassword argument instead.
|
|
- Support ISDB Closed Caption.
|
|
- Added a new output format, simplexml (used internally by a CCExtractor user, may or may not be useful for
|
|
anyone else).
|
|
|
|
0.77 (2015-06-20)
|
|
-----------------
|
|
- Fixed bug in capitalization code ('I' was not being capitalized).
|
|
- GUI should now run in Windows 8 (using the include .Net runtime, since
|
|
3.5 cannot be installed in Windows 8 apparently).
|
|
- Fixed Mac build script, binary is now compiled with support for
|
|
files over 2 GB.
|
|
- Fixed bug in PMT code, damaged PMT sections could make CCExtractor
|
|
crash.
|
|
|
|
0.76 (2015-03-28)
|
|
-----------------
|
|
- Added basic M2TS support
|
|
- Added EPG support - you can now export the Program Guide to XML
|
|
- Some bug fixes
|
|
|
|
0.75 (2015-01-15)
|
|
-----------------
|
|
- Fixed issue with teletext to other then srt.
|
|
- CCExtractor can be used as library if compiled using cmake
|
|
- By default the Windows version adds BOM to generated UTF files (this is
|
|
because it's needed to open the files correctly) while all other
|
|
builds don't add it (because it messes with text processing tools).
|
|
You can use -bom and -nobom to change the behaviour.
|
|
|
|
0.74 (2014-09-24)
|
|
-----------------
|
|
- Fixed issue with -o1 -o2 and -12 parameters (where it would write output only in the o2 file)
|
|
- Fixed UCLA parameter issue. Now the UCLA parameter settings can't be overwritten anymore by later parameters that affect the custom transcript
|
|
- Switched order around for TLT and TT page number in custom transcript to match UCLA settings
|
|
- Added nobom parameter, for when files are processed by tools that can't handle the BOM. If using this, files might be not readable under windows.
|
|
- Segfault fix when no input files were given
|
|
- No more bin output when sending to server + possibility to send TT to server for processing
|
|
- Windows: Added the Microsoft redistributable MSVCR120.DLL to both the installation package and the application zip.
|
|
|
|
0.73 - GSOC (2014-08-19)
|
|
------------------------
|
|
- Added support of BIN format for Teletext
|
|
- Added start of librarization. This will allow in the future for other programs to use encoder/decoder functions and more.
|
|
|
|
0.72 - GSOC (2014-08-12)
|
|
------------------------
|
|
- Fix for WTV files with incorrect timing
|
|
- Added support for fps change using data from AVC video track in a H264 TS file.
|
|
- Added FFMpeg Support to enable all encapsulator and decoder provided by ffmpeg
|
|
|
|
0.71 - GSOC (2014-07-31)
|
|
------------------------
|
|
- Added feature to receive captions in BIN format according to CCExtractor's own
|
|
protocol over TCP (-tcp port [-tcppassword password])
|
|
- Added ability to send captions to the server described above or to the
|
|
online repository (-sendto host[:port])
|
|
- Added -stdin parameter for reading input stream from standard input
|
|
- Compilation in Cygwin using linux/Makefile
|
|
- Fix for .bin files when not using latin1 charset
|
|
- Correction of mp4 timing, when one timestamp points timing of two atom
|
|
|
|
0.70 - GSOC (2014-07-06)
|
|
------------------------
|
|
This is the first release that is part of Google's Summer of Code.
|
|
Anshul, Ruslan and Willem joined CCExtractor to work on a number of things
|
|
over the summer, and their work is already reaching the mainstream
|
|
version of CCExtractor.
|
|
|
|
- Added a huge dictionary submitted by Matt Stockard.
|
|
- Added DVB subtitles decoder, spupng in output
|
|
- Added support for cdt2 media atoms in QT video files. Now multiple atoms in
|
|
a single sample sequence are supported.
|
|
- Changed Makefile.
|
|
- Fixed some bugs.
|
|
- Added feature to print info about file's subtitles and streams (-out=report).
|
|
- Support Long PMT.
|
|
- Support Configuration file.
|
|
- There is an sample configuration file in doc/ folder with name
|
|
ccextractor.cnf.sample
|
|
- Just now only ccextractor.cnf named files kept beside ccextractor
|
|
executable is supported
|
|
- for details of which options can be set using configuration file,
|
|
please look at sample file.
|
|
|
|
- Added options for custom transcript output:
|
|
new parameter (-customtxt format), where the format must be like this: 1100100 (7 digits).
|
|
These indicate whether the next things should be displayed or not in the (timed) transcript:
|
|
- Display start time
|
|
- Display end time
|
|
- Display caption mode
|
|
- Display caption channel
|
|
- Use a relative timestamp (relative to the sample)
|
|
- Display XDS info
|
|
- Use colors
|
|
Examples:
|
|
0000101 is the default setting for transcripts
|
|
1110101 is the default for timed transcripts
|
|
1111001 is the default setting for -ucla
|
|
Make sure you use this parameter after others that might affect these
|
|
settings (-out, -ucla, -xds, -txt, -ttxt, ...)
|
|
- Fixed Negative timing Bug
|
|
|
|
0.69 (2014-04-05)
|
|
-----------------
|
|
- A few patches from Christopher Small, including proper support
|
|
for multiple multicast clients listening on the same port.
|
|
- GUI: Fixed teletext preview.
|
|
- GUI: Added a small indicator of data being received when reading from
|
|
UDP.
|
|
- GUI: Added UTF-8 support to preview Window (used for teletext).
|
|
- Fixes in Makefile and build script, compilation in linux and OSX failed
|
|
if another libpng was found in the system.
|
|
- WTV support directly in CCExtractor (no need for wtvccdump any more).
|
|
- Started refactoring and clean-up.
|
|
- Fix: MPEG clock rollover (happens each 26 hours) caused a time
|
|
discontinuity.
|
|
- Windows GUI: Started work on HDHomeRun support. For now it just looks
|
|
for HDHomeRun devices. Lots of other things will arrive in the next
|
|
versions.
|
|
- Windows GUI: Some code refactoring, since the HDHomeRun support makes
|
|
the code larger enough to require more than one source file :-)
|
|
|
|
0.68 (2013-12-24)
|
|
-----------------
|
|
- A couple of shared variables between 608 decoders were causing
|
|
problems when both fields were processed at the same time with
|
|
-12, fixed.
|
|
- Added BOM for UTF-8 files.
|
|
- Corrected a few extended characters in the UTF-8 encoding,
|
|
probably never used in real world captioning but since we got
|
|
a good test sample file...
|
|
- Color and fonts in PAC commands were ignored, fixed (Helen Buus).
|
|
- Added a new output format, spupng. It consists on one .png file
|
|
for each subtitle frame and one .xml with all the timing
|
|
(Heleen Buus).
|
|
- Some fixes (Chris Small).
|
|
|
|
0.67 (2013-10-09)
|
|
-----------------
|
|
- Padding bytes were being discarded early in the process in 0.66,
|
|
which is convenient for debugging, but it messes with timing in
|
|
.raw, which depends on padding. Fixed.
|
|
- MythTV's branch had a fixed size buffer that could not be enough
|
|
some times. Made dynamic.
|
|
- Better support for PAT changing mid-stream.
|
|
- Removed quotes in Start in .smi (format fix).
|
|
- Added multicast support (Chris Small)
|
|
- Added ability to select IP address to bind in UDP (Chris Small)
|
|
- Fixes in -unixts and -delay for teletext.
|
|
- Added -autodash : When two people are talking, add a dash as
|
|
needed (this is based on subtitle position). Only in .srt and
|
|
with -trim. Quite experimental, feedback appreciated.
|
|
- Added -latin1 to select Latin 1 as encoding. Default is now
|
|
UTF-8 (-utf8 still exists but it's not needed).
|
|
- Added -ru1, which emulates a (non-existing in real life) 1 line
|
|
roll-up mode.
|
|
|
|
|
|
0.66 (2013-07-01)
|
|
-----------------
|
|
- Fixed bug in auto detection code that triggered a message
|
|
about file being auto of sync.
|
|
- Added -investigate_packets
|
|
The PMT is used to select the most promising elementary stream
|
|
to get captions from. Sometimes captions are where you least
|
|
expect it so -datapid allows you to select a elementary stream
|
|
manually, in case the CC location is not obvious from the PMT
|
|
contents. To assist looking for the right stream, the parameter
|
|
"-investigate_packets" will have CCExtractor look inside each
|
|
stream, looking for CC markers, and report the streams that
|
|
are likely to contain CC data even if it can't be determined from
|
|
their PMT entry.
|
|
- Added -datastreamtype to manually selecting a stream based on
|
|
its type instead of its PID. Useful if your recording program
|
|
always hides the caption under the stream type.
|
|
- Added -streamtype so if an elementary stream is selected manually
|
|
for processing, the streamtype can be selected too. This can be
|
|
needed if you process, for example a stream that is declared as
|
|
"private MPEG" in the PMT, so CCExtractor can't tell what it is.
|
|
Usually you'll want -streamtype 2 (MPEG video) or -streamtype 6
|
|
(MPEG private data).
|
|
- PMT content listing improved, it now shows the stream type for
|
|
more types.
|
|
- Fixes in roll-up, cursor was being moved to column 1 if a
|
|
RU2, RU3 or RU4 was received even if already in roll-up mode.
|
|
- Added -autoprogram. If a multiprogram TS is processed and
|
|
-autoprogram is used, CCExtractor will analyze all PMTs and use
|
|
the first program that has a suitable data stream.
|
|
- Timed transcript (ttxt) now also exports the caption mode
|
|
(roll-up, paint-on, etc.) next to each line, as it's useful to
|
|
detect things like commercials.
|
|
- Content Advisory information from XDS is now decoded if it's
|
|
transmitted in "US TV parental guidelines" or "MPA".
|
|
Other encoding such as Canada's are not supported yet due
|
|
to lack of samples.
|
|
- Copy Management information from XDS is now decoded.
|
|
- Added -xds. If present and export format is timed transcript
|
|
(only), XDS information will be saved to file (same file as the
|
|
transcript, with XDS being clearly marked). Note that for now
|
|
all XDS data is exported even if it doesn't change, so the
|
|
transcript file will be significantly larger.
|
|
- Added some PaintOn support, at least enough to prevent it
|
|
from breaking things when the other modes are used.
|
|
- Removed afd_data() warning. AFD doesn't carry any caption related
|
|
data. AFD still detected in code in case we want to do something
|
|
with it later anyway.
|
|
- Ported last changes from Petr Kutalek's telxcc. Current version
|
|
is 2.4.4.
|
|
- In teletext mode when exporting to transcript (not .srt), an effort
|
|
is made to detect and merge line duplicates. This is done by using
|
|
the Levenshtein's distance, which is the number of changes requires
|
|
to convert one string to another. To simplify things, strings are
|
|
compared up to the length of the shortest one.
|
|
There are 3 parameters that can be used to tweak the thresholds:
|
|
-deblev: Enable debug so the calculated distance for each two
|
|
strings is displayed. The output includes both strings, the
|
|
calculated distance, the maximum allowed distance, and whether
|
|
the strings are ultimately considered equivalent or not, i.e.
|
|
the calculated distance is less or equal than the max allowed.
|
|
-levdistmincnt value: Minimum distance we always allow
|
|
regardless of the length of the strings. Default 2. This means
|
|
that if the calculated distance is 0, 1 or 2, we consider the
|
|
strings to be equivalent.
|
|
-levdistmaxpct value: Maximum distance we allow, as a
|
|
percentage of the shortest string length. Default 10%. For
|
|
example, consider a comparison of one string of 30 characters
|
|
and one of 60 characters. We want to determine whether the
|
|
first 30 characters of the longer string are more or less the
|
|
same as the shortest string, i.e. whether the longest string
|
|
is the shortest one plus new characters and maybe some
|
|
corrections. Since the shortest string is 30 characters and
|
|
the default percentage is 10%, we would allow a distance of
|
|
up to 3 between the first 30 characters.
|
|
- Added -lf : Use UNIX line terminator (LF) instead of Windows (CRLF).
|
|
- Added -noautotimeref: Prevent UTC reference from being auto set from
|
|
the stream data.
|
|
|
|
0.65 (2013-03-14)
|
|
-----------------
|
|
- Minor GUI changes for teletext
|
|
- Added end timestamps in timed transcripts
|
|
- Added support for SMPTE (patch by John Kemp)
|
|
- Initial support for MPEG2 video tracks inside MP4 files (thanks a
|
|
lot to GPAC's Jean who assisted in analyzing the sample and
|
|
doing the required changes in GPAC).
|
|
- Improved MP4 auto detection
|
|
- Support for PCR if PTS is not available (needed for some teletext
|
|
samples, and probably useful for everything else).
|
|
- Support for UDP streaming - finally. Use "-udp $port" to have
|
|
CCExtractor listen for a stream. I've only been able to test it
|
|
with an European HDHomeRun, but it should work fine with any other
|
|
tuner.
|
|
- Refactored PMT / PAT processing in transport streams, now allows to
|
|
display their contents (-parsePAT and -parsePMT) which makes
|
|
troubleshooting easier.
|
|
|
|
0.64 (2012-10-29)
|
|
-----------------
|
|
- Changed Window GUI size (larger).
|
|
- Added Teletext options to GUI.
|
|
- Added -teletext to force teletext mode even if not detected
|
|
- Added -noteletext to disable teletext detection. This can be needed
|
|
for streams that have both 608 data and teletext packets if you
|
|
need to process the 608 data (if teletext is detected it will
|
|
take precedence otherwise).
|
|
- Added -datapid to force a specific elementary stream to be used for
|
|
data (bypassing detections).
|
|
- Added -ru2 and -ru3 to limit the number of visible lines in roll-up
|
|
captions (bypassing whatever the broadcast says).
|
|
- Added support for a .hex (hexadecimal) dump of data.
|
|
- Added support for wtv in Windows. This is done by using a new program
|
|
(wtvccdump.exe) and a new DirectShow filter (CCExtractorDump.dll) that
|
|
process the .wtv using DirecShow's filters and export the line 21 data
|
|
to a .hex file. The GUI calls wtvccdump.exe as needed.
|
|
- Added --nogoptime to force PTS timing even when CCExtractor would
|
|
use GOP timing otherwise.
|
|
|
|
0.63 (2012-08-17)
|
|
-----------------
|
|
- Telext support added, by integrating Petr Kutalek's telxcc. Integration is
|
|
still quite basic (there's equivalent code from both CCExtractor and
|
|
telxcc) and some clean up is needed, but it works. Petr has announced that
|
|
he's abandoning telxcc so further development will happen directly in
|
|
CCExtractor.
|
|
- Some bug fixes, as usual.
|
|
|
|
0.62 (2012-05-23)
|
|
-----------------
|
|
- Corrected Mac build "script" (needed to add GPAC includes). Thanks to the
|
|
Mac users that sent this.
|
|
- Hauppauge mode now uses PES timing, needed for files that don't have
|
|
caption data during all the video (such as in commercial breaks).
|
|
- Added -mp4 and -in:mp4 to force the input to be processed as MP4.
|
|
- CC608 data embedded in a separate stream (as opposed as in the video
|
|
stream itself) in MP4 files is now supported (not heavily tested).
|
|
This should be rather useful since closed captioned files from iTunes
|
|
use this format.
|
|
- More CEA-708 work. The debugger is now able to dump the "TV" contents for
|
|
the first time. Also, a .srt can be generated, however timing is not quite
|
|
good yet (still need to figure out why).
|
|
- Added -svc (or --service) to select the CEA-708 services to be processed.
|
|
For example, -svc 1,2 will process the primary and secondary language
|
|
services. Valid values are 1-63, where 1 is the primary language, 2 is
|
|
the secondary language (this is part of the specification) and 3-63 are
|
|
provider defined.
|
|
- Rajesh Hingorani sent a fix for the MPEG decoder that fixes garbled output
|
|
or certain samples (we had none like this in our test collection). Thanks,
|
|
Rajesh.
|
|
|
|
0.61 (2012-03-08)
|
|
-----------------
|
|
- Fix: GCC 3.4.4 can now build CCExtractor.
|
|
- Fix: Damaged TS packets (those that come with 'error in transport' bit
|
|
on) are now skipped.
|
|
- Fix: Part of the changes for MP4 support (CC packets buffering in
|
|
particular) broke some stuff for other files, causing at least very
|
|
annoying character duplication. We hope we've fixed it without breaking
|
|
anything but please report).
|
|
- Some non-interesting cleanup.
|
|
|
|
0.60 (unreleased)
|
|
-----------------
|
|
- Add: MP4 support, using GPAC (a media library). Integration is currently
|
|
"enough so it works", but needs some more work. There's some duplicate
|
|
code, the stream must be a file (no streaming), etc.
|
|
- Fix: The Windows version was writing text files with double \r.
|
|
- Fix: Closed captions blocks with no data could cause a crash.
|
|
- Fix: -noru (to generate files without duplicate lines in
|
|
roll-up) was broken, with complete lines being missing.
|
|
- Fix: bin format not working as input.
|
|
|
|
0.59 (2011-10-07)
|
|
-----------------
|
|
- More AVC/H.264 work. pic_order_cnt_type != 0 will be processed now.
|
|
- Fix: Roll-up captions with interruptions for Text (with ResumeTextDisplay
|
|
in the middle of the caption data) were missing complete lines.
|
|
- Added a timed text transcript output format, probably only useful for
|
|
roll-up captions. Use --timedtranscript or -ttxt. Output is like this:
|
|
|
|
00:01:25,485 | HOST: LAST NIGHT THE REPUBLICAN
|
|
00:01:29,522 | HOPEFULS INTRODUCE THEMSELVES TO
|
|
00:01:30,623 | PRIMARY VOTERS.
|
|
|
|
- XDS parser. Not complete (no point in dealing with V-Chip stuff for
|
|
example), but enough to extract program and station information.
|
|
- Input streams can now come from standard input using - (just an hyphen)
|
|
as parameter.
|
|
- Added a new output format called 'null' (use -null or -out=null). This
|
|
format means "Don't produce any file", and is useful to have CCExtractor
|
|
process the stream (for XDS messages, debugging, etc) without actually
|
|
generating anything.
|
|
- Updated Windows GUI.
|
|
- Added -quiet => If used, CCExtractor will not write any message.
|
|
- Added -stdout => If used, the captions will be sent to stdout (console)
|
|
instead of file. Combined with -, CCExtractor can work as a filter in
|
|
a larger process, receiving the stream from stdin and sending the
|
|
captions to stdout.
|
|
- Some code clean up, minor refactoring.
|
|
- Teletext detection (not yet processing).
|
|
|
|
0.58 (2011-08-21)
|
|
-----------------
|
|
- Implemented new PTS based mode to order the caption information
|
|
of AVC/H.264 data streams. The old pic_order_cnt_lsb based method
|
|
is still available via the -poc or --usepicorder command switches.
|
|
- Removed a couple of those annoying "Impossible!" error messages
|
|
that appears when processing some (possibly broken, unsure) files.
|
|
- Added -nots --notypesettings to prevent italics and underline
|
|
codes from being displayed.
|
|
- Note to those not liking the paragraph symbol being used for the
|
|
music note: Submit a VALID replacement in latin-1.
|
|
- Added preliminary support for multiple program TS files. The
|
|
parameter --program-number (or -pn) will let you choose which
|
|
program number to process. If no number is passed and the TS
|
|
file contains more than one, CCExtractor will display a list of
|
|
found programs and terminate.
|
|
- Added support (basic, because I only received one sample) for some
|
|
Hauppauge cards that save CC data in their own format. Use the
|
|
parameter -haup to enable it (CCExtractor will display a notice
|
|
if it thinks that it's processing a Hauppauge capture anyway).
|
|
- Fixed bug in roll-up.
|
|
- More AVC work, now TS files from echostar that provided garbled
|
|
output are processed OK.
|
|
- Updated Windows GUI.
|
|
|
|
0.57 (2010-12-16)
|
|
-----------------
|
|
- Bug fixes in the Windows version. Some debug code was unintentionally
|
|
left in the released version.
|
|
|
|
0.56 (2010-12-09)
|
|
-----------------
|
|
- H264 support
|
|
- Other minor changes a lot less important
|
|
|
|
0.55 (2009-08-09)
|
|
-----------------
|
|
- Replace pattern matching code with improved parser for MPEG-2 elementary
|
|
streams.
|
|
- Fix parsing of ReplayTV 5000 captions.
|
|
- Add ability to decode SCTE 20 encoded captions.
|
|
- Make decoding of TS files more error tolerant.
|
|
- Start implementation of EIA-708 decoding (not active yet).
|
|
- Add -gt / --goptime switch to use GOP timing instead of PTS timing.
|
|
- Start implementation of AVC/H.264 decoding (not active yet).
|
|
- Fixed: The basic problem is that when 24fps movie film gets converted to 30fps NTSC
|
|
they repeat every 4th frame. Some pics have 3 fields of CC data with field 3 CC data
|
|
belongs to the same channel as field 1. The following pics have the fields reversed
|
|
because of the odd number of fields. I used top_field_first to tell when the channels
|
|
are reversed. See Table 6-1 of the SCTE 20 [Paul Fernquist]
|
|
|
|
0.54 (2009-04-16)
|
|
-----------------
|
|
- Add -nosync and -fullbin switches for debugging purposes.
|
|
- Remove -lg (--largegops) switch.
|
|
- Improve synchronization of captions for source files with
|
|
jumps in their time information or gaps in the caption
|
|
information.
|
|
- [R. Abarca] Changed Mac script, it now compiles/link
|
|
everything from the /src directory.
|
|
- It's now possible to have CCExtractor add credits
|
|
automatically.
|
|
- Added a feature to add start and end messages (for credits).
|
|
See help screen for details.
|
|
|
|
0.53 (2009-02-24)
|
|
-----------------
|
|
- Force generated RCWT files to have the same length as source file.
|
|
- Fix documentation for -startat / -endat switches.
|
|
- Make -startat / -endat work with all output formats.
|
|
- Fix sync check for raw/rcwt files.
|
|
- Improve timing of dvr-ms NTSC captions.
|
|
- Add -in=bin switch to read CCExtractor's own binary format.
|
|
- Fix problem with short input files (smaller 1MB).
|
|
- Clean up regular and debug output.
|
|
- Add -out=bin switch to write RCWT data.
|
|
- Remove -bo/--bufferoutput switch and functionality.
|
|
- [Volker] Added new generic binary format (RCWT
|
|
for Raw Captions With Time). This new format
|
|
allows one file to contain all the available
|
|
closed caption data instead of just one stream.
|
|
- Added --no_progress_bar to disable status
|
|
information (mostly used when debugging, as the
|
|
progress information is annoying in the middle
|
|
of debug logs).
|
|
- The Windows GUI was reported to freeze in some
|
|
conditions. Fixed.
|
|
- The Windows GUI is now targeted for .NET 2.0
|
|
instead of 3.5. This allows Windows 2000 to run
|
|
it (there's not .NET 3.5 for Windows 2000), as
|
|
requested by a couple of key users.
|
|
|
|
0.51 (unreleased)
|
|
-----------------
|
|
- Removed -autopad and -goppad, no longer needed.
|
|
- In preparation to a new binary format we have
|
|
renamed the current .bin to .raw. Raw files
|
|
have only CC data (with no header, timing, etc.).
|
|
- The input file format (when forced) is now
|
|
specified with
|
|
-in=format
|
|
such as -in=ts, -in=raw, -in=ps ...
|
|
The old switches (-ts, -ps, etc.) still work.
|
|
The only exception is -bin which has been removed
|
|
(reserved for the new binary format). Use
|
|
-in=raw to process a raw file.
|
|
- Removed -d, which when produced a raw file used
|
|
a DVD format. This has been merged into a new
|
|
output type "dvdraw". So now instead of using
|
|
-raw -d as before, use -out=dvdraw if you need
|
|
this.
|
|
- Removed --noff
|
|
- Added gui_mode_reports for frontend communications,
|
|
see related file.
|
|
- Windows GUI rewritten. Source code now included,
|
|
too.
|
|
- [Volker] Dish Network clean-up
|
|
|
|
0.50 (2008-12-12)
|
|
-----------------
|
|
- [Volker] Fix in DVR-MS NTSC timing
|
|
- [Volker] More clean-up
|
|
- Minor fixes
|
|
|
|
0.49 (2008-12-10)
|
|
-----------------
|
|
- [Volker] Major MPEG parser rework. Code much
|
|
cleaner now.
|
|
- Some stations transmit broken roll-up captions,
|
|
and for some reason don't send CRs but RUs...
|
|
Added work-around code to make captions readable.
|
|
- Started work on EIA-708 (DTV). Right now you can
|
|
add -debug-708 to get a dump of the 708 data.
|
|
An actually useful decoder will come soon.
|
|
- Some of the changes MIGHT HAVE BROKEN MythTV's
|
|
code. I don't use MythTV myself so I rely on
|
|
other people's samples and reports. If MythTV
|
|
is broken please let me know.
|
|
- Added new debug options.
|
|
- [Volker] Added support for DVR-MS NTSC files.
|
|
- Other minor bug fixes and changes.
|
|
|
|
0.46 (2008-11-24)
|
|
-----------------
|
|
- Added support for live streaming, CCExtractor
|
|
can now process files that are being recorded
|
|
at the same time.
|
|
|
|
- [Volker] Added a new DVR-MS loop - this is
|
|
completely new, DVR-MS specific code, so we no
|
|
longer use the generic MPEG code for DVR-MS.
|
|
DVR-MS should (or will be eventually at least)
|
|
be as reliable as TS.
|
|
Note: For now, it's only ATSC recordings, not
|
|
NTSC (analog) recordings.
|
|
|
|
0.45 (2008-11-14)
|
|
-----------------
|
|
- Added auto-detection of DVR-MS files.
|
|
- Added -asf to force DVR-MS mode.
|
|
- Added some specific support for DVR-MS
|
|
files. These format used to work
|
|
correctly in 0.34 (pure luck) but the
|
|
MPEG code rework broke it. It should
|
|
work as it used to.
|
|
- Updated Windows GUI to support the
|
|
new options.
|
|
- Added -lg --largegops
|
|
From the help screen:
|
|
Each Group-of-Picture comes with timing
|
|
information. When this info is too separate
|
|
(for example because there are a lot of
|
|
frames in a GOP) ccextractor may prefer not
|
|
to use GOP timing. Use this option is you
|
|
need ccextractor to use GOP timing in large
|
|
GOPs.
|
|
|
|
0.44 (2008-09-10)
|
|
-----------------
|
|
- Added an option to the GUI to process
|
|
individual files in batch, i.e. call
|
|
ccextractor once per file. Use it if you
|
|
want to process several unrelated files
|
|
in one go.
|
|
- Added an option to prevent duplicate
|
|
lines in roll-up captions.
|
|
- Several minor bug fixes.
|
|
- Updated the GUI to add the new options.
|
|
|
|
0.43 (2008-06-20)
|
|
-----------------
|
|
- Fixed a bug in the read loop (no less)
|
|
that caused some files to fail when
|
|
reading without buffering (which is
|
|
the default in the Linux build).
|
|
- Several improvements in the GUI, such as
|
|
saving current options as default.
|
|
|
|
0.42 (2008-06-17)
|
|
-----------------
|
|
- The option switch "-transcript" has been
|
|
changed to "--transcript". Also, "-txt"
|
|
has been added as the short alias.
|
|
- Windows GUI
|
|
- Updated help screen
|
|
|
|
0.41 (2008-06-15)
|
|
-----------------
|
|
- Default output is now .srt instead of .bin,
|
|
use -raw if you need the data dump instead of
|
|
.srt.
|
|
- Added -trim, which removes blank spaces at
|
|
the left and rights of each line in .srt.
|
|
Note that those spaces are there to help
|
|
deaf people know if the person talking is
|
|
at the left or the right of the screen, i.e.
|
|
there aren't useless. But if they annoy
|
|
you, go ahead...
|
|
|
|
0.40 (2008-05-20)
|
|
-----------------
|
|
- Fixed a bug in the sanity check function
|
|
that caused the Myth branch to abort.
|
|
- Fixed the OSX build script, it needed a
|
|
new #define to work.
|
|
|
|
0.39 (2008-05-11)
|
|
-----------------
|
|
- Added a -transcript. If used, the output will
|
|
have no time information. Also, if in roll-up
|
|
mode there will be no repeated lines.
|
|
- Lots of changes in the MPEG parser, most of
|
|
them submitted by Volker Quetschke.
|
|
- Fixed a bug in the CC decoder that could cause
|
|
the first line not to be cleared in roll-up
|
|
mode.
|
|
- CCExtractor can now follow number sequences in
|
|
file names, by suffixing the name with +.
|
|
For example,
|
|
|
|
DVD0001.VOB+
|
|
|
|
means DVD0001.VOB, DVD0002.VOB, etc. This works
|
|
for all files, so part001.ts+ does what you
|
|
could expect.
|
|
- Added -90090 which changes the clock frequency
|
|
from the MPEG standard 90000 to 90090. It
|
|
*could* (remains to be seen) help if there are
|
|
timing issues.
|
|
- Better support for Tivo files.
|
|
- By default ccextractor now considers the whole
|
|
input file list a one large file, instead of
|
|
several, independent, video files. This has
|
|
been changed because most programs (for example
|
|
DVDDecrypt) just cut the files by size.
|
|
If you need the old behaviour (because you
|
|
actually edited the video files and want to
|
|
join the subs), use -ve.
|
|
|
|
|
|
0.36 (unreleased)
|
|
-----------------
|
|
- Fixed bug in SMI, nbsp was missing a ;.
|
|
- Footer for SAMI files was incorrect (<body> and
|
|
<sami> tags were being opened again instead of
|
|
being closed).
|
|
- Displayed memory is now written to disk at end
|
|
of stream even if there is no command requesting
|
|
so (may prevent losing the last screen-full).
|
|
- Important change that could break scripts, but
|
|
that have been added because old behaviour was
|
|
annoying to most people: _1 and _2 at the end
|
|
of the output file names is now added ONLY if
|
|
-12 is used (i.e. when there are two output
|
|
files to produce). So
|
|
|
|
ccextractor -srt sopranos.mpg
|
|
|
|
now produces sopranos.srt instead of sopranos_1.srt.
|
|
If you use -12, i.e.
|
|
|
|
ccextractor -srt -12 sopranos.mpg
|
|
|
|
You get
|
|
|
|
sopranos_1.srt and
|
|
sopranos_2.srt
|
|
|
|
as usual.
|
|
|
|
|
|
0.35 (unreleased)
|
|
-----------------
|
|
- Added --defaultcolor to the help screen. Code
|
|
was already in 0.34 but the documentation wasn't
|
|
updated.
|
|
- Buffer is larger now, since I've found a sample
|
|
where 256 Kb isn't enough for a PES (go figure).
|
|
- At the end of the process, a ratio between
|
|
video length and time to process is displayed.
|
|
|
|
0.34 (2007-06-03)
|
|
-----------------
|
|
- Added some basic letter case and capitalization
|
|
support. For captions that broadcast in ALL
|
|
UPPERCASE (most of them), ccextractor can now
|
|
do the first part of the job.
|
|
|
|
--sentencecap or -sc will tell ccextractor to
|
|
follow the typical capitalization rules, such
|
|
as capitalize months, days of week, etc.
|
|
|
|
So from
|
|
YOU BETTER RESPECT
|
|
THIS ROBE, ALAN
|
|
|
|
You get
|
|
|
|
You better respect
|
|
this robe, alan.
|
|
|
|
--capfile or -caf also enables the case
|
|
processing part and adds an extra list of
|
|
words in the specified file, for example:
|
|
|
|
--capfile names.txt
|
|
|
|
where names.txt is just a plain text file
|
|
with the proper spelling for some words,
|
|
such as
|
|
|
|
Alan
|
|
Tony
|
|
|
|
So you get
|
|
|
|
You better respect
|
|
this robe, Alan.
|
|
|
|
Which is the correct spelling. You can
|
|
have a different spelling file per TV
|
|
show, or a large file with a lot of
|
|
words, etc.
|
|
- ccextractor has been reported to
|
|
compile and run on Mac with a minor
|
|
change in the build script, so I've
|
|
created a mac directory with the
|
|
modified script. I haven't tested it
|
|
myself.
|
|
- Windows build comes with a File Version
|
|
Number (0.0.0.34 in this version) in case
|
|
you want to check for version info.
|
|
|
|
0.33 (unreleased)
|
|
-----------------
|
|
- Added -scr or --screenfuls, to select the
|
|
number of screenfuls ccextractor should
|
|
write before exiting. A screenful is
|
|
a change of screen contents caused by
|
|
a CC command (not new characters). In
|
|
practice, this means that for .srt each
|
|
group of lines is a screenful, except when
|
|
using -dru (which produces a lot of
|
|
groups of lines because each new character
|
|
produces a new group).
|
|
- Completed tables for all encodings.
|
|
- Fixed bug in .srt related to milliseconds
|
|
in time lines.
|
|
- Font colors are back for .srt (apparently
|
|
some programs do support them after all).
|
|
Use -nofc or --nofontcolor if you don't
|
|
want these tags.
|
|
|
|
0.32 (unreleased)
|
|
-----------------
|
|
- Added -delay ms, which adds (or subtracts)
|
|
a number of milliseconds to all times in
|
|
.srt/.sami files. For example,
|
|
|
|
-delay 400
|
|
|
|
causes all subtitles to appear 400 ms later
|
|
than they would normally do, and
|
|
|
|
-delay -400
|
|
|
|
causes all subtitles to appear 400 ms before
|
|
they would normally do.
|
|
- Added -startat at -endat which lets you
|
|
select just a portion of data to be processed,
|
|
such as from minute 3 to minute 5. Check
|
|
help screen for exact syntax.
|
|
|
|
0.31 (unreleased)
|
|
-----------------
|
|
- Added -dru (direct rollup), which causes
|
|
roll-up captions to be written as
|
|
they would on TV instead of line by line.
|
|
This makes .srt/.sami files a lot longer,
|
|
and ugly too (each line is written many
|
|
times, two characters at time).
|
|
|
|
0.30 (2007-05-24)
|
|
-----------------
|
|
- Fix in extended char decoding, I wasn't
|
|
replacing the previous char.
|
|
- When a sequence code was found before
|
|
having a PTS, reported time was
|
|
undefined.
|
|
|
|
0.29 (unreleased)
|
|
-----------------
|
|
- Minor bug fix.
|
|
|
|
0.28 (unreleased)
|
|
-----------------
|
|
- Fixed a buffering related issue. Short version,
|
|
the first 2 Mb in non-TS mode were being
|
|
discarded.
|
|
- .srt no longer has <font> tags. No player
|
|
seems to process them so my guess is that
|
|
they are not part of the .srt "standard"
|
|
even if McPoodle add them.
|
|
|
|
0.27 (unreleased)
|
|
-----------------
|
|
- Modified sanitizing code, it's less aggressive
|
|
now. Ideally it should mean that characters
|
|
won't be missed anymore. We'll see.
|
|
|
|
0.26 (unreleased)
|
|
-----------------
|
|
- Added -gp (or -goppad) to make ccextractor use
|
|
GOP timing. Try it for non TS files where
|
|
subs start OK but desync as the video advances.
|
|
|
|
0.25 (unreleased)
|
|
-----------------
|
|
- Format detection is not perfect yet. I've added
|
|
-nomyth to prevent the MytvTV code path to be
|
|
called. I've seen apparently correct files that
|
|
make MythTV's MPEG decoder to choke. So, if it
|
|
doesn't work correctly automatically: Try
|
|
-nomyth and -myth. Hopefully one of the two
|
|
options will work.
|
|
|
|
|
|
0.24 (unreleased)
|
|
-----------------
|
|
- Fixed a bug that caused dvr-ms (Windows Media Center)
|
|
files to be incorrectly processed (letters out of
|
|
order all the time).
|
|
- Reworked input buffer code, faster now.
|
|
- Completed MythTV's MPEG decoder for Program Streams,
|
|
which results in better processing of some specific
|
|
files.
|
|
- Automatic file format detection for all kind of
|
|
files and closed caption storage method. No need to
|
|
tell ccextractor anything about your file (but you
|
|
still can).
|
|
|
|
|
|
0.22 (2007-05-15)
|
|
-----------------
|
|
- Added text mode handling into decoder, which gets rids
|
|
of junk when text mode data is present.
|
|
- Added support for certain (possibly non standard
|
|
compliant) DVDs that add more captions block in a
|
|
user data block than they should (such as Red October).
|
|
- Fix in roll-up init code that caused the previous popup
|
|
captions not to be written to disk.
|
|
- Other Minor bug fixes.
|
|
|
|
|
|
0.20 (2007-05-07)
|
|
-----------------
|
|
- Unicode should be decent now.
|
|
- Added support for Hauppauge PVR 250 cards, and (possibly)
|
|
many others (bttv) with the same closed caption recording
|
|
format.
|
|
This is the result of hacking MythTV's MPEG parser into
|
|
CCExtractor. Integration is not very good (to put it
|
|
midly) but it seems to work. Depending on the feedback I
|
|
may continue working on this or just leave it 'as it'
|
|
(good enough).
|
|
If you want to process a file generated by one of these
|
|
analog cards, use -myth. This is essential as it will
|
|
make the program take a totally different code path.
|
|
- Added .SAMI generation. I'm sure this can be improved,
|
|
though. If you have a good CSS for .SAMI files let me
|
|
know.
|
|
|
|
0.19 (2007-05-03)
|
|
-----------------
|
|
- Work on Dish Network streams, timing was completely broken.
|
|
It's fixed now at least for the samples I have, if it's not
|
|
completely fixed let me know. Credit for this goes to
|
|
Jack Ha who sent me a couple of samples and a first
|
|
implementation of a semi working-fix.
|
|
- Added support for several input files (see help screen for
|
|
details).
|
|
- Added Unicode and Latin-1 encoding.
|
|
|
|
|
|
0.17 (2007-04-29)
|
|
-----------------
|
|
- Extraction to .srt is almost complete - works correctly for
|
|
pop-up and roll-up captions, possibly not yet for paint-on
|
|
(mostly because I don't have any sample with paint-on captions
|
|
so I can't test).
|
|
- Minor bug fixes.
|
|
- Automatic TS/non-TS mode detection.
|
|
|
|
0.14 (2007-04-25)
|
|
-----------------
|
|
- Work on handling special cases related to the MPEG reference
|
|
clock: Roll over, jumps, etc.
|
|
- Modified padding code a bit: In particular, padding occurs
|
|
on B-Frames now.
|
|
- Started work on CC data parsing (use -608 to see output).
|
|
- Added built-in input buffering.
|
|
- Major code reorganization.
|
|
- Added a decent progress indicator.
|
|
- Added TS header synchronization (so the input file no longer
|
|
needs to start with a TS header).
|
|
- Minor bug fixes.
|
|
|
|
0.07 (2007-04-19)
|
|
-----------------
|
|
- Added MPEG reference clock parsing.
|
|
- Added auto padding in TS. Does miracles with timing.
|
|
- Added video information (as extracted from sequence header).
|
|
- Some code clean-up.
|
|
- FF sanity check enabled by default.
|
|
|