aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).
- Version: 1.7.3
- Date: 2017-03-15
- Developed by: ReadBeyond
- Lead Developer: Alberto Pettarin
- License: the GNU Affero General Public License Version 3 (AGPL v3)
- Contact: email@example.com
- Quick Links: Home - GitHub - PyPI - Docs - Tutorial - Benchmark - Mailing List - Web App
aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.
1 => [00:00:00.000, 00:00:02.640] From fairest creatures we desire increase, => [00:00:02.640, 00:00:05.880] That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240] But as the riper should by time decease, => [00:00:09.240, 00:00:11.920] His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280] But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.800] Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760] Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680] Thy self thy foe, to thy sweet self too cruel: => [00:00:25.680, 00:00:31.240] Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.400] And only herald to the gaudy spring, => [00:00:34.400, 00:00:36.920] Within thine own bud buriest thy content, => [00:00:36.920, 00:00:40.640] And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.640] Pity the world, or else this glutton be, => [00:00:43.640, 00:00:48.080] To eat the world's due, by the grave and thee. => [00:00:48.080, 00:00:53.240]
This synchronization map can be output to file in several formats, depending on its application:
- research: Audacity (AUD), ELAN (EAF), TextGrid;
- digital publishing: SMIL for EPUB 3;
- closed captioning: SubRip (SRT), SubViewer (SBV/SUB), TTML, WebVTT (VTT);
- Web: JSON;
- further processing: CSV, SSV, TSV, TXT, XML.
All-in-one installers are available for Mac OS X and Windows, and a Bash script for deb-based Linux distributions (Debian, Ubuntu) is provided in this repository. It is also possible to download a VirtualBox+Vagrant virtual machine. Please see the INSTALL file for detailed, step-by-step installation procedures for different operating systems.
The generic OS-independent procedure is simple:
Make sure the following executables can be called from your shell:
pip install numpy pip install aeneas
To check whether you installed aeneas correctly, run:
python -m aeneas.diagnostics
Run without arguments to get the usage message:
python -m aeneas.tools.execute_task python -m aeneas.tools.execute_job
You can also get a list of live examples that you can immediately run on your machine thanks to the included files:
python -m aeneas.tools.execute_task --examples python -m aeneas.tools.execute_task --examples-all
To compute a synchronization map
map.jsonfor a pair (
text.txtin plain text format), you can run:
python -m aeneas.tools.execute_task \ audio.mp3 \ text.txt \ "task_language=eng|os_task_file_format=json|is_text_type=plain" \ map.json
(The command has been split into lines with
\for visual clarity; in production you can have the entire command on a single line and/or you can use shell variables.)
To compute a synchronization map
map.smilfor a pair (
audio.mp3, page.xhtml containing fragments marked by
f001), you can run:
python -m aeneas.tools.execute_task \ audio.mp3 \ page.xhtml \ "task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \ map.smil
As you can see, the third argument (the configuration string) specifies the parameters controlling the I/O formats and the processing options for the task. Consult the documentation for details.
If you have several tasks to process, you can create a job container to batch process them:
python -m aeneas.tools.execute_job job.zip output_directory
job.zip should contain a
configuration file, providing aeneas
with all the information needed to parse the input assets
and format the output sync map files.
Documentation and Support
- Documentation: http://www.readbeyond.it/aeneas/docs/
- Command line tools tutorial: http://www.readbeyond.it/aeneas/docs/clitutorial.html
- Library tutorial: http://www.readbeyond.it/aeneas/docs/libtutorial.html
- Old, verbose tutorial: A Practical Introduction To The aeneas Package
- Mailing list: https://groups.google.com/d/forum/aeneas-forced-alignment
- Changelog: http://www.readbeyond.it/aeneas/docs/changelog.html
- High level description of how aeneas works: HOWITWORKS
- Development history: HISTORY
- Testing: TESTING
- Benchmark suite: https://readbeyond.github.io/aeneas-benchmark/
- Input text files in
- Multilevel input text files in
- Text extraction from XML (e.g., XHTML) files using
- Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.)
- Input audio file formats: all those readable by
- Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, TEXTGRID, TSV, TTML, TXT, VTT, XML
- Confirmed working on 38 languages: AFR, ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
- MFCC and DTW computed via Python C extensions to reduce the processing time
- Several built-in TTS engine wrappers: AWS Polly TTS API, eSpeak (default), eSpeak-ng, Festival, MacOS (via say), Nuance TTS API
- Default TTS (eSpeak) called via a Python C extension for fast audio synthesis
- Possibility of running a custom, user-provided TTS engine Python wrapper (e.g., included example for speect)
- Batch processing of multiple audio/text pairs
- Download audio from a YouTube video
- In multilevel mode, recursive alignment from paragraph to sentence to word level
- In multilevel mode, MFCC resolution, MFCC masking, DTW margin, and TTS engine can be specified for each level independently
- Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes
- Adjustable splitting times, including a max character/second constraint for CC applications
- Automated detection of audio head/tail
- Output an HTML file for fine tuning the sync map manually (
- Execution parameters tunable at runtime
- Code suitable for Web app deployment (e.g., on-demand cloud computing instances)
- Extensive test suite including 1,200+ unit/integration/performance tests, that run and must pass before each release
aeneas is released under the terms of the GNU Affero General Public License Version 3. See the LICENSE file for details.
Licenses for third party code and files included in aeneas can be found in the licenses directory.
No copy rights were harmed in the making of this project.
July 2015: Michele Gianella generously supported the development of the boundary adjustment code (v1.0.4)
August 2015: Michele Gianella partially sponsored the port of the MFCC/DTW code to C (v1.1.0)
September 2015: friends in West Africa partially sponsored the development of the head/tail detection code (v1.2.0)
October 2015: an anonymous donation sponsored the development of the "YouTube downloader" option (v1.3.0)
April 2016: the Fruch Foundation kindly sponsored the development and documentation of v1.5.0
December 2016: the Centro Internazionale Del Libro Parlato "Adriano Sernagiotto" (Feltre, Italy) partially sponsored the development of the v1.7 series
Would you like supporting the development of aeneas?
I accept sponsorships to
- fix bugs,
- add new features,
- improve the quality and the performance of the code,
- port the code to other languages/platforms, and
- improve the documentation.
Feel free to get in touch.