aeneas Library Tutorial¶
Although a majority of
aeneas users work with the built-in command line tools,
aeneas is primarily designed for being used as a Python library.
aeneas.tools can be used programmatically,
thanks to their standard I/O interface.
Create a Task and process it, outputting the resulting sync map to file:
#!/usr/bin/env python # coding=utf-8 from aeneas.executetask import ExecuteTask from aeneas.task import Task # create Task object config_string = u"task_language=eng|is_text_type=plain|os_task_file_format=json" task = Task(config_string=config_string) task.audio_file_path_absolute = u"/path/to/input/audio.mp3" task.text_file_path_absolute = u"/path/to/input/plain.txt" task.sync_map_file_path_absolute = u"/path/to/output/syncmap.json" # process Task ExecuteTask(task).execute() # output sync map to file task.output_sync_map_file()
You can also use
#!/usr/bin/env python # coding=utf-8 from aeneas.tools.execute_task import ExecuteTaskCLI ExecuteTaskCLI(use_sys=False).run(arguments=[ None, # dummy program name argument u"/path/to/input/audio.mp3", u"/path/to/input/plain.txt", u"task_language=eng|is_text_type=plain|os_task_file_format=json", u"/path/to/output/syncmap.json" ])
Clearly, you can also manipulate objects programmatically.
Create a Task, process it, and print all fragments in the resulting sync map whose duration is less than five seconds:
#!/usr/bin/env python # coding=utf-8 from aeneas.executetask import ExecuteTask from aeneas.task import Task # create Task object config_string = u"task_language=eng|is_text_type=plain|os_task_file_format=json" task = Task(config_string=config_string) task.audio_file_path_absolute = u"/path/to/input/audio.mp3" task.text_file_path_absolute = u"/path/to/input/plain.txt" # process Task ExecuteTask(task).execute() # print fragments with a duration < 5 seconds for fragment in task.sync_map_leaves(): if fragment.length < 5.0: print(fragment)
Instead of passing around configuration strings, you can set properties explicitly, using the library functions and constants.
Create a Task, process it, and print the resulting sync map:
#!/usr/bin/env python # coding=utf-8 from aeneas.exacttiming import TimeValue from aeneas.executetask import ExecuteTask from aeneas.language import Language from aeneas.syncmap import SyncMapFormat from aeneas.task import Task from aeneas.task import TaskConfiguration from aeneas.textfile import TextFileFormat import aeneas.globalconstants as gc # create Task object config = TaskConfiguration() config[gc.PPN_TASK_LANGUAGE] = Language.ENG config[gc.PPN_TASK_IS_TEXT_FILE_FORMAT] = TextFileFormat.PLAIN config[gc.PPN_TASK_OS_FILE_FORMAT] = SyncMapFormat.JSON task = Task() task.configuration = config task.audio_file_path_absolute = u"/path/to/input/audio.mp3" task.text_file_path_absolute = u"/path/to/input/plain.txt" # process Task ExecuteTask(task).execute() # print produced sync map print(task.sync_map)
numpy(v1.9 or later)
lxml(v3.6.0 or later)
BeautifulSoup(v4.5.1 or later)
numpy is actually needed, as it is heavily used for the alignment computation.
The other two dependencies (
BeautifulSoup) are needed
only if you use XML-like input or output formats.
However, since they are popular Python packages, to avoid complex import testing,
they are listed as requirements.
This choice might change in the future.
Depending on what
aeneas classes you want to use,
you might need to install the following optional dependencies:
Speeding Critical Sections Up: Python C/C++ Extensions¶
Forced alignment is a computationally demanding task, both CPU-intensive and memory-intensive. Aligning a dozen minutes of audio might require an hour if done with pure Python code.
Hence, critical sections of the alignment code are written as Python C/C++ extensions, that is, C/C++ code that receives input from Python code, performs the heavy computation, and returns results to the Python code. The rule of thumb is that the C/C++ code only perform “computation-like”, low-level functions, while “house-keeping”, high-level functions are done in Python land.
With this approach, aligning a dozen minutes of audio
requires only few seconds, and even aligning hours of audio
can be done in few minutes.
The drawback is that your environment must be able to compile
Python C/C++ extensions. If you install
pip install aeneas), the compilation step is done automatically for you.
Due to the Python C/C++ extension compile and setup mechanism,
you must install
numpy before installing
and there is no (sane) way for the
numpy before compiling the
aeneas source code.
Hence, you really need to (manually) install
Hopefully this inconvenience will be removed in the future.
The Python C/C++ extensions included in
aeneas.cdtw, for computing the DTW;
aeneas.cew, for synthesizing text via the
aeneas.cfw, for synthesizing text via the
aeneas.cmfcc, for computing a MFCC representation of a WAVE (RIFF) audio file;
aeneas.cwave, for reading WAVE (RIFF) audio files.
aeneas.cew is available on Linux, Mac OS X, and Windows.
On Windows 64 bit it does not seem to work, probably because
eSpeak is available only as a 32 bit program/library,
aeneas will fall back to run the pure Python code.
Starting with v1.5.0, the pure Python code
for synthesizing text with eSpeak via
is only 2-3 times slower than
Unless you work with thousands of text fragments,
the performance difference is negligible.
aeneas.cfw is experimental and disabled by default.
Probably it works only on Linux.
To compile it, make sure you have installed
(e.g., install the
festival-dev package on DEB-based OSes) and
set the environment variable
pip install aeneas or
aeneas.cwave is not used.
It will be enabled in a future version of
Except for “enumeration” classes (e.g.,
“data-only” classes (e.g.,
TextFragment), most classes
are subclasses of
which provides the ability to log events using a shared
Logger object (
and to inject runtime execution parameters using a shared
RuntimeConfiguration object (
logger can tee (i.e., store messages and print them to stdout)
or dump to file.
rconf provides a way to fine tune
by changing its internal behavior.
The library defaults should fine for most use cases,
and they do not require explicitly passing an
Process a task with custom parameters, and log messages:
# create Logger which logs and tees logger = Logger(tee=True) # create RuntimeConfiguration object, with custom MFCC length and shift rconf = RuntimeConfiguration() rconf[RuntimeConfiguration.MFCC_WINDOW_LENGTH] = TimeValue(u"0.150") rconf[RuntimeConfiguration.MFCC_WINDOW_SHIFT] = TimeValue(u"0.050") # create Task object task = ... # process Task with custom parameters ExecuteTask(task, rconf=rconf, logger=logger).execute()
If you read from/write to file, you should be fine
interacting only with
For example, setting a path in
force the library to load the given file,
and to create a
object behind the scenes, storing it inside the Task object.
However, you can also build e.g. your own
and then assign it to your Task.
Create a TextFile programmatically, and assign it to Task:
task = Task() textfile = TextFile() for identifier, frag_text in [ (u"f001", [u"first fragment"]), (u"f002", [u"second fragment"]), (u"f003", [u"third fragment"]) ]: textfile.add_fragment(TextFragment(identifier, Language.ENG, frag_text, frag_text)) task.text_file = textfile
Starting with v1.5.0, both
SyncMap are backed by the
Tree structure, which can represent multilevel I/O files.
Both have a “virtual” (empty) root node, to which the “level 1” nodes
Note that single-level text files and sync maps are a special case,
where only “level 1” nodes are present, producing a tree with a root node
and a list of children, effectively equivalent to the “list” structure pre-v1.5.0.
- Ensuring that all the strings you pass to
aeneasare Unicode strings will save you a lot of headaches. If you read from files, be sure they are encoded using
- You can use any audio file format that is supported by
ffmpeg. If unsure, just try to play them on your audio file on the console: if it works there, it should work inside
- Enumeration classes usually have an
ALLOWED_VALUEclass member, which lists all the allowed values. For example:
ALLOWED_VALUES. This list is used for example by the validator to check input values.
- Most classes are optimized for reducing memory consumption.
For example, if you create an
AudioFileMFCCwith a file path, the input audio file will be converted to a temporary WAVE file, audio samples will be read into memory, MFCCs will be computed, and then audio data will be discarded from memory and the temporary WAVE file will be deleted, keeping only the MFCC matrix into memory. If you prefer persistence, you need to build intermediate objects yourself (i.e.,
AudioFile, etc.) and properly dispose of them in your code.
- Wherever possible,
NumPyviews are used to avoid data copying. Similarly, built-in
NumPyfunctions are used to improve run time.
- To avoid numerical issues, always use
TimeValueto hold time values with arbitrary precision. Note that doing so incurs in a negligible execution slow down, because the heaviest computations are done with integer
NumPyindices and arrays and the transformation to
TimeValuetakes place only when the sync map is output to file.
aeneas package contains several subpackages:
aeneas.cdtw(Python C extension)
aeneas.cew(Python C extension)
aeneas.cfw(Python C++ extension)
aeneas.cmfcc(Python C extension)
aeneas.cwave(Python C extension)
and the following modules:
aeneas.extra package contains some extra Python source files
which provide experimental and not officially supported functions,
mainly custom, not built-in TTS engine wrappers.
For example, if you want to write your own custom TTS engine wrapper,
have a look at the
aeneas/extra/ctw_espeak.py source file,
which is heavily commented and should be easy to modify for your own TTS engine.
aeneas.tests package contains the unit test files for
Resources needed to run the tests,
for example audio and text files,
are located in the
aeneas.tools package contains the built-in command line tools for
The two main tools are:
which are described in the aeneas Built-in Command Line Tools Tutorial.
aeneas.tools package also contains the following programs,
useful for debugging or converting between different file formats:
aeneas.tools.convert_syncmap: convert a sync map from a format to another
aeneas.tools.download: download a file from a Web resource (currently, audio from a YouTube video)
aeneas.tools.extract_mfcc: extract MFCCs from a monoaural WAVE file
aeneas.tools.ffmpeg_wrapper: a wrapper around
aeneas.tools.ffprobe_wrapper: a wrapper around
aeneas.tools.plot_waveform: plot a waveform and sets of labels to file
aeneas.tools.read_audio: read the properties of an audio file
aeneas.tools.read_text: read a text file and show the extracted text fragments
aeneas.tools.run_sd: read an audio file and the corresponding text file and detect the audio head/tail
aeneas.tools.run_vad: read an audio file and compute speech/nonspeech time intervals
aeneas.tools.synthesize_text: synthesize several text fragments read from file into a single wav file
aeneas.tools.validate: validate a job container or configuration strings/files
Run each program without arguments to get its help manual and usage examples.
Resources needed to run the live examples,
for example audio and text files,
are located in the
The package also contains the
which can run any of the tools listed above.
Run it without arguments to get its manual.