aeneas Built-in Command Line Tools Tutorial¶
This tutorial explains how to process tasks and jobs with
the command line tools aeneas.tools.execute_task and aeneas.tools.execute_job.
(If you are interested in using aeneas
as a Python package in your own application,
please consult the aeneas Library Tutorial.)
Processing Tasks¶
First, we need some definitions:
Audio File
An audio file is a file on disk containing audio data,
usually text narrated by a human being.
The audio format can be any of those supported by ffprobe and ffmpeg,
including: FLAC, MP3, MP4/AAC, OGG, WAVE, etc.
Example: /home/rb/audio.mp3
Text File
A text file is a file on disk containing the textual data
to be aligned with a matching audio file.
The format of the text file can be any format listed in ALLOWED_VALUES.
The contents of the text file define, explicitly or implicity,
a segmentation of the entire text into fragments,
which can have arbitrary granularity (paragraph, sentence, sub-sentence, word, etc.),
can be nested in a hierarchical structure,
can consist of multiple lines,
and can be associated to unique identifiers.
Certain input formats require the user to specify
additional parameters to parse the input file.
Example of a text file /home/rb/text.txt in PLAIN format, with three fragments:
Text of the first fragment
Text of the second fragment
Text of the third fragment
Sync Map File
A sync map file is a file on disk
which expresses the correspondence between an audio file and a text file.
Specifically, for each fragment in the text file,
it declares a time interval in the audio file where
the text of the fragment is spoken.
The actual format of the sync map file depends on the intended application.
Available formats are listed in ALLOWED_VALUES.
Text fragments can be represented by the full text and/or by their unique idenfiers.
Example of a sync map file in CSV format:
f001,0.000,1.234,First fragment text
f002,1.234,5.678,Second fragment text
f003,5.678,7.890,Third fragment text
Task
A Task is a triple (audio file, text file, parameters).
When a task is processed (executed), a sync map is computed
for the given audio and text files.
The parameters control how the alignment is computed, for example:
- specifying the language and the format of the input text;
- setting the format of the sync map file to be output;
- excluding the head/tail of the audio file because they contain speech not present in the text;
- modifying the time step of the aligner;
- etc.
Example (continued):
- audio file:
/home/rb/audio.mp3- text file:
/home/rb/text.txt
- parameters:
- text in PLAIN format
- language is ENGLISH
- output in JSON format
The aeneas.tools.execute_task tool processes a Task
and writes the corresponding sync map to file.
Therefore, it requires at least four arguments:
- the path of the input audio file;
- the path of the input text file;
- the parameters, formatted as a key1=value1|key2=value2|...|keyN=valueNstring;
- the path of the sync map to be created.
Showing Help Messages¶
If you execute the program without arguments, it will print the following help message:
$ python -m aeneas.tools.execute_task
NAME
  execute_task - Execute a Task.
SYNOPSIS
  python -m aeneas.tools.execute_task [-h|--help|--help-rconf|--version]
  python -m aeneas.tools.execute_task --list-parameters
  python -m aeneas.tools.execute_task --list-values[=PARAM]
  python -m aeneas.tools.execute_task AUDIO_FILE  TEXT_FILE CONFIG_STRING OUTPUT_FILE [OPTIONS]
  python -m aeneas.tools.execute_task YOUTUBE_URL TEXT_FILE CONFIG_STRING OUTPUT_FILE -y [OPTIONS]
OPTIONS
  --faster-rate : print fragments with rate > task_adjust_boundary_rate_value
  --help : print full help and exit
  --help-rconf : list all runtime configuration parameters
  --keep-audio : do not delete the audio file downloaded from YouTube (-y only)
  --largest-audio : download largest audio stream (-y only)
  --list-parameters : list all parameters
  --list-values : list all parameters for which values can be listed
  --list-values=PARAM : list all allowed values for parameter PARAM
  --output-html : output HTML file for fine tuning
  --presets-word : apply presets for word-level alignment (MFCC masking)
  --rate : print rate of each fragment
  --skip-validator : do not validate the given config string
  --version : print the program name and version and exit
  --zero : print fragments with zero duration
  -h : print short help and exit
  -l[=FILE], --log[=FILE] : log verbose output to tmp file or FILE if specified
  -r=CONF, --runtime-configuration=CONF : apply runtime configuration CONF
  -v, --verbose : verbose output
  -vv, --very-verbose : verbose output, print date/time values
  -y, --youtube : download audio from YouTube video
EXAMPLES
  python -m aeneas.tools.execute_task --examples
  python -m aeneas.tools.execute_task --examples-all
If you pass the --help argument,
it will print a slightly more verbose version:
$ python -m aeneas.tools.execute_task --help
NAME
  execute_task - Execute a Task.
SYNOPSIS
  python -m aeneas.tools.execute_task [-h|--help|--help-rconf|--version]
  python -m aeneas.tools.execute_task --list-parameters
  python -m aeneas.tools.execute_task --list-values[=PARAM]
  python -m aeneas.tools.execute_task AUDIO_FILE  TEXT_FILE CONFIG_STRING OUTPUT_FILE [OPTIONS]
  python -m aeneas.tools.execute_task YOUTUBE_URL TEXT_FILE CONFIG_STRING OUTPUT_FILE -y [OPTIONS]
OPTIONS
  --faster-rate : print fragments with rate > task_adjust_boundary_rate_value
  --help : print full help and exit
  --help-rconf : list all runtime configuration parameters
  --keep-audio : do not delete the audio file downloaded from YouTube (-y only)
  --largest-audio : download largest audio stream (-y only)
  --list-parameters : list all parameters
  --list-values : list all parameters for which values can be listed
  --list-values=PARAM : list all allowed values for parameter PARAM
  --output-html : output HTML file for fine tuning
  --presets-word : apply presets for word-level alignment (MFCC masking)
  --rate : print rate of each fragment
  --skip-validator : do not validate the given config string
  --version : print the program name and version and exit
  --zero : print fragments with zero duration
  -h : print short help and exit
  -l[=FILE], --log[=FILE] : log verbose output to tmp file or FILE if specified
  -r=CONF, --runtime-configuration=CONF : apply runtime configuration CONF
  -v, --verbose : verbose output
  -vv, --very-verbose : verbose output, print date/time values
  -y, --youtube : download audio from YouTube video
EXAMPLES
  python -m aeneas.tools.execute_task --examples
  python -m aeneas.tools.execute_task --examples-all
EXIT CODES
  0 : no error
  1 : error
  2 : help shown, no command run
AUTHOR
  Alberto Pettarin, http://www.albertopettarin.it/
REPORTING BUGS
  Please use the GitHub Issues Web page : https://github.com/ReadBeyond/aeneas/issues/
COPYRIGHT
  2012-2016, Alberto Pettarin and ReadBeyond Srl
  This software is available under the terms of the GNU Affero General Public License Version 3
SEE ALSO
  Code repository  : https://github.com/ReadBeyond/aeneas/
  Documentation    : http://www.readbeyond.it/aeneas/docs/
  Project Web page : http://www.readbeyond.it/aeneas/
Showing And Running Built-In Examples¶
aeneas includes some example input files which cover common use cases,
enabling the user to run live examples.
To list them, pass the --examples switch:
$ python -m aeneas.tools.execute_task --examples
Example 1 (input: plain text, output: EAF)
  $ python -m aeneas.tools.execute_task --example-eaf
Example 2 (input: plain text, output: JSON)
  $ python -m aeneas.tools.execute_task --example-json
Example 3 (input: multilevel plain text (mplain), output: SMIL)
  $ python -m aeneas.tools.execute_task --example-mplain-smil
Example 4 (input: multilevel unparsed text (munparsed), output: SMIL)
  $ python -m aeneas.tools.execute_task --example-munparsed-smil
Example 5 (input: unparsed text, output: SMIL)
  $ python -m aeneas.tools.execute_task --example-smil
Example 6 (input: subtitles text, output: SRT)
  $ python -m aeneas.tools.execute_task --example-srt
Example 7 (input: parsed text, output: TextGrid)
  $ python -m aeneas.tools.execute_task --example-textgrid
Example 8 (input: parsed text, output: TSV)
  $ python -m aeneas.tools.execute_task --example-tsv
Example 9 (input: single word granularity plain text, output: AUD)
  $ python -m aeneas.tools.execute_task --example-words
Example 10 (input: audio from YouTube, output: TXT)
  $ python -m aeneas.tools.execute_task --example-youtube
Similarly, the --examples-all switch prints a list
of more than twenty built-in examples,
covering more specific input/output/parameter combinations.
$ python -m aeneas.tools.execute_task --examples-all
Example 1 (input: plain text (plain), output: AUD, aba beforenext 0.200)
  $ python -m aeneas.tools.execute_task --example-aftercurrent
Example 2 (input: plain text (plain), output: AUD, aba beforenext 0.200)
  $ python -m aeneas.tools.execute_task --example-beforenext
Example 3 (input: plain text, output: TSV, run via cewsubprocess)
  $ python -m aeneas.tools.execute_task --example-cewsubprocess
Example 4 (input: plain text, output: TSV, tts engine: ctw espeak)
  $ python -m aeneas.tools.execute_task --example-ctw-espeak
Example 5 (input: plain text, output: TSV, tts engine: ctw speect)
  $ python -m aeneas.tools.execute_task --example-ctw-speect
Example 6 (input: plain text, output: EAF)
  $ python -m aeneas.tools.execute_task --example-eaf
Example 7 (input: plain text (plain), output: SRT, print faster than 12.0 chars/s)
  $ python -m aeneas.tools.execute_task --example-faster-rate
Example 8 (input: plain text, output: TSV, tts engine: Festival)
  $ python -m aeneas.tools.execute_task --example-festival
Example 9 (input: mplain text (multilevel), output: JSON, levels to output: 1 and 2)
  $ python -m aeneas.tools.execute_task --example-flatten-12
Example 10 (input: mplain text (multilevel), output: JSON, levels to output: 2)
  $ python -m aeneas.tools.execute_task --example-flatten-2
Example 11 (input: mplain text (multilevel), output: JSON, levels to output: 3)
  $ python -m aeneas.tools.execute_task --example-flatten-3
Example 12 (input: plain text, output: TSV, explicit head and tail)
  $ python -m aeneas.tools.execute_task --example-head-tail
Example 13 (input: plain text, output: JSON)
  $ python -m aeneas.tools.execute_task --example-json
Example 14 (input: multilevel plain text (mplain), output: JSON)
  $ python -m aeneas.tools.execute_task --example-mplain-json
Example 15 (input: multilevel plain text (mplain), output: SMIL)
  $ python -m aeneas.tools.execute_task --example-mplain-smil
Example 16 (input: multilevel plain text (mplain), different TTS engines, output: JSON)
  $ python -m aeneas.tools.execute_task --example-multilevel-tts
Example 17 (input: multilevel unparsed text (munparsed), output: JSON)
  $ python -m aeneas.tools.execute_task --example-munparsed-json
Example 18 (input: multilevel unparsed text (munparsed), output: SMIL)
  $ python -m aeneas.tools.execute_task --example-munparsed-smil
Example 19 (input: plain text, output: JSON, resolution: 0.500 s)
  $ python -m aeneas.tools.execute_task --example-mws
Example 20 (input: multilevel plain text (mplain), output: JSON, no zero duration)
  $ python -m aeneas.tools.execute_task --example-no-zero
Example 21 (input: plain text (plain), output: AUD, aba offset 0.200)
  $ python -m aeneas.tools.execute_task --example-offset
Example 22 (input: plain text (plain), output: AUD, aba percent 50)
  $ python -m aeneas.tools.execute_task --example-percent
Example 23 (input: plain text, output: JSON, pure python)
  $ python -m aeneas.tools.execute_task --example-py
Example 24 (input: plain text (plain), output: SRT, max rate 14.0 chars/s, print rates)
  $ python -m aeneas.tools.execute_task --example-rate
Example 25 (input: plain text (plain), output: SRT, remove nonspeech >=0.500 s)
  $ python -m aeneas.tools.execute_task --example-remove-nonspeech
Example 26 (input: plain text (plain), output: SRT, remove nonspeech >=0.500 s, max rate 14.0 chars/s, print rates)
  $ python -m aeneas.tools.execute_task --example-remove-nonspeech-rateaggressive
Example 27 (input: plain text (plain), output: AUD, replace nonspeech >=0.500 s with (sil))
  $ python -m aeneas.tools.execute_task --example-replace-nonspeech
Example 28 (input: plain text, output: TSV, head/tail detection)
  $ python -m aeneas.tools.execute_task --example-sd
Example 29 (input: unparsed text, output: SMIL)
  $ python -m aeneas.tools.execute_task --example-smil
Example 30 (input: subtitles text, output: SRT)
  $ python -m aeneas.tools.execute_task --example-srt
Example 31 (input: parsed text, output: TextGrid)
  $ python -m aeneas.tools.execute_task --example-textgrid
Example 32 (input: parsed text, output: TSV)
  $ python -m aeneas.tools.execute_task --example-tsv
Example 33 (input: single word granularity plain text, output: AUD)
  $ python -m aeneas.tools.execute_task --example-words
Example 34 (input: single word granularity plain text, output: AUD, tts engine: Festival, TTS cache on)
  $ python -m aeneas.tools.execute_task --example-words-festival-cache
Example 35 (input: mplain text (multilevel), output: AUD, levels to output: 3)
  $ python -m aeneas.tools.execute_task --example-words-multilevel
Example 36 (input: audio from YouTube, output: TXT)
  $ python -m aeneas.tools.execute_task --example-youtube
Running a built-in example can help learning quickly all the options/parameters
available in aeneas.
For example, passing the --example-json switch will produce:
$ python -m aeneas.tools.execute_task --example-json
[INFO] Running example task with arguments:
  Audio file:    aeneas/tools/res/audio.mp3
  Text file:     aeneas/tools/res/plain.txt
  Config string: task_language=eng|is_text_type=plain|os_task_file_format=json
  Sync map file: output/sonnet.json
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[INFO] Executing task... done
[INFO] Creating output sync map file...
[INFO] Creating output sync map file... done
[INFO] Created file 'output/sonnet.json'
Warning
If the above command generates an error, be sure to have
a directory named output in your current working directory.
If one does not exist, create it.
As you can see in the example above, built-in examples will print the command line arguments they shortcut. Therefore, the example above is essentially equivalent to:
$ python -m aeneas.tools.execute_task aeneas/tools/res/audio.mp3 aeneas/tools/res/plain.txt "task_language=eng|is_text_type=plain|os_task_file_format=json" output/sonnet.json
[INFO] Validating config string (specify --skip-validator to bypass)...
[INFO] Validating config string... done
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[INFO] Executing task... done
[INFO] Creating output sync map file...
[INFO] Creating output sync map file... done
[INFO] Created file 'output/sonnet.json'
Note
There is a formal difference: when running an example,
no validation of the input files and parameters is performed.
In fact, by default they are validated using a
Validator object,
created and run automatically for you.
If a validation error occurs,
the execution of the Task does not begin.
You can override this safety check with the --skip-validator switch.
In both cases, a new file output/sonnet.json is created,
containing the sync map in JSON format:
{
 "fragments": [
  {
   "begin": "0.000", 
   "children": [], 
   "end": "2.640", 
   "id": "f000001", 
   "language": "eng", 
   "lines": [
    "1"
   ]
  }, 
  {
   "begin": "2.640", 
   "children": [], 
   "end": "5.880", 
   "id": "f000002", 
   "language": "eng", 
   "lines": [
    "From fairest creatures we desire increase,"
   ]
  }, 
  {
   "begin": "5.880", 
   "children": [], 
   "end": "9.240", 
   "id": "f000003", 
   "language": "eng", 
   "lines": [
    "That thereby beauty's rose might never die,"
   ]
  }, 
  {
   "begin": "9.240", 
   "children": [], 
   "end": "11.920", 
   "id": "f000004", 
   "language": "eng", 
   "lines": [
    "But as the riper should by time decease,"
   ]
  }, 
  {
   "begin": "11.920", 
   "children": [], 
   "end": "15.280", 
   "id": "f000005", 
   "language": "eng", 
   "lines": [
    "His tender heir might bear his memory:"
   ]
  }, 
  {
   "begin": "15.280", 
   "children": [], 
   "end": "18.800", 
   "id": "f000006", 
   "language": "eng", 
   "lines": [
    "But thou contracted to thine own bright eyes,"
   ]
  }, 
  {
   "begin": "18.800", 
   "children": [], 
   "end": "22.760", 
   "id": "f000007", 
   "language": "eng", 
   "lines": [
    "Feed'st thy light's flame with self-substantial fuel,"
   ]
  }, 
  {
   "begin": "22.760", 
   "children": [], 
   "end": "25.680", 
   "id": "f000008", 
   "language": "eng", 
   "lines": [
    "Making a famine where abundance lies,"
   ]
  }, 
  {
   "begin": "25.680", 
   "children": [], 
   "end": "31.240", 
   "id": "f000009", 
   "language": "eng", 
   "lines": [
    "Thy self thy foe, to thy sweet self too cruel:"
   ]
  }, 
  {
   "begin": "31.240", 
   "children": [], 
   "end": "34.400", 
   "id": "f000010", 
   "language": "eng", 
   "lines": [
    "Thou that art now the world's fresh ornament,"
   ]
  }, 
  {
   "begin": "34.400", 
   "children": [], 
   "end": "36.920", 
   "id": "f000011", 
   "language": "eng", 
   "lines": [
    "And only herald to the gaudy spring,"
   ]
  }, 
  {
   "begin": "36.920", 
   "children": [], 
   "end": "40.640", 
   "id": "f000012", 
   "language": "eng", 
   "lines": [
    "Within thine own bud buriest thy content,"
   ]
  }, 
  {
   "begin": "40.640", 
   "children": [], 
   "end": "43.640", 
   "id": "f000013", 
   "language": "eng", 
   "lines": [
    "And tender churl mak'st waste in niggarding:"
   ]
  }, 
  {
   "begin": "43.640", 
   "children": [], 
   "end": "48.080", 
   "id": "f000014", 
   "language": "eng", 
   "lines": [
    "Pity the world, or else this glutton be,"
   ]
  }, 
  {
   "begin": "48.080", 
   "children": [], 
   "end": "53.240", 
   "id": "f000015", 
   "language": "eng", 
   "lines": [
    "To eat the world's due, by the grave and thee."
   ]
  }
 ]
}
for the input file:
1
From fairest creatures we desire increase,
That thereby beauty's rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou contracted to thine own bright eyes,
Feed'st thy light's flame with self-substantial fuel,
Making a famine where abundance lies,
Thy self thy foe, to thy sweet self too cruel:
Thou that art now the world's fresh ornament,
And only herald to the gaudy spring,
Within thine own bud buriest thy content,
And tender churl mak'st waste in niggarding:
Pity the world, or else this glutton be,
To eat the world's due, by the grave and thee.
Verbose Output And Logging To File¶
If you want more verbose output, you can pass the -v or --verbose switch:
$ python -m aeneas.tools.execute_task --example-json -v
[DEBU] CLI: Formal arguments: [u'/home/alberto/ebooks/cloned/rb/aeneas/aeneas/tools/execute_task.py', u'--example-json', u'-v']
[DEBU] CLI: Actual arguments: [u'--example-json']
[DEBU] CLI: Runtime configuration: 'allow_unlisted_languages=False|c_extensions=True|cew_subprocess_enabled=False|cew_subprocess_path=python|dtw_algorithm=stripe|dtw_margin=60.000|ffmpeg_path=ffmpeg|ffmpeg_sample_rate=16000|ffprobe_path=ffprobe|job_max_tasks=0|mfcc_emphasis_factor=0.97|mfcc_fft_order=512|mfcc_filters=40|mfcc_lower_frequency=133.3333|mfcc_size=13|mfcc_upper_frequency=6855.4976|mfcc_window_length=0.100|mfcc_window_length_l1=0.500|mfcc_window_length_l2=0.100|mfcc_window_length_l3=0.020|mfcc_window_shift=0.040|mfcc_window_shift_l1=0.200|mfcc_window_shift_l2=0.040|mfcc_window_shift_l3=0.005|nuance_tts_api_retry_attempts=5|nuance_tts_api_sleep=1.000|task_max_audio_length=7200.0|task_max_text_length=0|tts=espeak|tts_path=espeak|vad_extend_speech_after=0.000|vad_extend_speech_before=0.000|vad_log_energy_threshold=0.699|vad_min_nonspeech_length=0.200'
[INFO] CLI: Running example task with arguments:
  Audio file:    aeneas/tools/res/audio.mp3
  Text file:     aeneas/tools/res/plain.txt
  Config string: task_language=eng|is_text_type=plain|os_task_file_format=json
  Sync map file: output/sonnet.json
[INFO] Running example task with arguments:
  Audio file:    aeneas/tools/res/audio.mp3
  Text file:     aeneas/tools/res/plain.txt
  Config string: task_language=eng|is_text_type=plain|os_task_file_format=json
  Sync map file: output/sonnet.json
[INFO] CLI: Creating task...
[INFO] Creating task...
[DEBU] Task: Populate audio file...
[DEBU] Task: audio_file_path_absolute is None
[DEBU] Task: Populate audio file... done
[DEBU] Task: Populate text file...
[DEBU] Task: text_file_path_absolute and/or language is None
[DEBU] Task: Populate text file... done
[DEBU] Task: Populate audio file...
[DEBU] Task: audio_file_path_absolute is 'aeneas/tools/res/audio.mp3'
...
[DEBU] Task: Output sync map to output/sonnet.json
[DEBU] Task: sync_map_format is json
[DEBU] Task: page_ref is None
[DEBU] Task: audio_ref is None
[DEBU] Task: Calling sync_map.write...
[DEBU] Task: Calling sync_map.write... done
[INFO] CLI: Creating output sync map file... done
[INFO] Creating output sync map file... done
[SUCC] CLI: Created file 'output/sonnet.json'
[INFO] Created file 'output/sonnet.json'
[DEBU] CLI: Execution completed with code 0
There is also a -vv or --very-verbose switch
to increase the verbosity of the output.
Sometimes it is easier to dump the log to file, and then inspect it
with a text editor. To do so, just specify the -l switch:
$ python -m aeneas.tools.execute_task --example-json -l
[INFO] Running example task with arguments:
  Audio file:    aeneas/tools/res/audio.mp3
  Text file:     aeneas/tools/res/plain.txt
  Config string: task_language=eng|is_text_type=plain|os_task_file_format=json
  Sync map file: output/sonnet.json
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[INFO] Executing task... done
[INFO] Creating output sync map file...
[INFO] Creating output sync map file... done
[INFO] Created file 'output/sonnet.json'
[INFO] Log written to file '/tmp/tmpyS_VBv.log'
The path of the log file will be printed.
By default, the log file will be created in the temporary directory of your OS.
If you want your log file to be created at a specific path,
use --log=/path/to/your.log instead of -l.
Note that you can specify both -v/-vv and -l/--log.
Input Text Formats¶
aeneas is able to read several text file formats, listed in
TextFileFormat:
- PLAIN, one fragment per line (example:- --example-json):- Text of the first fragment Text of the second fragment Text of the third fragment 
- PARSED, one fragment per line, starting with an explicit identifier (example:- --example-tsv):- f001|Text of the first fragment f002|Text of the second fragment f003|Text of the third fragment 
- SUBTITLES, fragments separated by a blank line, each fragment might span multiple lines. This format is suitable for creating subtitle sync map files (example:- --example-srt):- Fragment on a single row Fragment on two rows because it is quite long Another one liner Another fragment on two rows 
- UNPARSED, XML file from which text fragments will be extracted by matching- idand/or- classattributes (example:- --example-smil):- <?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en" xml:lang="en"> <head> <meta charset="utf-8"/> <link rel="stylesheet" href="../Styles/style.css" type="text/css"/> <title>Sonnet I</title> </head> <body> <div id="divTitle"> <h1><span class="ra" id="f001">I</span></h1> </div> <div id="divSonnet"> <p> <span class="ra" id="f002">From fairest creatures we desire increase,</span><br/> <span class="ra" id="f003">That thereby beauty’s rose might never die,</span><br/> <span class="ra" id="f004">But as the riper should by time decease,</span><br/> <span class="ra" id="f005">His tender heir might bear his memory:</span><br/> <span class="ra" id="f006">But thou contracted to thine own bright eyes,</span><br/> <span class="ra" id="f007">Feed’st thy light’s flame with self-substantial fuel,</span><br/> <span class="ra" id="f008">Making a famine where abundance lies,</span><br/> <span class="ra" id="f009">Thy self thy foe, to thy sweet self too cruel:</span><br/> <span class="ra" id="f010">Thou that art now the world’s fresh ornament,</span><br/> <span class="ra" id="f011">And only herald to the gaudy spring,</span><br/> <span class="ra" id="f012">Within thine own bud buriest thy content,</span><br/> <span class="ra" id="f013">And tender churl mak’st waste in niggarding:</span><br/> <span class="ra" id="f014">Pity the world, or else this glutton be,</span><br/> <span class="ra" id="f015">To eat the world’s due, by the grave and thee.</span> </p> </div> </body> </html> 
- MPLAIN, the multilevel equivalent to PLAIN, with paragraphs separated by a blank line, one sentence per line, and words separated by blank spaces (example:- --example-mplain-json):- First sentence of Paragraph One. Second sentence of Paragraph One. First sentence of Paragraph Two. First sentence of Paragraph Three. Second sentence of Paragraph Three. Third sentence of Paragraph Three. 
- MUNPARSED, the multilevel equivalent to UNPARSED (example:- --example-munparsed-json):- <?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en" xml:lang="en"> <head> <meta charset="utf-8"/> <link rel="stylesheet" href="../Styles/style.css" type="text/css"/> <title>Sonnet I</title> </head> <body> <div id="divTitle"> <h1> <span id="p000001"> <span id="p000001s000001"> <span id="p000001s000001w000001">I</span> </span> </span> </h1> </div> <div id="divSonnet"> <p class="stanza" id="p000002"> <span id="p000002s000001"> <span id="p000002s000001w000001">From</span> <span id="p000002s000001w000002">fairest</span> <span id="p000002s000001w000003">creatures</span> <span id="p000002s000001w000004">we</span> <span id="p000002s000001w000005">desire</span> <span id="p000002s000001w000006">increase,</span> </span><br/> <span id="p000002s000002"> <span id="p000002s000002w000001">That</span> <span id="p000002s000002w000002">thereby</span> <span id="p000002s000002w000003">beauty’s</span> <span id="p000002s000002w000004">rose</span> <span id="p000002s000002w000005">might</span> <span id="p000002s000002w000006">never</span> <span id="p000002s000002w000007">die,</span> </span><br/> ... </p> ... </div> </body> </html> 
If you use UNPARSED files,
you need to provide the following additional parameters:
- at least one of PPN_TASK_IS_TEXT_UNPARSED_ID_REGEXandPPN_TASK_IS_TEXT_UNPARSED_CLASS_REGEX, to select the elements from which text will be considered;
- PPN_TASK_IS_TEXT_UNPARSED_ID_SORTto specify how extracted elements should be sorted, based on their- idattributes.
$ python -m aeneas.tools.execute_task --example-smil
[INFO] Running example task with arguments:
  Audio file:    aeneas/tools/res/audio.mp3
  Text file:     aeneas/tools/res/page.xhtml
  Config string: task_language=eng|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric|os_task_file_format=smil|os_task_file_smil_audio_ref=p001.mp3|os_task_file_smil_page_ref=p001.xhtml
  Sync map file: output/sonnet.smil
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[INFO] Executing task... done
[INFO] Creating output sync map file...
[INFO] Creating output sync map file... done
[INFO] Created file 'output/sonnet.smil'
Note
Even if you only specify the
PPN_TASK_IS_TEXT_UNPARSED_CLASS_REGEX
regex, your XML elements still need to have id attributes.
This is required for e.g. SMIL output to make sense.
(Although the EPUB 3 Media Overlays specification allows you
to specify an EPUB CFI instead of an id value,
it is recommended to use id values
for maximum reading system compatibility,
and hence aeneas only outputs SMIL files with id references.)
Similarly, for MUNPARSED files
you need to provide the following additional parameters:
- PPN_TASK_IS_TEXT_MUNPARSED_L1_ID_REGEX,
- PPN_TASK_IS_TEXT_MUNPARSED_L2_ID_REGEX, and
- PPN_TASK_IS_TEXT_MUNPARSED_L3_ID_REGEX.
$ python -m aeneas.tools.execute_task --example-munparsed-smil
[INFO] Running example task with arguments:
  Audio file:    aeneas/tools/res/audio.mp3
  Text file:     aeneas/tools/res/munparsed.xhtml
  Config string: task_language=eng|is_text_type=munparsed|is_text_munparsed_l1_id_regex=p[0-9]+|is_text_munparsed_l2_id_regex=p[0-9]+s[0-9]+|is_text_munparsed_l3_id_regex=p[0-9]+s[0-9]+w[0-9]+|os_task_file_format=smil|os_task_file_smil_audio_ref=p001.mp3|os_task_file_smil_page_ref=p001.xhtml
  Sync map file: output/sonnet.munparsed.smil
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[INFO] Executing task... done
[INFO] Creating output sync map file...
[INFO] Creating output sync map file... done
[INFO] Created file 'output/sonnet.munparsed.smil'
Note
If you are interested in synchronizing at word granularity, it is highly suggested to use:
- MFCC nonspeech masking;
- a multilevel text format, even if you are going to use only the timings for the finer granularity;
- better TTS engines, like Festival or AWS/Nuance TTS API;
as they generally yield more accurate timings.
(If you do not want the output sync map file to contain
the multilevel tree hierarchy for the timings,
you might “flatten” the output sync map file,
retaining only the word-level timings,
by using the configuration parameter
PPN_TASK_OS_FILE_LEVELS
with value 3).
Since aeneas v1.7.0,
the aeneas.tools.execute_task has a switch --presets-word
that enables MFCC nonspeech masking for single level tasks or
MFCC nonspeech masking on level 3 (word) for multilevel tasks.
For example:
$ python -m aeneas.tools.execute_task --example-words
$ python -m aeneas.tools.execute_task --example-words --presets-word
$ python -m aeneas.tools.execute_task --example-words-multilevel
$ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word
The other default settings should be fine for most users, however if you need finer control, feel free to experiment with the following parameters.
Starting with aeneas v1.5.1,
you can specify different MFCC parameters for each level, see:
- MFCC_WINDOW_LENGTH_L1,
- MFCC_WINDOW_SHIFT_L1,
- MFCC_WINDOW_LENGTH_L2,
- MFCC_WINDOW_SHIFT_L2,
- MFCC_WINDOW_LENGTH_L3,
- MFCC_WINDOW_SHIFT_L3.
Starting with aeneas v1.6.0,
you can also specify a different TTS engine for each level, see:
Starting with aeneas v1.7.0,
you can specify the MFCC nonspeech masking, for both
single level tasks and multilevel tasks.
In the latter case, you can apply it to each level separately, see:
If you are using a multilevel text format, you might want to enable MFCC masking only for level 3 (word), as enabling it for level 1 and 2 does not seem to yield significantly better results.
The aeneas mailing list contains some interesting threads
about using aeneas for word-level synchronization.
Output Sync Map Formats¶
aeneas is able to write the sync map into several formats, listed in
SyncMapFormat.
As for the input text, certain output sync map formats
require the user to specify additional parameters
to correctly create the output file.
For example,
SMIL
requires:
Example:
$ python -m aeneas.tools.execute_task --example-smil
[INFO] Running example task with arguments:
  Audio file:    aeneas/tools/res/audio.mp3
  Text file:     aeneas/tools/res/page.xhtml
  Config string: task_language=eng|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric|os_task_file_format=smil|os_task_file_smil_audio_ref=p001.mp3|os_task_file_smil_page_ref=p001.xhtml
  Sync map file: output/sonnet.smil
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[INFO] Executing task... done
[INFO] Creating output sync map file...
[INFO] Creating output sync map file... done
[INFO] Created file 'output/sonnet.smil'
Listing Parameter Names And Values¶
Since there are dozens of parameter names and values,
it is easy to forget their correct spelling.
You can use the --list-parameters switch to print
the list of parameter names that you can use in the configuration string.
$ python -m aeneas.tools.execute_task --list-parameters
[INFO] You can use --list-values=PARAM on parameters marked by '*'
[INFO] Parameters marked by 'REQ' are required
[INFO] Available parameters:
is_audio_file_detect_head_max           : detect audio head, at most this many seconds (float, None)
is_audio_file_detect_head_min           : detect audio head, at least this many seconds (float, None)
is_audio_file_detect_tail_max           : detect audio tail, at most this many seconds (float, None)
is_audio_file_detect_tail_min           : detect audio tail, at least this many seconds (float, None)
is_audio_file_head_length               : ignore this many seconds at begin of audio (float, None)
is_audio_file_process_length            : process this many seconds of audio (float, None)
is_audio_file_tail_length               : ignore this many seconds at end of audio (float, None)
is_text_file_ignore_regex               : for the alignment, ignore text matched by regex
is_text_file_transliterate_map          : for the alignment, apply this transliteration map to text
is_text_mplain_word_separator           : word separator (mplain)
is_text_munparsed_l1_id_regex           : regex matching level 1 id attributes (munparsed)
is_text_munparsed_l2_id_regex           : regex matching level 2 id attributes (munparsed)
is_text_munparsed_l3_id_regex           : regex matching level 3 id attributes (munparsed)
is_text_type                            : text format (REQ, *)
is_text_unparsed_class_regex            : regex matching class attributes (unparsed)
is_text_unparsed_id_regex               : regex matching id attributes (unparsed)
is_text_unparsed_id_sort                : algorithm to sort matched element (unparsed) (*)
os_task_file_eaf_audio_ref              : audio ref value (eaf)
os_task_file_format                     : sync map format (REQ, *)
os_task_file_head_tail_format           : audio head/tail format (*)
os_task_file_id_regex                   : regex to build sync map id's (subtitles, plain)
os_task_file_levels                     : output the specified levels only (mplain, munparserd)
os_task_file_name                       : sync map file name (ignored)
os_task_file_smil_audio_ref             : audio ref value (smil, smilh, smilm)
os_task_file_smil_page_ref              : text ref value (smil, smilh, smilm)
task_adjust_boundary_aftercurrent_value : offset value, in s (aftercurrent) (float, None)
task_adjust_boundary_algorithm          : algorithm to adjust sync map values (*)
task_adjust_boundary_beforenext_value   : offset value, in s (beforenext) (float, None)
task_adjust_boundary_no_zero            : if True, do not allow zero-length fragments (bool, None)
task_adjust_boundary_nonspeech_min      : minimum long nonspeech duration, in s (float, None)
task_adjust_boundary_nonspeech_string   : replace long nonspeech with this string or specify REMOVE
task_adjust_boundary_offset_value       : offset value, in s (offset) (float, None)
task_adjust_boundary_percent_value      : percent value in [0..100] (percent) (int, None)
task_adjust_boundary_rate_value         : max rate, in chars/s (rate, rateaggressive) (float, None)
task_custom_id                          : custom ID
task_description                        : description
task_language                           : language (REQ, *)
For parameters that accept a restricted set of values,
you can list the allowed values with --list-values=PARAM.
For example:
$ python -m aeneas.tools.execute_task --list-values
[INFO] Parameters for which values can be listed:
aws
espeak
espeak-ng
festival
is_text_type
is_text_unparsed_id_sort
nuance
os_task_file_format
os_task_file_head_tail_format
task_adjust_boundary_algorithm
task_language
$ python -m aeneas.tools.execute_task --list-values=is_text_type
[INFO] Available values for parameter 'is_text_type':
mplain
munparsed
parsed
plain
subtitles
unparsed
$ python -m aeneas.tools.execute_task --list-values=espeak
[INFO] Available values for parameter 'espeak':
af	    Afrikaans
afr	    Afrikaans
an	    Aragonese (not tested)
arg	    Aragonese (not tested)
...
yue     Yue Chinese (not tested)
zh	    Mandarin Chinese (not tested)
zh-yue	Yue Chinese (not tested)
zho	    Chinese (not tested)
Downloading Audio From YouTube¶
aeneas can download the audio stream from a YouTube video.
Instead of the audio file path, you provide the YouTube URL,
and add the -y switch at the end:
$ python -m aeneas.tools.execute_task --example-youtube
[INFO] Running example task with arguments:
  YouTube URL:   https://www.youtube.com/watch?v=rU4a7AA8wM0
  Text file:     aeneas/tools/res/plain.txt
  Config string: task_language=eng|is_text_type=plain|os_task_file_format=txt
  Sync map file: output/sonnet.txt
  Options:       -y
[INFO] Downloading audio from 'https://www.youtube.com/watch?v=rU4a7AA8wM0' ...
[INFO] Downloading audio from 'https://www.youtube.com/watch?v=rU4a7AA8wM0' ... done
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[INFO] Executing task... done
[INFO] Creating output sync map file...
[INFO] Creating output sync map file... done
[INFO] Created file 'output/sonnet.txt'
Warning
The download feature is experimental, and it might be unavailable in the future, for example if YouTube disables API access to audio/video contents. Also note that sometimes the download fails for network/backend reasons: just wait a few seconds and try executing again.
The Runtime Configuration¶
Although the default settings should be fine for most users, sometimes it might be useful to modify certain internal parameters affecting the processing of tasks, for example changing the directory where temporary files are created, modifying processing parameters like the time resolution, etc.
To do so, the user can use the -r or --runtime-configuration switch,
providing a suitable configuration string as its value.
Warning
Using the runtime configuration switch is advisable only to expert users or if explicitly suggested by expert users, since there are (almost) no sanity checks on the values provided this way, and setting wrong values might lead to erratic behaviors of the aligner.
The available paramenter names are listed in
RuntimeConfiguration.
Examples:
- disable checks on the language codes: - python -m aeneas.tools.execute_task --example-json -r="allow_unlisted_languages=True" 
- disable the Python C/C++ extensions, running the pure Python code: - python -m aeneas.tools.execute_task --example-json -r="c_extensions=False" 
- disable only the - cewPython C/C++ extension, while- cdtwand- cmfccwill still run (if compiled):- python -m aeneas.tools.execute_task --example-json -r="cew=False" 
- set the DTW margin to - 10.000seconds:- python -m aeneas.tools.execute_task --example-json -r="dtw_margin=10" 
- specify the path to the - ffprobeand- ffmpegexecutables:- python -m aeneas.tools.execute_task --example-json -r="ffmpeg_path=/path/to/my/ffmpeg|ffprobe_path=/path/to/my/ffprobe" 
- set the time resolution of the aligner to - 0.050seconds:- python -m aeneas.tools.execute_task --example-json -r="mfcc_window_length=0.150|mfcc_window_shift=0.050" 
- use the eSpeak-ng TTS, via the - espeak-ngexecutable available on- $PATH, instead of eSpeak:- python -m aeneas.tools.execute_task --example-json -r="tts=espeak-ng" 
- use the eSpeak-ng TTS, via the - espeak-ngexecutable at a custom location, instead of eSpeak:- python -m aeneas.tools.execute_task --example-json -r="tts=espeak-ng|tts_path=/path/to/espeak-ng" 
- use the Festival TTS, via the - text2waveexecutable available on- $PATH, instead of eSpeak:- python -m aeneas.tools.execute_task --example-json -r="tts=festival" 
- use the Festival TTS, via the - text2waveexecutable at a custom location, instead of eSpeak:- python -m aeneas.tools.execute_task --example-json -r="tts=festival|tts_path=/path/to/text2wave" 
- use the AWS Polly TTS API instead of eSpeak (with TTS caching enabled): - python -m aeneas.tools.execute_task --example-json -r="tts=aws|tts_cache=True" 
- use the Nuance TTS API instead of eSpeak (with TTS caching enabled): - python -m aeneas.tools.execute_task --example-json -r="tts=nuance|nuance_tts_api_id=YOUR_NUANCE_API_ID|nuance_tts_api_key=YOUR_NUANCE_API_KEY|tts_cache=True" 
- use a custom TTS wrapper located at - /path/to/your/wrapper.py(see the- aeneas/extra/directory for examples):- python -m aeneas.tools.execute_task --example-json -r="tts=custom|tts_path=/path/to/your/wrapper.py" 
- set the temporary directory: - python -m aeneas.tools.execute_task --example-json -r="tmp_path=/path/to/tmp/" 
- allow processing tasks with audio files at most 1 hour (= 3600 seconds) long: - python -m aeneas.tools.execute_task --example-json -r="task_max_audio_length=3600" 
Miscellanea¶
- --example-head-tail: ignore the first- 0.400seconds and the last- 0.500seconds of the audio file for alignment purposes
- --example-no-zero: ensure that no fragment in the output sync map has zero length
- --example-percent: adjust the output sync map, setting each boundary between adjacent fragments to the middle of the nonspeech interval, using the- PERCENTalgorithm with value- 50(i.e.,- 50%)
- --example-rate: adjust the output sync map, trying to ensure that no fragment has a rate of more than- 14character/s, using the- RATEalgorithm
- --example-sd: detect the audio head/tail, each at most- 10.000seconds long
- --example-multilevel-tts: use different TTS engines for different levels (- mplainmultilevel input text)
Processing Jobs¶
If you have several Tasks sharing the same parameters (configuration strings) and changing only in their audio/text files, you can either write your own Bash/BAT script, or you might want to create a Job:
Job
A Job is a container (compressed file or uncompressed directory), containing:
- one or more pairs audio/text files, and
- a configuration file (config.txtorconfig.xml) specifying parameters to locate each Task assets inside the Job, to process each Task, and to create the output container containing the output sync map files.
Example: /home/rb/job.zip, containing the following files,
corresponding to three Tasks:
.
├── config.txt
└── OEBPS
    └── Resources
        ├── sonnet001.mp3
        ├── sonnet001.txt
        ├── sonnet002.mp3
        ├── sonnet002.txt
        ├── sonnet003.mp3
        └── sonnet003.txt
The aeneas.tools.execute_job tool processes a Job
and writes the corresponding output container to file.
Therefore, it requires at least two arguments:
- the path of the input job container;
- the path of an existing directory where the output container should be created.
The --help, -v, -l, and -r switches
have the same meaning for aeneas.tools.execute_job
as described above. For example, the help message reads:
$ python -m aeneas.tools.execute_job
NAME
  execute_job - Execute a Job, passed as a container.
SYNOPSIS
  python -m aeneas.tools.execute_job [-h|--help|--help-rconf|--version]
  python -m aeneas.tools.execute_job --list-parameters
  python -m aeneas.tools.execute_job CONTAINER OUTPUT_DIR [CONFIG_STRING] [OPTIONS]
OPTIONS
  --cewsubprocess : run cew in separate process (see docs)
  --help : print full help and exit
  --help-rconf : list all runtime configuration parameters
  --skip-validator : do not validate the given container and/or config string
  --version : print the program name and version and exit
  -h : print short help and exit
  -l[=FILE], --log[=FILE] : log verbose output to tmp file or FILE if specified
  -r=CONF, --runtime-configuration=CONF : apply runtime configuration CONF
  -v, --verbose : verbose output
  -vv, --very-verbose : verbose output, print date/time values
EXAMPLES
  python -m aeneas.tools.execute_job aeneas/tools/res/job.zip output/
  python -m aeneas.tools.execute_job aeneas/tools/res/job.zip output/ --cewsubprocess
  python -m aeneas.tools.execute_job aeneas/tools/res/job_no_config.zip output/ "is_hierarchy_type=flat|is_hierarchy_prefix=assets/|is_text_file_relative_path=.|is_text_file_name_regex=.*\.xhtml|is_text_type=unparsed|is_audio_file_relative_path=.|is_audio_file_name_regex=.*\.mp3|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric|os_job_file_name=demo_sync_job_output|os_job_file_container=zip|os_job_file_hierarchy_type=flat|os_job_file_hierarchy_prefix=assets/|os_task_file_name=\$PREFIX.xhtml.smil|os_task_file_format=smil|os_task_file_smil_page_ref=\$PREFIX.xhtml|os_task_file_smil_audio_ref=../Audio/\$PREFIX.mp3|job_language=eng|job_description=Demo Sync Job"
Currently aeneas.tools.execute_job does not have
built-in examples shortcuts (--example-*),
but you can run a built-in example:
$ python -m aeneas.tools.execute_job aeneas/tools/res/job.zip output/
[INFO] Validating the container (specify --skip-validator to bypass)...
[INFO] Validating the container... done
[INFO] Loading job from container...
[INFO] Loading job from container... done
[INFO] Executing...
[INFO] Executing... done
[INFO] Creating output container...
[INFO] Creating output container... done
[INFO] Created output file 'output/demo_sync_job_output.zip'
TXT Config File (config.txt)¶
A ZIP container with the following files:
.
├── config.txt
└── OEBPS
    └── Resources
        ├── sonnet001.mp3
        ├── sonnet001.txt
        ├── sonnet002.mp3
        ├── sonnet002.txt
        ├── sonnet003.mp3
        └── sonnet003.txt
where the config.txt config file reads:
is_hierarchy_type=flat
is_hierarchy_prefix=OEBPS/Resources/
is_text_file_relative_path=.
is_text_file_name_regex=.*\.txt
is_text_type=parsed
is_audio_file_relative_path=.
is_audio_file_name_regex=.*\.mp3
os_job_file_name=output_example1
os_job_file_container=zip
os_job_file_hierarchy_type=flat
os_job_file_hierarchy_prefix=OEBPS/Resources/
os_task_file_name=$PREFIX.smil
os_task_file_format=smil
os_task_file_smil_page_ref=$PREFIX.xhtml
os_task_file_smil_audio_ref=$PREFIX.mp3
job_language=en
job_description=Example 1 (flat hierarchy, parsed text files)
will generate three tasks (sonnet001, sonnet002 and sonnet003),
output a SMIL file for each of them,
finally compress them in a ZIP file with the following structure:
.
└── OEBPS
    └── Resources
        ├── sonnet001.smil
        ├── sonnet002.smil
        └── sonnet003.smil
Note that the paths in config.txt are relative to
(the directory containing) the config.txt file,
and that you can use the PPV_OS_TASK_PREFIX
placeholder ($PREFIX) that will be replaced with the Task id.
XML Config File (config.xml)¶
While config.txt is concise and easy to write,
it constraints all the tasks of the job to share the same
execution settings (language, output format, and so on).
If you need to specify different values for execution parameters
of different tasks, you must use an XML config file,
named config.xml.
The following config.xml is equivalent to the example above:
<?xml version = "1.0" encoding="UTF-8" standalone="no"?>
<job>
    <job_language>en</job_language>
    <job_description>Example 4 (XML, flat hierarchy, parsed text files)</job_description>
    <tasks>
        <task>
            <task_language>en</task_language>
            <task_description>Sonnet 1</task_description>
            <task_custom_id>sonnet001</task_custom_id>
            <is_text_file>OEBPS/Resources/sonnet001.txt</is_text_file>
            <is_text_type>parsed</is_text_type>
            <is_audio_file>OEBPS/Resources/sonnet001.mp3</is_audio_file>
            <os_task_file_name>sonnet001.smil</os_task_file_name>
            <os_task_file_format>smil</os_task_file_format>
            <os_task_file_smil_page_ref>sonnet001.xhtml</os_task_file_smil_page_ref>
            <os_task_file_smil_audio_ref>sonnet001.mp3</os_task_file_smil_audio_ref>
        </task>
        <task>
            <task_language>en</task_language>
            <task_description>Sonnet 2</task_description>
            <task_custom_id>sonnet002</task_custom_id>
            <is_text_file>OEBPS/Resources/sonnet002.txt</is_text_file>
            <is_text_type>parsed</is_text_type>
            <is_audio_file>OEBPS/Resources/sonnet002.mp3</is_audio_file>
            <os_task_file_name>sonnet002.smil</os_task_file_name>
            <os_task_file_format>smil</os_task_file_format>
            <os_task_file_smil_page_ref>sonnet002.xhtml</os_task_file_smil_page_ref>
            <os_task_file_smil_audio_ref>sonnet002.mp3</os_task_file_smil_audio_ref>
        </task>
        <task>
            <task_language>en</task_language>
            <task_description>Sonnet 3</task_description>
            <task_custom_id>sonnet003</task_custom_id>
            <is_text_file>OEBPS/Resources/sonnet003.txt</is_text_file>
            <is_text_type>parsed</is_text_type>
            <is_audio_file>OEBPS/Resources/sonnet003.mp3</is_audio_file>
            <os_task_file_name>sonnet003.smil</os_task_file_name>
            <os_task_file_format>smil</os_task_file_format>
            <os_task_file_smil_page_ref>sonnet003.xhtml</os_task_file_smil_page_ref>
            <os_task_file_smil_audio_ref>sonnet003.mp3</os_task_file_smil_audio_ref>
        </task>
    </tasks>
    <os_job_file_name>output_example4</os_job_file_name>
    <os_job_file_container>zip</os_job_file_container>
    <os_job_file_hierarchy_type>flat</os_job_file_hierarchy_type>
    <os_job_file_hierarchy_prefix>OEBPS/Resources/</os_job_file_hierarchy_prefix>
</job>
Now note that config.xml allows you to bundle together
Tasks with different languages, output formats, etc.:
<?xml version = "1.0" encoding="UTF-8" standalone="no"?>
<job>
    <job_language>en</job_language>
    <job_description>Example 7 (XML, multiple languages, multiple output formats)</job_description>
    <tasks>
        <task>
            <task_language>en</task_language>
            <task_description>Sonnet 1</task_description>
            <task_custom_id>sonnet001</task_custom_id>
            <is_text_file>OEBPS/Resources/en.txt</is_text_file>
            <is_text_type>parsed</is_text_type>
            <is_audio_file>OEBPS/Resources/en.mp3</is_audio_file>
            <os_task_file_name>en.smil</os_task_file_name>
            <os_task_file_format>smil</os_task_file_format>
            <os_task_file_smil_page_ref>en.xhtml</os_task_file_smil_page_ref>
            <os_task_file_smil_audio_ref>en.mp3</os_task_file_smil_audio_ref>
        </task>
        <task>
            <task_language>de</task_language>
            <task_description>Simplicissimus</task_description>
            <task_custom_id>simplicissimus</task_custom_id>
            <is_text_file>OEBPS/Resources/de.txt</is_text_file>
            <is_text_type>parsed</is_text_type>
            <is_audio_file>OEBPS/Resources/de.mp3</is_audio_file>
            <os_task_file_name>de.csv</os_task_file_name>
            <os_task_file_format>csv</os_task_file_format>
        </task>
        <task>
            <task_language>es</task_language>
            <task_description>Capitan Veneno</task_description>
            <task_custom_id>capitan veneno</task_custom_id>
            <is_text_file>OEBPS/Resources/es.txt</is_text_file>
            <is_text_type>parsed</is_text_type>
            <is_audio_file>OEBPS/Resources/es.mp3</is_audio_file>
            <os_task_file_name>es.srt</os_task_file_name>
            <os_task_file_format>srt</os_task_file_format>
        </task>
    </tasks>
    <os_job_file_name>output_example7</os_job_file_name>
    <os_job_file_container>zip</os_job_file_container>
    <os_job_file_hierarchy_type>flat</os_job_file_hierarchy_type>
    <os_job_file_hierarchy_prefix>OEBPS/Resources/</os_job_file_hierarchy_prefix>
</job>