.. _clitutorial:

aeneas Built-in Command Line Tools Tutorial
===========================================

This tutorial explains how to process tasks and jobs with
the command line tools ``aeneas.tools.execute_task`` and ``aeneas.tools.execute_job``.

(If you are interested in using ``aeneas``
as a Python package in your own application,
please consult the :ref:`libtutorial`.)

Processing Tasks
~~~~~~~~~~~~~~~~

First, we need some definitions:

.. topic:: Audio File

    An audio file is a file on disk containing audio data,
    usually text narrated by a human being.
    The audio format can be any of those supported by ``ffprobe`` and ``ffmpeg``,
    including: FLAC, MP3, MP4/AAC, OGG, WAVE, etc.

    Example: ``/home/rb/audio.mp3``

.. topic:: Text File

    A text file is a file on disk containing the textual data
    to be aligned with a matching audio file.
    The format of the text file can be any format listed in :data:`~aeneas.textfile.TextFileFormat.ALLOWED_VALUES`.
    The contents of the text file define, explicitly or implicity,
    a segmentation of the entire text into fragments,
    which can have arbitrary granularity (paragraph, sentence, sub-sentence, word, etc.),
    can be nested in a hierarchical structure,
    can consist of multiple lines,
    and can be associated to unique identifiers.
    Certain input formats require the user to specify
    additional parameters to parse the input file.

    Example of a text file ``/home/rb/text.txt`` in :data:`~aeneas.textfile.TextFileFormat.PLAIN` format, with three fragments::

        Text of the first fragment
        Text of the second fragment
        Text of the third fragment    

.. topic:: Sync Map File

    A sync map file is a file on disk
    which expresses the correspondence between an audio file and a text file.
    Specifically, for each fragment in the text file,
    it declares a time interval in the audio file where
    the text of the fragment is spoken.
    The actual format of the sync map file depends on the intended application.
    Available formats are listed in :data:`~aeneas.syncmap.SyncMapFormat.ALLOWED_VALUES`.
    Text fragments can be represented by the full text and/or by their unique idenfiers.

    Example of a sync map file in :data:`~aeneas.syncmap.SyncMapFormat.CSV` format::

        f001,0.000,1.234,First fragment text
        f002,1.234,5.678,Second fragment text
        f003,5.678,7.890,Third fragment text

.. topic:: Task

    A Task is a triple ``(audio file, text file, parameters)``.
    When a task is processed (executed), a sync map is computed
    for the given audio and text files.
    The ``parameters`` control how the alignment is computed, for example:

    * specifying the language and the format of the input text;
    * setting the format of the sync map file to be output;
    * excluding the head/tail of the audio file because they contain speech not present in the text;
    * modifying the time step of the aligner;
    * etc.

    Example (continued):

        * audio file: ``/home/rb/audio.mp3``
        * text file: ``/home/rb/text.txt``
        * parameters:
            * text in PLAIN format
            * language is ENGLISH
            * output in JSON format

The ``aeneas.tools.execute_task`` tool processes a Task
and writes the corresponding sync map to file.
Therefore, it requires at least four arguments:

* the path of the input audio file;
* the path of the input text file;
* the parameters, formatted as a ``key1=value1|key2=value2|...|keyN=valueN`` string;
* the path of the sync map to be created.

Showing Help Messages
---------------------

If you execute the program without arguments,
it will print the following help message:

.. literalinclude:: _static/execute_task_help.txt
    :language: text

If you pass the ``--help`` argument,
it will print a slightly more verbose version:

.. literalinclude:: _static/execute_task_help_arg.txt
    :language: text

Showing And Running Built-In Examples
-------------------------------------

aeneas includes some example input files which cover common use cases,
enabling the user to run live examples.
To list them, pass the ``--examples`` switch:

.. literalinclude:: _static/execute_task_examples.txt
    :language: text

Similarly, the ``--examples-all`` switch prints a list
of more than twenty built-in examples,
covering more specific input/output/parameter combinations.

.. literalinclude:: _static/execute_task_examples_all.txt
    :language: text

Running a built-in example can help learning quickly all the options/parameters
available in ``aeneas``.

For example, passing the ``--example-json`` switch will produce:

.. literalinclude:: _static/execute_task_example_json.txt
    :language: text

.. warning::

    If the above command generates an error, be sure to have
    a directory named ``output`` in your current working directory.
    If one does not exist, create it.

As you can see in the example above, built-in examples
will print the command line arguments they shortcut.
Therefore, the example above is essentially equivalent to:

.. literalinclude:: _static/execute_task_example_json_2.txt
    :language: text

.. note::

    There is a formal difference: when running an example,
    no validation of the input files and parameters is performed.
    In fact, by default they are validated using a
    :class:`~aeneas.validator.Validator` object,
    created and run automatically for you.
    If a validation error occurs,
    the execution of the Task does not begin.
    You can override this safety check with the ``--skip-validator`` switch.

In both cases, a new file ``output/sonnet.json`` is created,
containing the sync map in JSON format:

.. literalinclude:: _static/execute_task_example_json_output.txt
    :language: json

for the input file:

.. literalinclude:: _static/execute_task_example_json_input.txt
    :language: text

Verbose Output And Logging To File
----------------------------------

If you want more verbose output, you can pass the ``-v`` or ``--verbose`` switch:

.. literalinclude:: _static/execute_task_example_json_verbose.txt
    :language: text

There is also a ``-vv`` or ``--very-verbose`` switch
to increase the verbosity of the output.

Sometimes it is easier to dump the log to file, and then inspect it
with a text editor. To do so, just specify the ``-l`` switch:

.. literalinclude:: _static/execute_task_example_json_log.txt
    :language: text

The path of the log file will be printed.
By default, the log file will be created in the temporary directory of your OS.
If you want your log file to be created at a specific path,
use ``--log=/path/to/your.log`` instead of ``-l``.

Note that you can specify both ``-v``/``-vv`` and ``-l``/``--log``.

Input Text Formats
------------------

``aeneas`` is able to read several text file formats, listed in
:class:`~aeneas.textfile.TextFileFormat`:

#. :data:`~aeneas.textfile.TextFileFormat.PLAIN`,
   one fragment per line
   (example: ``--example-json``):

   .. code-block:: text
    
    Text of the first fragment
    Text of the second fragment
    Text of the third fragment
   
#. :data:`~aeneas.textfile.TextFileFormat.PARSED`,
   one fragment per line, starting with an explicit identifier
   (example: ``--example-tsv``):
    
   .. code-block:: text
    
    f001|Text of the first fragment
    f002|Text of the second fragment
    f003|Text of the third fragment
   

#. :data:`~aeneas.textfile.TextFileFormat.SUBTITLES`,
   fragments separated by a blank line, each fragment
   might span multiple lines. This format is suitable
   for creating subtitle sync map files
   (example: ``--example-srt``):

   .. code-block:: text
    
    Fragment on a single row

    Fragment on two rows
    because it is quite long

    Another one liner

    Another fragment
    on two rows

#. :data:`~aeneas.textfile.TextFileFormat.UNPARSED`,
   XML file from which text fragments will be extracted
   by matching ``id`` and/or ``class`` attributes
   (example: ``--example-smil``):

   .. literalinclude:: _static/unparsed.xhtml
    :language: xml

#. :data:`~aeneas.textfile.TextFileFormat.MPLAIN`,
   the multilevel equivalent to PLAIN,
   with paragraphs separated by a blank line,
   one sentence per line,
   and words separated by blank spaces
   (example: ``--example-mplain-json``):

   .. code-block:: text
    
    First sentence of Paragraph One.
    Second sentence of Paragraph One.

    First sentence of Paragraph Two.

    First sentence of Paragraph Three.
    Second sentence of Paragraph Three.
    Third sentence of Paragraph Three.

#. :data:`~aeneas.textfile.TextFileFormat.MUNPARSED`,
   the multilevel equivalent to UNPARSED
   (example: ``--example-munparsed-json``):

   .. literalinclude:: _static/munparsed.xhtml
    :language: xml

If you use :data:`~aeneas.textfile.TextFileFormat.UNPARSED` files,
you need to provide the following additional parameters:

* at least one of :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_UNPARSED_ID_REGEX`
  and :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_UNPARSED_CLASS_REGEX`,
  to select the elements from which text will be considered;
* :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_UNPARSED_ID_SORT`
  to specify how extracted elements should be sorted, based on their ``id`` attributes.

.. literalinclude:: _static/execute_task_example_smil.txt
    :language: text

.. note::
    Even if you only specify the
    :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_UNPARSED_CLASS_REGEX`
    regex, your XML elements still need to have ``id`` attributes.
    This is required for e.g. SMIL output to make sense.
    (Although the EPUB 3 Media Overlays specification allows you
    to specify an EPUB CFI instead of an ``id`` value,
    it is recommended to use ``id`` values
    for maximum reading system compatibility,
    and hence ``aeneas`` only outputs SMIL files with ``id`` references.)

Similarly, for :data:`~aeneas.textfile.TextFileFormat.MUNPARSED` files
you need to provide the following additional parameters:

* :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_MUNPARSED_L1_ID_REGEX`, 
* :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_MUNPARSED_L2_ID_REGEX`, and
* :data:`~aeneas.globalconstants.PPN_TASK_IS_TEXT_MUNPARSED_L3_ID_REGEX`.

.. literalinclude:: _static/execute_task_example_munparsed.txt
    :language: text

.. note::
    If you are interested in synchronizing at **word granularity**,
    it is highly suggested to use:
   
    1. MFCC nonspeech masking;
    2. a **multilevel text format**,
       even if you are going to use only the timings for the finer granularity;
    3. better TTS engines, like Festival or AWS/Nuance TTS API;

    as they generally yield more accurate timings.

    (If you do not want the output sync map file to contain
    the multilevel tree hierarchy for the timings,
    you might "flatten" the output sync map file,
    retaining only the word-level timings,
    by using the configuration parameter
    :data:`~aeneas.globalconstants.PPN_TASK_OS_FILE_LEVELS`
    with value ``3``).

    Since ``aeneas`` v1.7.0,
    the ``aeneas.tools.execute_task`` has a switch ``--presets-word``
    that enables MFCC nonspeech masking for single level tasks or
    MFCC nonspeech masking on level 3 (word) for multilevel tasks.
    For example::

        $ python -m aeneas.tools.execute_task --example-words
        $ python -m aeneas.tools.execute_task --example-words --presets-word
        $ python -m aeneas.tools.execute_task --example-words-multilevel
        $ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word

    The other default settings should be fine for most users,
    however if you need finer control, feel free to experiment
    with the following parameters.

    Starting with ``aeneas`` v1.5.1,
    you can specify different MFCC parameters for each level, see:

    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_WINDOW_LENGTH_L1`,
    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_WINDOW_SHIFT_L1`,
    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_WINDOW_LENGTH_L2`,
    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_WINDOW_SHIFT_L2`,
    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_WINDOW_LENGTH_L3`,
    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_WINDOW_SHIFT_L3`.
    
    Starting with ``aeneas`` v1.6.0,
    you can also specify a different TTS engine for each level, see:

    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.TTS_L1`,
    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.TTS_L2`,
    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.TTS_L3`.

    Starting with ``aeneas`` v1.7.0,
    you can specify the MFCC nonspeech masking, for both
    single level tasks and multilevel tasks.
    In the latter case, you can apply it to each level separately, see:

    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_MASK_NONSPEECH`,
    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_MASK_NONSPEECH_L1`,
    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_MASK_NONSPEECH_L2`,
    * :data:`~aeneas.runtimeconfiguration.RuntimeConfiguration.MFCC_MASK_NONSPEECH_L3`.
    
    If you are using a multilevel text format,
    you might want to enable MFCC masking only for level 3 (word),
    as enabling it for level 1 and 2 does not seem to yield significantly
    better results.

    The ``aeneas`` mailing list contains some interesting threads
    about using aeneas for word-level synchronization.

Output Sync Map Formats
-----------------------

``aeneas`` is able to write the sync map into several formats, listed in
:class:`~aeneas.syncmap.SyncMapFormat`.

As for the input text, certain output sync map formats
require the user to specify additional parameters
to correctly create the output file.
For example,
:data:`~aeneas.syncmap.SyncMapFormat.SMIL`
requires:

* :data:`~aeneas.globalconstants.PPN_TASK_OS_FILE_SMIL_AUDIO_REF` and
* :data:`~aeneas.globalconstants.PPN_TASK_OS_FILE_SMIL_PAGE_REF`.

Example:

.. literalinclude:: _static/execute_task_example_smil.txt
    :language: text

Listing Parameter Names And Values
----------------------------------

Since there are dozens of parameter names and values,
it is easy to forget their correct spelling.
You can use the ``--list-parameters`` switch to print
the list of parameter names that you can use in the configuration string.

.. literalinclude:: _static/execute_task_list_parameters.txt
    :language: text

For parameters that accept a restricted set of values,
you can list the allowed values with ``--list-values=PARAM``.
For example:

.. literalinclude:: _static/execute_task_list_values.txt
    :language: text

Downloading Audio From YouTube
------------------------------

``aeneas`` can download the audio stream from a YouTube video.
Instead of the audio file path, you provide the YouTube URL,
and add the ``-y`` switch at the end:

.. literalinclude:: _static/execute_task_youtube.txt
    :language: text

.. warning::

    The download feature is experimental,
    and it might be unavailable in the future,
    for example if YouTube disables API access
    to audio/video contents.
    Also note that sometimes the download fails
    for network/backend reasons: just wait a few seconds
    and try executing again.

The Runtime Configuration
-------------------------

Although the default settings should be fine for most users,
sometimes it might be useful to modify certain internal parameters
affecting the processing of tasks, for example
changing the directory where temporary files are created,
modifying processing parameters like the time resolution, etc.

To do so, the user can use the ``-r`` or ``--runtime-configuration`` switch,
providing a suitable configuration string as its value.

.. warning::

    Using the runtime configuration switch is advisable
    only to expert users or if explicitly suggested by expert users,
    since there are (almost) no sanity checks on the values provided
    this way, and setting wrong values might lead to erratic behaviors
    of the aligner.

The available paramenter names are listed in
:class:`~aeneas.runtimeconfiguration.RuntimeConfiguration`.

Examples:

#. disable checks on the language codes:

    .. code-block:: text

        python -m aeneas.tools.execute_task --example-json -r="allow_unlisted_languages=True"

#. disable the Python C/C++ extensions, running the pure Python code:

    .. code-block:: text

        python -m aeneas.tools.execute_task --example-json -r="c_extensions=False"

#. disable only the ``cew`` Python C/C++ extension, while ``cdtw`` and ``cmfcc`` will still run (if compiled):

    .. code-block:: text

        python -m aeneas.tools.execute_task --example-json -r="cew=False"

#. set the DTW margin to ``10.000`` seconds:

    .. code-block:: text

        python -m aeneas.tools.execute_task --example-json -r="dtw_margin=10"

#. specify the path to the ``ffprobe`` and ``ffmpeg`` executables:

    .. code-block:: text

        python -m aeneas.tools.execute_task --example-json -r="ffmpeg_path=/path/to/my/ffmpeg|ffprobe_path=/path/to/my/ffprobe"

#. set the time resolution of the aligner to ``0.050`` seconds:

    .. code-block:: text

        python -m aeneas.tools.execute_task --example-json -r="mfcc_window_length=0.150|mfcc_window_shift=0.050"

#. use the eSpeak-ng TTS, via the ``espeak-ng`` executable available on ``$PATH``, instead of eSpeak:

    .. code-block:: text

        python -m aeneas.tools.execute_task --example-json -r="tts=espeak-ng"

#. use the eSpeak-ng TTS, via the ``espeak-ng`` executable at a custom location, instead of eSpeak:

    .. code-block:: text

        python -m aeneas.tools.execute_task --example-json -r="tts=espeak-ng|tts_path=/path/to/espeak-ng"

#. use the Festival TTS, via the ``text2wave`` executable available on ``$PATH``, instead of eSpeak:

    .. code-block:: text

        python -m aeneas.tools.execute_task --example-json -r="tts=festival"

#. use the Festival TTS, via the ``text2wave`` executable at a custom location, instead of eSpeak:

    .. code-block:: text

        python -m aeneas.tools.execute_task --example-json -r="tts=festival|tts_path=/path/to/text2wave"

#. use the AWS Polly TTS API instead of eSpeak (with TTS caching enabled):

    .. code-block:: text

        python -m aeneas.tools.execute_task --example-json -r="tts=aws|tts_cache=True"

#. use the Nuance TTS API instead of eSpeak (with TTS caching enabled):

    .. code-block:: text

        python -m aeneas.tools.execute_task --example-json -r="tts=nuance|nuance_tts_api_id=YOUR_NUANCE_API_ID|nuance_tts_api_key=YOUR_NUANCE_API_KEY|tts_cache=True"

#. use a custom TTS wrapper located at ``/path/to/your/wrapper.py`` (see the ``aeneas/extra/`` directory for examples):

    .. code-block:: text

        python -m aeneas.tools.execute_task --example-json -r="tts=custom|tts_path=/path/to/your/wrapper.py"

#. set the temporary directory:

    .. code-block:: text

        python -m aeneas.tools.execute_task --example-json -r="tmp_path=/path/to/tmp/"

#. allow processing tasks with audio files at most 1 hour (= 3600 seconds) long:

    .. code-block:: text

        python -m aeneas.tools.execute_task --example-json -r="task_max_audio_length=3600"

Miscellanea
-----------

#. ``--example-head-tail``: ignore the first ``0.400`` seconds and
   the last ``0.500`` seconds of the audio file for alignment purposes
#. ``--example-no-zero``: ensure that no fragment in the output sync map has zero length
#. ``--example-percent``: adjust the output sync map,
   setting each boundary between adjacent fragments to the middle of the nonspeech interval,
   using the :data:`~aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm.PERCENT` algorithm
   with value ``50`` (i.e., ``50%``)
#. ``--example-rate``: adjust the output sync map, trying to ensure that no fragment has
   a rate of more than ``14`` character/s,
   using the :data:`~aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm.RATE` algorithm
#. ``--example-sd``: detect the audio head/tail, each at most ``10.000`` seconds long
#. ``--example-multilevel-tts``: use different TTS engines for different levels
   (``mplain`` multilevel input text)

Processing Jobs
~~~~~~~~~~~~~~~

If you have several Tasks sharing the same parameters (configuration strings)
and changing only in their audio/text files,
you can either write your own Bash/BAT script, or
you might want to create a Job:

.. topic:: Job

    A Job is a container (compressed file or uncompressed directory),
    containing:
    
    * one or more pairs audio/text files, and
    * a configuration file (``config.txt`` or ``config.xml``)
      specifying parameters to locate each Task assets inside the Job,
      to process each Task, and to create the output container
      containing the output sync map files.

    Example: ``/home/rb/job.zip``, containing the following files,
    corresponding to three Tasks:

    .. code-block:: text

        .
        ├── config.txt
        └── OEBPS
            └── Resources
                ├── sonnet001.mp3
                ├── sonnet001.txt
                ├── sonnet002.mp3
                ├── sonnet002.txt
                ├── sonnet003.mp3
                └── sonnet003.txt

The ``aeneas.tools.execute_job`` tool processes a Job
and writes the corresponding output container to file.
Therefore, it requires at least two arguments:

* the path of the input job container;
* the path of an existing directory where the output container should be created.

The ``--help``, ``-v``, ``-l``, and ``-r`` switches
have the same meaning for ``aeneas.tools.execute_job``
as described above. For example, the help message reads:

.. literalinclude:: _static/execute_job_help.txt
    :language: text

Currently ``aeneas.tools.execute_job`` does not have
built-in examples shortcuts (``--example-*``),
but you can run a built-in example:

.. literalinclude:: _static/execute_job_example.txt
    :language: text

TXT Config File (``config.txt``)
--------------------------------

A ZIP container with the following files:

.. code-block:: text

    .
    ├── config.txt
    └── OEBPS
        └── Resources
            ├── sonnet001.mp3
            ├── sonnet001.txt
            ├── sonnet002.mp3
            ├── sonnet002.txt
            ├── sonnet003.mp3
            └── sonnet003.txt

where the ``config.txt`` config file reads:

.. literalinclude:: _static/execute_job_config.txt

will generate three tasks (``sonnet001``, ``sonnet002`` and ``sonnet003``),
output a SMIL file for each of them,
finally compress them in a ZIP file with the following structure:

.. code-block:: text

    .
    └── OEBPS
        └── Resources
            ├── sonnet001.smil
            ├── sonnet002.smil
            └── sonnet003.smil

Note that the paths in ``config.txt`` are relative to
(the directory containing) the ``config.txt`` file,
and that you can use the :data:`~aeneas.globalconstants.PPV_OS_TASK_PREFIX`
placeholder (``$PREFIX``) that will be replaced with the Task id.

XML Config File (``config.xml``)
--------------------------------
            
While ``config.txt`` is concise and easy to write,
it constraints all the tasks of the job to share the same
execution settings (language, output format, and so on).

If you need to specify different values for execution parameters
of different tasks, you must use an XML config file,
named ``config.xml``.

The following ``config.xml`` is equivalent to the example above:

.. literalinclude:: _static/execute_job_config_xml_1.txt
    :language: xml

Now note that ``config.xml`` allows you to bundle together
Tasks with different languages, output formats, etc.:

.. literalinclude:: _static/execute_job_config_xml_2.txt
    :language: xml