adjustboundaryalgorithm

This module contains the following classes:

Warning

This module is likely to be refactored in a future version

class aeneas.adjustboundaryalgorithm.AdjustBoundaryAlgorithm(rconf=None, logger=None)[source]

Enumeration and implementation of the available algorithms to adjust the boundary point between two consecutive fragments.

Parameters:
Raises:

TypeError: if one of boundary_indices, real_wave_mfcc, or text_file is None or it has a wrong type

AFTERCURRENT = 'aftercurrent'

Set the boundary at value seconds after the end of the current fragment, if the current boundary falls inside a nonspeech interval. If not, no adjustment is made.

Example (value 0.200 seconds):

Comparison between AUTO labels and AFTERCURRENT labels with 0.200 seconds offset
ALLOWED_VALUES = ['aftercurrent', 'auto', 'beforenext', 'offset', 'percent', 'rate', 'rateaggressive']

List of all the allowed values

AUTO = 'auto'

Auto (no adjustment).

Example:

The AUTO method does not change the time intervals
BEFORENEXT = 'beforenext'

Set the boundary at value seconds before the beginning of the next fragment, if the current boundary falls inside a nonspeech interval. If not, no adjustment is made.

Example (value 0.200 seconds):

Comparison between AUTO labels and BEFORENEXT labels with 0.200 seconds offset
OFFSET = 'offset'

Offset the current boundaries by value seconds. The value can be negative or positive.

Example (value -0.200 seconds):

Comparison between AUTO labels and OFFSET labels with value -0.200

Example (value 0.200 seconds):

Comparison between AUTO labels and OFFSET labels with value 0.200

New in version 1.1.0.

PERCENT = 'percent'

Set the boundary at value percent of the nonspeech interval between the current and the next fragment, if the current boundary falls inside a nonspeech interval. The value must be an integer in [0, 100]. If not, no adjustment is made.

Example (value 25 %):

Comparison between AUTO labels and PERCENT labels with value 25 %

Example (value 50 %):

Comparison between AUTO labels and PERCENT labels with value 50 %

Example (value 75 %):

Comparison between AUTO labels and PERCENT labels with value 75 %
RATE = 'rate'

Adjust boundaries trying to respect the value characters/second constraint. The value must be positive. First, the rates of all fragments are computed, using the current boundaries. For those fragments exceeding value characters/second, the algorithm will try to move the end boundary forward, so that its time interval increases (and hence its rate decreases). Clearly, it is possible that not all fragments can be adjusted this way: for example, if you have three consecutive fragments exceeding value, the middle one cannot be stretched.

Example (value 13.0, note how f000003 is modified):

Comparison between AUTO labels and RATE labels with value 13.0
RATEAGGRESSIVE = 'rateaggressive'

Adjust boundaries trying to respect the value characters/second constraint, in aggressive mode. The value must be positive. First, the rates of all fragments are computed, using the current boundaries. For those fragments exceeding value characters/second, the algorithm will try to move the end boundary forward, so that its time interval increases (and hence its rate decreases). If moving the end boundary is not possible, or it is not enough to keep the rate below value, the algorithm will try to move the begin boundary back; this is the difference with the less aggressive RATE algorithm. Clearly, it is possible that not all fragments can be adjusted this way: for example, if you have three consecutive fragments exceeding value, the middle one cannot be stretched.

Example (value 13.0, note how f000003 is modified):

Comparison between AUTO labels and RATEAGGRESSIVE labels with value 13.0

New in version 1.1.0.

adjust(aba_parameters, boundary_indices, real_wave_mfcc, text_file, allow_arbitrary_shift=False)[source]

Adjust the boundaries of the text map using the algorithm and parameters specified in the constructor, storing the sync map fragment list internally.

Parameters:
  • aba_parameters (dict) – a dictionary containing the algorithm and its parameters, as produced by aba_parameters() in TaskConfiguration
  • boundary_indices (numpy.ndarray (1D)) – the current boundary indices, with respect to the audio file full MFCCs
  • real_wave_mfcc (AudioFileMFCC) – the audio file MFCCs
  • text_file (TextFile) – the text file containing the text fragments associated
  • allow_arbitrary_shift (bool) – if True, allow arbitrary shifts when adjusting zero length
Return type:

list of SyncMapFragmentList

append_fragment_list_to_sync_root(sync_root)[source]

Append the sync map fragment list to the given node from a sync map tree.

Parameters:sync_root (Tree) – the root of the sync map tree to which the new nodes should be appended
intervals_to_fragment_list(text_file, time_values)[source]

Transform a list of at least 4 time values (corresponding to at least 3 intervals) into a sync map fragment list and store it internally. The first interval is a HEAD, the last is a TAIL.

For example:

time_values=[0.000, 1.000, 2.000, 3.456] => [(0.000, 1.000), (1.000, 2.000), (2.000, 3.456)]
Parameters:
  • text_file (TextFile) – the text file containing the text fragments associated
  • time_values (list of TimeValue) – the time values
Raises:

TypeError: if text_file is not an instance of TextFile or time_values is not a list

Raises:

ValueError: if time_values has length less than four