dtw¶
This module contains the implementation of dynamic time warping (DTW) algorithms to align two audio waves, represented by their Mel-frequency cepstral coefficients (MFCCs).
This module contains the following classes:
DTWAlgorithm, an enumeration of the available algorithms;DTWAligner, the actual wave aligner;DTWExact, a DTW aligner implementing the exact (full) DTW algorithm;DTWStripe, a DTW aligner implementing the Sachoe-Chiba band heuristic.
To align two wave files:
- build an
DTWAlignerobject, passing in the constructor the paths of the two wave files or their MFCC representations; - call
compute_path()to compute the min cost path between the MFCC representations of the two wave files.
Warning
This module might be refactored in a future version
-
class
aeneas.dtw.DTWAlgorithm[source]¶ Enumeration of the DTW algorithms that can be used for the alignment of two audio waves.
-
ALLOWED_VALUES= ['exact', 'stripe']¶ List of all the allowed values
-
EXACT= 'exact'¶ Classical (exact) DTW algorithm.
This implementation has
O(nm)time and space complexity, wheren(respectively,m) is the number of MFCC window shifts (vectors) of the real (respectively, synthesized) wave.
-
STRIPE= 'stripe'¶ DTW algorithm restricted to a stripe around the main diagonal (Sakoe-Chiba Band), for reducing memory usage and run time.
Note that this is an heuristic approximation of the optimal (exact) path.
This implementation has
O(nd)time and space complexity, wherenis the number of MFCC window shifts (vectors) of the real wave, anddis the number of MFCC window shifts corresponding to the margin.
-
-
class
aeneas.dtw.DTWAligner(real_wave_mfcc=None, synt_wave_mfcc=None, real_wave_path=None, synt_wave_path=None, rconf=None, logger=None)[source]¶ The audio wave aligner.
The two waves, henceforth named real and synthesized, can be passed as
AudioFileMFCCobjects or as file paths. In the latter case, MFCCs will be extracted upon object creation.Parameters: - real_wave_mfcc (
AudioFileMFCC) – the real audio file - synt_wave_mfcc (
AudioFileMFCC) – the synthesized audio file - real_wave_path (string) – the path to the real audio file
- synt_wave_path (string) – the path to the synthesized audio file
- rconf (
RuntimeConfiguration) – a runtime configuration - logger (
Logger) – the logger object
Raises: ValueError: if
real_wave_mfccorsynt_wave_mfccis notNonebut not of typeAudioFileMFCCRaises: ValueError: if
real_wave_pathorsynt_wave_pathis notNonebut it cannot be read-
compute_accumulated_cost_matrix()[source]¶ Compute the accumulated cost matrix, and return it.
Return
Noneif the accumulated cost matrix cannot be computed because one of the two waves is empty after masking (if requested).Return type: numpy.ndarray(2D)Raises: RuntimeError: if both the C extension and the pure Python code did not succeed. New in version 1.2.0.
-
compute_boundaries(synt_anchors)[source]¶ Compute the min cost path between the two waves, and return a list of boundary points, representing the argmin values with respect to the provided
synt_anchorstimings.If
synt_anchorshaskelements, the returned array will havek+1elements, accounting for the tail fragment.Parameters: synt_anchors (list of TimeValue) – the anchor time values (in seconds) of the synthesized fragments, each representing the begin time in the synthesized wave of the corresponding fragmentReturn the list of boundary indices.
Return type: numpy.ndarray(1D)
-
compute_path()[source]¶ Compute the min cost path between the two waves, and return it.
Return the computed path as a tuple with two elements, each being a
numpy.ndarray(1D) ofintindices:([r_1, r_2, ..., r_k], [s_1, s_2, ..., s_k])
where
r_iare the indices in the real wave ands_iare the indices in the synthesized wave, andkis the length of the min cost path.Return
Noneif the accumulated cost matrix cannot be computed because one of the two waves is empty after masking (if requested).Return type: tuple (see above) Raises: RuntimeError: if both the C extension and the pure Python code did not succeed.
- real_wave_mfcc (