vad¶
This module contains the following classes:
VAD, a simple voice activity detector based on the energy of the 0-th MFCC.
Given an energy vector representing an audio file,
it will return a boolean mask
with elements set to True where speech is,
and False where nonspeech occurs.
New in version 1.0.4.
-
class
aeneas.vad.VAD(logger=None, rconf=None)[source]¶ The voice activity detector (VAD).
Parameters: - rconf (
RuntimeConfiguration) – a runtime configuration - logger (
Logger) – the logger object
-
run_vad(wave_energy, log_energy_threshold=None, min_nonspeech_length=None, extend_before=None, extend_after=None)[source]¶ Compute the time intervals containing speech and nonspeech, and return a boolean mask with speech frames set to
True, and nonspeech frames set toFalse.The last four parameters might be
None: in this case, the corresponding RuntimeConfiguration values are applied.Parameters: - wave_energy (
numpy.ndarray(1D)) – the energy vector of the audio file (0-th MFCC) - log_energy_threshold (float) – the minimum log energy threshold to consider a frame as speech
- min_nonspeech_length (int) – the minimum length, in frames, of a nonspeech interval
- extend_before (int) – extend each speech interval by this number of frames to the left (before)
- extend_after (int) – extend each speech interval by this number of frames to the right (after)
Return type: numpy.ndarray(1D)- wave_energy (
- rconf (