vad¶

This module contains the following classes:

VAD, a simple voice activity detector based on the energy of the 0-th MFCC.

Given an energy vector representing an audio file, it will return a boolean mask with elements set to True where speech is, and False where nonspeech occurs.

New in version 1.0.4.

class aeneas.vad.VAD(logger=None, rconf=None)[source]¶

The voice activity detector (VAD).

Parameters:	rconf (`RuntimeConfiguration`) – a runtime configuration logger (`Logger`) – the logger object

run_vad(wave_energy, log_energy_threshold=None, min_nonspeech_length=None, extend_before=None, extend_after=None)[source]¶

Compute the time intervals containing speech and nonspeech, and return a boolean mask with speech frames set to True, and nonspeech frames set to False.

The last four parameters might be None: in this case, the corresponding RuntimeConfiguration values are applied.

Parameters:

wave_energy (numpy.ndarray (1D)) – the energy vector of the audio file (0-th MFCC)
log_energy_threshold (float) – the minimum log energy threshold to consider a frame as speech
min_nonspeech_length (int) – the minimum length, in frames, of a nonspeech interval
extend_before (int) – extend each speech interval by this number of frames to the left (before)
extend_after (int) – extend each speech interval by this number of frames to the right (after)

Return type:

numpy.ndarray (1D)