This module contains the following classes:

  • VAD, a simple voice activity detector based on the energy of the 0-th MFCC.

Given an energy vector representing an audio file, it will return a boolean mask with elements set to True where speech is, and False where nonspeech occurs.

New in version 1.0.4.

class aeneas.vad.VAD(logger=None, rconf=None)[source]

The voice activity detector (VAD).

run_vad(wave_energy, log_energy_threshold=None, min_nonspeech_length=None, extend_before=None, extend_after=None)[source]

Compute the time intervals containing speech and nonspeech, and return a boolean mask with speech frames set to True, and nonspeech frames set to False.

The last four parameters might be None: in this case, the corresponding RuntimeConfiguration values are applied.

  • wave_energy (numpy.ndarray (1D)) – the energy vector of the audio file (0-th MFCC)
  • log_energy_threshold (float) – the minimum log energy threshold to consider a frame as speech
  • min_nonspeech_length (int) – the minimum length, in frames, of a nonspeech interval
  • extend_before (int) – extend each speech interval by this number of frames to the left (before)
  • extend_after (int) – extend each speech interval by this number of frames to the right (after)
Return type:

numpy.ndarray (1D)