[aubio-user] Recommendation for a method to detect "silence" intervals

Tue Dec 20 02:18:20 CET 2016

Hi Lukasz, hi all,

On 12/15/2016 06:33 PM, Lukasz Tracewski wrote:
> Hi everyone,
> 
> 
> I am looking for an advice on how to best implement in Python adaptive
> "silence" detection. By silence I mean here regions where only
> background noise is present. The goal is to be able to analyse the
> spectrum in a subsequent step and do spectral subtraction on the whole
> sample to reduce noise.

This classification task, which might sound simple, can end up being
somewhat complex. The first thing to do would be to describe precisely
what should be considered as 'non-silence', and what should be
considered as 'silence'.

> Roughly three years ago I successfully used aubio to create a tool
> <https://github.com/tracek/Ornithokrites> for automatic bird calls
> identification (thanks again Paul!) and was faced with exactly the same
> challenge. At that time I simply was taking two consecutive onsets
> (calculated with "energy" method) and if the distance between them was
> "large enough" I would take it with some buffer and call "silence". This
> is of course very naive method and I would like to improve it. 

Yes, I remember! I wrote a blog post telling about the story here:

    https://aubio.org/news/20141129-2346_kiwis

> Can you offer some advice how to best do this?

An empirical approach, which can still be very efficient, would be to
use one or several descriptors to characterize short time segments (for
instance 256 or 2048 samples) then, based on the value of these
descriptors, decide in which class a new segment belongs.

Spectral shape descriptors could be a good starting point to searcj
useful descriptors. If the non-silent sounds you are looking for are
harmonic, the confidence of yin or yinfft could also be a good descriptor.

    https://aubio.org/doc/latest/specdesc_8h.html
    https://aubio.org/doc/latest/pitch_8h.html

You could then use a simple classifier such as SVM to predict the class
of a given segment.

A modern approach would consist of building a machine learning algorithm
(for instance a neural network), train it on a database of manually
annotated recordings, and then use it to classify new sound segments.
The algorithm could use as input a set of descriptors, or directly
spectral data, or even directly the raw signal data.

> Best regards,
> 
> Lucas
> 
> 
> P.S. Thanks Paul for continuous work on improving the library! Python 3
> support is very much welcomed.

You're welcome! Continuous integration at travis and co should now help
ensuring aubio compiles and runs fine on the most common configurations
for different platforms, including python 2.7 and python 3.x.

Best wishes,
Paul