Bird Detection

As a first step, we investigated detecting whether a given audio recording contained any bird calls, as opposed to other sound or even human bird imitations. This study actually preceded the aMOBY project, but forms its starting point.

Task

Formally, the task is framed as follows: Given a short audio recording (e.g., 10 seconds), determine if there is any audible bird vocalization — no matter whether it is in the foreground or background, and no matter which species. This task can be challenging even for humans.

Try for yourself: Do the following examples contain bird calls or songs?

  1. Click to reveal solution
  2. Click to reveal solution
  3. Click to reveal solution

Approach

We obtained an annotated dataset from a scientific challenge on bird audio detection that we participated in. It contains recordings from three different sources:

Field recordings from freesound.orga park with trees
Mobile phone recordings from Warblra smartphone with the Warblr logo on screen
Remote recordings from Chernobyltrain bridge near the Chernobyl nuclear power plant; photograph by Pawel Szubert released under CC-BY-SA 3.0

As a first step, we computed spectrograms from the recordings, a two-dimensional image representation of the sound (see below for examples). We then trained Convolutional Neural Networks (CNNs) to distinguish spectrograms that contain birds from spectrograms that do not. Specifically, the training set consisted of almost 16,000 short recordings (44 hours) taken from freesound and Warblr, each with a single human annotation. During training, the parameters of the CNN — which mostly performs a series of simple image operations — are adapted to minimize the number and severity of mistakes made on the training examples.

Results

To check how well the system works for data it has not seen during training, we compute its predictions on the test set of 8620 recordings (24 hours) taken from Chernobyl and Warblr. It achieves an AUC of 89%, enough to win the public challenge. A major obstacle is that no Chernobyl recordings are available for training, and it is difficult for the network to cope with unseen recording conditions. When testing on data from freesound and Warblr (on examples that were withheld from training), we obtain 97.5% AUC.

For additional insights, we will revisit the three audio examples, starting with the first:

 

The topmost panel shows a spectrogram of the recording. Time processes from left to right, frequency (pitch) increases from bottom to top, and the brightness indicates the loudness of a specific frequency at a specific time. We can see that the recording is noisy, giving it a “sandy” appearance, and a bit louder at the lowest and softer at the highest frequencies.
Applying the CNN's transformations optimized to detect birds, we obtain a prediction curve shown in the second panel. For each position, it gives a value between 0 and 1, where 1 indicates a confident bird detection. We see that it detected the first call, but hardly reacted for the others. To answer whether the recording contains a bird, we would take the maximum value of all points in the curve.
Finally, the third panel shows which positions in the spectrogram influence the predictions the most — those are likely bird calls. Listening again, they are correct, but some faint calls are missing.

 

In the second example, the CNN correctly detects the bird call audible in the background at the beginning of the file, and also correctly ignores the human whistling as to imitate a bird.

 

The tree frog in the third example is completely ignored by the CNN. We may have been lucky here, as the training data probably did not contain any such frogs. Apparently, the training set does not contain similarly-sounding birds either, or has similar-sounding counter-examples that cause it to clearly reject the frog.

Demonstration

Please try our bird detection browser demo to run our bird detector on some audio recordings from your computer or even record yourself imitating a bird and see if you can fool the detector.

Further Reading

For a detailed description of our work on bird detection, please refer to the following scientific publication:

Thomas Grill and Jan Schl├╝ter: Two Convolutional Neural Networks for Bird Detection in Audio Signals. In Proceedings of the 25th European Signal Processing Conference (EUSIPCO), Kos Island, Greece, 2017. (PDF file)

The source code for the first system described in this paper is available online.

Get In Touch

This project merely serves to demonstrate what is possible with current technology. If you work on a related problem — be it in academia or industry — we are highly interested in hearing from you! Please send us an email or give us a call and we will figure out a way to collaborate.