Whale Detection

To complement our experiments on bird calls, we will explore detecting whales in underwater acoustic recordings. In addition to applications in biodiversity assessment, this can be used to warn vessels about nearby whales and avoid collisions.

Task

Given an audio recording, the system should output all time instances at which whale calls are present. For this study, we constrain the task to tonal calls of the North Atlantic blue whale (Balaenoptera musculus).

Approach

We obtained an annotated dataset from a scientific challenge on whale detection. It includes 23 recordings from 6 different recording sites, totalling in 1055 hours of audio to learn from. Each recording is labeled with the starting and ending position of blue whale tonal calls and right whale up-calls, but we will focus on the blue whales.

The following is an excerpt of a recording done in February 2016 in the Western North Atlantic at a depth of 850 metres. It includes a spectrogram detailing which frequencies (up to 1 kHz) are active at which time. Below the spectrogram, the positions of two whale calls are marked, as detected by a human expert. Feel free to listen to the excerpt and try if you can hear anything.

 

Apart from some noise, there is nothing special to be heard, and also no visible difference in the spectrogram between the whale calls and everything else. This is because the whale calls are in the infrasound range: The frequency is too low to be audible by humans, and too low to be visible in a spectrogram covering 1000 Hz in 513 rows of pixels. To be able to hear the whale calls, we can speed up the playback. At a factor of 22, this reduces 5 minutes to about 14 seconds, and makes low pitches audible:

 

Indeed, we can now hear two faint whale calls above the noise. To make it easier, we can silence all frequencies above and below the whale calls with a bandpass filter:

 

The visual equivalent of this bandpass filter is to compute the spectrogram only for a limited range, with higher resolution (here, from 15 to 20 Hz in 20 pixels). This is shown above the recording. Looking and listening closely, it seems there is a third whale call in the beginning that was not annotated by the human expert. Such annotation errors will make it more difficult to learn from the data.

Results

This is a work in progress to be picked up later.

Get In Touch

This project merely serves to demonstrate what is possible with current technology. If you work on a related problem — be it in academia or industry — we are highly interested in hearing from you! Please send us an email or give us a call and we will figure out a way to collaborate.