Python NMF/HPSS Results

For this part of the project, librosa was used to analyze the bass and drums of an audio file called Music Delta - Disco. The individual bass and drums files were added together, and spectrograms and Mel spectrograms were created. The spectrogram, Mel spectrogram, original bass file, and original drums file are shown below:

bass

00:00 / 02:04

drums

00:00 / 02:04

Different separation techniques were performed on each of the spectrograms.

NMF

In performing NMF, we used the following parameters:

n_components: 12
init: nndsvda
solver: mu
beta_loss: kullback-leiber
max_iter: 500
random_state: 0
l1_ratio: 0.1

After reconstruction, we obtained the following spectrograms:

We obtained the following audio files:

bass_nmf

00:00 / 00:09

drums_nmf

00:00 / 00:09

bass_nmf_mel

00:00 / 00:09

drums_nmf_mel

00:00 / 00:09

NMF seemed to separate drums very well from the audio. However, we noticed that there were noticeable drum components present in the bass output. The bass output was capturing some of the lowest and highest frequency drum sounds.

HPSS

In performing HPSS, we used a power of 1.5 and also used soft masks. After reconstruction, we obtained the following spectrograms:

We obtained the following audio files:

bass_hpss

00:00 / 00:09

drums_hpss

00:00 / 00:09

bass_hpss_mel

00:00 / 00:09

drums_hpss_mel

00:00 / 00:09

HPSS seemed to also separate drums well. However, it was better than NMF at attributing the lower frequency drum components to the drum output correctly. The higher frequency snares were still present in the bass output.

HPSS and NMF

To attempt at improving our results, we first performed HPSS and then performed NMF on its output. After reconstruction, we obtained the following spectrograms:

We obtained the following audio files:

bass_hpss_nmf

00:00 / 00:09

drums_hpss_nmf

00:00 / 00:09

bass_hpss_nmf_mel

00:00 / 00:09

drums_hpss_nmf_mel

00:00 / 00:09

The bass and drums output using regular spectrograms sounded very well separated. However, the bass output was very quiet, even with a gain applied. Additionally, the Mel spectrogram output had significant distortion in its sound.

Spleeter

Spleeter was used to separate the mixed audio as well. We obtained the following audio files:

bass_spleeter

00:00 / 00:10

drums_spleeter

00:00 / 00:10

As expected, Spleeter's bass and drums outputs very accurately represented the original bass and drums input. We noticed that the bass output was also quieter than the original bass input, which indicates that bass sounds are generally very difficult to separate.