MATLAB NMF Results

We used NMF and experimented with different parameters and audios. The parameters changed were:

Window Length - Length of the window doing Short Time Fourier Transform (STFT) on
Window Shape - What window is used in STFT, like Hamming, Hann, Kaiser, Triangular, Rectangular
Noverlap - Number of points that overlap from one window to the next
Nfft - Number of points in FFT
Rank of the factorization - How many sources we want NMF produce

We experimented with different audios: two-note instruments, 20-second drums, boy-man conversation, and different mixings among drums, vocals, guitar, bass.

Two-note instruments

We made two instruments each playing 2 separate notes in Logic (music software). We expected NMF to work really well, as 1) instruments greatly varied in frequency range, 2) each of them were playing one note, so it's easy to differentiate frequencies, and 3) each of them were playing notes at different times.

Results

All the combinations we tried have very similar results. You can very clearly hear that when the bass note stops or reduces in volume, so does the higher frequency component. It clearly demonstrates how NMF works.

Best Results

two notes two instruments audio.wav_source__wnlen=4096_noverlap=4000_nfft=16384__1

00:00 / 00:14

two notes two instruments audio.wav_source__wnlen=4096_noverlap=4000_nfft=16384__2

00:00 / 00:14

20-second drums

We made a shorter version of just drums based on drums of a song.

Results

We felt like the "stuttering" effect was from the window length STFT.

Best Results

20_seconds_drums.wav_source__wnlen=8192_noverlap=6000_nfft=16384__1

00:00 / 00:22

20_seconds_drums.wav_source__wnlen=8192_noverlap=6000_nfft=16384__2

00:00 / 00:22

Boy-man conversation

We found online an audio of a boy and an old man having a conversation. We thought this would be better as they're talking at different times (but we were very wrong).

There is noise in between talkers which could disrupt/affect NMF algorithm negatively.

There's a weird modulation effect when window length is small. Humans can detect voices and words easily so the bar of source separating speakers is lower.

Best Results

boy_man_talking.wav_source__wnlen=8192_noverlap=4096_nfft=65536__1

00:00 / 01:49

boy_man_talking.wav_source__wnlen=8192_noverlap=4096_nfft=65536__2

00:00 / 01:49

"Nobody" stitch up bass & vocal

Song called "Nobody". We just combined bass and vocals since in theory, they wouldn't have overlapping frequency range. It did better than expected.

Best Results

nobody_stitch_up__bass_vocal.wav_source__wnlen=16384_noverlap=4096_nfft=65536__1

00:00 / 00:50

nobody_stitch_up__bass_vocal.wav_source__wnlen=16384_noverlap=4096_nfft=65536__2

00:00 / 00:50

"Nobody" stitch up drums & vocal

Song called "Nobody". We just combined drums and vocals since in theory, NMF is really good at drums, so we tried with vocals.

The snare directly is the same frequency as the vocal. When one audio has really good vocals, the other source audio sounds bad.

Best Result

Bobby Nobody - stitch up drums-vocals.wav_source__wnlen=1024_noverlap=512_nfft=2048__1

00:00 / 03:40

Bobby Nobody - stitch up drums-vocals.wav_source__wnlen=1024_noverlap=512_nfft=2048__2

00:00 / 03:40

Other Findings

Impact of Factorization Rank: A small rank results in overly coarse separation, where each component retains a mix of various elements, leading to an indistinct separation. Conversely, a high rank causes the gain of each component to diminish significantly, amplifying artifacts and reducing the clarity of separation.
Window Shape Comparison: After testing several window shapes, the Hamming window consistently delivered the best performance. Its smoothing properties appear to enhance the quality of the separation.
Effect of Input Audio Composition: Mixed audio that lacks vocals generally yields better separation results, likely due to reduced complexity in the signal.
Influence of FFT Points and Overlap: Our experiments revealed a general trend: higher FFT points and greater overlap (Noverlap) lead to improved separation results. These parameters enhance frequency resolution and temporal continuity, contributing to higher-quality outputs.

These findings show the importance of carefully balancing parameters and choosing suitable preprocessing methods to achieve optimal results in audio separation using NMF.