Conclusions

When implementing our tests in MATLAB, we found that NMF is effective at separating different frequency drum components. It can separate vocals, but there is distortion in the output. It can separate vocals and bass decently, but fails at separating vocals and drums.

Regarding parameters, we found that a small factorization rank results in overly coarse separation, whereas a high factorization rank results in a less clear separation. The Hamming window consistently performed well above other window shapes. A greater number of FFT points and a greater overlap also improve separation results.

When implementing our tests in Python with Librosa, we found that both NMF and HPSS can separate drums well but struggle with bass. The output from using Mel spectrograms had noticeable distortions. However, when HPSS and NMF were combined together in that order, the results were improved significantly, and each source output contained little components of the other source. We compared our results with the output from Spleeter and found that while Spleeter produce better results, our results were qualitatively still fairly close to Spleeter's results.

We attempted to quantitatively evaluate the results but found that the values were opposite from our expectations. Future work would include trying SDR/SIR/SAR again but using museval, which source separation evaluation library that is more updated than mir_eval. We would also try these methods with more audio files and more sources.