Evaluation

When analyzing our source separation results, we started by only listening to the output. While this was a valuable qualitative assessment of our results, we also aimed to analyze the results quantitatively. Quantitative metrics allow a consistent comparison between the different methods we used, but they don't always capture subjective nuances that a human listener would notice. Therefore, we combined both qualitative and quantitative analyses to comprehensively evaluate the audio source separation algorithms.

The quantitative criteria we used were first introduced by Vincent et al. [2] in their article about blind audio source separation. As long as the true signal (referred to as the reference) and the reconstructed signal (referred to as the estimate) are provided, the performance of the technique can be evaluated. The estimated source can be decomposed as:

where the first term refers to the true source modified by a distortion, and the following three terms refer to the interference, noise, and artifact error terms respectively. The energy ratios are computed in decibels (dB) to evaluate the relative amount of each of these contributions in the estimated source. The larger the ratio, the better the performance. This results in four quantitative metrics.

The source-to-distortion ratio (SDR) measures how well the separation algorithm retained the true signal while minimizing distortions (e.g., time-invariant gains). This is the most general metric that is typically used as the primary measure of audio source separation performance. The SDR is defined as:
The source-to-interference ratio (SIR) measures how well the separation algorithm suppressed interference from other sources. The SIR is defined as:
The source-to-noise ratio (SNR) measures how much the separation algorithm has successfully isolated the true signal from unwanted noise. The SNR is defined as:
The source-to-artifact ratio (SAR) measures how much the separation algorithm has altered the true signal by introducing artifacts. The SAR is defined as:

Together, these metrics provide a comprehensive evaluation of an audio source separation algorithm. In this project, we report SDR, SIR, and SAR.

[2]