The Repository @ St. Cloud State

Open Access Knowledge and Scholarship

  •  
  •  
 

SCSU Journal of Student Scholarship

SCSU Journal of Student Scholarship

Faculty Mentor(s)

Dr. Ming Ma, Winona State University

Abstract

Decades of music exists in the world with no way to find the original multitrack recordings that they are composed of, whether it be a lack of access or just being lost to time. Music source separation is the process of using technology to isolate the individual components of a song after they have been fully arranged and mixed together. These components, or stems, can be the vocals, drums, bass, or any other instrument found in the song. By doing complex analysis on the structure of songs to determine which frequencies or waveform patterns belong to particular sources, a convolutional neural network can be trained to cleanly separate these sources from the full mix. Methods to perform this separation include the use of a spectrogram or a waveform. We examine the existing methods used by Spleeter, Demucs, and f90 Wave-U-Net. In the first phase of the experiment we implement three different U-Net architectures used by these methods using both one-dimensional and two-dimensional convolution. By examining the metrics of SDR (Signal Distortion Ratio), SAR (Signal to Artifacts Ratio) and ISR (Source Image to Spatial Distortion Ratio), we were able to determine the best architecture. From this architecture we move to phase two and implement a secondary U-Net to aid in further refining the results.

Share

COinS