Back to list

Singing Voice Extraction from 2-Channel Polyphonic Musical Recordings

Authors Rieck, S.
Year 2012
Thesis Type Diploma thesis
Topic Audio Signal Processing
Keywords audio recording
Abstract This thesis deals with the extraction of singing voice signals from 2-channel polyphonic musical recordings. The proposed method consists of 2 steps. First, the assumption is used that singing voice is very likely positioned in the center of the stereo panorama. Using a similarity measure, this „center“ part is extracted and the resulting monophonic signal represents the basis for the subsequent processing. Second, voiced singing voice and unvoiced singing voice are extracted separately and summed up in a final step, to form the extracted singing voice. The extraction of the voiced singing voice is realized by detecting the fundamental frequency f0 of singing voice, along with its corresponding partials. Using a sinusoidal model, all partial frequencies are then synthesized. The extraction of the unvoiced singing voice is realized by segmenting the monophonic signal in Time-Frequency-Units. Those that can be associated with singing voice are extracted. Preprocessing the stereophonic recording, improves the accuracy of the voiced singing voice extraction process by 12%, and the accuracy in extracting the unvoiced singing by 7%. The detection of the singing voice f0 is based on the Diploma Thesis of A. Rahimzadeh. We propose modifications and results show, that the average accuracy in detecting the f 0 is improved by 16%. The proposed method to extract singing voice has been evaluated using blind source separation performance measures and yields a average Source to Distortion Ratio of 35.1dB, which is an improvement of 5-10dB compared to state of the art methods. The average Source to Distortion Ratio results in -2.4dB.
Supervisors Sontacchi, A.