Back to list

The role of abstract feature sets in analysis and classification of phonation types in singing

Authors Bereuter, P.
Year 2021
Thesis Type Master's thesis
Topic Audio Signal Processing
Abstract The human voice apparatus is capable of producing phonation types with different timbral characteristics. These are perceived as distinct voice-qualities such as modal, breathy or pressed. In professional singing, these different phonation types are intentionally used to transport feelings or emotions, whereas the strenuous usage of unhealthy voice qualities should be minimized in order to reduce the risk of voice disorders. Therefore, professional singers in training still strongly rely on the feedback given to them by vocal coaches or experts. However, the advances in the field of speech signal processing, with regard to classification algorithms building on supervised or unsupervised learning, provide important tools to deepen and facilitate the feedback on sung phonation types. In contrast to established approaches, which require a separation of the source and filter signal, the novel approaches using machine learning techniques are mostly applied onto the sung vocal signals. This provides advantages when it comes to real-time applications and fundamental frequency dependence. Typically, the foundation of this machine learning based classification task is an abstract feature set, designed to provide a meaningful description of the voice qualities. The aim of this thesis is to shed light onto the role of these abstract feature sets in a classification task concerning phonation types in singing. The main focus lies on the Mel frequency cepstral coefficients (MFCCs), which are the most prominent features in speech signal processing. Different variations of MFCC feature sets are analyzed and evaluated with respect to their capabilities of phonation type classification. Additionally, the MFCCs’ development over time, their pitch dependence and the influence of modulating effects like the vibrato are analyzed. A more precise analysis of the relation between vibrato and voice quality is carried out with methods like the modulation power spectrum (MPS), yielding in an assessment of possible alternative vibrato based features that enable voice quality classification. Finally, the results of this work should reflect if the discussed features are able to contribute relevant information towards a real-time analysis environment, with the aim to provide professional singers with helpful feedback regarding their current sung voice quality.
Supervisors Sontacchi, A., Brandner, M.