Audio Signal Processing

Audio-Signal Processing

The audio-signal processing group at IEM is in particular concentrating on sound analysis, sound modelling and the extraction of musical or speech-relevant features and characteristics. The topic comprises methods of time-frequency processing, multi-rate processing and adaptive filtering.

Audio-Signal Processing

An overarching aim of audio signal processing is the enhancement of audio material or the optimisation of acoustic circumstances for users and receivers. Examples of this are the suppression of interfering signals, active noise reduction and the concealment of short signal drop-outs in the case of digital transmissions. In general, one can distinguish between measures regarding the transmitting end and those regarding the receiving end. In the first case, interfering signals are suppressed; in the second, noise is reduced acoustically by feeding in destructive, interfering sound (at the transmitting/ receiving end). In the case of drop-out concealment, signal characteristics are analysed continuously in real time in order to appropriately replace the missing or interfered signal in temporary signal drop-outs. What is common to all three applications is that sounds are analysed and resynthesised later. In any case, all three approaches make use of the theory of adaptive filtering.

As regards the analysis of a piece of music with regard to content, we go one step further. Since the relevant information of a piece of music is first and foremost its melody, harmony and rhythm, these characteristics are extracted using appropriate signal parameters. In this way, for example, the periodicity residing in the signal can be used to determine the fundamental frequency. Musical information retrieval (MIR) does not only comprise the automatic transcription of the piece, but can also be used for applications such as AutoDJ or AutoKaraoke. Furthermore, the whole musical structure of a piece can be captured, for example, to mark the beginning and end of a refrain.

Besides the analysis of individual songs, the determination of the similarity between two pieces of music is important for classification into musical categories. Whether it is a question of predefined genre (classical, jazz or rock) or meta-terms (up-tempo, laid-back) is irrelevant. One possible application for this is in the automatic creation of play lists for large music collections.

The determination of meaningful signal parameters is also of interest regarding the spoken word. At the IEM, we are currently working on finding interrelations between diction and the emotional state of the speaker, and thus using the melodic-rhythmic characteristics of speech (prosody) to recognise stress.

The methods of analysis, transformation and resynthesis used (such as filter banks, transformations, additive synthesis, etc) are not only a means to an end but are also being scientifically researched in their own right. Examples of this include multi-rate signal processing, time-frequency processing and sinusoidal modelling.

Publications and Documents