Using open source tools to analyse and recognise sounds
2021-11-14, 11:30–12:00 (Europe/Athens), Room 1

This presentation demonstrates how signal processing, machine and deep learning can be utilised to build applications that analyse and recognise sounds. Apart from a brief intro to the basics of speech and audio processing, it examines a range of open source tools and libraries for building audio analysis applications.


The end users of modern applications have experienced the burst of Deep Learning through different types of media: Textual information is probably the most common (web search, profiling, recommender systems, social media analytics etc). Visual information has also tons of DL applications (face recognition, image retrieval and recognition, deep fakes, smart video editing, etc). Sensorial data also feeds various DL-driven applications, such as smart cities, autonomous vehicles.

In this presentation we focus on the modality of audio and we describe how to analyse and recognise sounds using modern ML and DL methods. This concerns many application domains such as: automatic speech recognition (speech to text), speaker recognition (verification and identification), speech emotion recognition, music information retrieval and auditory scene analysis. Apart from a brief intro to the basics of speech and audio processing, we examine a range of open source tools and libraries for building audio analysis applications. Some examples will be demonstrated during the presentation.

The presenter is the author of pyAudioAnalysis (https://github.com/tyiannak/pyAudioAnalysis), one of the most widely used open source Python libraries for audio recognition and several other libraries in the field, such as deep-audio-features (https://github.com/tyiannak/deep_audio_features) and paura (https://github.com/tyiannak/paura).

See also: Presentation

Dr Theodoros Giannakopoulos was born in Athens, Greece, in 1980. He received the Degree in Informatics and Telecommunications from the University of Athens (UOA), Athens, Greece, in 2002, the M.Sc. (Honors) Diploma in signal and image processing from the University of Patras, Patras, Greece, in 2004 and his Ph.D. in the field of Multimodal Machine Learning from the department of Informatics and Telecommunications, UOA, in 2009. He is the coauthor of more than 100 publications in journals and conferences in the fields of pattern recognition and multimedia analysis and the coauthor of a book titled "Introduction to Audio Analysis: A MATLAB Approach". He is an active member of the open source community, author of the pyAudioAnalysis and deep_audio_features libraries, and he is the top Python contributor in Greece and in the top 0.1% worldwide. He is currently a Tenured Researcher Institute of Informatics and Telecommunications, NCSR “Demokritos”, Greece. He has several years of experience in tutoring, mostly in Master Programs organized by NCSR Demokritos, courses such as: Machine Learning, Deep Learning, Data Programming and Multimodal Data Analysis. His research interests lie in the fields of multimodal machine learning, music information retrieval and speech analytics.
Website: http://tyiannak.github.io