BD.20.005 – InDeep: Interpreting Deep Learning Models for Text and Sound

Route: Creating Value through responsible access to and use of big data

Cluster question: 112 Can we use Big Data and Big Data collection to define values, generate insights, and get answers?

The combination of Deep Learning and Big Data has revolutionized language and speech technology in the last five years, and constitutes the state of the art in domains ranging from machine translation and question-answering to speech recognition and music analysis. These models are often now so accurate that many new useful applications are being discovered with potentially significant impacts on individuals, businesses and society. Alongside that power and popularity, new responsibilities and questions arise: how do we ensure reliability, avoid undesirable biases, and provide insights into how a system arrives at a particular outcome? How do we leverage domain expertise and user feedback to improve the models even further? In all these issues, interpretability of the deep learning models is key. In the proposed project, pioneering researchers in the domain of interpretability of deep learning models of text, language, speech and music come together. They collaborate with companies and not-for-profit institutions working with language, speech and music technology to develop applications that help assess the usefulness of various interpretability techniques on a range of different tasks. In justification tasks, we look at how interpretability techniques help give users meaningful feedback. Examples include fraud detection from large email collections, legal and medical document text mining, and audio search. In augmentation tasks we look at how these techniques facilitate the use of domain knowledge and models from outside deep learning to make the models perform even better. Examples include machine translation, music recommendation, and speech recognition. In interaction tasks we allow users to influence the functioning of their automated systems, by providing both interpretable information on how the system operates, and letting human-produced output find its way into the internal states of the learning algorithm. Examples include adapting speech recognition to non-standard accents and dialects, interactive music generation, and machine assisted translation.


Automatic Speech Recognition, Big Data, Computational Musicology, Explainable AI, Natural Language Processing, Responsible AI

Other organisations

Rijksuniversiteit Groningen (RUG), RUN, TU, Vrije Universiteit Amsterdam (VU)


Organisation University of Amsterdam (UvA)
Name Dr. W. (Willem) Zuidema