BD.20.012 – Conversational Intelligence in Dutch

Route: Creating Value through responsible access to and use of big data

Cluster question: 113 Can we develop human language technology (HLT) that allows us to communicate with our computers (smartphones, tablets)?

Access to information is increasingly conversational in nature. Task-oriented dialogue systems help users achieve a specific task through conversations, e.g., at helpdesks, when booking a trip, as shopping assistant. State-of-the-art methods underlying task-oriented dialogue systems are typically developed in a data-intensive manner, using massive volumes of labeled data and unlabeled conversations. This is effective languages such as English and Chinese, but for languages such as Dutch, such data sources are simply not available, thus hindering the development of Dutch language conversational assistance that offer state-of-the-art performance. How can we develop effective conversational assistants for the Dutch language with limited data? We propose to develop, implement and evaluate a diverse range of data augmentation techniques that can exploit existing language resources in the Dutch language and other languages to generate a sufficient volume of training material to achieve state-of-the-art performance in conversational assistance for the Dutch language. In particular, we seek to design methods for transfer learning (across languages), for weak supervision, and for developing simulators that support complementary data augmentation strategies. In addition, we aim to refine the dominant pre-train-and-refine method to support rapid adaptation to different domains (e.g., food, electronics, fashion).To inform the development and evaluation of the proposed data augmentation techniques in real-world scenarios, we seek to set up collaborations between researchers, developers, and practitioners who deploy conversational technologies in retail in the Netherlands, in Dutch. The choice for the retail domain is motivated by the potential for impact (reaching millions of people in the Netherlands), the rich diversity in domains (thus providing a challenging testing ground for transfer learning methods), and the urgency felt in the domain.

Keywords

Conversational systems, Data augmentation, Information access, Language technology

Other organisations

Albert Heijn B.V., Bol.com B.V., Mastercard Netherlands B.V., Thuiswinkel.org

Submitter

Organisation Universiteit van Amsterdam (UvA)
Name prof. dr. M. (Maarten) de Rijke
E-mail m.derijke@uva.nl
Website https://www.uva.nl/profiel/r/i/m.derijke/m.derijke.html?cb