Blog
The latest from Google Research
Announcing Two New Natural Language Dialog Datasets
Friday, September 6, 2019
Posted by Bill Byrne and Filip Radlinski, Research Scientists, Google Research
Today’s digital assistants are expected to complete tasks and return personalized results across many subjects, such as movie listings, restaurant reservations and travel plans. However, despite tremendous progress in recent years, they have not yet reached human-level understanding. This is due, in part, to the lack of quality training data that accurately reflects the way people express their needs and preferences to a digital assistant. This is because the limitations of such systems bias what we say—we want to be understood, and so tailor our words to what we expect a digital assistant to understand. In other words, the conversations we might observe with today’s digital assistants don’t reach the level of dialog complexity we need to model human-level understanding.
To address this, we’re releasing the
Coached Conversational Preference Elicitation
(CCPE) and
Taskmaster-1
English dialog datasets. Both collections make use of a
Wizard-of-Oz
platform that pairs two people who engage in spoken conversations, just like those one might like to have with a truly effective digital assistant. For both datasets, an in-house Wizard-of-Oz interface was designed to uniquely mimic today’s speech-based digital assistants, preserving the characteristics of spoken dialog in the context of an automated system. Since the human “assistants” understand exactly what the user asks, as any person would, we are able to capture how users would actually express themselves to a “perfect” digital assistant, so that we can continue to improve such systems. Full details of the CCPE dataset are described in our
research paper
to be published at the
2019 Annual Conference of the Special Interest Group on Discourse and Dialogue
, and the Taskmaster-1 dataset is described in detail in a
research paper
to appear at the
2019 Conference on Empirical Methods in Natural Language Processing
.
Preference Elicitation
In the movie-oriented
CCPE
dataset, individuals posing as a user speak into a microphone and the audio is played directly to the person posing as a digital assistant. The “assistant” types out their response, which is in turn played to the user via text-to-speech. These 2-person dialogs naturally include disfluencies and errors that happen spontaneously between the two parties that are difficult to replicate using synthesized dialog. This creates a collection of natural, yet structured, conversations about people’s movie preferences.
Among the insights into this dataset, we find that the ways in which people describe their preferences are amazingly rich. This dataset is the first to characterize that richness at scale. We also find that preferences do not always match the way digital assistants, or for that matter recommendation sites, characterize options. To put it another way, the filters on your favorite movie website or service probably don’t match the language you would use in describing the sorts of movies that you like when seeking a recommendation from a person.
Task-Oriented Dialog
The
Taskmaster-1
dataset makes use of both the methodology described above as well as a one-person, written technique to increase the corpus size and speaker diversity—about 7.7k written “self-dialog” entries and ~5.5k 2-person, spoken dialogs. For written dialogs, we engaged people to create the full conversation themselves based on scenarios outlined for each task, thereby playing roles of both the user and assistant. So, while the spoken dialogs more closely reflect conversational language, written dialogs are both appropriately rich and complex, yet are cheaper and easier to collect. The dataset is based on one of six tasks: ordering pizza, creating auto repair appointments, setting up rides for hire, ordering movie tickets, ordering coffee drinks and making restaurant reservations.
This dataset also uses a simple annotation schema that provides sufficient grounding for the data, while making it easy for workers to apply labels to the dialog consistently. As compared to traditional, detailed strategies that make robust agreement among workers difficult, we focus solely on API arguments for each type of conversation, meaning just the variables required to execute the transaction. For example, in a dialog about scheduling a rideshare, we label the “to” and “from” locations along with the car type (economy, luxury, pool, etc.). For movie tickets, we label the movie name, theater, time, number of tickets, and sometimes the screening type (e.g., 3D or standard). A complete list of labels is included with the corpus release.
It is our hope that these datasets will be useful to the research community for experimentation and analysis in both dialog systems and conversational recommendation.
Acknowledgements
We would like to thank our co-authors and collaborators whose hard work and insights made the release of these datasets possible: Karthik Krishnamoorthi, Krisztian Balog, Chinnadhurai Sankar, Arvind Neelakantan, Amit Dubey, Kyu-Young Kim, Andy Cedilnik, Scott Roy, Muqthar Mohammed, Mohd Majeed, Ashwin Kakarla and Hadar Shemtov.
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
AI for Social Good
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Augmented Reality
Australia
Automatic Speech Recognition
AutoML
Awards
BigQuery
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Compression
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gboard
Gmail
Google Accelerated Science
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICCV
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
India
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
Kaggle
KDD
Keyboard Input
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
materials science
Mixed Reality
ML
ML Fairness
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
NeurIPS
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
Peer Review
ph.d. fellowship
PhD Fellowship
PhotoScan
Physics
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum AI
Quantum Computing
Recommender Systems
Reinforcement Learning
renewable energy
Research
Research Awards
resource optimization
Responsible AI
Robotics
schema.org
Search
search ads
Security and Privacy
Self-Supervised Learning
Semantic Models
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Sound Search
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorBoard
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
Unsupervised Learning
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
Year in Review
YouTube
Archive
2022
May
Apr
Mar
Feb
Jan
2021
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2020
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2019
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2018
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2017
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Follow @googleai
Give us feedback in our
Product Forums
.