Blog
The latest from Google Research
Robust Neural Machine Translation
lunes, 29 de julio de 2019
Posted by Yong Cheng, Software Engineer, Google Research
In recent years,
neural machine translation
(NMT) using
Transformer models
has experienced tremendous success. Based on
deep neural networks
, NMT models are usually trained end-to-end on very large parallel corpora (input/output text pairs) in an entirely data-driven fashion and without the need to impose explicit rules of language.
Despite this huge success, NMT models can be sensitive to minor perturbations of the input, which can manifest as a variety of different errors, such as under-translation, over-translation or mistranslation. For example, given a German sentence, the state-of-the-art NMT model,
Transformer
, will yield a correct translation.
“Der Sprecher des Untersuchungsausschusses hat angekündigt, vor Gericht zu ziehen, falls sich die
geladenen
Zeugen weiterhin weigern sollten, eine Aussage zu machen.”
(Machine translation to English:
“The spokesman of the Committee of Inquiry has announced that if the witnesses summoned continue to refuse to testify, he will be brought to court.”
),
But, when we apply a subtle change to the input sentence, say from
geladenen
to the synonym
vorgeladenen
, the translation becomes very different (and in this case, incorrect):
“Der Sprecher des Untersuchungsausschusses hat angekündigt, vor Gericht zu ziehen, falls sich die
vorgeladenen
Zeugen weiterhin weigern sollten, eine Aussage zu machen.”
(Machine translation to English:
“The investigative committee has announced that he will be brought to justice if the witnesses who have been invited continue to refuse to testify.”
).
This lack of robustness in NMT models prevents many commercial systems from being applicable to tasks that cannot tolerate this level of instability. Therefore, learning robust translation models is not just desirable, but is often required in many scenarios. Yet, while the robustness of neural networks has been extensively studied in the computer vision community, only a few prior studies on learning robust NMT models can be found in literature.
In “
Robust Neural Machine Translation with Doubly Adversarial Inputs
” (to appear at
ACL 2019
), we propose an approach that uses generated adversarial examples to improve the stability of machine translation models against small perturbations in the input. We learn a robust NMT model to directly overcome adversarial examples generated with knowledge of the model and with the intent of distorting the model predictions. We show that this approach improves the performance of the NMT model on standard benchmarks.
Training a Model with AdvGen
An ideal NMT model would generate similar translations for separate inputs that exhibit small differences. The idea behind our approach is to perturb a translation model with adversarial inputs in the hope of improving the model’s robustness. It does this using an algorithm called Adversarial Generation (AdvGen
),
which generates plausible adversarial examples for perturbing the model and then feeds them back into the model for defensive training. While this method is inspired by the idea of
generative adversarial networks
(GANs), it does not rely on a discriminator network, but simply applies the adversarial example in training, effectively diversifying and extending the training set.
The first step is to perturb the model using AdvGen. We start by using Transformer to calculate the translation loss based on a source input sentence, a target input sentence and a target output sentence. Then AdvGen randomly selects some words in the source sentence, assuming a uniform distribution. Each word has an associated list of similar words, i.e., candidates that can be used for substitution, from which AdvGen selects the word that is most likely to introduce errors in Transformer output. Then, this generated adversarial sentence is fed back into Transformer, initiating the defense stage.
First, the Transformer model is applied to an input sentence (
lower left
) and, in conjunction with the target output sentence (
above right
) and target input sentence (
middle right;
beginning with the placeholder “<sos>”), the translation loss is calculated. The AdvGen function then takes the source sentence, word selection distribution, word candidates, and the translation loss as inputs to construct an adversarial source example.
During the defend stage, the adversarial sentence is fed back into the Transformer model. Again the translation loss is calculated, but this time using the adversarial source input. Using the same method as above, AdvGen uses the target input sentence, word replacement candidates, the word selection distribution calculated by the attention matrix, and the translation loss to construct an adversarial
target
example.
In the defense stage, the adversarial source example serves as input to the Transformer model, and the translation loss is calculated. AdvGen then uses the same method as above to generate an adversarial target example from the target input.
Finally, the adversarial sentence is fed back into Transformer and the robustness loss using the adversarial source example, the adversarial target input example and the target sentence is calculated. If the perturbation led to a significant loss, the loss is minimized so that when the model is confronted with similar perturbations, it will not repeat the same mistake. On the other hand, if the perturbation leads to a low loss, nothing happens, indicating that the model can already handle this perturbation.
Model Performance
We demonstrate the effectiveness of our approach by applying it to the standard Chinese-English and English-German translation benchmarks. We observed a notable improvement of 2.8 and 1.6
BLEU
points, respectively, compared to the competitive Transformer model, achieving a new state-of-the-art performance.
Comparison of Transformer model (Vaswani et al., 2017) on standard benchmarks.
We then evaluate our model on a noisy dataset, generated using a procedure similar to that described for AdvGen. We take an input clean dataset, such as that used on standard translation benchmarks, and randomly select words for similar word substitution. We find that our model exhibits improved robustness compared to other recent models.
Comparison of Transformer,
Miyao et al.
and
Cheng et al.
on artificial noisy inputs.
These results show that our method is able to overcome small perturbations in the input sentence and improve the generalization performance. It outperforms competitive translation models and achieves state-of-the-art translation performance on standard benchmarks. We hope our translation model will serve as a robust building block for improving many downstream tasks, especially when those are sensitive or intolerant to imperfect translation input.
Acknowledgements
This research was conducted by Yong Cheng, Lu Jiang and Wolfgang Macherey. Additional thanks go to our leadership Andrew Moore and Julia (Wenli) Zhu.
Google at ACL 2019
lunes, 29 de julio de 2019
Andrew Helton, Editor, Google Research Communications
This week, Florence, Italy hosts the
2019 Annual Meeting of the Association for Computational Linguistics
(ACL 2019), the premier conference in the field of natural language understanding, covering a broad spectrum of research areas that are concerned with computational approaches to natural language.
As a leader in natural language processing and understanding, and a Diamond Level sponsor of ACL 2019, Google will be on hand to showcase the latest research on syntax, semantics, discourse, conversation, multilingual modeling, sentiment analysis, question answering, summarization, and generally building better systems using labeled and unlabeled data.
If you’re attending ACL 2019, we hope that you’ll stop by the Google booth to meet our researchers and discuss projects and opportunities at Google that go into solving interesting problems for billions of people. Our researchers will also be on hand to demo the
Natural Questions corpus
, the
Multilingual Universal Sentence Encoder
and more. You can also learn more about the Google research being presented at ACL 2019 below (Google affiliations in
blue
).
Organizing Committee
includes:
Enrique Alfonseca
Accepted Publications
A Joint Named-Entity Recognizer for Heterogeneous Tag-sets Using a Tag Hierarchy
Genady Beryozkin
,
Yoel Drori
,
Oren Gilon
,
Tzvika Hartman
,
Idan Szpektor
Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study
Chinnadhurai Sankar
, Sandeep Subramanian, Chris Pal,
Sarath Chandar
, Yoshua Bengio
Generating Logical Forms from Graph Representations of Text and Entities
Peter Shaw
,
Philip Massey
,
Angelica Chen
,
Francesco Piccinno
,
Yasemin Altun
Extracting Symptoms and their Status from Clinical Conversations
Nan Du
,
Kai Chen
,
Anjuli Kannan
,
Linh Trans
,
Yuhui Chen
,
Izhak Shafran
Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
Vihan Jain
,
Gabriel Magalhaes
,
Alexander Ku
,
Ashish Vaswani
,
Eugene Le
,
Jason Baldridge
Meaning to Form: Measuring Systematicity as Information
Tiago Pimentel, Arya D. McCarthy, Damian Blasi,
Brian Roark
, Ryan Cotterell
Matching the Blanks: Distributional Similarityfor Relation Learning
Livio Baldini Soares
,
Nicholas FitzGerald
,
Jeffrey Ling
,
Tom Kwiatkowski
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
,
Zhilin Yang
, Yiming Yang, Jaime Carbonell,
Quoc Le
, Ruslan Salakhutdinov
HighRES: Highlight-based Reference-less Evaluation of Summarization
Hardy Hardy,
Shashi Narayan
, Andreas Vlachos
Zero-Shot Entity Linking by Reading Entity Descriptions
Lajanugen Logeswaran,
Ming-Wei Chang
,
Kristina Toutanova
,
Kenton Lee
,
Jacob Devlin
,
Honglak Lee
Robust Neural Machine Translation with Doubly Adversarial Inputs
Yong Cheng
,
Lu Jiang
,
Wolfgang Macherey
Natural Questions: a Benchmark for Question Answering Research
Tom Kwiatkowski
,
Jennimaria Palomaki
,
Olivia Redfield
,
Michael Collins
,
Ankur Parikh
,
Chris Alberti
,
Danielle Epstein
,
Illia Polosukhin
,
Matthew Kelcey
,
Jacob Devlin
,
Kenton Lee
,
Kristina N. Toutanova
,
Llion Jones
,
Ming-Wei Chang
,
Andrew Dai
,
Jakob Uszkoreit
,
Quoc Le
,
Slav Petrov
Like a Baby: Visually Situated Neural Language Acquisition
Alexander Ororbia, Ankur Mali, Matthew Kelly,
David Reitter
What Kind of Language Is Hard to Language-Model?
Sebastian J. Mielke, Ryan Cotterell,
Kyle Gorman
,
Brian Roark
, Jason Eisner
How Multilingual is Multilingual BERT?
Telmo Pires
,
Eva Schlinger
,
Dan Garrette
Handling Divergent Reference Texts when Evaluating Table-to-Text Generation
Bhuwan Dhingra,
Manaal Faruqui
,
Ankur Parikh
,
Ming-Wei Chang
,
Dipanjan Das
,
William Cohen
BAM! Born-Again Multi-Task Networks for Natural Language Understanding
Kevin Clark
,
Minh-Thang Luong
, Urvashi Khandelal, Christopher D. Manning,
Quoc V. Le
Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning" for Neural Machine Translation
Wei Wang
,
Isaac Caswell
,
Ciprian Chelba
Monotonic Infinite Lookback Attention for Simultaneous Machine Translation
Naveen Arivazhagan
,
Colin Cherry
,
Wolfgang Macherey
,
Chung-Cheng Chiu
,
Semih Yavuz
,
Ruoming Pang
,
Wei Li
,
Colin Raffel
On the Robustness of Self-Attentive Models
Yu-Lun Hsieh, Minhao Cheng,
Da-Cheng Juan
,
Wei Wei
, Wen-Lian Hsu, Cho-Jui Hsieh
Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B
Jiaming Luo,
Yuan Cao
, Regina Barzilay
How Large Are Lions? Inducing Distributions over Quantitative Attributes
Yanai Elazar, Abhijit Mahabal,
Deepak Ramachandran
,
Tania Bedrax-Weiss
, Dan Roth
BERT Rediscovers the Classical NLP Pipeline
Ian Tenney
,
Dipanjan Das
,
Ellie Pavlick
Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling
Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas Mccoy, Roma Patel, Najoung Kim,
Ian Tenney
, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen, Benjamin Van Durme, Edouard Grave,
Ellie Pavlick
, Samuel R. Bowman
Robust Zero-Shot Cross-Domain Slot Filling with Example Values
Darsh Shah
,
Raghav Gupta
,
Amir Fayazi
,
Dilek Hakkani-Tur
Latent Retrieval for Weakly Supervised Open Domain Question Answering
Kenton Lee
,
Ming-Wei Chang
,
Kristina Toutanova
On-device Structured and Context Partitioned Projection Networks
Sujith Ravi
,
Zornitsa Kozareva
Incorporating Priors with Feature Attribution on Text Classification
Frederick Liu
,
Besim Avci
Informative Image Captioning with External Sources of Information
Sanqiang Zhao,
Piyush Sharma
,
Tomer Levinboim
,
Radu Soricut
Reducing Word Omission Errors in Neural Machine Translation: A Contrastive Learning Approach
Zonghan Yang,
Yong Cheng
, Yang Liu, Maosong Sun
Synthetic QA Corpora Generation with Roundtrip Consistency
Chris Alberti
,
Daniel Andor
,
Emily Pitler
,
Jacob Devlin
,
Michael Collins
Unsupervised Paraphrasing without Translation
Aurko Roy
,
David Grangier
Workshops
Widening NLP 2019
Organizers include:
Diyi Yang
NLP for Conversational AI
Organizers include:
Thang-Minh Luong
,
Tania Bedrax-Weiss
The Fourth Arabic Natural Language Processing Workshop
Organizers include:
Imed Zitouni
The Third Workshop on Abusive Language Online
Organizers include:
Zeerak Waseem
TyP-NLP, Typology for Polyglot NLP
Organizers include:
Manaal Faruqui
Gender Bias in Natural Language Processing
Organizers include:
Kellie Webster
Tutorials
Wikipedia as a Resource for Text Analysis and Retrieval
Organizer:
Marius Pasca
Learning Better Simulation Methods for Partial Differential Equations
martes, 23 de julio de 2019
Posted by Stephan Hoyer, Software Engineer, Google Research
The world’s fastest supercomputers were designed for modeling physical phenomena, yet they still are not fast enough to robustly predict the
impacts of climate change
, to design controls for
airplanes
based on airflow or to accurately
simulate a fusion reactor
. All of these phenomena are modeled by
partial differential equations
(PDEs), the class of equations that describe everything smooth and continuous in the physical world, and the most common class of simulation problems in science and engineering. To solve these equations, we need faster simulations, but in recent years,
Moore’s law has been slowing
. At the same time, we’ve seen huge breakthroughs in machine learning (ML) along with
faster hardware
optimized for it. What does this new paradigm offer for scientific computing?
In “
Learning Data Driven Discretizations for Partial Differential Equations
”, published in
Proceedings of the National Academy of Sciences
, we explore a potential path for how ML can offer continued improvements in high-performance computing, both for solving PDEs and, more broadly, for solving hard computational problems in every area of science.
For most real-world problems,
closed-form solutions
to PDEs don’t exist. Instead, one must find discrete equations (“
discretizations”
) that a computer can solve to approximate the continuous PDE. Typical approaches to solve PDEs represent equations on a grid, e.g., using
finite differences
. To achieve convergence, the mesh spacing of the grid needs to be smaller than the smallest feature size of the solutions. This often isn’t feasible because of an unfortunate scaling law: achieving 10x higher resolution requires 10,000x more compute, because the grid must be scaled in four dimensions—three spatial dimensions and time. Instead, in our paper we show that ML can be used to learn better representations for PDEs on coarser grids.
Satellite photo of a hurricane, at both full resolution and simulated resolution in a
state-of-the-art weather model
. Cumulus clouds (e.g., in the red circle) are responsible for heavy rainfall, but in the weather model the details are entirely blurred out. Instead, models rely on
crude approximations
for sub-grid physics, a
key source of uncertainty
in climate models. Image credit:
NOAA
The challenge is to retain the accuracy of high-resolution simulations while still using the coarsest grid possible. In our work we’re able to improve upon
existing schemes
by replacing heuristics based on deep human insight (e.g., “solutions to a PDE should always be smooth away from discontinuities”) with optimized rules based on machine learning. The rules our ML models recover are complex, and we don’t entirely understand them, but they incorporate sophisticated physical principles like the idea of “upwinding”—to accurately model what’s coming towards you in a fluid flow, you should look upstream in the direction the wind is coming from. An example of our results on a simple model of fluid dynamics are shown below:
Simulations of
Burgers’ equation
, a model for shock waves in fluids, solved with either a standard finite volume method (left) or our neural network based method (right). The orange squares represent simulations with each method on low resolution grids. These points are fed back into the model at each time step, which then predicts how they should change. Blue lines show the exact simulations used for training. The neural network solution is much better, even on a 4x coarser grid, as indicated by the orange squares smoothly tracing the blue line.
Our research also illustrates a broader lesson about how to effectively combine machine learning and physics. Rather than attempting to learn physics from scratch, we combined neural networks with components from traditional simulation methods, including the known form of the equations we’re solving and
finite volume methods
. This means that laws such as
conservation of momentum
are exactly satisfied, by construction, and allows our machine learning models to focus on what they do best, learning optimal rules for interpolation in complex, high-dimensional spaces.
Next Steps
We are focused on scaling up the techniques outlined in our paper to solve larger scale simulation problems with real-world impacts, such as weather and climate prediction. We’re excited about the broad potential of
blending machine learning
into the complex algorithms of scientific computing.
Acknowledgments
Thanks to co-authors Yohai Bar-Sinari, Jason Hickey and Michael Brenner; and Google collaborators Peyman Milanfar, Pascal Getreur, Ignacio Garcia Dorado, Dmitrii Kochkov, Jiawei Zhuang and Anton Geraschenko.
Building SMILY, a Human-Centric, Similar-Image Search Tool for Pathology
viernes, 19 de julio de 2019
Posted by Narayan Hegde, Software Engineer, Google Health and Carrie J. Cai, Research Scientist, Google Research
Advances in machine learning (ML) have shown great promise for assisting in the work of healthcare professionals, such as aiding the detection of
diabetic eye disease
and
metastatic breast cancer
. Though high-performing algorithms are necessary to gain the trust and adoption of clinicians, they are not always sufficient—
what
information is presented to doctors and
how
doctors interact with that information can be crucial determinants in the utility that ML technology ultimately has for users.
The medical specialty of
anatomic pathology
, which is the gold standard for the diagnosis of cancer and many other diseases through microscopic analysis of tissue samples, can greatly benefit from applications of ML. Though diagnosis through pathology is traditionally done on physical microscopes, there has been a growing adoption of “digital pathology,” where high-resolution images of pathology samples can be examined on a computer. With this movement comes the potential to much more easily look up information, as is needed when pathologists tackle the diagnosis of difficult cases or rare diseases, when “general” pathologists approach specialist cases, and when trainee pathologists are learning. In these situations, a common question arises, “What is this feature that I’m seeing?” The traditional solution is for doctors to ask colleagues, or to laboriously browse reference textbooks or online resources, hoping to find an image with similar visual characteristics. The general computer vision solution to problems like this is termed
content-based image retrieval
(CBIR), one example of which is the “
reverse image search
” feature in
Google Images
, in which users can search for similar images by using another image as input.
Today, we are excited to share two research papers describing further progress in human-computer interaction research for
similar image search
in medicine. In “
Similar Image Search for Histopathology: SMILY
” published in
Nature Partner Journal (npj) Digital Medicine
, we report on our ML-based tool for reverse image search for pathology. In our second paper,
“
Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making
”
(preprint available
here
), which received an
honorable mention
at the
2019 ACM CHI Conference on Human Factors in Computing Systems
, we explored different modes of refinement for image-based search, and evaluated their effects on doctor interaction with SMILY.
SMILY Design
The first step in developing SMILY was to apply a deep learning model, trained using 5 billion natural, non-pathology images (e.g., dogs, trees, man-made objects, etc.), to compress images into a “summary” numerical vector, called an
embedding
. The network learned during the training process to distinguish similar images from dissimilar ones by computing and comparing their embeddings. This model is then used to create a database of image patches and their associated embeddings using a corpus of de-identified slides from
The Cancer Genome Atlas
. When a query image patch is selected in the SMILY tool, the query patch’s embedding is similarly computed and compared with the database to retrieve the image patches with the most similar embeddings.
Schematic of the steps in building the SMILY database and the process by which input image patches are used to perform the similar image search.
The tool allows a user to select a region of interest, and obtain visually-similar matches. We tested SMILY’s ability to retrieve images along a pre-specified axis of similarity (e.g. histologic feature or tumor grade), using images of tissue from the breast, colon, and prostate (3 of the most common cancer sites). We found that SMILY demonstrated promising results despite not being trained specifically on pathology images or using any labeled examples of histologic features or tumor grades.
Example of selecting a small region in a slide and using SMILY to retrieve similar images. SMILY efficiently searches a database of billions of cropped images in a few seconds. Because pathology images can be viewed at different magnifications (zoom levels), SMILY automatically searches images at the same magnification as the input image.
Second example of using SMILY, this time searching for a lobular carcinoma, a specific subtype of breast cancer.
Refinement tools for SMILY
However, a problem emerged when we observed how pathologists interacted with SMILY. Specifically, users were trying to answer the nebulous question of “What
looks similar
to this image?” so that they could learn from past cases containing similar images. Yet, there was no way for the tool to understand the intent of the search: Was the user trying to find images that have a similar histologic feature, glandular morphology, overall architecture, or something else? In other words, users needed the ability to guide and refine the search results on a case-by-case basis in order to actually find what they were looking for. Furthermore, we observed that this need for iterative search refinement was rooted in how doctors often perform “
iterative diagnosis
”—by generating hypotheses, collecting data to test these hypotheses, exploring alternative hypotheses, and revisiting or retesting previous hypotheses in an iterative fashion. It became clear that, for SMILY to meet real user needs, it would need to support a different approach to user interaction.
Through careful human-centered research described in our
second paper
, we designed and augmented SMILY with a suite of interactive refinement tools that enable end-users to express what similarity means on-the-fly: 1)
refine-by-region
allows pathologists to crop a region of interest within the image, limiting the search to just that region; 2)
refine-by-example
gives users the ability to pick a subset of the search results and retrieve more results like those; and 3)
refine-by-concept
sliders can be used to specify that more or less of a clinical concept be present in the search results (e.g., fused glands). Rather than requiring that these concepts be built into the machine learning model, we instead developed a method that enables end-users to create new concepts post-hoc, customizing the search algorithm towards concepts they find important for each specific use case. This enables new explorations via post-hoc tools after a machine learning model has already been trained, without needing to re-train the original model for each concept or application of interest.
Through our user study with pathologists, we found that the tool-based SMILY not only increased the clinical usefulness of search results, but also significantly increased users’ trust and likelihood of adoption, compared to a conventional version of SMILY without these tools. Interestingly, these refinement tools appeared to have supported pathologists’ decision-making process in ways
beyond
simply performing better on similarity searches. For example, pathologists used the observed changes to their results from iterative searches as a means of progressively tracking the likelihood of a hypothesis. When search results were surprising, many re-purposed the tools to test and understand the underlying algorithm, for example, by cropping out regions they thought were interfering with the search or by adjusting the concept sliders to increase the presence of concepts they suspected were being ignored. Beyond being passive recipients of ML results, doctors were empowered with the agency to actively test hypotheses and apply their expert domain knowledge, while simultaneously leveraging the benefits of automation.
With these interactive tools enabling users to tailor each search experience to their desired intent, we are excited for SMILY’s potential to assist with searching large databases of digitized pathology images. One potential application of this technology is to index textbooks of pathology images with descriptive captions, and enable medical students or pathologists in training to search these textbooks using visual search, speeding up the educational process. Another application is for cancer researchers interested in studying the correlation of tumor morphologies with patient outcomes, to accelerate the search for similar cases. Finally, pathologists may be able to leverage tools like SMILY to locate all occurrences of a feature (e.g. signs of active cell division, or mitosis) in the same patient’s tissue sample to better understand the severity of the disease to inform cancer therapy decisions. Importantly, our findings add to the body of evidence that sophisticated machine learning algorithms need to be paired with human-centered design and interactive tooling in order to be most useful.
Acknowledgements
This work would not have been possible without Jason D. Hipp, Yun Liu, Emily Reif, Daniel Smilkov, Michael Terry, Craig H. Mermel, Martin C. Stumpe and members of
Google Health
and
PAIR
. Preprints of the two papers are available
here
and
here
.
Parrotron: New Research into Improving Verbal Communication for People with Speech Impairments
miércoles, 17 de julio de 2019
Posted by Fadi Biadsy, Research Scientist and Ron Weiss, Software Engineer, Google Research
Most people take for granted that when they speak, they will be heard and understood. But for the millions who live with speech impairments caused by physical or neurological conditions, trying to communicate with others can be difficult and lead to frustration. While there have been a great number of recent advances in
automatic speech recognition
(ASR; a.k.a. speech-to-text) technologies, these interfaces can be inaccessible for those with speech impairments. Further, applications that rely on speech recognition as input for
text-to-speech synthesis
(TTS) can exhibit word substitution, deletion, and insertion errors. Critically, in today’s technological environment, limited access to speech interfaces, such as digital assistants that depend on directly understanding one's speech, means being excluded from state-of-the-art tools and experiences, widening the gap between what those with and without speech impairments can access.
Project Euphonia
has demonstrated that speech recognition models can be significantly improved to better transcribe a variety of atypical and dysarthric speech. Today, we are presenting
Parrotron
, an ongoing research project that continues and extends our effort to build speech technologies to help those with impaired or atypical speech to be understood by both people and devices. Parrotron consists of a single
end-to-end
deep neural network
trained to convert speech from a speaker with atypical speech patterns directly into fluent synthesized speech, without an intermediate step of generating text—skipping speech recognition altogether. Parrotron’s approach is speech-centric, looking at the problem only from the point of view of speech signals—e.g., without visual cues such as lip movements. Through this work, we show that Parrotron can help people with a variety of atypical speech patterns—including those with ALS, deafness, and muscular dystrophy—to be better understood in both human-to-human interactions and by ASR engines.
The Parrotron Speech Conversion Model
Parrotron is an
attention-based sequence-to-sequence model
trained in two phases using parallel corpora of input/output speech pairs. First, we build a general speech-to-speech conversion model for standard fluent speech, followed by a personalization phase that adjusts the model parameters to the atypical speech patterns from the target speaker. The primary challenge in such a configuration lies in the collection of the parallel training data needed for supervised training, which consists of utterances spoken by many speakers and mapped to the same output speech content spoken by a single speaker. Since it is impractical to have a single speaker record the many hours of training data needed to build a high quality model, Parrotron uses parallel data automatically derived with a TTS system. This allows us to make use of a pre-existing anonymized, transcribed speech recognition corpus to obtain training targets.
The first training phase uses a corpus of ~30,000 hours that consists of millions of anonymized utterance pairs. Each pair includes a natural utterance paired with an automatically synthesized speech utterance that results from running our state-of-the-art
Parallel WaveNet
TTS system on the transcript of the first
.
This dataset includes utterances from thousands of speakers spanning hundreds of dialects/accents and acoustic conditions, allowing us to model a large variety of voices, linguistic and non-linguistic contents, accents, and noise conditions with “typical” speech all in the same language. The resulting conversion model projects away all non-linguistic information, including speaker characteristics, and retains only what is being said, not who, where, or how it is said. This base model is used to seed the second personalization phase of training.
The second training phase utilizes a corpus of utterance pairs generated in the same manner as the first dataset. In this case, however, the corpus is used to adapt the network to the acoustic/phonetic,
phonotactic
and language patterns specific to the input speaker, which might include, for example, learning how the target speaker alters, substitutes, and reduces or removes certain vowels or consonants. To model ALS speech characteristics in general, we use utterances taken from an ALS speech corpus derived from
Project Euphonia
. If instead we want to personalize the model for a particular speaker, then the utterances are contributed by that person. The larger this corpus is, the better the model is likely to be at correctly converting to fluent speech. Using this second smaller and personalized parallel corpus, we run the neural-training algorithm, updating the parameters of the pre-trained base model to generate the final personalized model.
We found that training the model with a
multitask objective
to predict the target phonemes while simultaneously generating
spectrograms
of the target speech led to significant quality improvements. Such a multitask trained encoder can be thought of as learning a latent representation of the input that maintains information about the underlying linguistic content.
Overview of the Parrotron model architecture. An input speech spectrogram is passed through encoder and decoder neural networks to generate an output spectrogram in a new voice.
Case Studies
To demonstrate a proof of concept, we worked with our fellow Google research scientist and mathematician Dimitri Kanevsky, who was born in Russia to Russian speaking, normal-hearing parents but has been
profoundly deaf
from a very young age. He learned to speak English as a teenager, by using Russian phonetic representations of English words, learning to pronounce English using
transliteration
into Russian (e.g., The quick brown fox jumps over the lazy dog => ЗИ КВИК БРАУН ДОГ ЖАМПС ОУВЕР ЛАЙЗИ ДОГ). As a result, Dimitri’s speech is substantially distinct from native English speakers, and can be challenging to comprehend for systems or listeners who are not accustomed to it.
Dimitri recorded a corpus of 15 hours of speech, which was used to adapt the base model to the nuances specific to his speech. The resulting Parrotron system helped him be better understood by both
people
and Google’s ASR system alike. Running Google’s ASR engine on the output of Parrotron significantly reduced the word error rate from 89% to 32%, on a held out test set from Dimitri. Below is an example of Parrotron’s successful conversion of input speech from Dimitri:
Dimitri saying, "
How far is the Moon from the Earth?
"
Parrotron (male voice) saying, "
How far are the Moon from the Earth?
"
We also worked with Aubrie Lee, a Googler and advocate for disability inclusion, who has
muscular dystrophy
, a condition that causes progressive muscle weakness, and sometimes impacts speech production. Aubrie contributed 1.5 hours of speech, which has been instrumental in showing promising
outcomes
of the applicability of this speech-to-speech technology. Below is an example of Parrotron’s successful conversion of input speech from Aubrie:
Aubrie saying, "
Is morning glory a perennial plant?
"
Parrotron (female voice) saying, "
Is morning glory a perennial plant?
"
Aubrie saying, "
Schedule a meeting with John on Friday.
"
Parrotron (female voice) saying, "
Schedule a meeting with John on Friday.
"
We also tested Parrotron’s performance on speech from speakers with ALS by adapting the pretrained model on multiple speakers who share similar speech characteristics grouped together, rather than on a single speaker. We conducted a preliminary listening study and observed an increase in intelligibility when comparing natural ALS speech to the corresponding speech obtained from running the Parroton model, for the majority of our test speakers.
Cascaded Approach
Project Euphonia
has built a personalized speech-to-text model that has reduced the word error rate for a deaf speaker from 89% to 25%, and ongoing research is also likely to improve upon these results. One could use such a speech-to-text model to achieve a similar goal as Parrotron by simply passing its output into a TTS system to synthesize speech from the result. In such a cascaded approach, however, the recognizer may choose an incorrect word (roughly 1 out 4 times, in this case)—i.e., it may yield words/sentences with unintended meaning and, as a result, the synthesized audio of these words would be far from the speaker’s intention. Given the end-to-end speech-to-speech training objective function of Parrotron, even when errors are made, the generated output speech is likely to sound acoustically similar to the input speech, and thus the speaker’s original intention is less likely to be significantly altered and it is often still possible to understand what is intended:
Dimitri saying, "
What is definition of rhythm?
"
Parrotron (male voice) saying, "
What is definition of rhythm?
"
Dimitri saying, "
How many ounces in one liter?
"
Parrotron (male voice) saying, "
Hey Google, How many unces
[sic]
in one liter?
"
Google Assistant saying, "
One liter is equal to thirty-three point eight one four US fluid ounces.
"
Aubrie saying, "
Is it wheelchair accessible?
"
Parrotron (female voice) saying, "
Is it wheelchair accecable
[sic]
?
"
Furthermore, since Parrotron is not strongly biased to producing words from a predefined vocabulary set, input to the model may contain completely new invented words, foreign words/names, and even nonsense words. We observe that feeding Arabic and Spanish utterances into the US-English Parrotron model often results in output which echoes the original speech content with an American accent, in the target voice. Such behavior is qualitatively different from what one would obtain by simply running an ASR followed by a TTS. Finally, by going from a combination of independently tuned neural networks to a single one, we also believe there are improvements and simplifications that could be substantial.
Conclusion
Parrotron makes it easier for users with atypical speech to talk to and be understood by other people and by speech interfaces, with its end-to-end speech conversion approach more likely to reproduce the user’s intended speech. More exciting applications of Parrotron are discussed in
our paper
and additional audio samples can be found on our
github repository
. If you would like to participate in this ongoing research, please fill out
this short form
and volunteer to record a set of phrases. We look forward to working with you!
Acknowledgements
This project was joint work between the Speech and Google Brain teams. Contributors include Fadi Biadsy, Ron Weiss, Pedro Moreno, Dimitri Kanevsky, Ye Jia, Suzan Schwartz, Landis Baker, Zelin Wu, Johan Schalkwyk, Yonghui Wu, Zhifeng Chen, Patrick Nguyen, Aubrie Lee, Andrew Rosenberg, Bhuvana Ramabhadran, Jason Pelecanos, Julie Cattiau, Michael Brenner, Dotan Emanuel, Joel Shor, Sean Lee and Benjamin Schroeder. Our data collection efforts have been vastly accelerated by our collaborations with
ALS-TDI
.
Multilingual Universal Sentence Encoder for Semantic Retrieval
viernes, 12 de julio de 2019
Posted by Yinfei Yang and Amin Ahmad, Software Engineers, Google Research
Since it was
introduced last year
, “
Universal Sentence Encoder (USE) for English
’’ has become one of the most downloaded pre-trained text modules in
Tensorflow Hub
, providing versatile sentence embedding models that convert sentences into vector representations. These vectors capture rich semantic information that can be used to train classifiers for a broad range of downstream tasks. For example, a strong
sentiment classifier
can be trained from as few as one hundred labeled examples, and still be used to measure semantic similarity and for meaning-based clustering.
Today, we are pleased to announce the release of
three new USE multilingual modules
with additional features and potential applications. The first two modules provide multilingual models for retrieving semantically similar text, one optimized for retrieval performance and the other for speed and less memory usage. The third model is specialized for
question-answer retrieval
in sixteen languages (USE-QA), and represents an entirely new application of USE. All three multilingual modules are trained using a
multi-task dual-encoder framework
, similar to the original USE model for English, while using techniques we developed for improving the
dual-encoder with additive margin softmax approach
. They are designed not only to maintain good transfer learning performance, but to perform well on semantic retrieval tasks.
Multi-task training structure of the
Universal Sentence Encoder
. A variety of tasks and task structures are joined by shared encoder layers/parameters (pink boxes).
Semantic Retrieval Applications
The three new modules are all built on semantic retrieval architectures, which typically split the encoding of questions and answers into separate neural networks, which makes it possible to search among billions of potential answers within milliseconds. The key to using dual encoders for efficient semantic retrieval is to pre-encode all candidate answers to expected input queries and store them in a vector database that is optimized for solving the
nearest neighbor problem
, which allows a large number of candidates to be searched quickly with good
precision and recall
. For all three modules, the input query is then encoded into a vector on which we can perform an
approximate
nearest neighbor search. Together, this enables good results to be found quickly without needing to do a direct query/candidate comparison for every candidate. The prototypical pipeline is illustrated below:
A prototypical semantic retrieval pipeline, used for textual similarity.
Semantic Similarity Modules
For semantic similarity tasks, the query and candidates are encoded using the same neural network. Two common semantic retrieval tasks made possible by the new modules include Multilingual Semantic Textual Similarity Retrieval and Multilingual Translation Pair Retrieval.
Multilingual Semantic Textual Similarity Retrieval
Most existing approaches for finding
semantically similar text
require being given a pair of texts to compare. However, using the Universal Sentence Encoder, semantically similar text can be extracted directly from a very large database. For example, in an application like FAQ search, a system can first index all possible questions with associated answers. Then, given a user’s question, the system can search for known questions that are semantically similar enough to provide an answer. A similar approach was used to
find comparable sentences from 50 million sentences in wikipedia
. With the new multilingual USE models, this can be done in any of supported non-English languages.
Multilingual Translation Pair Retrieval
The newly released modules can also be used to mine translation pairs to train neural machine translation systems. Given a source sentence in one language (“How do I get to the restroom?”), they can find the potential translation target in any other supported language (“¿Cómo llego al baño?”).
Both new semantic similarity modules are cross-lingual. Given an input in Chinese, for example, the modules can find the best candidates, regardless of which language it is expressed in. This versatility can be particularly useful for languages that are underrepresented on the internet. For example, an early version of these modules has been used by
Chidambaram et al. (2018)
to provide classifications in circumstances where the training data is only available in a single language, e.g. English, but the end system must function in a range of other languages.
USE for Question-Answer Retrieval
The USE-QA module extends the USE architecture to
question-answer retrieval applications
, which generally take an input query and find relevant answers from a large set of documents that may be indexed at the document, paragraph, or even sentence level. The input query is encoded with the question encoding network, while the candidates are encoded with the answer encoding network.
Visualizing the action of a neural answer retrieval system. The blue point at the north pole represents the question vector. The other points represent the embeddings of various answers. The correct answer, highlighted here in red, is “closest” to the question, in that it minimizes the angular distance. The points in this diagram are produced by an actual USE-QA model, however, they have been projected downwards from ℝ500 to ℝ3 to assist the reader’s visualization.
Question-answer retrieval systems also rely on the ability to understand semantics. For example, consider a possible query to one such system,
Google Talk to Books
, which was launched in early 2018 and backed by a sentence-level index of over 100,000 books. A query, “
What fragrance brings back memories?
”, yields the result, “
And for me, the smell of jasmine along with the pan bagnat, it brings back my entire carefree childhood.
” Without specifying any explicit rules or substitutions, the vector encoding captures the semantic similarity between the terms
fragrance
and
smell
. The advantage provided by the USE-QA module is that it can extend question-answer retrieval tasks such as this to multilingual applications.
For Researchers and Developers
We're pleased to share the latest additions to the Universal Sentence Encoder family with the research community, and are excited to see what other applications will be found. These modules can be used as-is, or fine tuned using domain-specific data. Lastly, we will also host the
semantic similarity for natural language
page on Cloud AI Workshop to further encourage research in this area.
Acknowledgements
Mandy Guo, Daniel Cer, Noah Constant, Jax Law, Muthuraman Chidambaram for core modeling, Gustavo Hernandez Abrego, Chen Chen, Mario Guajardo-Cespedes for infrastructure and colabs, Steve Yuan, Chris Tar, Yunhsuan Sung, Brian Strope, Ray Kurzweil for discussion of the model architecture.
Advancing Semi-supervised Learning with Unsupervised Data Augmentation
miércoles, 10 de julio de 2019
Posted by Qizhe Xie, Student Researcher and Thang Luong, Senior Research Scientist, Google Research, Brain Team
Success in deep learning has largely been enabled by key factors such as algorithmic advancements, parallel processing hardware (
GPU
/
TPU
), and the availability of large-scale labeled datasets, like
ImageNet
. However, when labeled data is scarce, it can be difficult to train neural networks to perform well. In this case, one can apply data augmentation methods, e.g., paraphrasing a sentence or rotating an image, to effectively increase the amount of labeled training data. Recently, there has been significant progress in the design of data augmentation approaches for a variety of areas such as
natural language processing
(NLP),
vision
, and
speech
. Unfortunately, data augmentation is often limited to
supervised learning
only, in which labels are required to transfer from original examples to augmented ones.
Example augmentation operations for text-based (
top
) or image-based (
bottom
) training data.
In our recent work, “
Unsupervised Data Augmentation (UDA) for Consistency Training
”, we demonstrate that one can also perform data augmentation on unlabeled data to significantly improve
semi-supervised learning
(SSL). Our results support the recent
revival of semi-supervised learning
, showing that: (1) SSL can match and even outperform purely supervised learning that uses orders of magnitude more labeled data, (2) SSL works well across domains in both text and vision and (3) SSL combines well with transfer learning, e.g., when fine-tuning from
BERT
. We have also open-sourced our code (
github
) for the community to replicate and build upon.
Unsupervised Data Augmentation Explained
Unsupervised Data Augmentation (UDA) makes use of both labeled data and unlabeled data. To use labeled data, it computes the
loss function
using standard methods for supervised learning to train the model, as shown in the left part of the graph below. For unlabeled data, consistency training is applied to enforce the predictions to be similar for an unlabeled example and the augmented unlabeled example, as shown in the right part of the graph. Here, the same model is applied to both the unlabeled example and its augmented counterpart to produce two model predictions, from which a consistency loss is computed (i.e., the distance between the two prediction distributions). UDA then computes the final loss by jointly optimizing both the supervised loss from the labeled data and the unsupervised consistency loss from the unlabeled data.
An overview of Unsupervised Data Augmentation (UDA).
Left:
Standard supervised loss is computed when labeled data is available.
Right:
With unlabeled data, a consistency loss is computed between an example and its augmented version.
By minimizing the consistency loss, UDA allows for label information to propagate smoothly from labeled examples to unlabeled ones. Intuitively, one can think of UDA as an implicit iterative process. First, the model relies on a small amount of labeled examples to make correct predictions for some unlabeled examples, from which the label information is propagated to augmented counterparts through the consistency loss. Over time, more and more unlabeled examples will be predicted correctly which reflects the improved generalization of the model. Various other types of noise have been tested for consistency training (e.g.,
Gaussian noise
,
adversarial noise
, and others), yet we found that data augmentation outperforms all of them in many settings, leading to state-of-the-art performance on a wide variety of tasks from language to vision. UDA applies different existing augmentation methods depending on the task at hand, including
back translation
,
AutoAugment
, and
TF-IDF
word replacement.
Benchmarks in NLP and Computer Vision
UDA is surprisingly effective in the low-data regime. With only 20 labeled examples, UDA achieves an error rate of 4.20 on the
IMDb
sentiment analysis task by leveraging 50,000 unlabeled examples. This result outperforms the previous state-of-the-art model trained on 25,000 labeled examples with an error rate of 4.32. In the large-data regime, with the full training set, UDA also provides robust gains.
Benchmark on IMDb, a
sentiment analysis
task. UDA surpasses state-of-the-art results in supervised learning across different training sizes.
On the
CIFAR-10
semi-supervised learning benchmark, UDA outperforms all existing SSL methods, such as
VAT
and
ICT
by significant margins. With 4k examples, UDA achieves an error rate of 5.27, matching the performance of the fully supervised model that uses 50k examples. Furthermore, with a more advanced architecture,
PyramidNet+ShakeDrop
, UDA achieves a new state-of-the-art error rate of 2.7, a more than 45% reduction in error rate compared to the previous best semi-supervised result. On
SVHN
, UDA achieves an error rate of 2.46 with only 1k labeled examples, matching the performance of the fully supervised model trained with ~70k labeled examples.
SSL benchmark on CIFAR-10 and SVHN image classification tasks. UDA surpases existing semi-supervised learning methods.
On
ImageNet
with 10% labeled examples, UDA improves the top-1 (top-5) accuracy from 55.1% (77.3%) with the supervised baseline and no unlabeled examples to 68.7% (88.5%) using all images from ImageNet as unlabeled examples. In the high-data regime with the fully labeled set and 1.3M extra unlabeled examples, UDA continues to provide gains from 78.3% (94.4%) to 79.0% (94.5%).
Release
We have released the
codebase
of UDA, together with all data augmentation methods, e.g., back-translation with pre-trained translation models, to replicate our results. We hope that this release will further advance the progress in semi-supervised learning.
Acknowledgements
Special thanks to the co-authors of the paper Zihang Dai, Eduard Hovy, and Quoc V. Le. We’d also like to thank Hieu Pham, Adams Wei Yu, Zhilin Yang, Colin Raffel, Olga Wichrowska, Ekin Dogus Cubuk, Guokun Lai, Jiateng Xie, Yulun Du, Trieu Trinh, Ran Zhao, Ola Spyra, Brandon Yang, Daiyi Peng, Andrew Dai, Samy Bengio and Jeff Dean for their help with this project. A
preprint
is available online.
In addition, we would also like to refer readers to other approaches on semi-supervised learning concurrently developed by our colleagues in Google Research, such as MixMatch and S4L, that have demonstrated promising results. We thank the authors of the
MixMatch
and
S4L
papers for their comments and feedback on our paper and blog post.
Predicting the Generalization Gap in Deep Neural Networks
martes, 9 de julio de 2019
Posted by Yiding Jiang, Google AI Resident
Deep
neural networks
(DNN) are the cornerstone of recent progress in machine learning, and are responsible for recent breakthroughs in a variety of tasks such as
image recognition
,
image segmentation
,
machine translation
and more. However, despite their ubiquity, researchers are still attempting to fully understand the underlying principles that govern them. In particular, classical theories (e.g.,
VC-dimension
and
Rademacher complexity
) in conventional settings suggest that over-parameterized functions should
generalize poorly
to unseen data, yet
recent work
has found that
massively
over-parameterized functions (orders of magnitude more parameters than the number of data points) generalize well. In order to improve models, a better understanding of generalization, which can lead to more theoretically grounded and therefore more principled approaches to DNN design, is required.
An important concept for understanding generalization is the
generalization gap
, i.e., the difference between a model’s performance on training data and its performance on unseen data drawn from the same distribution. Significant strides have been made towards deriving better DNN generalization bounds—the upper limit to the generalization gap—but they still tend to greatly overestimate the actual generalization gap, rendering them uninformative as to why some models generalize so well. On the other hand, the notion of
margin
—the distance between a data point and the
decision boundary
—has been extensively studied in the context of shallow models such as
support-vector machines
, and is found to be closely related to how well these models generalize to unseen data. Because of this, the use of
margin
to study generalization performance has been extended to DNNs, resulting in highly refined
theoretical upper bounds
on the generalization gap, but has not significantly improved the ability to predict how well a model generalizes.
An example of a support-vector machine decision boundary. The hyperplane defined by w∙x-b=0 is the "decision boundary" of this linear classifier, i.e., every point x lying on the hyperplane is equally likely to be in either class under this classifier.
In our
ICLR 2019
paper, “
Predicting the Generalization Gap in Deep Networks with Margin Distributions
”, we propose the use of a
normalized margin
distribution across network layers as a predictor of the
generalization gap
. We empirically study the relationship between the margin distribution and generalization and show that, after proper normalization of the distances, some basic statistics of the margin distributions can accurately predict the generalization gap. We also make available all the models used as a dataset for studying generalization through the
Github repository
.
Each plot corresponds to
a convolutional neural network
trained on
CIFAR-10
with different classification accuracies. The probability density (y-axis) of normalized margin distributions (x-axis) at 4 layers of a network is shown for three different models with increasingly better generalization (left to right). The normalized margin distributions are strongly correlated with test accuracy, which suggests they can be used as a proxy for predicting a network's generalization gap. Please see
our paper
for more details on these networks.
Margin Distributions as a Predictor of Generalization
Intuitively, if the statistics of the margin distribution are truly predictive of the generalization performance, a simple prediction scheme should be able to establish the relationship. As such, we chose
linear regression
to be the predictor. We found that the relationship between the generalization gap and the log-transformed statistics of the margin distributions is almost perfectly linear (see figure below). In fact, the proposed scheme produces better prediction relative to other existing measures of generalization. This indicates that the margin distributions may contain important information about how deep models generalize.
Predicted generalization gap (x-axis) vs. true generalization gap (y-axis) on CIFAR-100 +
ResNet-32
. The points lie close to the diagonal line, which indicates that the predicted values of the log linear model fit the true generalization gap very well.
The Deep Model Generalization Dataset
In addition to our paper, we are introducing the
Deep Model Generalization
(DEMOGEN) dataset, which consists of of 756 trained deep models, along with their training and test performance on the CIFAR-10 and CIFAR-100 datasets. The models are variants of CNNs (with architectures that resemble
Network-in-Network
) and ResNet-32 with different popular
regularization
techniques and hyperparameter settings, inducing a wide spectrum of generalization behaviors. For example, the models of CNNs trained on CIFAR-10 have the test accuracies ranging from 60% to 90.5% with generalization gaps ranging from 1% to 35%. For details of the dataset, please see our paper or the
Github repository
. As part of the dataset release, we also include utilities to easily load the models and reproduce the results presented in our paper.
We hope that this research and the DEMOGEN dataset will provide the community with an accessible tool for studying generalization in deep learning without having to retrain a large number of models. We also hope that our findings will motivate further research in generalization gap predictors and margin distributions in the hidden layers.
Etiquetas
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
AI for Social Good
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Augmented Reality
Australia
Automatic Speech Recognition
AutoML
Awards
BigQuery
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Compression
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gboard
Gmail
Google Accelerated Science
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICCV
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
India
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
Kaggle
KDD
Keyboard Input
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
materials science
Mixed Reality
ML
ML Fairness
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
NeurIPS
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
Peer Review
ph.d. fellowship
PhD Fellowship
PhotoScan
Physics
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum AI
Quantum Computing
Recommender Systems
Reinforcement Learning
renewable energy
Research
Research Awards
resource optimization
Responsible AI
Robotics
schema.org
Search
search ads
Security and Privacy
Self-Supervised Learning
Semantic Models
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Sound Search
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorBoard
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
Unsupervised Learning
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
Year in Review
YouTube
Archive
2022
jun
may
abr
mar
feb
ene
2021
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2020
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2019
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2018
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2017
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2016
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2015
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2014
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2013
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2012
dic
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2011
dic
nov
sep
ago
jul
jun
may
abr
mar
feb
ene
2010
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2009
dic
nov
ago
jul
jun
may
abr
mar
feb
ene
2008
dic
nov
oct
sep
jul
may
abr
mar
feb
2007
oct
sep
ago
jul
jun
feb
2006
dic
nov
sep
ago
jul
jun
abr
mar
feb
Feed
Follow @googleai
Give us feedback in our
Product Forums
.