Blog
The latest from Google Research
The Google Brain Team — Looking Back on 2017 (Part 1 of 2)
jueves, 11 de enero de 2018
Posted by Jeff Dean, Google Senior Fellow, on behalf of the entire Google Brain Team
The
Google Brain team
works to advance the state of the art in artificial intelligence by research and systems engineering, as one part of the overall
Google AI
effort. Last year we
shared
a summary of our work in 2016. Since then, we’ve continued to make progress on our long-term research agenda of making machines intelligent, and have collaborated with a number of teams across
Google
and
Alphabet
to use the results of our research to improve people’s lives. This first of two posts will highlight some of our work in 2017, including some of our basic research work, as well as updates on open source software, datasets, and new hardware for
machine learning
. In the
second post
we’ll dive into the research we do in specific domains where machine learning can have a large impact, such as healthcare, robotics, and some areas of basic science, as well as cover our work on creativity, fairness and inclusion and tell you a bit more about who we are.
Core Research
A significant focus of our team is pursuing research that advances our understanding and improves our ability to solve new problems in the field of machine learning. Below are several themes from our research last year.
AutoML
The goal of automating machine learning is to develop techniques for computers to solve new machine learning problems automatically, without the need for human machine learning experts to intervene on every new problem. If we’re ever going to have truly intelligent systems, this is a fundamental capability that we will need. We developed
new approaches for designing neural network architectures
using both reinforcement learning and evolutionary algorithms, scaled this work to
state-of-the-art results on ImageNet classification and detection
, and also showed how to learn new
optimization algorithms
and
effective activation functions
automatically. We are actively working with our
Cloud AI
team to bring this technology into the hands of Google customers, as well as continuing to push the research in many directions.
Convolutional architecture discovered by Neural Architecture Search
Object detection with a network discovered by AutoML
Speech Understanding and Generation
Another theme is on developing new techniques that improve the ability of our computing systems to understand and generate human speech, including our collaboration with the speech team at Google to
develop a number of improvements for an end-to-end approach to speech recognition
, which reduces the relative word error rate over Google’s production speech recognition system by 16%. One nice aspect of this work is that it required many separate threads of research to come together (which you can find on Arxiv:
1
,
2
,
3
,
4
,
5
,
6
,
7
,
8
,
9
).
Components of the
Listen-Attend-Spell end-to-end model
for speech recognition
We also collaborated with our research colleagues on Google’s
Machine Perception
team to develop a new approach for performing text-to-speech generation (Tacotron 2) that dramatically improves the quality of the generated speech. This model achieves a mean opinion score (MOS) of 4.53 compared to a MOS of 4.58 for professionally recorded speech like you might find in an audiobook, and 4.34 for the previous best computer-generated speech system. You can
listen for yourself
.
Tacotron 2’s model architecture
New Machine Learning Algorithms and Approaches
We continued to develop novel machine learning algorithms and approaches, including work on
capsules
(which explicitly look for agreement in activated features as a way of evaluating many different noisy hypotheses when performing visual tasks),
sparsely-gated mixtures of experts
(which enable very large models that are still computational efficient),
hypernetworks
(which use the weights of one model to generate weights for another model),
new kinds of multi-modal models
(which perform multi-task learning across audio, visual, and textual inputs in the same model),
attention-based mechanisms
(as an alternative to convolutional and recurrent models),
symbolic
and
non-symbolic
learned optimization methods, a technique to
back-propagate through discrete variables
, and a few new
reinforcement
learning
algorithmic improvements.
Machine Learning for Computer Systems
The use of machine learning to replace traditional heuristics in computer systems also greatly interests us. We have shown how to use
reinforcement learning to make placement decisions for mapping computational graphs onto a set of computational devices
that are better than human experts. With other colleagues in Google Research, we have shown in “
The Case for Learned Index Structures
” that neural networks can be both faster and much smaller than traditional data structures such as B-trees, hash tables, and Bloom filters. We believe that we are just scratching the surface in terms of the use of machine learning in core computer systems, as outlined in a NIPS workshop talk on
Machine Learning for Systems and Systems for Machine Learning
.
Learned Models as Index Structures
Privacy and Security
Machine learning and its interactions with security and privacy continue to be major research foci for us. We showed that machine learning techniques can be applied in a way that provides differential privacy guarantees, in
a paper
that received one of the best paper awards at
ICLR 2017
. We also continued our investigation into the properties of
adversarial examples
, including
demonstrating adversarial examples in the physical world
, and
how to harness adversarial examples at scale during the training process
to make models more robust to adversarial examples.
Understanding Machine Learning Systems
While we have seen impressive results with
deep learning
, it is important to understand why it works, and when it won’t. In another
one of the best paper awards
of ICLR 2017, we showed that current machine learning theoretical frameworks fail to explain the impressive results of deep learning approaches. We also showed that the
“flatness” of minima found by optimization methods is not as closely linked to good generalization as initially thought
. In order to better understand how training proceeds in deep architectures, we published a series of papers analyzing
random
matrices
, as they are the starting point of most training approaches. Another important avenue to understand deep learning is to better measure their performance. We showed the importance of good experimental design and statistical rigor in a
recent study comparing many GAN approaches
that found many popular enhancements to generative models do not actually improve performance. We hope this study will give an example for other researchers to follow in making robust experimental studies.
We are developing
methods that allow better interpretability of machine learning systems
. And in March, in collaboration with
OpenAI
,
DeepMind
,
YC Research
and others, we announced the
launch of Distill
, a new online open science journal dedicated to supporting human understanding of machine learning. It has gained a reputation for clear exposition of machine learning concepts and for excellent interactive visualization tools in its articles. In its first year,
Distill
has published
many
illuminating
articles
aimed at understanding the inner working of various machine learning techniques, and we look forward to the many more sure to come in 2018.
Feature Visualization
How to Use t-SNE effectively
Open Datasets for Machine Learning Research
Open datasets like
MNIST
,
CIFAR-10
,
ImageNet
,
SVHN
, and
WMT
have pushed the field of machine learning forward tremendously. Our team and Google Research as a whole have been active in open-sourcing interesting new datasets for open machine learning research over the past year or so, by providing access to more large labeled datasets including:
YouTube-8M
: >7 million YouTube videos annotated with 4,716 different classes
YouTube-Bounding Boxes
: 5 million bounding boxes from 210,000 YouTube videos
Speech Commands Dataset
: thousands of speakers saying short command words
AudioSet
: 2 million 10-second YouTube clips labeled with 527 different sound events
Atomic Visual Actions (AVA)
: 210,000 action labels across 57,000 video clips
Open Images
: 9M creative-commons licensed images labeled with 6000 classes
Open Images with Bounding Boxes
: 1.2M bounding boxes for 600 classes
Examples from the
YouTube-Bounding Boxes dataset
: Video segments sampled at 1 frame per second, with bounding boxes successfully identified around the items of interest.
TensorFlow and Open Source Software
A map showing the broad distribution of TensorFlow users (
source
)
Throughout our team’s history, we have built tools that help us to conduct machine learning research and deploy machine learning systems in Google’s many products. In November 2015, we open-sourced our second-generation machine learning framework,
TensorFlow
, with the hope of allowing the machine learning community as a whole to benefit from our investment in machine learning software tools. In February, we released
TensorFlow 1.0
, and in November,
we released v1.4
with these significant additions:
Eager execution
for interactive imperative-style programming,
XLA
, an optimizing compiler for TensorFlow programs, and
TensorFlow Lite
, a lightweight solution for mobile and embedded devices. The
pre-compiled TensorFlow binaries
have now been downloaded more than 10 million times in over 180 countries, and the
source code on GitHub
now has more than 1,200 contributors.
In February, we hosted the first ever
TensorFlow Developer Summit
, with over 450 people attending live in Mountain View and more than 6,500 watching on live streams around the world, including at more than 85 local viewing events in 35 countries. All
talks were recorded
, with topics ranging from new features, techniques for using TensorFlow, or detailed looks under the hoods at low-level TensorFlow abstractions. We’ll be hosting another TensorFlow Developer Summit on March 30, 2018 in the Bay Area.
Sign up now
to save the date and stay updated on the latest news.
This
rock-paper-scissors science experiment
is a novel use of TensorFlow. We’ve been excited by the wide variety of uses of TensorFlow we saw in 2017, including
automating cucumber sorting
,
finding sea cows in aerial imagery
,
sorting diced potatoes to make safer baby food
,
identifying skin cancer
,
helping to interpret bird call recordings in a New Zealand bird sanctuary
, and
identifying diseased plants in the most popular root crop on Earth in Tanzania
!
In November, TensorFlow celebrated its second anniversary as an open-source project. It has been incredibly rewarding to see a vibrant community of TensorFlow developers and users emerge. TensorFlow is the #1 machine learning platform on GitHub and
one of the top five repositories
on GitHub overall, used by
many companies and organizations
, big and small, with
more than 24,500 distinct repositories on GitHub
related to TensorFlow. Many research papers are now published with open-source TensorFlow implementations to accompany the research results, enabling the community to more easily understand the exact methods used and to reproduce or extend the work.
TensorFlow has also benefited from other Google Research teams open-sourcing related work, including
TF-GAN
, a lightweight library for generative adversarial models in TensorFlow,
TensorFlow Lattice
, a set of estimators for working with lattice models, as well as the
TensorFlow Object Detection API
. The TensorFlow
model repository
continues to grow with an ever-widening set of models.
In addition to TensorFlow, we released
deeplearn.js
, an
open-source hardware-accelerated implementation of deep learning APIs right in the browser
(with no need to download or install anything). The deeplearn.js homepage has a number of great examples, including
Teachable Machine
, a computer vision model you train using your webcam, and
Performance RNN
, a real-time neural-network based piano composition and performance demonstration. We’ll be working in 2018 to make it possible to deploy TensorFlow models directly into the deeplearn.js environment.
TPUs
Cloud TPUs
deliver up to 180 teraflops of machine learning acceleration
About five years ago, we recognized that deep learning would dramatically change the kinds of hardware we would need. Deep learning computations are very computationally intensive, but they have two special properties: they are largely composed of dense linear algebra operations (matrix multiples, vector operations, etc.), and they are very tolerant of reduced precision. We realized that we could take advantage of these two properties to build specialized hardware that can run neural network computations very efficiently. We provided design input to Google’s Platforms team and they designed and produced our first generation Tensor Processing Unit (TPU): a single-chip ASIC designed to accelerate inference for deep learning models (inference is the use of an already-trained neural network, and is distinct from training). This first-generation TPU has been deployed in our data centers for three years, and it has been used to power deep learning models on every
Google Search
query, for
Google Translate
, for understanding images in
Google Photos
, for the
AlphaGo
matches against
Lee Sedol
and
Ke Jie
, and for many other research and product uses. In June, we published a
paper
at
ISCA 2017
, showing that this first-generation TPU was 15X - 30X faster than its contemporary GPU or CPU counterparts, with performance/Watt about 30X - 80X better.
Cloud TPU Pods
deliver up to 11.5 petaflops of machine learning acceleration
Experiments with ResNet-50 training on ImageNet
show near-perfect speed-up as the number of TPU devices used increases.
Inference is important, but accelerating the training process is an even more important problem - and also much harder. The faster researchers can try a new idea, the more breakthroughs we can make. Our
second-generation TPU
, announced at Google I/O in May, is a whole system (custom ASIC chips, board and interconnect) that is designed to accelerate both training and inference, and we showed a single device configuration as well as a multi-rack deep learning supercomputer configuration called a TPU Pod. We announced that these second generation devices will be offered on the
Google Cloud Platform
as
Cloud TPUs
. We also unveiled the
TensorFlow Research Cloud (TFRC)
, a program to provide top ML researchers who are committed to sharing their work with the world with access to a cluster of 1,000 Cloud TPUs for free. In December, we
presented work
showing that we can train a ResNet-50 ImageNet model to a high level of accuracy in 22 minutes on a TPU Pod as compared to days or longer on a typical workstation. We think lowering research turnaround times in this fashion will dramatically increase the productivity of machine learning teams here at Google and at all of the organizations that use Cloud TPUs. If you’re interested in Cloud TPUs, TPU Pods, or the TensorFlow Research Cloud, you can sign up to learn more at
g.co/tpusignup
. We’re excited to enable many more engineers and researchers to use TPUs in 2018!
Thanks for reading!
(In
part 2
we’ll discuss our research in the application of machine learning to domains like healthcare, robotics, different fields of science, and creativity, as well as cover our work on fairness and inclusion.)
Etiquetas
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
AI for Social Good
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Augmented Reality
Australia
Automatic Speech Recognition
AutoML
Awards
BigQuery
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Compression
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gboard
Gmail
Google Accelerated Science
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICCV
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
India
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
Kaggle
KDD
Keyboard Input
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
materials science
Mixed Reality
ML
ML Fairness
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
NeurIPS
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
Peer Review
ph.d. fellowship
PhD Fellowship
PhotoScan
Physics
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum AI
Quantum Computing
Recommender Systems
Reinforcement Learning
renewable energy
Research
Research Awards
resource optimization
Responsible AI
Robotics
schema.org
Search
search ads
Security and Privacy
Self-Supervised Learning
Semantic Models
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Sound Search
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorBoard
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
Unsupervised Learning
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
Year in Review
YouTube
Archive
2022
may
abr
mar
feb
ene
2021
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2020
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2019
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2018
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2017
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2016
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2015
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2014
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2013
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2012
dic
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2011
dic
nov
sep
ago
jul
jun
may
abr
mar
feb
ene
2010
dic
nov
oct
sep
ago
jul
jun
may
abr
mar
feb
ene
2009
dic
nov
ago
jul
jun
may
abr
mar
feb
ene
2008
dic
nov
oct
sep
jul
may
abr
mar
feb
2007
oct
sep
ago
jul
jun
feb
2006
dic
nov
sep
ago
jul
jun
abr
mar
feb
Feed
Follow @googleai
Give us feedback in our
Product Forums
.