Skip to main content

The latest research from Google

Retrieval-augmented visual-language pre-training

Large-scale models, such as T5, GPT-3, PaLM, Flamingo and PaLI, have demonstrated the ability to store substantial amounts of knowledge when scaled to tens of billions of parameters and trained on large text and image datasets. These models achieve state-of-the-art results on downstream tasks, such as image captioning, visual question answering and open vocabulary recognition. Despite such achievements, these models require a massive volume of data for training and end up with a tremendous number of parameters (billions in many cases), resulting in significant computational requirements. Moreover, the data used to train these models can become outdated, requiring re-training every time the world's knowledge is updated. For example, a model trained just two years ago might yield outdated information about the current president of the United States.

Large sequence models for software development activities

Foundation models for reasoning on charts

Barkour: Benchmarking animal-level agility with quadruped robots

Differentially private clustering for large-scale datasets

Google Research at I/O 2023

Resolving code review comments with ML

Making ML models differentially private: Best practices and open challenges

Sparse video tubes for joint video and image vision transformers

Responsible AI at Google Research: PAIR

Using reinforcement learning for dynamic planning in open-ended conversations