Skip to main content

The latest research from Google

Scaling vision transformers to 22 billion parameters

Large Language Models (LLMs) like PaLM or GPT-3 showed that scaling transformers to hundreds of billions of parameters improves performance and unlocks emergent abilities. The biggest dense models for image understanding, however, have reached only 4 billion parameters, despite research indicating that promising multimodal models like PaLI continue to benefit from scaling vision models alongside their language counterparts. Motivated by this, and the results from scaling LLMs, we decided to undertake the next step in the journey of scaling the Vision Transformer.

Data-centric ML benchmarking: Announcing DataPerf’s 2023 challenges

Leveraging transfer learning for large scale differentially private image classification

PRESTO – A multilingual dataset for parsing realistic task-oriented dialogues

Detecting novel systemic biomarkers in external eye photos

Visual language maps for robot navigation

Vid2Seq: a pretrained visual language model for describing multi-event videos

Responsible AI at Google Research: The Impact Lab

Learning from deep learning: a case study of feature discovery and validation in pathology

PaLM-E: An embodied multimodal language model

The BirdCLEF 2023 Challenge: Pushing the frontiers of biodiversity monitoring