Skip to main content

The latest research from Google

Visual language maps for robot navigation

People are excellent navigators of the physical world, due in part to their remarkable ability to build cognitive maps that form the basis of spatial memory — from localizing landmarks at varying ontological levels (like a book on a shelf in the living room) to determining whether a layout permits navigation from point A to point B. Building robots that are proficient at navigation requires an interconnected understanding of (a) vision and natural language (to associate landmarks or follow instructions), and (b) spatial reasoning (to connect a map representing an environment to the true spatial distribution of objects). While there have been many recent advances in training joint visual-language models on Internet-scale data, figuring out how to best connect them to a spatial representation of the physical world that can be used by robots remains an open research question.

Vid2Seq: a pretrained visual language model for describing multi-event videos

Responsible AI at Google Research: The Impact Lab

Learning from deep learning: a case study of feature discovery and validation in pathology

PaLM-E: An embodied multimodal language model

The BirdCLEF 2023 Challenge: Pushing the frontiers of biodiversity monitoring

Announcing the ICDAR 2023 Competition on Hierarchical Text Detection and Recognition

Universal Speech Model (USM): State-of-the-art speech AI for 100+ languages

Performer-MPC: Navigation via real-time, on-robot transformers

Distributed differential privacy for federated learning

Teaching old labels new tricks in heterogeneous graphs