Gemini Robotics-ER 1.5
Our state-of-the-art embodied reasoning model – it specializes in understanding physical spaces, planning, and making logical decisions within its surroundings
Our Gemini-based multimodal model gives advanced world understanding to robots.
Watch
Capabilities
Gemini Robotics-ER 1.5 is capable of making detailed plans from simple commands. For example, let’s say a human instructed it to ‘clean the kitchen’. The ER model would break down the task into smaller, manageable steps – clear the counter, load the dishwasher, wipe the surfaces. This model also supports thinking.
-
Orchestration
Orchestrates robot activities, like a high-level brain. Excels at planning and making logical decisions within a physical environment. Interacts in natural language, estimates progress, and can natively call tools – like using Google Search to look for information.
-
Advanced spatial understanding
Perceives and understands the surrounding environment to locate and handle objects with greater accuracy.
-
Temporal reasoning
Understands the cause and effect relationships between objects and actions as they unfold over time.
Benchmarks
Aggregated performance on 15 embodied reasoning academic benchmarks. The benchmarks include: Point-Bench, RefSpatial, RoboSpatial-Pointing, Where2Place, BLINK, CV-Bench, ERQA, EmbSpatial, MindCube, RoboSpatial-VQA, SAT, Cosmos-Reason1, Min Video Pairs, OpenEQA and VSI-Bench.
Supported data types for input | Image, Video, Text |
Supported data types for output | Text |
Supported # tokens for input | 1M |
Knowledge cutoff | January 2025 |
Availability | Public preview |