At Google DeepMind, we value diversity of experience, knowledge, backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunities regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy, or related condition (including breastfeeding) or any other basis as protected by applicable law. If you have a disability or additional need that requires accommodation, please do not hesitate to let us know.
Snapshot
We are seeking a highly motivated and innovative Research Engineer to join our team in Tokyo, focused on building state-of-the-art multimodal embodied agents. You will work with researchers and engineers to develop general-purpose agents capable of perceiving, reasoning, planning, and executing precise real-time actions in complex, open-ended environments.
In this role, you will utilize the latest advancements in multimodal large language models (LLMs), vision-language-action (VLA) models, in-context learning (ICL), supervised fine-tuning (SFT), and reinforcement learning (RL) to tackle fundamental challenges in embodied intelligence. You will leverage these architectures to bridge the gap between high-level long-horizon planning and low-level high-frequency motor control, creating agents that can adaptively master tasks in rich virtual testbeds (including high-fidelity 3D simulations and sandbox games).
This role offers a unique opportunity to stand at the forefront of the quest for AI. You will join a world-class team tackling the "hard problems" of embodiment—autonomously solving long-horizon tasks, learning from vast multimodal memories, and generalizing to completely unseen worlds. If you are passionate about pushing the frontiers of what AI agents can achieve and are eager to define the next era of adaptive intelligence, we encourage you to apply.
About us
Artificial Intelligence could be one of humanity’s most useful inventions. At Google DeepMind, we’re a team of scientists, engineers, machine learning experts and more, working together to advance the state of the art in artificial intelligence. We use our technologies for widespread public benefit and scientific discovery, and collaborate with others on critical challenges, ensuring safety and ethics are the highest priority.
The role
As a Research Engineer at Google DeepMind, you will contribute to the development of Gemini-powered embodied agents capable of autonomous progression and complex problem-solving.
We believe that rich virtual environments provide the ideal pressures to develop robust skills in reasoning, memory, and motor control. You will use these domains to research how agents can learn from demonstrations and experiences, and adapt their strategies in real-time.
Key responsibilities
- Agent Architecture & Control: Develop and optimize state-of-the-art agent architectures that seamlessly integrate multimodal perception, reasoning, and precise real-time execution.
- Scalable Learning: Build and scale training recipes utilizing supervised fine-tuning, reinforcement learning, imitation learning, and/or in-context learning..
- Memory & Planning: Design advanced systems that enable agents to reason over long horizons and effectively utilize memory to solve complex, extended tasks.
- Adaptation: Research and implement capabilities that allow agents to adapt to new environments and learn from experience at test time.
- Evaluation: Establish rigorous benchmarks within virtual environments to measure progress in general agent capabilities and embodied intelligence in unseen environments.
About you
You are a passionate and talented Research Engineer with a strong foundation in Deep Learning and a proven ability to conduct impactful research. You are excited by the challenge of building agents that demonstrate general intelligence through embodied interaction.
Minimum qualifications:
- Bachelors/Masters/Ph.D. in Computer Science, Artificial Intelligence, or a related field.
- Experience with relevant ML frameworks such as JAX, TensorFlow, or PyTorch.
- Strong programming skills in Python and experience with large-scale data pipelines.
- Solid understanding of LLM internals, e.g., typical training pipelines, computational characteristics of training/inference, mechanisms for multimodal extension.
- Knowledge of Deep Reinforcement Learning (RL), LLM Reasoning, Imitation Learning, Memory-Based Architectures, Vision-Language-Model (VLM), and/or Vision-Language-Action (VLA) models.
- Proven track record of designing, implementing, and maintaining robust technical assets (such as libraries, frameworks, or models) used by a large number of technical stakeholders; experience with OSS contributions is a plus.
- Excellent communication and collaboration skills.
In addition, the following would be an advantage:
- A minimum of 5 years of relevant professional experience.
- Experience building agents for 3D virtual environments, simulators, or video games.
- Strong track record in competitions in machine learning, data science, or AI in games.
- Strong track record in AI competitions or publications in top-tier conferences (NeurIPS, ICLR, ICML, CVPR, etc.).