Octo: An Open-Sourced Large Transformer-based Generalist Robot Policy Trained on 800k Trajectories from the Open X-Embodiment Dataset

Regarding robotic learning, the standard practice is to use datasets tailored to the particular robot and job at hand to train policies. Starting from scratch in this manner necessitates a substantial amount of data collection for every activity, and the policies that are produced typically display little generalizability. Theoretically, data gathered from previous robots and jobs could be a solution; training models on various control issues could enhance their ability to generalize and perform better on subsequent tasks. In contrast to the pervasiveness of general-purpose models in computer vision and natural language processing, creating a â€œgeneral-purpose robot modelâ€ capable of controlling various robots has proven to be a formidable challenge. Dealing with robot embodiments, sensor configurations, action spaces, task specifications, surroundings, and compute budgets are unique issues when training a unified control strategy in robotics.

Several publications have put forward robotic foundation models that accomplish just thatâ€”directly translate robot observations into actionsâ€”and offer generalizability to new domains and robots with zero or few shots. Because of their versatility in low-level visuomotor control across activities, settings, and robotic systems, these models are generally called â€œgeneralist robot policiesâ€ (GRPs). While there has been progress toward a â€œgeneral-purpose robot model,â€ these models still have a ways to go. For example, they donâ€™t allow for effective finetuning to new domains; the biggest ones arenâ€™t even available to the public. Another issue is that they limit downstream users to a pre-defined and often restrictive set of input observations, like a single camera stream.

To better accommodate the variety of user interfaces found in robotic applications further down the line, researchers from UC Berkeley, Stanford, Carnegie Mellon University, and Google Deepmind provide a method for pretraining generalist robot policies.Â

Octo is a transformer-based strategy pre-trained using 800k robot demonstrations from the Open X-Embodiment dataset, the largest dataset on robot manipulation. Octo is the first generalist robot manipulation policy to be completely open-source, including the data, model checkpoints, and training pipeline. It is also the first GRP to be effectively fine tuned to new observations and action spaces.Â

When trained on a varied dataset of robots and tasks, the model is a transformer architecture that can convert any number of input tokensâ€”generated from observations and tasksâ€”into actions. This policy may be trained once and used for several robots, different camera setups (e.g., wrist or workspace cameras), and other input methods (e.g., language commands, goal images) by simply switching the tokens provided into the model. The model can be easily adjusted to accommodate other robot configurations, sensory inputs, action spaces, or morphologies by incorporating the necessary adapters and refining it using a small dataset from the target domain and a reasonable computing budget.

Previous research has delved into the individual components of Octo, such as a transformer backbone, goal image specification support, and a diffusion head to model expressive action distributions. However, the true power of this combination as a generalist robot policy is a new and innovative concept. The researchers conducted extensive experiments on nine robots from four different universities, demonstrating that their integrated system achieves state-of-the-art results in out-of-the-box multi-robot control for single and dual-arm manipulation tasks. They also showed that Octo can be effectively used as an initialization for fine-tuning to new observation and action spaces in unseen setups. Throughout these experiments, they analyzed the impact of several design choices on the pretrained GRPâ€™s quality, including data distribution, model architecture, and policy formulation. The evaluation underscored the importance of scale and flexibility in achieving optimal performance.Â

In addition to this publication, the team is making all the necessary resources available for training, using, reproducing, and refining an Octo model. With 27M and 93M parameters, respectively, their pretrained Octo model checkpoints allow language and goal image task specification out of the box and multiple RGB camera inputs. In addition to their whole pre-training pipeline, which includes optimal data loaders, transformer implementations for multimodal inputs, and tools to monitor training progress, they also offer scripts for fine-tuning these models on new domains.

While the team acknowledges that there is still room for improvement in the model, such as language conditioning, support for wrist cameras, and the incorporation of data beyond ideal demonstrations, Octo represents a significant step towards creating generalist robot policies that are compatible with a variety of robot settings. Octo aims to provide a practical platform where researchers and practitioners can access larger datasets related to robotics. They envision that their work will enable the use of pretrained models for rapid task learning and generalization, thereby advancing the field of robotics and machine learning.Â

Check out theÂ Paper and Project. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 42k+ ML SubReddit

The post Octo: An Open-Sourced Large Transformer-based Generalist Robot Policy Trained on 800k Trajectories from the Open X-Embodiment Dataset appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Build Confidence In Your UX Work

Revisiting Microsoft’s biggest canceled devices — from revolutionary phones to dual-screen PCs

What graphics card do YOU have in your PC and why? — Weekend discussion 💬

Windows 11 setup bypass blocked, smaller Surface Pro confirmed, and Amazon raises concerns over Xbox & Surface returns

10 Facts you probably didn’t know about Windows as Microsoft turns 50

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PECL Releases (03.11.2025)

How to Sell Products to PHP Developers Using Sponsorships

Revisiting Microsoft’s biggest canceled devices — from revolutionary phones to dual-screen PCs

Revisiting Microsoft’s biggest canceled devices — from revolutionary phones to dual-screen PCs

What graphics card do YOU have in your PC and why? — Weekend discussion 💬

Windows 11 setup bypass blocked, smaller Surface Pro confirmed, and Amazon raises concerns over Xbox & Surface returns

Octo: An Open-Sourced Large Transformer-based Generalist Robot Policy Trained on 800k Trajectories from the Open X-Embodiment Dataset

ruby-align is Baseline Newly available

February 2025 Baseline monthly digest

All About JavaScript Loops

WaveMaker releases AutoCode plugin for Figma for generating front-end components

AWS vs Azure: Picking the Perfect Platform

Aiarty Image Matting: The Ultimate Background Removal Software for PC?

GenASL: Generative AI-powered American Sign Language avatars

Kathará – lightweight network emulation system

India Highlights Key Governmentâ€™s Initiatives to Boost Cybersecurity

SQL Server Transaction basics

Octo: An Open-Sourced Large Transformer-based Generalist Robot Policy Trained on 800k Trajectories from the Open X-Embodiment Dataset

Related Posts