Visual and action data are interconnected in robotic tasks, forming a perception-action loop. Robots rely on control parameters for movement, while VFMs excel in processing visual data. However, a modality gap exists between visual and action data arising from the fundamental differences in their sensory modalities, abstraction levels, temporal dynamics, contextual dependence, and susceptibility to noise. These differences make it challenging to directly relate visual perception to action control, requiring intermediate representations or learning algorithms to bridge the gap. Currently, robots are represented by geometric primitives like triangle meshes, and kinematic structures describe their morphology. While VFMs provide generalizable control signals, passing these signals to robots has been challenging.Â
Researchers from Columbia University and Stanford University proposed “Dr. Robot,†a differentiable robot rendering method that integrates Gaussians Splatting, implicit linear blend skinning (LBS), and pose-conditioned appearance deformation to enable differentiable robot control. The key innovation is the ability to calculate gradients from robot images and transfer them to action control parameters, making it compatible with various robot forms and degrees of freedom. This method allows robots to learn actions from VFMs, closing the gap between visual inputs and control actions, which was previously hard to achieve.
The core components of Dr. Robot include Gaussian splatting to model the robot’s appearance and geometry in a canonical pose and implicit LBS to adapt this model to different robot poses. The robot’s appearance is represented by a set of 3D Gaussians, which are transformed and deformed based on the robot’s pose. A differentiable forward kinematics model allows these changes to be tracked, while a deformation function adapts the robot’s appearance in real time. This method produces high-quality gradients for learning robotic control from visual data, as demonstrated by outperforming the state-of-the-art in robot pose reconstruction tasks and planning robot actions through VFMs. In various evaluation experiments, Dr. Robot shows better accuracy in robot pose reconstruction from videos and outperforms existing methods by over 30% in estimating joint angles. The framework is also demonstrated in applications such as robot action planning using language prompts and motion retargeting.
In conclusion, the research presents a robust solution to control robots using visual foundation models by developing a fully differentiable robot representation. Dr. Robot serves as a bridge between the visual world and robotic action space, allowing effective planning and control directly from images and pixels. By creating an efficient and flexible method that integrates forward kinematics, Gaussians Splatting, and implicit LBS, this paper sets a new foundation for using vision-based learning in robotic control tasks.
Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 50k+ ML SubReddit.
[Upcoming Live Webinar- Oct 29, 2024] The Best Platform for Serving Fine-Tuned Models: Predibase Inference Engine (Promoted)
The post Differentiable Rendering of Robots (Dr. Robot): A Robot Self-Model Differentiable from Its Visual Appearance to Its Control Parameters appeared first on MarkTechPost.
Source: Read MoreÂ