Researchers at UC San Diego Propose DrS: A Novel Machine Learning Approach for Learning Reusable Dense Rewards for Multi-Stage Tasks in a Data-Driven Manner

The success of many reinforcement learning (RL) techniques relies on dense reward functions, but designing them can be difficult due to expertise requirements and trial and error. Sparse rewards, like binary task completion signals, are easier to obtain but pose challenges for RL algorithms, such as exploration. Consequently, the question emerges: Can dense reward functions be learned in a data-driven manner to address these challenges?Â

Existing research on reward learning often overlooks the importance of reusing rewards for new tasks. In learning reward functions from demonstrations, known as inverse RL, methods like adversarial imitation learning (AIL) have gained traction. Inspired by GANs, AIL employs a policy network and a discriminator to generate and distinguish trajectories, respectively. However, AILâ€™s rewards are not reusable across tasks, limiting its ability to generalize to new tasks.

Researchers from UC San Diego present Dense reward learning from Stages (DrS), a unique approach to learning reusable rewards by incorporating sparse rewards as a supervision signal instead of the original signal for classifying demonstration and agent trajectories. This involves training a discriminator to classify success and failure trajectories based on binary sparse rewards. Higher rewards are assigned to transitions in success trajectories, and lower rewards are assigned to transitions within failure trajectories, ensuring consistency throughout training. Once training is completed, the rewards become reusable. Expert demonstrations can be included as success trajectories, but they are not mandatory, as only sparse rewards are needed, which is often inherent in task definitions.

DrS model consists of two phases: Reward Learning and Reward Reuse. In the Reward Learning phase, a classifier is trained to differentiate between successful and unsuccessful trajectories using sparse rewards. This classifier serves as a dense reward generator. The Reward Reuse phase applies the learned dense reward to train new RL agents in test tasks. Stage-specific discriminators are trained to provide dense rewards for multi-stage functions for each stage, ensuring effective guidance through task progression.

The proposed model was evaluated on three challenging physical manipulation tasks: Pick-and-Place, Turn Faucet, and Open Cabinet Door, each containing various objects. The evaluation focused on the reusability of learned rewards, utilizing non-overlapping training and test sets for each task family. During the Reward Learning phase, rewards were learned by training agents to manipulate training objects, and then these rewards were reused to train agents on test objects in the Reward Reuse phase. The study utilized the Soft Actor-Critic (SAC) algorithm for evaluation. Results demonstrated that the learned rewards outperformed baseline rewards across all task families, sometimes rivaling human-engineered rewards. Semi-sparse rewards exhibited limited success, while other reward learning methods failed to achieve success.

In conclusion, this research presents DrS, a data-driven approach for learning dense reward functions from sparse rewards Evaluated on robotic manipulation tasks, showcasing DrSâ€™s effectiveness in transferring across tasks with varying object geometries. This simplification of the reward design process holds promise for scaling up RL applications in diverse scenarios. However, two main limitations arise with the multi-stage version of the approach. Firstly, the acquisition of task structure knowledge remains unexplored, which could be addressed using large language models or information-theoretic approaches. Secondly, relying on stage indicators may pose challenges in directly training RL agents in real-world settings. However, tactile sensors or visual detection/tracking methods can obtain stage information when necessary.

Check out theÂ Paper.Â All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 40k+ ML SubReddit

The post Researchers at UC San Diego Propose DrS: A Novel Machine Learning Approach for Learning Reusable Dense Rewards for Multi-Stage Tasks in a Data-Driven Manner appeared first on MarkTechPost.

Source: Read MoreÂ

IBM’s next generation Granite models are now available

The Human Element: Using Research And Psychology To Elevate Data Storytelling

Google to offer free version of Gemini Code Assist

MongoDB acquires Voyage AI for its embedding and reranking models

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

OpenAI expands ‘Deep Reseach’ to those paying $20 a month or more, a day after Microsoft made OpenAI’s ‘Think Deeper’ free for all Copilot users with no usage caps

Rethink State💡 Why You Should Model Your Frontend Around Events

Rethink State💡 Why You Should Model Your Frontend Around Events

What To Expect When Migrating Your Site To A New Platform

Kotlin Multiplatform vs. React Native vs. Flutter: Building Your First App

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

Researchers at UC San Diego Propose DrS: A Novel Machine Learning Approach for Learning Reusable Dense Rewards for Multi-Stage Tasks in a Data-Driven Manner

ANDI Accessibility Testing Tool Tutorial

How Data Analytics in Insurance is Driving Smarter Decisions

Unleashing the Power of Generative AI in Acute Care: Revolutionizing Healthcare Delivery

How to Find the Ideal AI Model for Your Project: Decoding AI

FunAudioLLM: A Multi-Model Framework for Natural, Multilingual, and Emotionally Expressive Voice Interactions

This EcoFlow portable battery powered my whole home during recent outages

Site not secure/connection not secure page automation UFT

Query structured data from Amazon Q Business using Amazon QuickSight integration

This AI mirror could track your weight, blood pressure, sleep, and more

SCUF drops a hot new Xbox controller designed by pros for pros (and those of us who think we are)

Researchers at UC San Diego Propose DrS: A Novel Machine Learning Approach for Learning Reusable Dense Rewards for Multi-Stage Tasks in a Data-Driven Manner

Related Posts