A Simple Open-loop Model-Free Baseline for Reinforcement Learning Locomotion Tasks without Using Complex Models or Computational Resources

The field of deep reinforcement learning (DRL) is expanding the capabilities of robotic control. However, there has been a growing trend of increasing algorithm complexity. As a result, the latest algorithms need many implementation details to perform well on different levels, causing issues with reproducibility. Moreover, even state-of-the-art DRL models have simple problems, like the Mountain Car environment or the Swimmer task. However, several works have gone against finding simpler baselines and scalable alternatives for RL tasks, so these efforts emphasized the need for simplicity in the field. Complex RL algorithms often require detailed task design in the form of slow reward engineering.Â

To address these issues, this paper discusses related works like the quest for simpler RL baselines and Periodic policies for locomotion.Â In the first approach, simpler parametrizations such as linear function or radial basis functions (RBF) are proposed, highlighting the fragility of RL. The second approach involves periodic policies for locomotion, integrating rhythmic movements into robotic control. Recent work has focused on using oscillators to manage locomotion tasks in quadruped robots. However, no prior studies have examined the application of open-loop oscillators in RL locomotion benchmarks.

Researchers from the German Aerospace Center (DLR) RMC in Germany, Sorbonne UniversitÃ© CNRS in France, and TU Delft CoR in the Netherlands have proposed a simple, open-loop model-free baseline that performs better on standard locomotion tasks without any use of complex models or a lot of computational resources. Although it does not beat RL algorithms in simulation, it provides multiple benefits for real-world applications. These benefits include fast computation, easy deployment on embedded systems, smooth control outputs, and robustness to sensor noise. This method is designed to solve locomotion tasks but is not limited to versatility due to its simplicity.Â

JAX implementations are used from Stable-Baselines3 and the RL Zoo training framework for the RL baselines. The search space is used to optimize the parameters of the oscillators. The effectiveness of the proposed method is tested on the MuJoCo v4 locomotion tasks included in the Gymnasium v0.29.1 library. The approach is compared against three established deep RL algorithms: (a) Proximal Policy Optimization (PPO), (b) Deep Deterministic Policy Gradients (DDPG), and (c) Soft Actor-Critic (SAC). Further, the hyperparameter settings are obtained from the original papers to ensure a fair comparison, except for the swimmer task, where the discount factor (Î³ = 0.9999) is fine-tuned.

The proposed baseline and associated experiments highlight the existing limitations of DRL for robotic applications, provide insights on how to address them, and encourage reflection on the costs of complexity and generality. DRL algorithms are compared to the baseline through experiments on locomotion tasks, including simulated tasks, and transfer to a real elastic quadruped. This paper aims to address three key questions:Â

How do open-loop oscillators fare against DRL methods in terms of performance, runtime, and parameter efficiency?Â

How resilient are RL policies to sensor noise, failures, and external disturbances compared to the open-loop baseline?Â

How do learned policies transfer to a real robot when training without randomization or reward engineering?Â

In conclusion, researchers introduced an open-loop model-free baseline that performs well on standard locomotion tasks without needing complex models or computational resources. In this paper, two more experiments are included, which were conducted using open-loop oscillators to detect the current drawback of DRL algorithms. DRL, when compared against the baseline, shows that it is more prone to low performance when faced with sensor noise or failure. However, by design, open-loop control is sensitive to disturbances and cannot recover from potential falls, limiting this baseline. This method produces joint positions without using the robotâ€™s state. So, a PD controller is needed in simulation to transform these positions into torque commands.

Check out the Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â

Join ourÂ Telegram Channel andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 46k+ ML SubReddit

The post A Simple Open-loop Model-Free Baseline for Reinforcement Learning Locomotion Tasks without Using Complex Models or Computational Resources appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

A Simple Open-loop Model-Free Baseline for Reinforcement Learning Locomotion Tasks without Using Complex Models or Computational Resources

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2022-4363 – Wholesale Market WooCommerce CSRF Vulnerability

TRAIN begins its activity in Europe, promising healthcare advancement through AI

More_eggs Malware Disguised as Resumes Targets Recruiters in Phishing Attack

9 Steps to Get CTEM on Your 2025 Budgetary Radar

Signalize.js – Modulable JavaScript Framework

CERT-In Flags Info Disclosure Flaw in TP-Link Tapo H200 Smart Hub

Cellebrite Android Zero-Day Exploit PoC Released: CVE-2024-53104

Peopleâ€™s Cyber Army, APT44, and NoName057 Launch DDoS Attacks on Denmark

Abandon – double-entry accounting system

A Simple Open-loop Model-Free Baseline for Reinforcement Learning Locomotion Tasks without Using Complex Models or Computational Resources

Related Posts