Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»TD3-BST: A Machine Learning Algorithm to Adjust the Strength of Regularization Dynamically Using Uncertainty Model

    TD3-BST: A Machine Learning Algorithm to Adjust the Strength of Regularization Dynamically Using Uncertainty Model

    April 28, 2024

    Reinforcement learning (RL) is a type of learning approach where an agent interacts with an environment to collect experiences and aims to maximize the reward received from the environment. This usually involves a looping process of experience collecting and enhancement, and due to the requirement of policy rollouts, it is called online RL. Both on-policy and off-policy RL need online interaction, which can be impractical in certain domains due to experimental or environmental constraints. Offline RL algorithms are framed so that they can extract optimal policies from static datasets.

    Offline RL algorithms are used to learn effective and well-applicable policies with the help of static datasets. Many approaches to this algorithm have achieved major success recently. However, they demand significant hyperparameter tuning specific to each dataset to achieve reported performance, which needs policy rollouts in the environment to evaluate. This can create a major problem because the need for significant tuning can affect the adoption of these algorithms in practical domains. Offline RL faces challenges during the evaluation of out-of-distribution (OOD) actions.

    Researchers from Imperial College London introduced TD3-BST (TD3 with Behavioral Supervisor Tuning), an algorithm that uses an uncertainty model to adjust the strength of regularization dynamically. The trained uncertainty model is incorporated into the regularized policy yield TD3 with behavioral supervisor tuning (TD3-BST). TD3-BST helps adjust regularization dynamically using an uncertainty network, helping the learned policy optimize Q-values around dataset modes. TD3-BST outperforms other methods, showcasing state-of-the-art performance when tested on D4RL datasets. 

    Tuning TD3-BST is simple and straight, which involves selecting the choice and scale of the kernel (λ), along with the temperature, using primary hyperparameters of the Morse network. For high-dimensional actions, increasing λ helps hold the region around modes tight. Training with Morse-weighted behavioral cloning (BC) reduces the impact of BC loss for distant modes, allowing the policy to focus on selecting and optimizing errors for a single mode. Moreover, the study has proven the importance of letting some OOD actions in the TD3-BST framework, and it depends on λ. 

    Simple versions of RL, called One-step algorithms, have the potential to learn a policy from an offline dataset. They depend on weighted BC, which has some limitations, and to improve the performance, relaxing the policy objective will play a major role. A BST objective is integrated into an existing IQL algorithm to overcome this issue and learn an optimal policy while retaining an in-sample policy evaluation. This new approach, IQL-BST, is tested using the same setup as the original IQL, and the results obtained match closely with the original IQL with a very slight drop in performance on larger datasets. However, relaxing weighted BC with a BST objective performs well, especially on difficult-medium and large datasets.

    In conclusion, researchers from Imperial College London introduced TD3-BST, an algorithm that uses an uncertainty model to adjust the strength of regularization dynamically. On comparing with previous methods in Gym Locomotion tasks, TD3-BST achieves the best score resulting in strong performance when learning from suboptimal data. In addition, integrating policy regularization with an ensemble-based source of uncertainty enhances the performance. Future work includes: working on different methods to estimate uncertainty, alternative uncertainty measures, and the best way to combine multiple sources of uncertainty.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    The post TD3-BST: A Machine Learning Algorithm to Adjust the Strength of Regularization Dynamically Using Uncertainty Model appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleThis AI Paper Proposes FLORA: A Novel Machine Learning Approach that Leverages Federated Learning and Parameter-Efficient Adapters to Train Visual-Language Models VLMs
    Next Article Researches Discovers New Android Banking Trojan ‘Brokewell’ Disguised as Chrome Update

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    The Zoom Meeting of Doom: A Descent into Corporate Madness

    Artificial Intelligence

    What application types are supported by winappdriver

    Development

    Cloud Application Development for Business Growth: Key Benefits

    Development

    Lari: The AI-Powered Tie That Talks to You 24×7 – The Future of Smart Fashion Is Here

    Artificial Intelligence
    GetResponse

    Highlights

    Redefining software excellence: Quality, testing, and observability in the age of GenAI

    December 20, 2024

    As software development undergoes a seismic shift with GenAI at the forefront, testing, quality assurance,…

    CNCF announces cert-manager and Dapr graduation, Jaeger v2 release

    November 15, 2024

    swappy is a Wayland native snapshot editing tool

    April 14, 2025

    CVE-2025-2987 – IBM Maximo Asset Management SSRF

    April 21, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.