Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      How To Prevent WordPress SQL Injection Attacks

      June 13, 2025

      Java never goes out of style: Celebrating 30 years of the language

      June 12, 2025

      OpenAI o3-pro available in the API, BrowserStack adds Playwright support for real iOS devices, and more – Daily News Digest

      June 12, 2025

      Creating The “Moving Highlight” Navigation Bar With JavaScript And CSS

      June 11, 2025

      Microsoft Copilot’s own default configuration exposed users to the first-ever “zero-click” AI attack, but there was no data breach

      June 13, 2025

      Sam Altman says “OpenAI was forced to do a lot of unnatural things” to meet the Ghibli memes demand surge

      June 13, 2025

      5 things we didn’t get from the Xbox Games Showcase, because Xbox obviously hates me personally

      June 13, 2025

      Minecraft Vibrant Visuals finally has a release date and it’s dropping with the Happy Ghasts

      June 13, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      QAQ-QQ-AI-QUEST

      June 13, 2025
      Recent

      QAQ-QQ-AI-QUEST

      June 13, 2025

      JS Dark Arts: Abusing prototypes and the Result type

      June 13, 2025

      Helpful Git Aliases To Maximize Developer Productivity

      June 13, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft Copilot’s own default configuration exposed users to the first-ever “zero-click” AI attack, but there was no data breach

      June 13, 2025
      Recent

      Microsoft Copilot’s own default configuration exposed users to the first-ever “zero-click” AI attack, but there was no data breach

      June 13, 2025

      Sam Altman says “OpenAI was forced to do a lot of unnatural things” to meet the Ghibli memes demand surge

      June 13, 2025

      5 things we didn’t get from the Xbox Games Showcase, because Xbox obviously hates me personally

      June 13, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»ether0: A 24B LLM Trained with Reinforcement Learning RL for Advanced Chemical Reasoning Tasks

    ether0: A 24B LLM Trained with Reinforcement Learning RL for Advanced Chemical Reasoning Tasks

    June 10, 2025

    LLMs primarily enhance accuracy through scaling pre-training data and computing resources. However, the attention has shifted towards alternate scaling due to finite data availability. This includes test-time training and inference compute scaling. Reasoning models enhance performance by emitting thought processes before answers, initially through CoT prompting. Recently, reinforcement learning (RL) post-training has been used. Scientific domains present ideal opportunities for reasoning models. The reason is they involve “inverse problems” where solution quality assessment is straightforward but solution generation remains challenging. Despite conceptual alignment between structured scientific reasoning and model capabilities, current methods lack detailed approaches for scientific reasoning beyond multiple-choice benchmarks.

    Technical Evolution of Reasoning Architectures

    Reasoning models have evolved from early prompt-based methods such as CoT, zero-shot CoT, and Tree of Thought. They have progressed to complex RL approaches via Group Relative Policy Optimization (GRPO) and inference time scaling. Moreover, reasoning models in chemistry focus on knowledge-based benchmarks rather than complex reasoning tasks. Examples include retrosynthesis or molecular design. While datasets such as GPQA-D and MMLU assess chemical knowledge, they fail to evaluate complex chemical reasoning capabilities. Current scientific reasoning efforts remain fragmented. Limited attempts include OmniScience for general science, Med-R1 for medical vision-language tasks, and BioReason for genomic reasoning. However, no comprehensive framework exists for large-scale chemical reasoning model training.

    ether0 Architecture and Design Principles

    Researchers from FutureHouse have proposed ether0, a novel model that reasons in natural language and outputs molecular structures as SMILES strings. It demonstrates the efficacy of reasoning models in chemical tasks. It outperforms frontier LLMs, human experts, and general chemistry models. The training approach uses several optimizations over vanilla RL. This includes distillation of reasoning behavior, a dynamic curriculum, and expert model initialization to enhance efficiency and effectiveness. Moreover, factors such as data efficiency, failure modes, and reasoning behavior are analyzed. This analysis allows for a better understanding of the reasoning utility in solving chemistry problems.

    Training Pipeline: Distillation and GRPO Integration

    The model employs a multi-stage training procedure alternating between distillation and GRPO phases. The architecture introduces four special tokens. These tokens demarcate reasoning and answer boundaries. Training begins with SFT on long CoT sequences generated by DeepSeek-R1. These are filtered for valid SMILES format, and reasoning quality. Specialist RL then optimizes task-specific policies for different problem categories using GRPO. Then, distillation merges specialist models into a generalist. This merges occurs through SFT on correct responses collected throughout training. The final phase applies generalist GRPO to the merged model. This includes continuous quality filtering to remove low-quality reasoning and undesirable molecular substructures.

    Performance Evaluation and Comparative Benchmarks

    Ether0 demonstrates superior performance against both general-purpose LLMs like Claude and o1, and chemistry-specific models, including ChemDFM and TxGemma. It achieves the highest accuracy across all open-answer categories while maintaining competitive performance on multiple-choice questions. For data efficiency, the model outperforms traditional molecular transformer models. It is trained on only 60,000 reactions compared to full USPTO datasets. Ether0 achieves 70% accuracy after seeing 46,000 training examples. Molecular transformers achieved 64.1% on complete datasets in comparison. Under one-shot prompting conditions, ether0 surpasses all evaluated frontier models. Safety alignment procedures successfully filter 80% of unsafe questions without degrading performance on core chemistry tasks.

    Conclusion: Implications for Future Scientific LLMs

    In conclusion, researchers introduced ether0, a 24B-parameter model trained on ten challenging molecular tasks. It significantly outperforms frontier LLMs, domain experts, and specialized models. This is achieved through its interleaved RL and behavior distillation pipeline. The model exhibits exceptional data efficiency and reasoning capabilities. It excels in open-answer chemistry tasks involving molecular design, completion, modification, and synthesis. However, limitations include potential generalization challenges beyond organic chemistry. Moreover, there is a loss of general instruction-following and absence of tool-calling integration. The release of model weights, benchmark data, and reward functions establishes a foundation. This foundation aids in advancing scientific reasoning models across diverse domains.


    Check out the Paper and Technical details. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 99k+ ML SubReddit and Subscribe to our Newsletter.

    ▶ Want to promote your product/webinar/service to 1 Million+ AI Engineers/Developers/Data Scientists/Architects/CTOs/CIOs? Lets Partner..

    The post ether0: A 24B LLM Trained with Reinforcement Learning RL for Advanced Chemical Reasoning Tasks appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleIntelligent QA at Scale: How Agentic AI Delivers Faster & Safer Software Releases
    Next Article Meta Introduces LlamaRL: A Scalable PyTorch-Based Reinforcement Learning RL Framework for Efficient LLM Training at Scale

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 13, 2025
    Machine Learning

    Google AI Unveils a Hybrid AI-Physics Model for Accurate Regional Climate Risk Forecasts with Better Uncertainty Assessment

    June 13, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-23164 – Unifi Protect Permanent Livestream Access Token Vulnerability

    Common Vulnerabilities and Exposures (CVEs)
    Sensor-Invariant Tactile Representation for Zero-Shot Transfer Across Vision-Based Tactile Sensors

    Sensor-Invariant Tactile Representation for Zero-Shot Transfer Across Vision-Based Tactile Sensors

    Machine Learning

    CVE-2025-24347 – CtrlX OS Network Interfaces HTTP Request Manipulation Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-3111 – GitLab Kubernetes Denial of Service Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    News & Updates

    Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix

    June 12, 2025

    By Alex Hutter, Alexandre Bertails, Claire Wang, Haoyuan He, Kishore Banala, Peter Royal, Shervin Afshar As…

    CVE-2025-6012 – WordPress Auto Attachments Stored Cross-Site Scripting Vulnerability

    June 13, 2025

    How can businesses save money on internet security in 2015?

    April 9, 2025

    CVE-2024-48877 – Microsoft Xls2csv Heap Buffer Overflow Vulnerability

    June 2, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.