Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      AI and its impact on the developer experience, or ‘where is the joy?’

      July 23, 2025

      Google launches OSS Rebuild tool to improve trust in open source packages

      July 23, 2025

      AI-enabled software development: Risk of skill erosion or catalyst for growth?

      July 23, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Power bank slapped with a recall? Stop using it now – here’s why

      July 23, 2025

      I recommend these budget earbuds over pricier Bose and Sony models – here’s why

      July 23, 2025

      Microsoft’s big AI update for Windows 11 is here – what’s new

      July 23, 2025

      Slow internet speed on Linux? This 30-second fix makes all the difference

      July 23, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Singleton and Scoped Container Attributes in Laravel 12.21

      July 23, 2025
      Recent

      Singleton and Scoped Container Attributes in Laravel 12.21

      July 23, 2025

      wulfheart/laravel-actions-ide-helper

      July 23, 2025

      lanos/laravel-cashier-stripe-connect

      July 23, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      ‘Wuchang: Fallen Feathers’ came close to fully breaking me multiple times — a soulslike as brutal and as beautiful as it gets

      July 23, 2025
      Recent

      ‘Wuchang: Fallen Feathers’ came close to fully breaking me multiple times — a soulslike as brutal and as beautiful as it gets

      July 23, 2025

      Sam Altman is “terrified” of voice ID fraudsters embracing AI — and threats of US bioweapon attacks keep him up at night

      July 23, 2025

      NVIDIA boasts a staggering $111 million in market value per employee — since it became the world’s first $4 trillion company

      July 23, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Allen Institute for AI-Ai2 Unveils AutoDS: A Bayesian Surprise-Driven Engine for Open-Ended Scientific Discovery

    Allen Institute for AI-Ai2 Unveils AutoDS: A Bayesian Surprise-Driven Engine for Open-Ended Scientific Discovery

    July 22, 2025

    The Allen Institute for Artificial Intelligence (AI2) has introduced AutoDS (Autonomous Discovery via Surprisal), a groundbreaking prototype engine for open-ended autonomous scientific discovery. Distinct from conventional AI research assistants that depend on human-defined objectives or queries, AutoDS autonomously generates, tests, and iterates on hypotheses by quantifying and seeking out “Bayesian surprise”—a principled measure of genuine discovery, even beyond what humans specifically look for.

    From Goal-Driven Inquiry to Open-Ended Exploration

    Traditional approaches to autonomous scientific discovery (ASD) typically revolve around answering pre-specified research questions: generate hypotheses relevant to a given problem, then experimentally validate them. AutoDS departs fundamentally from this paradigm. Drawing inspiration from the curiosity-driven exploration of human scientists, AutoDS operates in an open-ended manner—it decides what questions to pose, which hypotheses to pursue, and how to build upon previous results, all without predefined goals.

    Open-ended discovery is inherently challenging, requiring mechanisms for both traversing vast hypothesis spaces and prioritizing which hypotheses merit investigation. To address these challenges, AutoDS formalizes the concept of “surprisal”—a measurable shift in belief about a hypothesis before and after acquiring empirical evidence.

    Quantifying Bayesian Surprise via Large Language Models

    At the core of AutoDS is a novel framework for estimating Bayesian surprise. For each generated hypothesis, state-of-the-art large language models (LLMs)—such as GPT-4o—act as probabilistic observers, eliciting their “belief” about the hypothesis (in the form of probabilities) both before and after empirical testing. These belief distributions, constructed by sampling multiple judgments from the LLM, are modeled with Beta distributions.

    To detect meaningful discovery, AutoDS calculates the Kullback-Leibler (KL) divergence between the posterior (after evidence) and prior (before evidence) Beta distributions—a formal measure of Bayesian surprise. Critically, only belief shifts that cross a threshold of evidential change (e.g., from likely true to likely false) are treated as genuinely surprising, focusing the system on substantive discoveries rather than trivial uncertainty updates.

    Efficient Hypothesis Search with MCTS

    Exploring the vast hypothesis landscape efficiently requires more than naive sampling. AutoDS leverages Monte Carlo Tree Search (MCTS) with progressive widening to guide its search for surprising discoveries. Each node in the search tree represents a hypothesis, and branches correspond to new hypotheses conditioned on prior findings. This structure lets AutoDS maintain a balance between exploring novel avenues and following up on fruitful leads.

    Unlike greedy or beam search methods that risk either overcommitting or prematurely pruning, MCTS sustains high discovery efficiency under fixed computation. Empirically, across 21 datasets from domains such as biology, economics, and behavioral science, AutoDS outperforms repeated sampling, greedy, and beam search baselines—discovering 5–29% more hypotheses judged surprising by the LLM.

    A Modular Multi-Agent LLM Architecture

    AutoDS orchestrates a series of specialized LLM agents, each responsible for a distinct part of the autonomous scientific workflow:

    • Hypothesis Generation
    • Experimental Design
    • Programming and Execution
    • Results Analysis and Revision

    Deduplication of semantically similar hypotheses uses a hierarchical clustering pipeline: LLM-based text embeddings combined with pairwise semantic equivalence checks ensure the final output set comprises only truly distinct discoveries.

    Human Alignment and Interpretability

    Alignment with human scientific intuition is a key benchmark. In a structured human evaluation (with reviewers holding MS/PhD-level STEM backgrounds), 67% of the hypotheses AutoDS judged surprising were also seen as surprising by domain experts. Furthermore, AutoDS’s Bayesian surprise metric aligned more closely with human judgment than proxy metrics such as predicted “interestingness” or “utility.”

    Interestingly, the nature and direction of surprising belief shifts varied by scientific field—highlighting, for example, that confirmatory claims often require stronger evidence to be convincingly surprising than do novel falsifications.

    Practical Considerations and Future Outlook

    AutoDS exhibits high implementation and experimental validity, with over 98% of evaluated discoveries deemed correctly implemented by human reviewers. While current pipelines depend on API-driven LLMs and thus face latency constraints, the team also explored a “programmatic search” implementation that delivers much faster, albeit less conceptually rich, results.

    Although AutoDS is currently a research prototype (with open-sourcing prospectively planned), its architecture and empirical success chart a compelling path for scalable, AI-driven science.

    Conclusion

    AutoDS represents a significant advance in autonomous scientific reasoning. By transitioning from goal-driven research to autonomous, curiosity-based exploration—and grounding its search in Bayesian surprise—it points the way toward future AI systems capable of complementing, accelerating, or even independently leading scientific discovery.


    Check out the Paper, GitHub Page and Blog. All credit for this research goes to the researchers of this project.

    Sponsorship Opportunity: Reach the most influential AI developers in US and Europe. 1M+ monthly readers, 500K+ community builders, infinite possibilities. [Explore Sponsorship]

    The post Allen Institute for AI-Ai2 Unveils AutoDS: A Bayesian Surprise-Driven Engine for Open-Ended Scientific Discovery appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleTikTok Researchers Introduce SWE-Perf: The First Benchmark for Repository-Level Code Performance Optimization
    Next Article Building a Smart Python-to-R Code Converter with Gemini AI-Powered Validation and Feedback

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 23, 2025
    Machine Learning

    FastVLM: Efficient Vision Encoding for Vision Language Models

    July 23, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-5436 – Multilaser Sirius RE016 Information Disclosure Remote Vulnerability

    Common Vulnerabilities and Exposures (CVEs)
    Sensor-Invariant Tactile Representation for Zero-Shot Transfer Across Vision-Based Tactile Sensors

    Sensor-Invariant Tactile Representation for Zero-Shot Transfer Across Vision-Based Tactile Sensors

    Machine Learning

    CISA Warns of Critical Vulnerabilities in Planet Technology Products

    Security

    Opossum Attack: New Vulnerability Compromises Encrypted TLS Connections, Allowing MitM & Data Injection

    Security

    Highlights

    CVE-2025-35995 – BIG-IP PEM Denial of Service Vulnerability

    May 7, 2025

    CVE ID : CVE-2025-35995

    Published : May 7, 2025, 10:15 p.m. | 1 hour, 21 minutes ago

    Description : When a BIG-IP PEM system is licensed with URL categorization, and the URL categorization policy or an iRule with the urlcat command is enabled on a virtual server, undisclosed requests can cause the Traffic Management Microkernel (TMM) to terminate. Note: Software versions which have reached End of Technical Support (EoTS) are not evaluated.

    Severity: 7.5 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2022-50228 – QEMU KVM SVM Invalid Interrupt Injection Vulnerability

    June 18, 2025

    CVE-2025-5585 – SiteOrigin Widgets Bundle Stored Cross-Site Scripting Vulnerability

    June 25, 2025

    ChatGPT can record, transcribe, and analyze your meetings now

    June 4, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.