Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 5, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 5, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 5, 2025

      In MCP era API discoverability is now more important than ever

      June 5, 2025

      Google’s DeepMind CEO lists 2 AGI existential risks to society keeping him up at night — but claims “today’s AI systems” don’t warrant a pause on development

      June 5, 2025

      Anthropic researchers say next-generation AI models will reduce humans to “meat robots” in a spectrum of crazy futures

      June 5, 2025

      Xbox just quietly added two of the best RPGs of all time to Game Pass

      June 5, 2025

      7 reasons The Division 2 is a game you should be playing in 2025

      June 5, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Mastering TypeScript: How Complex Should Your Types Be?

      June 5, 2025
      Recent

      Mastering TypeScript: How Complex Should Your Types Be?

      June 5, 2025

      IDMC – CDI Best Practices

      June 5, 2025

      PWC-IDMC Migration Gaps

      June 5, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Google’s DeepMind CEO lists 2 AGI existential risks to society keeping him up at night — but claims “today’s AI systems” don’t warrant a pause on development

      June 5, 2025
      Recent

      Google’s DeepMind CEO lists 2 AGI existential risks to society keeping him up at night — but claims “today’s AI systems” don’t warrant a pause on development

      June 5, 2025

      Anthropic researchers say next-generation AI models will reduce humans to “meat robots” in a spectrum of crazy futures

      June 5, 2025

      Xbox just quietly added two of the best RPGs of all time to Game Pass

      June 5, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»ARM: Enhancing Open-Domain Question Answering with Structured Retrieval and Efficient Data Alignment

    ARM: Enhancing Open-Domain Question Answering with Structured Retrieval and Efficient Data Alignment

    February 3, 2025

    Answering open-domain questions in real-world scenarios is challenging, as relevant information is often scattered across diverse sources, including text, databases, and images. While LLMs can break down complex queries into simpler steps to improve retrieval, they usually fail to account for how data is structured, leading to suboptimal results. Agentic RAG introduces iterative retrieval, refining searches based on prior results. However, this approach is inefficient, as queries are guided by past retrievals rather than data organization. Additionally, it lacks joint optimization, making it prone to reasoning derailment, where errors in early steps cascade into incorrect decisions, increasing computational costs.

    Researchers from MIT, AWS AI, and the University of Pennsylvania introduced ARM, an LLM-based retrieval method designed to enhance complex question answering by aligning queries with the structure of available data. Unlike conventional approaches, ARM explores relationships between data objects rather than relying solely on semantic matching, enabling a retrieve-all-at-once solution. Evaluated on Bird and OTT-QA datasets, ARM outperformed standard RAG and agentic RAG, achieving up to 5.2 and 15.9 points higher execution accuracy on Bird and up to 5.5 and 19.3 points higher F1 scores on OTT-QA. ARM improves retrieval efficiency through structured reasoning and alignment verification.

    The alignment-driven LLM retrieval framework integrates retrieval and reasoning within a unified decoding process, optimizing it through beam search. Unlike conventional methods that treat retrieval and reasoning as separate steps, the LLM can dynamically retrieve relevant data objects while incorporating structured data, a reasoning solver, and self-verification. Since LLMs lack direct access to structured data, we frame retrieval as a generative task, where the model formulates reasoning to identify essential data objects. This process involves iterative decoding with three key components: information alignment, structure alignment, and self-verification, ensuring logical consistency and accurate retrieval.

    Textual data is indexed as N-grams and embeddings to enhance retrieval accuracy, enabling constrained beam decoding for precise alignment. Information alignment extracts key terms and retrieves relevant objects using BM25 scoring and embedding-based similarity. Structure alignment refines these objects through an optimization model, ensuring logical coherence. Finally, self-verification allows the LLM to validate and integrate selected objects within a structured reasoning framework. Multiple drafts are generated through controlled object expansion, and beam search aggregation prioritizes the most confident selections, ensuring high-quality, contextually relevant responses from diverse data sources.

    The study assesses the method on open-domain question-answering tasks using OTT-QA and Bird datasets. OTT-QA involves short-text answers from passages and tables, while Bird requires SQL queries from multiple tables. We compare our approach with standard and agentic RAG baselines, incorporating query decomposition and reranking. ARM, using Llama-3.1-8B-Instruct, retrieves relevant objects efficiently, outperforming baselines in recall and end-to-end accuracy while reducing LLM calls. ReAct struggles with iterative reasoning errors, often repeating searches. ARM’s structured retrieval process improves precision and efficiency. The results highlight ARM’s superiority in retrieving essential information while maintaining computational efficiency across both datasets.

    In conclusion, Effective open-domain question answering requires understanding the available data objects and their organization. Query decomposition with an off-the-shelf LLM often leads to suboptimal retrieval due to a lack of awareness about the data structure. While agentic RAG can interact with the data, it relies on previous retrieval results, making it inefficient and increasing LLM calls. The proposed ARM retrieval method identifies and navigates relevant data objects, even those not directly mentioned in the question. Experimental results show that ARM outperforms baselines in retrieval accuracy and efficiency, requiring fewer LLM calls for improved performance in downstream tasks.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

    🚨 Marktechpost is inviting AI Companies/Startups/Groups to partner for its upcoming AI Magazines on ‘Open Source AI in Production’ and ‘Agentic AI’.

    The post ARM: Enhancing Open-Domain Question Answering with Structured Retrieval and Efficient Data Alignment appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleThis AI Paper from Meta Introduces Diverse Preference Optimization (DivPO): A Novel Optimization Method for Enhancing Diversity in Large Language Models
    Next Article Accelerate video Q&A workflows using Amazon Bedrock Knowledge Bases, Amazon Transcribe, and thoughtful UX design

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 5, 2025
    Machine Learning

    Voice Quality Dimensions as Interpretable Primitives for Speaking Style for Atypical Speech and Affect

    June 5, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Windows 11 users can soon access their iPhones from the Start menu

    News & Updates

    How Aqua Security exports query data from Amazon Aurora to deliver value to their customers at scale

    Databases

    Flexible Docker Images with PHP INI Environment Variables

    Development

    Was your Social Security number leaked to the dark web? Here’s how to know and what to do

    Development

    Highlights

    How to Implement SSR and Client Hydration in Next.js

    January 14, 2025

    Comments Source: Read More 

    (non) recensione CachyOS

    January 7, 2025

    You can finally repair your Xbox at home with parts from iFixit

    December 7, 2024

    We’re Betting On Success at Shoptalk Spring 2025

    March 16, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.