Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Researchers from Stanford and Amazon Developed STARK: A Large-Scale Semi-Structure Retrieval AI Benchmark on Textual and Relational Knowledge Bases

    Researchers from Stanford and Amazon Developed STARK: A Large-Scale Semi-Structure Retrieval AI Benchmark on Textual and Relational Knowledge Bases

    May 2, 2024

    Imagine you’re looking for the perfect gift for your kid – a fun yet safe tricycle that ticks all the boxes. You might search with a query like “Can you help me find a push-along tricycle from Radio Flyer that’s both fun and safe for my kid?” Sounds pretty specific, right? But what if the search engine could understand the textual requirements (“fun” and “safe for kids”) as well as the relational aspect (“from Radio Flyer”)?

    This is the kind of complex, multimodal retrieval challenge that researchers aimed to tackle with STARK (Semi-structured Retrieval on Textual and Relational Knowledge Bases). While we have benchmarks for retrieving information from either pure text or structured databases, real-world knowledge bases often blend these two elements. Think e-commerce platforms, social media, or biomedical databases—they all contain a mix of textual descriptions and connections between entities.

    To create the benchmark, they first built three semi-structured knowledge bases from public datasets: one about Amazon products, one about academic papers and authors, and one about biomedical entities like diseases, drugs, and genes. These knowledge bases contained millions of entities and relationships between them, as well as textual descriptions for many entities.

    https://arxiv.org/abs/2404.13207

    Next, they developed a novel pipeline (shown in Figure 3) to automatically generate queries for their benchmark datasets. The pipeline starts by sampling a relational requirement, like “belongs to the brand Radio Flyer” for products. It then extracts relevant textual properties from an entity that satisfies this requirement, such as describing a tricycle as “fun and safe for kids.” Using language models, it combines the relational and textual information into a natural-sounding query, like “Can you help me find a push-along tricycle from Radio Flyer that’s both fun and safe for my kid?”

    The really cool part is how they construct the ground truth answers for each query. They take the remaining candidate entities (excluding the one used to extract textual properties) and verify if they actually meet the full query requirements using multiple language models. Only the entities that pass this stringent verification get included in the final ground truth answer set.

    After generating thousands of such queries across the three knowledge bases, the researchers analyzed the data distribution and had people evaluate the naturalness, diversity, and practicality of the queries. The results showed that their benchmark captured a wide range of query styles and real-world scenarios.

    When they tested various retrieval models on the STARK benchmark, they found that current approaches still struggle with accurately retrieving relevant entities, especially when the queries involve reasoning over both textual and relational information. The best results came from combining traditional vector similarity methods with language model rerankers like GPT-4, but even then, the performance left significant room for improvement. Traditional embedding methods lacked the advanced reasoning capabilities of large language models, while fine-tuning LLMs for this task proved computationally demanding and difficult to align with textual requirements. On the biomedical dataset, STARK-PRIME, the best method could only retrieve the top-ranked correct answer around 18% of the time (as measured by the Hit@1 metric). The Recall@20 metric, which looks at the proportion of relevant items in the top 20 results, remained below 60% across all datasets.

    The researchers emphasize that STARK sets a new benchmark for evaluating retrieval systems on SKBs, offering valuable opportunities for future research. They suggest that reducing retrieval latency and incorporating strong reasoning abilities into the retrieval process are prospective directions for advancements in this domain. Additionally, they have made their work open-source, fostering further exploration and development in multimodal retrieval tasks.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    Thrilled to release STaRK – A large-scale LLM retrieval benchmark on semi-structured knowledge bases.

    While LLMs excel at reasoning and semantic retrieval, they struggle with more complex tasks. Especially when real-world user queries require a combination of unstructured… pic.twitter.com/nc4CzZ5Pok

    — Shirley Wu (@ShirleyYXWu) April 29, 2024

    The post Researchers from Stanford and Amazon Developed STARK: A Large-Scale Semi-Structure Retrieval AI Benchmark on Textual and Relational Knowledge Bases appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticlePlaywright: Storing an element selector in variable
    Next Article XTuner: An Efficient, Flexible, and Full-Featured AI Toolkit for Fine-Tuning Large Models

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    How to implement seleniums LoadableComponent with a fluent API?

    Development

    Critical ASUS Router Vulnerability Let Attackers Malicious Code Remotely

    Security

    Warehouse – useful Flatpak tool

    Development

    Amazon developing smart glasses to help drivers deliver packages faster – report

    Development
    Hostinger

    Highlights

    Lost in translation? Amazon Q Developer now speaks more languages

    April 9, 2025

    The coding assistant update expands accessibility for developers globally. Source: Latest news 

    Casper Malware: After Babar and Bunny, Another Espionage Cartoon

    April 9, 2025

    LockBit Ransomware Group Allegedly Strikes Heras UK in Cyberattack

    May 30, 2024

    Nite Riot: Minimalism Gets a Wild Side

    April 15, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.