With the success of LLMs in various tasks, search engines have begun using generative methods to provide accurate answers with in-line citations to user queries. However, generating reliable and attributable answers, especially in open-ended information-seeking scenarios, poses challenges due to the complexity of questions and the broad scope of candidate-attributed answers. Existing methods typically focus on attributed question-answering, which involves finding specific answers to precise queries without focusing on the more difficult problem of attributed information seeking. The primary issue is the potential for LLMs to generate incorrect or “hallucinated†information.
A team of researchers from France proposed a reproducible AI framework that supports various LLM architectures for attributed information seeking and is adaptable to any dataset. The proposed framework is designed to benchmark attributed information-seeking tasks with different LLM architectures. The Generate approach leverages LLMs to produce answers based solely on their pre-existing knowledge. In the Retrieve Then Generate approach, documents relevant to the query are first retrieved, and then the LLM generates answers with citations based on these documents. The framework includes variations of RTG, such as vanilla retrieval and query generation, where the latter involves generating subqueries to improve retrieval accuracy. In the Generate Then Retrieve approach, answers are initially generated without citations, followed by identifying relevant documents to support the generated statements.
Performance evaluations using the HAGRID dataset show that RTG approaches outperform other methods, demonstrating better overall performance in answer correctness and citation quality. Specifically, the RTG-query-gen scenario generates queries to guide document retrieval and achieves the highest citation quality scores. The analysis reveals that the quality of citations and the impact of retrieval methods are crucial factors in the effectiveness of attributed information-seeking systems. The framework also includes various metrics to evaluate both answer correctness and citation quality, with results indicating that RTG methods generally yield superior outcomes compared to GTR approaches.
Several other findings emerged from their experiments. Increasing the number of supporting documents had a mixed impact on citations. While it improved recall for citations, it also led to issues with over-citation. Furthermore, the study revealed that while automatic citation metrics such as AutoAIS and natural language inference (NLI)-based metrics correlate well with human judgments in question-answering scenarios, they perform less effectively in open-ended information-seeking tasks.
In conclusion, the proposed framework effectively addresses the gap in evaluating attributed information-seeking scenarios by proposing a comprehensive, open-source framework that supports various LLM architectures. By focusing on both answer correctness and citation quality, the framework offers valuable insights and benchmarks for future research. The RTG-query-gen approach demonstrates significant improvements in citation accuracy, highlighting the importance of effective document retrieval and query generation in attributed information-seeking tasks.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..
Don’t Forget to join our 50k+ ML SubReddit
The post An Extensible Open-Source AI Framework to Benchmark Attributable Information-Seeking Using Representative LLM-based Approaches appeared first on MarkTechPost.
Source: Read MoreÂ