The rapid advancement of Large Language Models (LLMs) has significantly improved conversational systems, generating natural and high-quality responses. However, despite these advancements, recent studies have identified several limitations in using LLMs for conversational tasks. These limitations include the need for up-to-date knowledge, generation of non-factual or hallucinated content, and restricted domain adaptability. To address these issues, a common approach is to retrieve and augment LLMs with external knowledge to enhance conversational responses, making them more accurate, reliable, and adaptable to different domains. Nevertheless, the necessity of augmenting every turn of the conversation with external knowledge still needs to be investigated. This paper examines the need for each turn of system response to be augmented with external expertise and proposes an adaptive solution, RAGate, to address this challenge.
Existing studies have explored various methods to improve conversational responses, primarily focusing on knowledge retrieval and joint optimization of retriever and generator components. Knowledge retrieval techniques often use dense passage retrieval methods or public search services to fetch relevant information, which is then integrated into the conversational response. For instance, dense passage retrieval models have been shown to reduce hallucination rates, while graph-structured knowledge bases can enhance reasoning ability and domain generalizability.
Despite these advancements, most retrieval-augmented generation (RAG) techniques assume that every conversation requires external knowledge, potentially leading to unnecessary and irrelevant information being included in responses. The authors propose RAGate, a gating model that leverages human judgments to determine when external knowledge augmentation is necessary. RAGate aims to improve the efficiency and effectiveness of conversational systems by dynamically deciding the need for augmentation based on the conversation context and relevant inputs.
RAGate is inspired by the gate function in long-short term memory (LSTM) models, which control input and memory. It employs a binary knowledge gate mechanism to manipulate external knowledge for conversational systems. The model predicts whether a conversational system requires RAG for improved responses by modeling the conversation context and relevant inputs. The authors explored three variants of RAGate: RAGate-Prompt, RAGate-PEFT, and RAGate-MHA.
RAGate-Prompt: This variant uses a pre-trained language model with devised prompts to adapt to new tasks. It employs zero-shot and in-context learning prompts to describe the task and generate responses with binary feedback.
RAGate-PEFT: This variant uses parameter-efficient fine-tuning (PEFT) methods, such as QLoRA, to fine-tune language models with instruction tuning. It leverages low-rank approximation and quantization techniques to train the model with minimal memory spikes efficiently.
RAGate-MHA: This variant introduces a multi-head attention neural encoder to model the context and estimate the need for augmentation. It uses various setups, including context only or concatenated context and retrieved knowledge, to learn attention weights and generate appropriate responses.
The authors conducted extensive experiments on an annotated Task-Oriented Dialogue (TOD) system dataset, KETOD, which spans 16 domains such as Restaurant and Weather. The experimental results show that RAGate enables conversational systems to efficiently use external knowledge at appropriate conversational turns, producing high-quality system responses. By modeling the uncertainty and confidence level of the system, the authors demonstrated that the “always†augmentation of external knowledge could significantly increase generation uncertainty and the risk of hallucination. RAGate effectively controls the conversation system to make confident and informative responses, reducing the likelihood of hallucinated outputs.
Additionally, the study observed a positive correlation between the calculated confidence score and the relevance of augmented knowledge. This finding suggests that dynamically determining the need for augmentation based on confidence levels can lead to more accurate and relevant responses, enhancing the overall user experience.
The paper addresses the challenge of determining when to use external knowledge augmentation in conversational systems. The proposed solution, RAGate, effectively identifies conversation turns that require augmentation, ensuring natural, relevant, and contextually appropriate responses. By leveraging human judgments and advanced language models, RAGate improves the efficiency and performance of retrieval-augmented generation techniques, providing a valuable contribution to developing advanced conversational systems.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..
Don’t Forget to join our 47k+ ML SubReddit
Find Upcoming AI Webinars here
The post RAGate: Enhancing Conversational AI with Adaptive Knowledge Retrieval appeared first on MarkTechPost.
Source: Read MoreÂ