Nearest Neighbor Speculative Decoding (NEST): An Inference-Time Revision Method for Language Models to Enhance Factuality and Attribution Using Nearest-Neighbor Speculative Decoding

Large language models (LLMs) have proven their potential to handle multiple tasks and perform extremely well across various applications. However, it is challenging for LLMs to generate accurate information, especially when the knowledge is less represented in their training data. To overcome this challenge, retrieval augmentation combines information retrieval and nearest neighbor search from a non-parametric data store that improves evidence-based and situated reasoning with LLMs. This leads to a reduction tendency in semi-parametric LMs while generating unsupported content.

Many works have been explored to overcome these shortcomings. One of the existing methods is Retrieval Augmentation (RA), which uses external knowledge sources to enhance the performance of LMs in tasks that require deep understanding. Advancements in retrieval augmentation, like REALM, RAG, and Atlas, integrate the retrieval component into pre-training and fine-tuning for these downstream tasks. Another method discussed is Speculative decoding, which utilizes a small model to generate drafts for a large model. The most related method is REST which takes multiple drafts from a data store and uses a prefix trie tree to find the proposal distribution.Â

Researchers from FAIR at Meta, the University of Waterloo, Carnegie Mellon University, and the University of Chicago have proposed Nearest Neighbor Speculative Decoding (NEST). NEST is a new semi-parametric language modeling method that can integrate real-world text spans of any length into the generations of an existing LM, enhancing both the quality and latency. NEST extends the standard kNN-LM method by interpolating the output distribution of an LM with the distribution of potential next tokens derived from a corpus. Initially, it includes an extra passage retrieval step, which reduces the need to store and search through all tokens in the corpus, creating a balance between search accuracy and efficiency.

NEST generates content with three sub-steps at each inference step. These steps are:

Confidence-based interpolation: Relative Retrieval Confidence (RRC) score is used to evaluate the uncertainty of the token retriever, which is then used as the interpolation coefficient for the output probability mixture.

Dynamic span selection: NEST selects the best token predicted by the mixture probability and extends to include the span from that token when the threshold is exceeded by token retrieval confidence.

Relaxed speculative decoding: When a span of multiple tokens is selected, it is evaluated based on mixture probability, and only a prefix that is highly likely according to the mixture probability is accepted.

NEST outperforms both the methods, base LM and the standard kNN-LM under a zero-shot setting using Llama-2-Chat models of different sizes on tasks such as text completion, and factuality aware generation. For example, the NEST, combined with the Llama-2-Chat 70B model, shows a 42.3% improvement of ROUGE-1 on WikiText-103 and a 21.6% improvement of FActScore on Biography. Moreover, NEST enhances the efficiency of long-form generation by producing multiple tokens at each time step, and becomes 1.8 times faster in inference time with Llama-2-Chat 70B, without affecting attribution or fluency.Â Â

In conclusion, researchers introduced NEST, an inference-time revision method for LMs that enhances their factuality and attribution with the help of nearest-neighbor speculative decoding. NEST enhances both validation perplexity and quality of free-form generation across 9 different tasks. However, some of the limitations of the proposed method are:

The results of NEST might have factual errors depending on the accuracy of the first-stage passage retrieval and the second-stage token retrieval.Â

The results can be better if fine-tuned on appropriate tasks because the integrated system without fine-tuning might be sub-optimal.

Check out theÂ Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 43k+ ML SubReddit | Also, check out our AI Events Platform

The post Nearest Neighbor Speculative Decoding (NEST): An Inference-Time Revision Method for Language Models to Enhance Factuality and Attribution Using Nearest-Neighbor Speculative Decoding appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

I test a lot of AI coding tools, and this stunning new OpenAI release just saved me days of work

How to use your Android phone as a webcam when your laptop’s default won’t cut it

The 5 most customizable Linux desktop environments – when you want it your way

Gen AI use at work saps our motivation even as it boosts productivity, new research shows

Strategic Cloud Partner: Key to Business Success, Not Just Tech

Strategic Cloud Partner: Key to Business Success, Not Just Tech

Perficient’s “What If? So What?” Podcast Wins Gold at the 2025 Hermes Creative Awards

PIM for Azure Resources

Windows 11 24H2’s Settings now bundles FAQs section to tell you more about your system

Windows 11 24H2’s Settings now bundles FAQs section to tell you more about your system

You can now share an app/browser window with Copilot Vision to help you with different tasks

Microsoft will gradually retire SharePoint Alerts over the next two years

Nearest Neighbor Speculative Decoding (NEST): An Inference-Time Revision Method for Language Models to Enhance Factuality and Attribution Using Nearest-Neighbor Speculative Decoding

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-30419 – NI Circuit Design Suite SymbolEditor Out-of-Bounds Read Vulnerability

Building a Multi-Tenant SaaS Application with Next.js (Backend Integration)

How to give your Xbox Series X|S or Xbox One a Static IP address

Commvault back-upserver via kritiek path traversal-lek over te nemen

Secure Salesforce Integrations

Hacking of Ewon Cosy+ Secure Industrial Remote Access Gateway is Possible

MyNav – workspace and session management TUI

CVE-2025-4639 – Peergos XML XXE Vulnerability

Ransomware Attack Hits Union County, Exposing Residents’ Personal Data

Nearest Neighbor Speculative Decoding (NEST): An Inference-Time Revision Method for Language Models to Enhance Factuality and Attribution Using Nearest-Neighbor Speculative Decoding

Related Posts