Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      June 1, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 1, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 1, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 1, 2025

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025

      A week of hell with my Windows 11 PC really makes me appreciate the simplicity of Google’s Chromebook laptops

      June 1, 2025

      Elden Ring Nightreign Night Aspect: How to beat Heolstor the Nightlord, the final boss

      June 1, 2025

      New Xbox games launching this week, from June 2 through June 8 — Zenless Zone Zero finally comes to Xbox

      June 1, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Student Record Android App using SQLite

      June 1, 2025
      Recent

      Student Record Android App using SQLite

      June 1, 2025

      When Array uses less memory than Uint8Array (in V8)

      June 1, 2025

      Laravel 12 Starter Kits: Definite Guide Which to Choose

      June 1, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025
      Recent

      My top 5 must-play PC games for the second half of 2025 — Will they live up to the hype?

      June 1, 2025

      A week of hell with my Windows 11 PC really makes me appreciate the simplicity of Google’s Chromebook laptops

      June 1, 2025

      Elden Ring Nightreign Night Aspect: How to beat Heolstor the Nightlord, the final boss

      June 1, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»This AI Paper from Tel Aviv University Introduces GASLITE: A Gradient-Based Method to Expose Vulnerabilities in Dense Embedding-Based Text Retrieval Systems

    This AI Paper from Tel Aviv University Introduces GASLITE: A Gradient-Based Method to Expose Vulnerabilities in Dense Embedding-Based Text Retrieval Systems

    January 7, 2025

    Dense embedding-based text retrieval has become the cornerstone for ranking text passages in response to queries. The systems use deep learning models for embedding text into vector spaces that enable semantic similarity measurements. This method has been adopted widely in applications such as search engines and retrieval-augmented generation (RAG), where retrieving accurate and contextually relevant information is critical. These systems efficiently match queries with relevant content by building on learned representations, driving huge advancements in knowledge-intensive domains.

    However, the main challenge for embedding-based retrieval systems is their susceptibility to manipulation by adversaries. The reason is that these systems often build on public corpora, which are not immune to adversarial content. Malicious actors can inject crafted passages into the corpus in a way that affects the retrieval system’s ranking to prioritize the adversarial entries over the queries containing them. This can threaten the integrity of search results with the spread of misinformation or the introduction of biased content, endangering the reliability of knowledge systems.

    Previous approaches to counter adversarial attacks have used simple poisoning techniques, such as stuffing targeted queries with repetitive text or embedding misleading information. Although these methods can break single-query systems, they are often ineffective against more complex models that handle diverse query distributions. Existing defenses also do not address the core vulnerabilities in embedding-based retrieval systems, leaving the systems open to more advanced and subtle attacks.

    Researchers at Tel Aviv University introduced a mathematically grounded gradient-based optimization method called GASLITE for crafting adversarial passages. GASLITE performs better than previous techniques because it focuses precisely on the retrieval model’s embedding space rather than modifying content in the text. It aligns itself with certain query distributions, which results in adversarial passages achieving high visibility within retrieval results. Thus, this makes it a potent tool for evaluating vulnerabilities in dense embedding-based systems.

    The GASLITE methodology is grounded in rigorous mathematical principles and innovative optimization techniques. It constructs adversarial passages from attacker-chosen prefixes combined with optimized triggers designed to maximize similarity to targeted query distributions. Optimization takes the form of gradient calculations in the embedding space to find optimal token substitutions. Unlike previous approaches, GASLITE does not edit the corpus or model but instead focuses on generating text that the retrieval system’s ranking algorithm can manipulate. This design makes it stealthy and effective; adversarial passages can blend directly into the corpus without being detectable by standard defenses.

    The authors test GASLITE with nine state-of-the-art retrieval models under various threat scenarios. The method consistently outperformed baseline approaches, achieving a remarkable 61-100% success rate in ranking adversarial passages within the top 10 results for concept-specific queries. These results were achieved with minimal poisoning of the corpus, with adversarial passages comprising just 0.0001% of the dataset. For example, GASLITE demonstrated top-10 visibility across most retrieval models when targeting concept-specific queries, showcasing its precision and efficiency. In single-query attacks, the method consistently ranked adversarial content as the top result, which is effective even under the most stringent conditions.

    Further analysis of the factors that contributed to the success of GASLITE showed that embedding-space geometry and similarity metrics significantly determined model susceptibility. Models using dot-product similarity measures were particularly vulnerable because the GASLITE method exploited these characteristics to achieve optimal alignment with targeted query distributions. The researchers further emphasized that models with anisotropic embedding spaces, where random text pairs produced high similarities, were more susceptible to attacks. This again points towards the importance of understanding embedding-space properties while designing retrieval systems.

    Hostinger

    It underscores the need for strong defenses against adversarial manipulations in embedding-based retrieval systems. The authors thus recommend utilizing hybrid retrieval approaches like dense and sparse retrieval techniques that can minimize the risks provided by such methods as GASLITE. It serves, on its own, to expose the vulnerability in current retrieval systems to risks and pave the way for more secure and resilient technologies.

    The researchers urgently call to focus on the risks presented by such adversarial attacks to dense embedding-based systems. The minimal effort that GASLITE could use to manipulate search results shows the potential severity of such attacks. However, by characterizing critical vulnerabilities and developing actionable defenses, this work provides valuable insights into improving this robustness and reliability in retrieval models.


    Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

    🚨 FREE UPCOMING AI WEBINAR (JAN 15, 2025): Boost LLM Accuracy with Synthetic Data and Evaluation Intelligence–Join this webinar to gain actionable insights into boosting LLM model performance and accuracy while safeguarding data privacy.

    The post This AI Paper from Tel Aviv University Introduces GASLITE: A Gradient-Based Method to Expose Vulnerabilities in Dense Embedding-Based Text Retrieval Systems appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleNVIDIA AI Introduces Cosmos World Foundation Model (WFM) Platform to Advance Physical AI Development
    Next Article Researchers from USC and Prime Intellect Released METAGENE-1: A 7B Parameter Autoregressive Transformer Model Trained on Over 1.5T DNA and RNA Base Pairs

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 1, 2025
    Machine Learning

    Enigmata’s Multi-Stage and Mix-Training Reinforcement Learning Recipe Drives Breakthrough Performance in LLM Puzzle Reasoning

    June 1, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    TinyZero: The $30 DeepSeek AI Clone Spotted by Human AI Srinidhi Ranganathan – Can Tech Giants Keep Up?

    Artificial Intelligence

    A data-driven approach to making better choices

    Artificial Intelligence

    The New Hacker’s List and an Old Debate: Would you Hire a Hacker?

    Development

    How to Use Hash Tables for Fast Data Lookup in JavaScript

    Development

    Highlights

    Artificial Intelligence

    The AI for Science Forum: A new era of discovery

    May 29, 2025

    The AI Science Forum highlights AI’s present and potential role in revolutionizing scientific discovery and…

    MDAgents: A Dynamic Multi-Agent Framework for Enhanced Medical Decision-Making with Large Language Models

    November 4, 2024

    Lenovo Legion Go S gaming handheld pre-orders are live for “SteamOS” models — I might break my loyalty to the Steam Deck

    March 20, 2025

    CVE-2025-4132 – Rapid7 Corporate Website Open Redirect Vulnerability

    May 8, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.