Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 6, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 6, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 6, 2025

      AI is currently in its teenage years, battling raging hormones

      June 6, 2025

      4 ways your organization can adapt and thrive in the age of AI

      June 6, 2025

      Google’s new Search tool turns financial info into interactive charts – how to try it

      June 6, 2025

      This rugged Android phone has something I’ve never seen on competing models

      June 6, 2025

      Anthropic’s new AI models for classified info are already in use by US gov

      June 6, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Handling PostgreSQL Migrations in Node.js

      June 6, 2025
      Recent

      Handling PostgreSQL Migrations in Node.js

      June 6, 2025

      How to Add Product Badges in Optimizely Configured Commerce Spire

      June 6, 2025

      Salesforce Health Check Assessment Unlocks ROI

      June 6, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft: Run PS script now if you deleted “inetpub” on Windows 11, Windows 10

      June 6, 2025
      Recent

      Microsoft: Run PS script now if you deleted “inetpub” on Windows 11, Windows 10

      June 6, 2025

      Spf Permerror Troubleshooting Guide For Better Email Deliverability Today

      June 6, 2025

      Amap – Gather Info in Easy Way

      June 6, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»LLMs Can Be Misled by Surprising Data: Google DeepMind Introduces New Techniques to Predict and Reduce Unintended Knowledge Contamination

    LLMs Can Be Misled by Surprising Data: Google DeepMind Introduces New Techniques to Predict and Reduce Unintended Knowledge Contamination

    April 20, 2025
    LLMs Can Be Misled by Surprising Data: Google DeepMind Introduces New Techniques to Predict and Reduce Unintended Knowledge Contamination

    Large language models (LLMs) are continually evolving by ingesting vast quantities of text data, enabling them to become more accurate predictors, reasoners, and conversationalists. Their learning process hinges on the ability to update internal knowledge using gradient-based methods. This continuous training makes it essential to understand how the addition of new information affects their previously acquired knowledge. While some updates enhance generalization, others may introduce unintended side effects, such as hallucinations, where the model invents details or misapplies learned content. Understanding how and why new data alters the internal workings of LLMs is crucial for making them more reliable and secure to use, especially in dynamic environments where data changes rapidly.

    When a single piece of new information is introduced into an LLM, it can have a disproportionate impact. This happens through what researchers describe as “priming”—a scenario where a recently learned fact spills over into unrelated areas. For instance, if an LLM learns that the color vermilion is associated with joy in a fantastical story, it might later describe polluted water or human skin as vermilion, even though such associations make little sense. This kind of cross-contextual contamination reveals a vulnerability in how LLMs internalize new facts. Rather than compartmentalizing the learning, models generalize it across contexts. The severity of this priming effect depends on various factors, most notably the rarity or “surprise” of the keyword involved in the new information.

    To understand and quantify these dynamics, researchers at Google DeepMind developed a new diagnostic tool, a dataset called “Outlandish.” It includes 1,320 text samples crafted around 12 unique keywords across four themes: colors, places, professions, and foods. Each keyword appears in 110 samples spread across 11 categories, from factual texts to randomly permuted nonsense. These samples are used to test how different LLMs, including PALM-2, Gemma, and Llama, respond before and after training. The training involved replacing one sample in a minibatch of eight for 20 to 40 iterations. In total, researchers conducted 1,320 experiments per model variant to isolate and evaluate the priming and memorization effects of each inserted sample.

    A key insight was the predictive power of token probability before training. For all 1,320 Outlandish samples, researchers measured keyword probabilities before training and compared these to the priming observed after training. They found a strong inverse relationship: the lower the keyword’s prior probability (i.e., the more surprising it was), the higher the likelihood of priming. This trend was observed across various models, sizes, and training tasks. A clear threshold emerged around a probability of 10⁻³. Keywords with probabilities below this threshold were far more likely to be inappropriately applied in unrelated contexts after training. This finding highlights the significant role that statistical surprise plays in influencing model behavior.

    Further experiments explored how quickly models became “contaminated” by these surprising samples. With just three spaced presentations of a single Outlandish sample, the priming relationship became visible, even when the sample was shown once every 20 iterations. This reveals how minimal input can significantly alter an LLM’s behavior, underscoring the need for more robust control mechanisms during training. Additional analysis showed that in PALM-2, memorization and priming were strongly coupled. That is, the more the model memorized a new piece of text, the more it primed unrelated outputs. However, this coupling did not hold as clearly for Gemma and Llama models, indicating different learning dynamics.

    Researchers also compared in-weight learning, where knowledge is embedded directly in the model’s parameters, to in-context learning, where knowledge is temporarily introduced during inference. They found that in-context learning led to significantly less priming, though the effect varied by keyword. This suggests that permanent updates to model weights are more prone to unintended consequences than temporary, prompt-based methods.

    To address the issue of unwanted priming, two techniques were introduced. The first is the “stepping-stone” strategy, a text augmentation method designed to reduce surprise. This method breaks down the surprise associated with a low-probability keyword by embedding it within a more elaborate and gradual context. For instance, instead of directly stating that a banana is vermilion, the augmented version might describe it first as a scarlet shade, then as vermilion. Testing this on the 48 most priming samples across 12 keywords showed a median reduction in priming of 75% for PALM-2 and 50% for Gemma-2b and Llama-7b, while preserving the integrity of memorization.

    The second method, “ignore-topk,” is a gradient pruning strategy. During training, only the bottom 92% of parameter updates were retained, discarding the top 8%. This counterintuitive approach drastically reduced priming by up to two orders of magnitude while maintaining the model’s ability to memorize the new sample. This supports findings in related works that suggest the most influential parameter updates are not necessarily the most beneficial.

    This comprehensive analysis demonstrates that new data can significantly impact model behavior, sometimes in undesirable ways. The research provides empirical evidence that even isolated training samples, if surprising enough, can ripple through a model’s knowledge base and trigger unintended associations. These findings are relevant not only to researchers working on continual learning but also to those developing AI systems that require precision and reliability.

    Several Key Takeaways from the Research include:

    • 1,320 custom-crafted text samples were used to evaluate the impact of new information on LLMs.  
    • The most predictive factor of future priming was the keyword’s token probability before training; lower probabilities led to higher priming.
    • A probability threshold of 10⁻³ was identified, below which priming effects became significantly pronounced. 
    • Priming effects were measurable after just three training iterations, even with spacing between inputs.
    • PALM-2 showed a strong correlation between memorization and priming, while Gemma and Llama exhibited different learning behaviors.  
    • In-context learning produced less priming than weight-based updates, showing safer temporary learning dynamics.
    • The “stepping-stone” strategy reduced priming by up to 75% without compromising learning.
    • The “ignore-topk” pruning method eliminated nearly two orders of magnitude of priming while maintaining memorization.

    Check out the Paper. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post LLMs Can Be Misled by Surprising Data: Google DeepMind Introduces New Techniques to Predict and Reduce Unintended Knowledge Contamination appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleLLMs Can Think While Idle: Researchers from Letta and UC Berkeley Introduce ‘Sleep-Time Compute’ to Slash Inference Costs and Boost Accuracy Without Sacrificing Latency
    Next Article CVE-2025-43973 – GoBGP Buffer Overflow Vulnerability

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 6, 2025
    Machine Learning

    Build a Text-to-SQL solution for data consistency in generative AI using Amazon Nova

    June 6, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Helping robots practice skills independently to adapt to unfamiliar environments

    Artificial Intelligence

    CVE-2025-32979 – NETSCOUT nGeniusONE Path Traversal Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Unable to double click the list of players from the table in the exact order?

    Development

    CVE-2025-48341 – 10Web Form Maker Stored Cross-site Scripting

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2023-53141 – Intel Ila Netlink Vulnerability

    May 2, 2025

    CVE ID : CVE-2023-53141

    Published : May 2, 2025, 4:15 p.m. | 34 minutes ago

    Description : In the Linux kernel, the following vulnerability has been resolved:

    ila: do not generate empty messages in ila_xlat_nl_cmd_get_mapping()

    ila_xlat_nl_cmd_get_mapping() generates an empty skb,
    triggerring a recent sanity check [1].

    Instead, return an error code, so that user space
    can get it.

    [1]
    skb_assert_len
    WARNING: CPU: 0 PID: 5923 at include/linux/skbuff.h:2527 skb_assert_len include/linux/skbuff.h:2527 [inline]
    WARNING: CPU: 0 PID: 5923 at include/linux/skbuff.h:2527 __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
    Modules linked in:
    CPU: 0 PID: 5923 Comm: syz-executor269 Not tainted 6.2.0-syzkaller-18300-g2ebd1fbb946d #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
    pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=–)
    pc : skb_assert_len include/linux/skbuff.h:2527 [inline]
    pc : __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
    lr : skb_assert_len include/linux/skbuff.h:2527 [inline]
    lr : __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
    sp : ffff80001e0d6c40
    x29: ffff80001e0d6e60 x28: dfff800000000000 x27: ffff0000c86328c0
    x26: dfff800000000000 x25: ffff0000c8632990 x24: ffff0000c8632a00
    x23: 0000000000000000 x22: 1fffe000190c6542 x21: ffff0000c8632a10
    x20: ffff0000c8632a00 x19: ffff80001856e000 x18: ffff80001e0d5fc0
    x17: 0000000000000000 x16: ffff80001235d16c x15: 0000000000000000
    x14: 0000000000000000 x13: 0000000000000001 x12: 0000000000000001
    x11: ff80800008353a30 x10: 0000000000000000 x9 : 21567eaf25bfb600
    x8 : 21567eaf25bfb600 x7 : 0000000000000001 x6 : 0000000000000001
    x5 : ffff80001e0d6558 x4 : ffff800015c74760 x3 : ffff800008596744
    x2 : 0000000000000001 x1 : 0000000100000000 x0 : 000000000000000e
    Call trace:
    skb_assert_len include/linux/skbuff.h:2527 [inline]
    __dev_queue_xmit+0x1bc0/0x3488 net/core/dev.c:4156
    dev_queue_xmit include/linux/netdevice.h:3033 [inline]
    __netlink_deliver_tap_skb net/netlink/af_netlink.c:307 [inline]
    __netlink_deliver_tap+0x45c/0x6f8 net/netlink/af_netlink.c:325
    netlink_deliver_tap+0xf4/0x174 net/netlink/af_netlink.c:338
    __netlink_sendskb net/netlink/af_netlink.c:1283 [inline]
    netlink_sendskb+0x6c/0x154 net/netlink/af_netlink.c:1292
    netlink_unicast+0x334/0x8d4 net/netlink/af_netlink.c:1380
    nlmsg_unicast include/net/netlink.h:1099 [inline]
    genlmsg_unicast include/net/genetlink.h:433 [inline]
    genlmsg_reply include/net/genetlink.h:443 [inline]
    ila_xlat_nl_cmd_get_mapping+0x620/0x7d0 net/ipv6/ila/ila_xlat.c:493
    genl_family_rcv_msg_doit net/netlink/genetlink.c:968 [inline]
    genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
    genl_rcv_msg+0x938/0xc1c net/netlink/genetlink.c:1065
    netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2574
    genl_rcv+0x38/0x50 net/netlink/genetlink.c:1076
    netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
    netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365
    netlink_sendmsg+0x800/0xae0 net/netlink/af_netlink.c:1942
    sock_sendmsg_nosec net/socket.c:714 [inline]
    sock_sendmsg net/socket.c:734 [inline]
    ____sys_sendmsg+0x558/0x844 net/socket.c:2479
    ___sys_sendmsg net/socket.c:2533 [inline]
    __sys_sendmsg+0x26c/0x33c net/socket.c:2562
    __do_sys_sendmsg net/socket.c:2571 [inline]
    __se_sys_sendmsg net/socket.c:2569 [inline]
    __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2569
    __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
    invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
    el0_svc_common+0x138/0x258 arch/arm64/kernel/syscall.c:142
    do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
    el0_svc+0x58/0x168 arch/arm64/kernel/entry-common.c:637
    el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
    el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
    irq event stamp: 136484
    hardirqs last enabled at (136483): [] __up_console_sem+0x60/0xb4 kernel/printk/printk.c:345
    hardirqs last disabled at (136484): [] el1_dbg+0x24/0x80 arch/arm64/kernel/entry-common.c:405
    softirqs last enabled at (136418): [] softirq_ha
    —truncated—

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-4794 – PHPGurukul Online Course Registration SQL Injection Vulnerability

    May 16, 2025

    CVE-2025-1479 – Legion Space Debug Interface Code Execution Vulnerability

    May 30, 2025

    ERROR_THREAD_WAS_SUSPENDED (Error Code 699) [Solved]

    February 18, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.