Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 15, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 15, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 15, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 15, 2025

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025

      NVIDIA’s drivers are causing big problems for DOOM: The Dark Ages, but some fixes are available

      May 15, 2025

      Capcom breaks all-time profit records with 10% income growth after Monster Hunter Wilds sold over 10 million copies in a month

      May 15, 2025

      Microsoft plans to lay off 3% of its workforce, reportedly targeting management cuts as it changes to fit a “dynamic marketplace”

      May 15, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      A cross-platform Markdown note-taking application

      May 15, 2025
      Recent

      A cross-platform Markdown note-taking application

      May 15, 2025

      AI Assistant Demo & Tips for Enterprise Projects

      May 15, 2025

      Celebrating Global Accessibility Awareness Day (GAAD)

      May 15, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025
      Recent

      Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

      May 15, 2025

      NVIDIA’s drivers are causing big problems for DOOM: The Dark Ages, but some fixes are available

      May 15, 2025

      Capcom breaks all-time profit records with 10% income growth after Monster Hunter Wilds sold over 10 million copies in a month

      May 15, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Google DeepMind Researchers Propose a Dynamic Visual Memory for Flexible Image Classification

    Google DeepMind Researchers Propose a Dynamic Visual Memory for Flexible Image Classification

    August 19, 2024

    Deep learning models typically represent knowledge statically, making adapting to evolving data needs and concepts challenging. This rigidity necessitates frequent retraining or fine-tuning to incorporate new information, which could be more practical. The research paper “Towards Flexible Perception with Visual Memory” by Geirhos et al. presents an innovative solution that integrates the symbolic strength of deep neural networks with the adaptability of a visual memory database. By decomposing image classification into image similarity and fast nearest neighbor retrieval, the authors introduce a flexible visual memory capable of adding and removing data seamlessly. 

    Current methods for image classification often rely on static models that require retraining to incorporate new classes or datasets. Traditional aggregation techniques, such as plurality and softmax voting, can lead to overconfidence in predictions, particularly when considering distant neighbors. The authors propose a retrieval-based visual memory system that builds a database of feature-label pairs extracted from a pre-trained image encoder, such as DinoV2 or CLIP. This system allows for rapid classification by retrieving the k nearest neighbors based on cosine similarity, enabling the model to adapt to new data without retraining.

    The methodology consists of two main steps: constructing the visual memory and performing nearest neighbor-based inference. Visual memory is created by extracting and storing features from a dataset in a database. When a query image is presented, its features are compared to those in the visual memory to retrieve the nearest neighbors. The authors introduce a novel aggregation method called RankVoting, which assigns weights to neighbors based on rank, outperforming traditional methods and enhancing classification accuracy.

    The proposed visual memory system demonstrates impressive performance metrics. The RankVoting method effectively addresses the limitations of existing aggregation techniques, which often suffer from performance decay as the number of neighbors increases. In contrast, RankVoting improves accuracy with more neighbors, stabilizing performance at higher counts. The authors report achieving an outstanding 88.5% top-1 ImageNet validation accuracy by incorporating Gemini’s vision-language model to re-rank the retrieved neighbors. This surpasses the baseline performance of both the DinoV2 ViT-L14 kNN (83.5%) and linear probing (86.3%).

    The flexibility of the visual memory allows it to scale to billion-scale datasets without additional training, and it can also remove outdated data through unlearning and memory pruning. This adaptability is crucial for applications requiring continuous learning and updating in dynamic environments. The results indicate that the proposed visual memory not only enhances classification accuracy but also offers a robust framework for integrating new information and maintaining model relevance over time, providing a reliable solution for dynamic learning environments.

     The research highlights the immense potential of a flexible visual memory system as a solution to the challenges posed by static deep learning models. By enabling the addition and removal of data without retraining, the proposed method addresses the need for adaptability in machine learning. The RankVoting technique and the integration of vision-language models demonstrate significant performance improvements, paving the way for the widespread adoption of visual memory systems in deep learning applications and inspiring optimism for their future applications.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

    Don’t Forget to join our 48k+ ML SubReddit

    Find Upcoming AI Webinars here

    Researchers at FPT Software AI Center Introduce XMainframe: A State-of-the-Art Large Language Model (LLM) Specialized for Mainframe Modernization to Address the $100B Legacy Code Modernization

    The post Google DeepMind Researchers Propose a Dynamic Visual Memory for Flexible Image Classification appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleArabLegalEval: A Multitask AI Benchmark Dataset for Assessing the Arabic Legal Knowledge of LLMs
    Next Article Understanding the 27 Unique Challenges in Large Language Model Development: An Empirical Study of Over 29,000 Developer Forum Posts and 54% Unresolved Issues

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4732 – TOTOLINK A3002R/A3002RU HTTP POST Request Handler Buffer Overflow

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    4 ways you can take advantage of Google’s expanded shopping tools this summer

    Development

    AI’s Greatest Threat? Elon Musk Sounds the Alarm on the ‘Woke Mind Virus’ – Part 2 of the Research Article

    Artificial Intelligence

    PilotANN: A Hybrid CPU-GPU System For Graph-based ANNS

    Machine Learning

    Newpark Resources Hit by Ransomware Attack, Disrupting Key Systems

    Development

    Highlights

    CVE-2025-43002 – SAP S4CORE OData Information Disclosure

    May 13, 2025

    CVE ID : CVE-2025-43002

    Published : May 13, 2025, 1:15 a.m. | 1 hour, 49 minutes ago

    Description : SAP S4CORE OData meta-data property allows an authenticated attacker to access restricted information due to missing authorization check. This could cause a low impact on confidentiality but integrity and availability of the application are not impacted.

    Severity: 4.3 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    AWS Secrets Manager – A Secure Solution for Protecting Your Data

    February 5, 2025

    Bundle Up And Save On Smashing Books And Workshops

    November 12, 2024

    Cybersecurity’s Biggest Event: The World CyberCon India Edition is Back!

    June 10, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.