Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 24, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 24, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 24, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 24, 2025

      Looking for an AI-powered website builder? Here’s your best option in 2025

      May 24, 2025

      SteamOS is officially not just for Steam Deck anymore — now ready for Lenovo Legion Go S and sort of ready for the ROG Ally

      May 23, 2025

      Microsoft’s latest AI model can accurately forecast the weather: “It doesn’t know the laws of physics, so it could make up something completely crazy”

      May 23, 2025

      OpenAI scientists wanted “a doomsday bunker” before AGI surpasses human intelligence and threatens humanity

      May 23, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      A timeline of JavaScript’s history

      May 23, 2025
      Recent

      A timeline of JavaScript’s history

      May 23, 2025

      Loading JSON Data into Snowflake From Local Directory

      May 23, 2025

      Streamline Conditional Logic with Laravel’s Fluent Conditionable Trait

      May 23, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Open-Typer is a typing tutor application

      May 24, 2025
      Recent

      Open-Typer is a typing tutor application

      May 24, 2025

      RefreshOS is a distribution built on the robust foundation of Debian

      May 24, 2025

      Cosmicding is a client to manage your linkding bookmarks

      May 24, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Comprehensive Analysis of The Performance of Vision State Space Models (VSSMs), Vision Transformers, and Convolutional Neural Networks (CNNs)

    Comprehensive Analysis of The Performance of Vision State Space Models (VSSMs), Vision Transformers, and Convolutional Neural Networks (CNNs)

    July 1, 2024

    Deep learning models like Convolutional Neural Networks (CNNs) and Vision Transformers achieved great success in many visual tasks, such as image classification, object detection, and semantic segmentation. However, their ability to handle different changes in data is still a big concern, especially for use in security-critical applications. Many works evaluated the robustness of CNNs and Transformers against common corruptions, domain shifts, information drops, and adversarial attacks. It shows that a model’s design affects its ability to manage these issues, and robustness varies across different architectures. A major drawback of transformers is their quadratic computational scaling with input size, making them costly for complex tasks.

    This paper discussed two related topics: the Robustness of Deep Learning Models (RDLM) and State Space Models (SSMs). RDLM focuses on how well a traditionally trained model can maintain good performance if faced with natural and adversarial changes in data distribution. Deep learning models often face data corruption, like noise, blur, compression artifacts, and intentional disruptions designed to trick the model in real-world situations. These issues can significantly harm their performance, so, to ensure these models are reliable and robust, it is important to evaluate their performance under these tough conditions. On the other hand, SSMs are a promising approach for modeling sequential data in deep learning. These models transform a one-dimensional sequence using an implicit latent state.

    Researchers from MBZUAI UAE, Linkoping University, and ANU Australia have introduced a comprehensive analysis of the performance of VSSMs, Vision Transformers, and CNNs. This analysis can manage various challenges for classification, detection, and segmentation tasks, and provides valuable insights into their robustness and suitability for real-world applications. The evaluations performed by researchers are divided into three parts, each focusing on an important area of model robustness. The first part is Occlusions and Information Loss, where the robustness of VSSMs is evaluated against information loss along scanning directions and occlusions. The other two parts are Common Corruptions and Adversarial Attacks.

    The robustness of classification models based on VSSM is tested against Common Corruptions that reflect real-world situations. These include global corruptions like noise, blur, weather, and digital distortions at different intensity levels, and detailed corruptions such as object attribute editing and background changes. The evaluation is then extended to VSSM-based detection and segmentation models to show their strength in dense prediction tasks. Moreover, the robustness of VSSMs is analyzed against the third and last section, Adversarial Attacks in both white-box and black-box settings. This analysis gives insights into the ability of VSSMs to resist adversarial changes at various frequency levels.

    Based on the evaluation of all the three sections, here are the key findings:

    In the first part, it is found that ConvNext and VSSM models handle sequential information loss along the scanning direction, better than ViT and Swin models. In situations that involve patch drops, VSSMs show the highest robustness, although Swin models perform better under extreme information loss. 

    VSSM models experience the smallest average performance drop compared to Swin and ConvNext models in global corruption. For fine-grained corruptions, VSSM models outperform all transformer-based variants and either match.

    For adversarial attacks, smaller VSSM models show great robustness against white-box attacks compared to their Swin Transformer counterparts. VSSM models keep above 90% robustness for strong low-frequency perturbations, but their performance drops quickly with high-frequency attacks.

    Hostinger

    In conclusion, researchers thoroughly evaluated the robustness of Vision State-Space Models (VSSMs) under various natural and adversarial disturbances, showing their strengths and weaknesses compared to transformers and CNNs. The experiments revealed the capabilities and limitations of VSSMs in handling occlusions, common corruptions, and adversarial attacks, as well as their ability to adapt to changes in object-background composition in complex visual scenes. This study will guide future research to enhance the reliability and effectiveness of visual perception systems in real-world situations.

    Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. 

    Join our Telegram Channel and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 45k+ ML SubReddit

    The post Comprehensive Analysis of The Performance of Vision State Space Models (VSSMs), Vision Transformers, and Convolutional Neural Networks (CNNs) appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHow Valuable is Interpretability and Analysis Work for NLP Research? This Paper Investigate the Impact of Interpretability and Analysis Research on NLP
    Next Article Spring Boot Logging with ELK: A Configuration Guide

    Related Posts

    Artificial Intelligence

    Markus Buehler receives 2025 Washington Award

    May 24, 2025
    Artificial Intelligence

    LWiAI Podcast #201 – GPT 4.5, Sonnet 3.7, Grok 3, Phi 4

    May 24, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Unable to locate pseudo element using javascript executor in Selenium JAVA

    Development

    Introducing AWS MCP Servers for code assistants (Part 1)

    Machine Learning

    plakativ stretches PDF or raster image across multiple pages

    Linux

    Sophisticated IIS Malware Targets South Korean Web Servers

    Security
    GetResponse

    Highlights

    Linux

    Ubuntu Adds Official Support for NVIDIA Jetson AI Modules

    March 22, 2025

    Canonical has announced that it now ‘officially supports’ Ubuntu on NVIDIA Jetson, a series of…

    Debunking the AI Hype: Inside Real Hacker Tactics

    February 18, 2025

    InfHow: Learn how to do anything

    May 4, 2025

    How to Write Like Shakespeare: A Comprehensive Guide

    June 3, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.