Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Sigma: Changing AI Perception with Multi-Modal Semantic Segmentation through a Siamese Mamba Network for Enhanced Environmental Understanding

    Sigma: Changing AI Perception with Multi-Modal Semantic Segmentation through a Siamese Mamba Network for Enhanced Environmental Understanding

    April 10, 2024

    In AI, searching for machines capable of comprehending their environment with near-human accuracy has led to significant advancements in semantic segmentation. This field, integral to AI’s perception capabilities, includes allocating a semantic label to each pixel in an image, facilitating a detailed understanding of the scene. However, conventional segmentation techniques often falter under less-than-ideal conditions, such as poor lighting or obstructions, making pursuing more robust methods a high priority.

    One emerging solution to this challenge is multi-modal semantic segmentation, which combines traditional visual data with additional information sources, such as thermal imaging and depth sensing. This approach offers a more nuanced view of the environment, allowing for improved performance where singular data modalities may fail. For instance, while RGB data provides detailed colour information, thermal imaging can detect entities based on heat signatures, and depth sensing offers a 3D scene perspective.

    Despite the promise of multi-modal segmentation, existing methodologies, primarily CNNs and ViTs, have notable limitations. CNNs, for example, are restricted by their local field of view, limiting their ability to grasp the broader context of an image. ViTs can capture global context at a prohibitive computational cost, making them less viable for real-time applications. These challenges highlight the need for an innovative approach to harness multi-modal data’s power efficiently.

    Researchers from the Robotics Institute at Carnegie Mellon University and the School of Future Technology at the Dalian University of Technology introduced Sigma to solve the above problems. Sigma leverages a Siamese Mamba network architecture, incorporating the Selective Structured State Space Model, Mamba, to balance global contextual understanding and computational efficiency. This model departs from traditional methods by offering global receptive field coverage with linear complexity, enabling faster and more accurate segmentation across diverse conditions.

    On the challenging RGB-Thermal and RGB-Depth segmentation tasks, Sigma consistently outperformed existing state-of-the-art models. For instance, in experiments conducted on the MFNet and PST900 datasets for RGB-T segmentation, Sigma demonstrated superior accuracy, with mean Intersection over Union (mIoU) scores exceeding those of comparable methods. Sigma’s innovative design allowed it to achieve these results with significantly fewer parameters and lower computational demands, highlighting its potential for real-time applications and devices with limited processing power.

    The Siamese encoder extracts features from different data modalities, which are then intelligently fused using a novel Mamba fusion mechanism. This process ensures that essential information from each modality is retained and effectively integrated. The subsequent decoding phase employs a channel-aware Mamba decoder, further refining the segmentation output by focusing on the most relevant features across the fused data. This layered approach enables Sigma to produce remarkably accurate segmentations, even when traditional methods struggle.

    In conclusion, Sigma advances semantic segmentation, introducing a powerful multi-modal approach that leverages the strengths of different data types to enhance AI’s environmental perception. By combining the depth and thermal modalities with RGB data, Sigma achieves unparalleled accuracy and efficiency, setting a new standard for semantic segmentation technologies. Its success underscores the potential of multi-modal data fusion and paves the way for future innovations.

    Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 40k+ ML SubReddit

    The post Sigma: Changing AI Perception with Multi-Modal Semantic Segmentation through a Siamese Mamba Network for Enhanced Environmental Understanding appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleAutoWebGLM: A GPT-4-Outperforming Automated Web Navigation Agent Built Upon ChatGLM3-6B
    Next Article Researchers Uncover First Native Spectre v2 Exploit Against Linux Kernel

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-47916 – Invision Community Themeeditor Remote Code Execution

    May 16, 2025
    Leave A Reply Cancel Reply

    Hostinger

    Continue Reading

    New eBook: Interaction Design Patterns for Enterprises

    Development

    Apple Mail can help you write emails now – here’s how

    Development

    Stopping malaria in its tracks

    Artificial Intelligence

    Framed – Files and Directories Reusability, Architecture, and Management

    Linux

    Highlights

    Development

    Usability and Experience (UX) in Universal Design Series: Challenges and Opportunities – 4

    June 14, 2024

    Overcoming Barriers: Challenges and Opportunities in Usability and UX for Universal Design In our fourth…

    Why UX/UI is a Game-Changer for Cybersecurity Platforms

    February 5, 2025

    The Oura Ring 3 is up to $100 off. Here’s how it compares to the Oura Ring 4

    March 18, 2025

    LG just gave Android users a big reason to watch its TVs when traveling

    February 1, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.