Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»This AI Paper from China Proposes a Novel Architecture Named-ViTAR (Vision Transformer with Any Resolution)

    This AI Paper from China Proposes a Novel Architecture Named-ViTAR (Vision Transformer with Any Resolution)

    April 5, 2024

    The remarkable strides made by the Transformer architecture in Natural Language Processing (NLP) have ignited a surge of interest within the Computer Vision (CV) community. The Transformer’s adaptation in vision tasks, termed Vision Transformers (ViTs), delineates images into non-overlapping patches, converts each patch into tokens, and subsequently applies Multi-Head Self-Attention (MHSA) to capture inter-token dependencies.

    Leveraging the robust modeling prowess inherent in Transformers, ViTs have demonstrated commendable performance across a spectrum of visual tasks encompassing image classification, object detection, vision-language modeling, and even video recognition. However, despite their successes, ViTs confront limitations in real-world scenarios, necessitating the handling of variable input resolutions. At the same time, several studies incur significant performance degradation.

    To address this challenge, recent efforts such as ResFormer (Tian et al., 2023) have emerged. These efforts incorporate multiple-resolution images during training and refine positional encodings into more flexible, convolution-based forms. Nevertheless, these advancements still need to improve to maintain high performance across various resolution variations and integrate seamlessly into prevalent self-supervised frameworks.

    In response to these challenges, a research team from China proposes a truly innovative solution, Vision Transformer with Any Resolution (ViTAR). This novel architecture is designed to process high-resolution images with minimal computational burden while exhibiting robust resolution generalization capabilities. Key to ViTAR’s efficacy is the introduction of the Adaptive Token Merger (ATM) module, which iteratively processes tokens post-patch embedding, efficiently merging tokens into a fixed grid shape, thus enhancing resolution adaptability while mitigating computational complexity. 

    Furthermore, to enable generalization to arbitrary resolutions, the researchers introduce Fuzzy conditional encoding (FPE), which introduces positional perturbation. This transforms precise positional perception into a fuzzy one with random noise, thereby preventing overfitting and enhancing adaptability.

    Their study’s contributions encompass the proposal of an effective multi-resolution adaptation module (ATM), which significantly enhances resolution generalization and reduces computational load under high-resolution inputs. Additionally, introducing Fuzzy Positional Encoding (FPE) facilitates robust position perception during training, improving adaptability to varying resolutions. 

    Their extensive experiments unequivocally validate the efficacy of the proposed approach. The base model not only demonstrates robust performance across a range of input resolutions but also showcases superior performance compared to existing ViT models. Moreover, ViTAR exhibits commendable performance in downstream tasks such as instance segmentation and semantic segmentation, underscoring its versatility and utility across diverse visual tasks.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 39k+ ML SubReddit

    The post This AI Paper from China Proposes a Novel Architecture Named-ViTAR (Vision Transformer with Any Resolution) appeared first on MarkTechPost.

    Source: Read More 

    Hostinger
    Facebook Twitter Reddit Email Copy Link
    Previous ArticleCohere AI Releases C4AI Command R+: An Open Weights Research Release of a 104B Parameter Model with Highly Advanced Capabilities Including Tools like RAG
    Next Article CISO Perspectives on Complying with Cybersecurity Regulations

    Related Posts

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4610 – WordPress WP-Members Membership Plugin Stored Cross-Site Scripting Vulnerability

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4824 – TOTOLINK A702R, A3002R, A3002RU HTTP POST Request Handler Buffer Overflow Vulnerability

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Black Basta Ransomware Strikes 500+ Entities Across North America, Europe, and Australia

    Development

    MediaTek to rival Qualcomm, designs ARM-based chip for Microsoft’s Copilot+ PCs

    Development

    Hey Final Fantasy fans: You can grab Rebirth on Steam (Windows PC) right now at a 49% discount

    News & Updates

    Processing 2-Hour Videos Seamlessly: This AI Paper Unveils LONGVILA, Advancing Long-Context Visual Language Models for Long Videos

    Development
    Hostinger

    Highlights

    CVE-2025-31241 – Apple iOS/WatchOS tvOS/PadOS Double Free Vulnerability

    May 12, 2025

    CVE ID : CVE-2025-31241

    Published : May 12, 2025, 10:15 p.m. | 1 hour, 28 minutes ago

    Description : A double free issue was addressed with improved memory management. This issue is fixed in watchOS 11.5, macOS Sonoma 14.7.6, tvOS 18.5, iPadOS 17.7.7, iOS 18.5 and iPadOS 18.5, macOS Sequoia 15.5, visionOS 2.5, macOS Ventura 13.7.6. A remote attacker may cause an unexpected app termination.

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Meta Llama 3 models are now available in Amazon SageMaker JumpStart

    April 18, 2024

    Spark your digital transformation with AI and VSM

    March 26, 2025

    Anthropic introduces prompt caching to reduce latency and costs

    August 14, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.