Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Google AI Researchers Propose a Noise-Aware Training Method (NAT) for Layout-Aware Language Models

    Google AI Researchers Propose a Noise-Aware Training Method (NAT) for Layout-Aware Language Models

    April 7, 2024

    In document processing, particularly visually rich documents (VRDs), the need for efficient information extraction (IE) has become increasingly critical. VRDs, such as invoices, utility bills, and insurance quotes, are ubiquitous in business workflows, often presenting similar information in varying layouts and formats. Automating the extraction of pertinent data from these documents can significantly reduce the manual effort required for parsing. However, achieving a generalizable solution for IE from VRDs poses significant challenges, as it necessitates understanding the document’s textual and visual properties, which cannot be easily retrieved from other sources.

    Numerous approaches have been proposed to tackle the task of IE from VRDs, ranging from segmentation algorithms to deep learning architectures that encode visual and textual context. However, many of these methods rely on supervised learning, requiring many human-labeled samples for training. 

    Labeling highly accurate VRDs is labor-intensive and costly, posing a bottleneck in enterprise scenarios where custom extractors must be trained for thousands of document types. Researchers have turned to pre-training strategies to address this challenge, leveraging unsupervised multimodal objectives to train extractor models on unlabeled instances before fine-tuning on human-labeled samples.

    Despite the promise of pre-training strategies, they often require significant time and computational resources, making them impractical in constrained training time. In response to this challenge, a team of researchers from Google AI proposed a semi-supervised continual training method to train robust extractors with limited human-labeled samples within a bounded time. The team Proposed a Noise-Aware Training method or NAT. Their method operates in three phases, leveraging labeled and unlabeled data to iteratively improve the performance of the extractor while respecting the time constraints imposed on training.

    The research question at the heart of their study is crucial for advancing the field of document processing, particularly in enterprise settings where scalability and efficiency are paramount concerns. The challenge is to develop techniques that allow for the effective extraction of information from VRDs with limited labeled data and bounded training time. Their proposed method aims to address this challenge, with the ultimate goal of democratizing access to advanced document processing capabilities while minimizing the manual effort and resources required for training custom extractors.

    In conclusion, the proposed semi-supervised continual training method not only addresses the challenges inherent in training robust document extractors within strict time constraints but also offers a host of benefits. By leveraging both labeled and unlabeled data systematically, their approach holds the potential to significantly improve the efficiency and scalability of document processing workflows in enterprise environments, ultimately enhancing productivity and reducing operational costs. Their research paves the way for democratizing access to advanced document processing capabilities, marking a significant step forward in the field.

    Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

    If you like our work, you will love our newsletter..

    Don’t Forget to join our 39k+ ML SubReddit

    The post Google AI Researchers Propose a Noise-Aware Training Method (NAT) for Layout-Aware Language Models appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleEffector: A Python-based Machine Learning Library Dedicated to Regional Feature Effects
    Next Article Meet IPEX-LLM: A PyTorch Library for Running LLMs on Intel CPU and GPU

    Related Posts

    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 17, 2025
    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4837 – Projectworlds Student Project Allocation System SQL Injection Vulnerability

    May 17, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    A cross-platform Markdown note-taking application

    Development

    CVE-2025-3995 – TOTOLINK N150RT Cross-Site Scripting Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry

    Development

    The Future of AI in UI/UX Design: A New Era of Creativity and Efficiency

    Development
    Hostinger

    Highlights

    CSS Snippets & Audible’s UX Insights

    January 31, 2025

    This week’s Unicorn Club is full of fantastic resources to dive into. From creating seamless…

    How to share data between steps in Cucumber feature file?

    May 26, 2024

    Security Flaws in Popular ML Toolkits Enable Server Hijacks, Privilege Escalation

    November 11, 2024

    CVE-2025-4838 – Kanwangzjm Funiture Open Redirect Vulnerability

    May 17, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.