Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sentry launches MCP monitoring tool

      August 14, 2025

      10 Benefits of Hiring a React.js Development Company (2025–2026 Edition)

      August 13, 2025

      From Line To Layout: How Past Experiences Shape Your Design Career

      August 13, 2025

      Hire React.js Developers in the US: How to Choose the Right Team for Your Needs

      August 13, 2025

      I’ve tested every Samsung Galaxy phone in 2025 – here’s the model I’d recommend on sale

      August 14, 2025

      Google Photos just put all its best editing tools a tap away – here’s the shortcut

      August 14, 2025

      Claude can teach you how to code now, and more – how to try it

      August 14, 2025

      One of the best work laptops I’ve tested has MacBook written all over it (but it’s even better)

      August 14, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Controlling Execution Flow with Laravel’s Sleep Helper

      August 14, 2025
      Recent

      Controlling Execution Flow with Laravel’s Sleep Helper

      August 14, 2025

      Generate Secure Temporary Share Links for Files in Laravel

      August 14, 2025

      This Week in Laravel: Filament 4, Laravel Boost, and Junie Review

      August 14, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      KDE Plasma 6 on Wayland: the Payoff for Years of Plumbing

      August 14, 2025
      Recent

      KDE Plasma 6 on Wayland: the Payoff for Years of Plumbing

      August 14, 2025

      FOSS Weekly #25.33: Debian 13 Released, Torvalds vs RISC-V, Arch’s New Tool, GNOME Perfection and More Linux Stuff

      August 14, 2025

      Ultimate ChatGPT-5 Prompt Guide: 52 Ideas for Any Task

      August 14, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Scalable intelligent document processing using Amazon Bedrock Data Automation

    Scalable intelligent document processing using Amazon Bedrock Data Automation

    August 14, 2025

    Intelligent document processing (IDP) is a technology to automate the extraction, analysis, and interpretation of critical information from a wide range of documents. By using advanced machine learning (ML) and natural language processing algorithms, IDP solutions can efficiently extract and process structured data from unstructured text, streamlining document-centric workflows.

    When enhanced with generative AI capabilities, IDP enables organizations to transform document workflows through advanced understanding, structured data extraction, and automated classification. Generative AI-powered IDP solutions can better handle the variety of documents that traditional ML models might not have seen before. This technology combination is impactful across multiple industries, including child support services, insurance, healthcare, financial services, and the public sector. Traditional manual processing creates bottlenecks and increases error risk, but by implementing these advanced solutions, organizations can dramatically enhance their document workflow efficiency and information retrieval capabilities. AI-enhanced IDP solutions improve service delivery while reducing administrative burden across diverse document processing scenarios.

    This approach to document processing provides scalable, efficient, and high-value document processing that leads to improved productivity, reduced costs, and enhanced decision-making. Enterprises that embrace the power of IDP augmented with generative AI can benefit from increased efficiency, enhanced customer experiences, and accelerated growth.

    In the blog post Scalable intelligent document processing using Amazon Bedrock, we demonstrated how to build a scalable IDP pipeline using Anthropic foundation models on Amazon Bedrock. Although that approach delivered robust performance, the introduction of Amazon Bedrock Data Automation brings a new level of efficiency and flexibility to IDP solutions. This post explores how Amazon Bedrock Data Automation enhances document processing capabilities and streamlines the automation journey.

    Benefits of Amazon Bedrock Data Automation

    Amazon Bedrock Data Automation introduces several features that significantly improve the scalability and accuracy of IDP solutions:

    • Confidence scores and bounding box data – Amazon Bedrock Data Automation provides confidence scores and bounding box data, enhancing data explainability and transparency. With these features, you can assess the reliability of extracted information, resulting in more informed decision-making. For instance, low confidence scores can signal the need for additional human review or verification of specific data fields.
    • Blueprints for rapid development – Amazon Bedrock Data Automation provides pre-built blueprints that simplify the creation of document processing pipelines, helping you develop and deploy solutions quickly. Amazon Bedrock Data Automation provides flexible output configurations to meet diverse document processing requirements. For simple extraction use cases (OCR and layout) or for a linearized output of the text in documents, you can use standard output. For customized output, you can start from scratch to design a unique extraction schema, or use preconfigured blueprints from our catalog as a starting point. You can customize your blueprint based on your specific document types and business requirements for more targeted and accurate information retrieval.
    • Automatic classification support – Amazon Bedrock Data Automation splits and matches documents to appropriate blueprints, resulting in precise document categorization. This intelligent routing alleviates the need for manual document sorting, drastically reducing human intervention and accelerating processing time.
    • Normalization – Amazon Bedrock Data Automation addresses a common IDP challenge through its comprehensive normalization framework, which handles both key normalization (mapping various field labels to standardized names) and value normalization (converting extracted data into consistent formats, units, and data types). This normalization approach helps reduce data processing complexities, so organizations can automatically transform raw document extractions into standardized data that integrates more smoothly with their existing systems and workflows.
    • Transformation – The Amazon Bedrock Data Automation transformation feature converts complex document fields into structured, business-ready data by automatically splitting combined information (such as addresses or names) into discrete, meaningful components. This capability simplifies how organizations handle varied document formats, helping teams define custom data types and field relationships that match their existing database schemas and business applications.
    • Validation – Amazon Bedrock Data Automation enhances document processing accuracy by using automated validation rules for extracted data, supporting numeric ranges, date formats, string patterns, and cross-field checks. This validation framework helps organizations automatically identify data quality issues, trigger human reviews when needed, and make sure extracted information meets specific business rules and compliance requirements before entering downstream systems.

    Solution overview

    The following diagram shows a fully serverless architecture that uses Amazon Bedrock Data Automation along with AWS Step Functions and Amazon Augmented AI (Amazon A2I) to provide cost-effective scaling for document processing workloads of different sizes.

    AWS Architetcure Diagram Showing Document Processing using Amazon Bedrock Data Auatomation and Human in the Loop

    The Step Functions workflow processes multiple document types including multipage PDFs and images using Amazon Bedrock Data Automation. It uses various Amazon Bedrock Data Automation blueprints (both standard and custom) within a single project to enable processing of diverse document types such as immunization documents, conveyance tax certificates, child support services enrollment forms, and driver licenses.

    The workflow processes a file (PDF, JPG, PNG, TIFF, DOC, DOCX) containing a single document or multiple documents through the following steps:

    1. For multi-page documents, splits along logical document boundaries
    2. Matches each document to the appropriate blueprint
    3. Applies the blueprint’s specific extraction instructions to retrieve information from each document
    4. Perform normalization, Transformation and validation on extracted data according to the instruction specified in blueprint

    The Step Functions Map state is used to process each document. If a document meets the confidence threshold, the output is sent to an Amazon Simple Storage Service (Amazon S3) bucket. If any extracted data falls below the confidence threshold, the document is sent to Amazon A2I for human review. Reviewers use the Amazon A2I UI with bounding box highlighting for selected fields to verify the extraction results. When the human review is complete, the callback task token is used to resume the state machine and human-reviewed output is sent to an S3 bucket.

    To deploy this solution in an AWS account, follow the steps provided in the accompanying GitHub repository.

    In the following sections, we review the specific Amazon Bedrock Data Automation features deployed using this solution, using the example of a child support enrollment form.

    Automated Classification

    In our implementation, we define the document class name for each custom blueprint created, as illustrated in the following screenshot. When processing multiple document types, such as driver’s licenses and child support enrollment forms, the system automatically applies the appropriate blueprint based on content analysis, making sure the correct extraction logic is used for each document type.

    Bedrock Data Automation interface showing Child Support Form classification detail

    Data Normalization

    We use data normalization to make sure downstream systems receive uniformly formatted data. We use both explicit extractions (for clearly stated information visible in the document) and implicit extractions (for information that needs transformation). For example, as shown in the following screenshot, dates of birth are standardized to YYYY-MM-DD format.

    Bedrock Data Automation interface displaying extracted and normalized Date of Birth data

    Similarly, format of Social Security Numbers is changed to XXX-XX-XXXX.

    Data Transformation

    For the child support enrollment application, we’ve implemented custom data transformations to align extracted data with specific requirements. One example is our custom data type for addresses, which breaks down single-line addresses into structured fields (Street, City, State, ZipCode). These structured fields are reused across different address fields in the enrollment form (employer address, home address, other parent address), resulting in consistent formatting and straightforward integration with existing systems.

    Amazon Bedrock Data Automation interface displaying custom address type configuration with explicit field mappings

    Data Validation

    Our implementation includes validation rules for maintaining data accuracy and compliance. For our example use case, we’ve implemented two validations: 1. verify the presence of the enrollee’s signature and 2. verify that the signed date isn’t in the future.

    Bedrock extraction interface showing signature and date validation configurations

    The following screenshot shows the result of the above validation rules applied to the document.

    Amazon Bedrock-powered document automation showing form field validation, signature verification, and confidence scoring

    Human-in-the-loop validation

    The following screenshot illustrates the extraction process, which includes a confidence score and is integrated with a human-in-the-loop process. It also shows normalization applied to the date of birth format.

    bda Human in the loop

    Conclusion

    Amazon Bedrock Data Automation significantly advances IDP by introducing confidence scoring, bounding box data, automatic classification, and rapid development through blueprints. In this post, we demonstrated how to take advantage of its advanced capabilities for data normalization, transformation, and validation. By upgrading to Amazon Bedrock Data Automation, organizations can significantly reduce development time, improve data quality, and create more robust, scalable IDP solutions that integrate with human review processes.

    Follow the AWS Machine Learning Blog to keep up to date with new capabilities and use cases for Amazon Bedrock.


    About the authors

    Abdul NavazAbdul Navaz is a Senior Solutions Architect in the Amazon Web Services (AWS) Health and Human Services team, based in Dallas, Texas. With over 10 years of experience at AWS, he focuses on modernization solutions for child support and child welfare agencies using AWS services. Prior to his role as a Solutions Architect, Navaz worked as a Senior Cloud Support Engineer, specializing in networking solutions.

    Venkata Kampana is a senior solutions architect in the Amazon Web Services (AWS) Health and Human Services team and is based in Sacramento, Calif. In this role, he helps public sector customers achieve their mission objectives with well-architected solutions on AWS.

    Sanjeev PulapakaSanjeev Pulapaka is principal solutions architect and AI lead for public sector. Sanjeev is a published author with several blogs and a book on generative AI. He is also a well-known speaker at several events including re:Invent and Summit. Sanjeev has an undergraduate degree in engineering from the Indian Institute of Technology and an MBA from the University of Notre Dame.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticlePitch Accent Detection Improves Pretrained Automatic Speech Recognition
    Next Article Whiteboard to cloud in minutes using Amazon Q, Amazon Bedrock Data Automation, and Model Context Protocol

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    August 14, 2025
    Machine Learning

    Citations with Amazon Nova understanding models

    August 14, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Kong AI Gateway updated with features to reduce LLM hallucination and protect sensitive personal data

    Tech & Work

    This Xbox Cloud Gaming feature is finally making the jump from PC to consoles

    News & Updates

    CVE-2025-45765 – Apache Ruby-JWT Weak Encryption Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Streamline Conditional Logic with Laravel’s Fluent Conditionable Trait

    Development

    Highlights

    CVE-2025-46733 – OP-TEE Secure Storage API Panic Vulnerability

    July 4, 2025

    CVE ID : CVE-2025-46733

    Published : July 4, 2025, 2:15 p.m. | 4 hours, 57 minutes ago

    Description : OP-TEE is a Trusted Execution Environment (TEE) designed as companion to a non-secure Linux kernel running on Arm; Cortex-A cores using the TrustZone technology. In version 4.5.0, using a specially crafted tee-supplicant binary running in REE userspace, an attacker can trigger a panic in a TA that uses the libutee Secure Storage API. Many functions in libutee, specifically those which make up the Secure Storage API, will panic if a system call returns an unexpected return code. This behavior is mandated by the TEE Internal Core API specification. However, in OP-TEE’s implementation, return codes of secure storage operations are passed through unsanitized from the REE tee-supplicant, through the Linux kernel tee-driver, through the OP-TEE kernel, back to libutee. Thus, an attacker with access to REE userspace, and the ability to stop tee-supplicant and replace it with their own process (generally trivial for a root user, and depending on the way permissions are set up, potentially available even to less privileged users) can run a malicious tee-supplicant process that responds to storage requests with unexpected response codes, triggering a panic in the requesting TA. This is particularly dangerous for TAs built with `TA_FLAG_SINGLE_INSTANCE` (corresponding to `gpd.ta.singleInstance` and `TA_FLAG_INSTANCE_KEEP_ALIVE` (corresponding to `gpd.ta.keepAlive`). The behavior of these TAs may depend on memory that is preserved between sessions, and the ability of an attacker to panic the TA and reload it with a clean memory space can compromise the behavior of those TAs. A critical example of this is the optee_ftpm TA. It uses the kept alive memory to hold PCR values, which crucially must be non-resettable. An attacker who can trigger a panic in the fTPM TA can reset the PCRs, and then extend them PCRs with whatever they choose, falsifying boot measurements, accessing sealed data, and potentially more. The impact of this issue depends significantly on the behavior of affected TAs. For some, it could manifest as a denial of service, while for others, like the fTPM TA, it can result in the disclosure of sensitive data. Anyone running the fTPM TA is affected, but similar attacks may be possible on other TAs that leverage the Secure Storage API. A fix is available in commit 941a58d78c99c4754fbd4ec3079ec9e1d596af8f.

    Severity: 7.9 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    How WordPress Agencies Can Improve Site Building Efficiency

    May 6, 2025

    CVE-2025-20308 – Cisco Spaces Connector Privilege Escalation Vulnerability

    July 2, 2025

    Acer just announced a smart ring, and it’s half the cost of the Oura Ring 4

    May 21, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.