Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Elastic simplifies log analytics for SREs and developers with launch of Log Essentials

      August 7, 2025

      OpenAI launches GPT-5

      August 7, 2025

      Melissa brings its data quality solutions to Azure with new SSIS integration

      August 7, 2025

      Automating Design Systems: Tips And Resources For Getting Started

      August 6, 2025

      This $180 mini projector has no business being this good for the price

      August 7, 2025

      GPT-5 is finally here, and you can access it for free today – no subscription needed

      August 7, 2025

      Changing this Android setting instantly doubled my phone speed (Samsung and Google models included)

      August 7, 2025

      ChatGPT can now talk nerdy to you – plus more personalities and other upgrades beyond GPT-5

      August 7, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Advanced Application Architecture through Laravel’s Service Container Management

      August 7, 2025
      Recent

      Advanced Application Architecture through Laravel’s Service Container Management

      August 7, 2025

      Switch Between Personas in Laravel With the MultiPersona Package

      August 7, 2025

      AI-Driven Smart Tagging and Metadata in AEM Assets

      August 7, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Bill Gates on AI’s Impact: ‘Be Curious, Read, and Use the Latest Tools’

      August 7, 2025
      Recent

      Bill Gates on AI’s Impact: ‘Be Curious, Read, and Use the Latest Tools’

      August 7, 2025

      Halo Infinite’s Fall Update: New Features and Modes to Revive the Game?

      August 7, 2025

      Forza Motorsport’s Future in Jeopardy: Fans Demand Clarity from Microsoft

      August 7, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Google AI Releases DeepPolisher: A New Deep Learning Tool that Improves the Accuracy of Genome Assemblies by Precisely Correcting Base-Level Errors

    Google AI Releases DeepPolisher: A New Deep Learning Tool that Improves the Accuracy of Genome Assemblies by Precisely Correcting Base-Level Errors

    August 7, 2025

    Google AI, in collaboration with the UC Santa Cruz Genomics Institute, has introduced DeepPolisher, a cutting-edge deep learning tool designed to substantially improve the accuracy of genome assemblies by correcting base-level errors. Its notable efficacy was recently demonstrated in advancing the Human Pangenome Reference, a major milestone in genomics research.

    The Challenge of Accurate Genome Assembly

    A reference genome is an essential foundation for understanding genetic diversity, heredity, disease mechanisms, and evolutionary biology. Modern sequencing technologies, including those developed by Illumina and Pacific Biosciences, have dramatically improved sequencing accuracy and throughput—but even with technological breakthroughs, assembling an error-free human genome (comprising over 3 billion nucleotides) remains immensely challenging. Even a minuscule per-base error rate can result in thousands of errors which can obscure key genetic variations or mislead downstream analyses.

    What Is DeepPolisher?

    DeepPolisher is an open-source, transformer-based sequencing correction tool. Building on advances from DeepConsensus, it takes advantage of transformer deep learning architectures to further reduce errors in genome assembly, particularly insertion and deletion (indel) errors, which have a profound impact by shifting reading frames and can cause important genes or regulatory elements to be missed during annotation.

    • Technology: Encoder-only transformer, adapting proven techniques in natural language processing for genomics.
    • Training data: Leveraged a human cell line extensively characterized by NIST and NHGRI, sequenced with various platforms to ensure near-complete accuracy (~99.99999% correctness, between 300–1,000 errors in 6 billion bases).

    How Does It Work? (Technical Overview)

    1. Input Alignment: Takes aligned PacBio HiFi reads against a haplotype-resolved genome assembly as input.
    2. Error Site Detection: Scans the assembly in 25kb windows; identifies candidate error sites where read evidence deviates from the assembly.
    3. Data Encoding: For each window containing putative errors (<100bp), it creates a multi-channel tensor representation of read alignment features such as base, base quality, mapping quality, and match/mismatch status.
    4. Model Inference: Feeds these tensors into the transformer, which predicts corrected sequences for these regions.
    5. Output Correction: Outputs differences in VCF format, which are then applied to the assembly to produce a polished, highly accurate sequence using tools like bcftools.

    Performance and Impact

    DeepPolisher delivers substantial improvements:

    • Total error reduction: ~50%
    • Indel error reduction: >70%
    • Error rates: Achieves an error rate as low as one base error per 500,000 assembled bases in real-world deployment with the Human Pangenome Reference Consortium (HPRC).
    • Genomic Q-score improvement: Raises assembly quality from Q66.7 to Q70.1 on average (Q-score is a logarithmic measure of per-base error rate; higher is better. Q70.1 implies <1 error per 12 million nucleotides)
    • Every sample tested by HPRC showed improvement.

    These advances directly impact the reliability and accuracy of derived references, such as in the Human Pangenome Reference, which saw a fivefold data expansion and substantial error reduction due to DeepPolisher.

    Deployment and Applications

    • Integrated in major projects: Used in HPRC’s second data release, providing high-accuracy reference assemblies for 232 individuals, ensuring broad ancestral diversity in genomic references.
    • Open-source access: Available via GitHub, with case studies and Dockerized workflows for use on assemblies produced by tools like HiFiasm and sequenced with PacBio HiFi reads.
    • Generalizability: While initially focused on human genomes, the structure and approach are adaptable to other organisms and sequencing platforms, fostering accuracy across the genomics community.

    Practical Workflow Example

    A typical workflow using DeepPolisher might involve:

    • Input: HiFiasm diploid assembly and PacBio HiFi reads, phase-aligned using the PHARAOH pipeline.
    • Running: Dockerized commands for image creation, inference, and correction application.
    • Output: Separate VCF files for maternal and paternal assemblies, polished FASTAs after bcftools consensus step.
    • Assessment: Use of benchmarking tools (e.g., dipcall, Hap.py) to quantify improvements in error rates and variant accuracy.

    Conclusion and Future Directions

    DeepPolisher represents a leap forward in genome polishing technology—sharply reducing error rates and unlocking higher resolution for functional genomics, rare variant discovery, and clinical applications. By targeting the remaining barrier to perfect genome assemblages, it enables more accurate diagnosis, population-level genetic studies, and paves the way for next-generation reference projects benefiting biomedical research and medicine.


    Check out the Technical details, GitHub Page and Paper. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

    The post Google AI Releases DeepPolisher: A New Deep Learning Tool that Improves the Accuracy of Genome Assemblies by Precisely Correcting Base-Level Errors appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleOpenAI Just Released GPT-5: The Smartest, Fastest, and Most Useful OpenAI Model
    Next Article Alibaba Introduces Group Sequence Policy Optimization (GSPO): An Efficient Reinforcement Learning Algorithm that Powers the Qwen3 Models

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    August 7, 2025
    Machine Learning

    Google DeepMind Introduces Genie 3: A General Purpose World Model that can Generate an Unprecedented Diversity of Interactive Environments

    August 7, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Mark Zuckerberg says OpenAI and Anthropic researchers are flocking to Meta’s AI camp for these 2 big reasons — aside from the inaccurately reported $100 million signing bonuses

    News & Updates

    Revisit Large-Scale Image–Caption Data in Pre-training Multimodal Foundation Models

    Machine Learning

    CVE-2025-49535 – Adobe ColdFusion XXE Security Feature Bypass

    Common Vulnerabilities and Exposures (CVEs)
    Elon Musk got roasted in Path of Exile 2 livestream — He rage-quit after dying over and over

    Elon Musk got roasted in Path of Exile 2 livestream — He rage-quit after dying over and over

    News & Updates

    Highlights

    CVE-2025-6240 – Profisee Path Traversal Vulnerability

    June 18, 2025

    CVE ID : CVE-2025-6240

    Published : June 18, 2025, 3:15 p.m. | 1 hour, 45 minutes ago

    Description : Improper Input Validation vulnerability in Profisee on Windows (filesystem modules) allows Path Traversal after authentication to the Profisee system.This issue affects Profisee: from 2020R1 before 2024R2.

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Will AI replace software engineers? It depends on who you ask

    April 21, 2025

    CVE-2025-52363 – Tenda CP3 Pro Root Password Hash Hardcoded Vulnerability

    July 14, 2025

    Graylog Flaw (CVE-2025-53106, CVSS 8.8): Privilege Escalation Via API Token Abuse

    July 1, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.