This AI Paper Introduces Evo: A Genomic Foundation Model that Enables Prediction and Generation Tasks from the Molecular to Genome-Scale

Genomic research is a critical field that focuses on understanding genomesâ€™ structure, function, and evolution. It encompasses studies on DNA sequences, genetic variations, and the intricate mechanisms governing gene expression and regulation. This field has profound implications for biotechnology, medicine, and evolutionary biology, offering insights into genetic disorders, potential therapies, and the fundamental processes of life.

One critical problem is the need for advanced models to predict and generate biological sequences. Current methods can be more complex and scale to model genomic functions accurately. Researchers seek solutions to improve these modelsâ€™ precision and efficiency to better understand and manipulate biological systems.

Current methods often need more capability to handle the complexity and scale required to model genomic functions accurately. Researchers seek solutions to improve these modelsâ€™ precision and efficiency to better understand and manipulate biological systems. Traditional approaches in genomic modeling have primarily utilized modality-specific models focused on proteins, regulatory DNA, or RNA. These models often need help handling the multi-scale interactions in complex biological processes. Generative applications have been restricted to designing simple molecules and short sequences, lacking the breadth necessary for comprehensive genomic analysis.

Researchers from Stanford University, Arc Institute, TogetherAI, CZ Biohub, and the University of California, Berkeley, have introduced Evo, a genomic foundation model designed to perform prediction and generation tasks from the molecular to genome-scale. Evo leverages a novel deep signal processing architecture to handle vast genomic datasets with high precision. Evoâ€˜s architecture incorporates a hybrid of attention mechanisms and convolutional operators, allowing it to process sequences at single-nucleotide resolution over long contexts. Trained on 7 billion parameters with data from whole prokaryotic genomes, Evo can generalize across DNA, RNA, and protein modalities, enabling it to predict gene functions and generate complex biological systems.

Evo employs a state-of-the-art deep signal processing architecture, StripedHyena, which combines attention mechanisms with convolutional operators to process long genomic sequences efficiently. This hybrid approach enables Evo to maintain high resolution at the single-nucleotide level, which is crucial for capturing the detailed variations in genetic sequences. The model is trained on extensive prokaryotic genome datasets totaling 300 billion nucleotide tokens, which include bacterial and archaeal genomes and millions of predicted phage and plasmid sequences. This comprehensive training allows Evo to learn the intricate patterns of genomic sequences, making it capable of predicting and generating tasks across different molecular modalities. The training process involved two stages: initially using a context length of 8,000 tokens and extending to 131,000 tokens to capture broader genomic contexts. Evoâ€˜s architecture includes 29 layers of data-controlled convolutional operators interleaved with multi-head attention layers equipped with rotary position embeddings, enhancing its ability to recall long-sequence information.

The performance of Evo excels in zero-shot function prediction and generation tasks. It can generate synthetic CRISPR-Cas molecular complexes and transposable systems, predict gene essentiality with high accuracy, and create coding-rich sequences up to 650 kilobases in length. In terms of specific performance metrics, Evo demonstrated a Spearman correlation of 0.64 in predicting the fitness effects of mutations on the 5S ribosomal RNA in E. coli. For gene expression prediction, Evo achieved a correlation of 0.41 for mRNA expression and an AUROC of 0.68 for protein expression prediction. The modelâ€™s ability to predict gene essentiality was also impressive, with an AUROC of 0.86 for lambda phage essentiality and 0.81 for Pseudomonas aeruginosa. These capabilities surpass those of existing domain-specific language models, highlighting Evoâ€˜s advanced performance across various genomic tasks. Furthermore, Evoâ€˜s generative capabilities are demonstrated by its ability to produce coherent CRISPR-Cas systems, with 15-45% of generated sequences containing Cas coding sequences as long as 5kb and generating transposable elements with significant protein sequence diversity.

In conclusion, the research team has developed a powerful tool in Evo that addresses the limitations of previous models. By enabling comprehensive genomic analysis and generation, Evo represents a significant advancement in the field, promising to enhance our understanding and control of biological systems on multiple levels. Evoâ€˜s success in modeling genomic data at scale and its ability to perform zero-shot predictions and generate complex biological sequences mark a significant leap forward in genomic research. This model not only provides a deeper mechanistic understanding of biology but also accelerates the potential for engineering life forms, offering a new paradigm in biological research and synthetic biology.

Check out theÂ Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 42k+ ML SubReddit

The post This AI Paper Introduces Evo: A Genomic Foundation Model that Enables Prediction and Generation Tasks from the Molecular to Genome-Scale appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Build Confidence In Your UX Work

Microsoft’s ‘ultimate goal is to remove passwords completely’ — this overhaul could make it happen

Intel’s new CEO requests “brutal honesty” from partners in his first keynote speech — Determined to build a “world-class” foundry

Xbox fans, I wasn’t ready for $80 games, but Nintendo Switch 2’s Mario Kart World just set the tone

The Nintendo Switch 2 has game sharing and a camera — sound familiar?

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PEAR Releases (03.10.2025)

Community News: Latest PECL Releases (03.11.2025)

Perficient Included in IDC Market Glance: Payer, 1Q25

Microsoft’s ‘ultimate goal is to remove passwords completely’ — this overhaul could make it happen

Microsoft’s ‘ultimate goal is to remove passwords completely’ — this overhaul could make it happen

Intel’s new CEO requests “brutal honesty” from partners in his first keynote speech — Determined to build a “world-class” foundry

Xbox fans, I wasn’t ready for $80 games, but Nintendo Switch 2’s Mario Kart World just set the tone

This AI Paper Introduces Evo: A Genomic Foundation Model that Enables Prediction and Generation Tasks from the Molecular to Genome-Scale

ruby-align is Baseline Newly available

February 2025 Baseline monthly digest

How a BEC scam cost a company $60 Million â€“ Week in security with Tony Anscombe

This AI Paper Presents a Direct Experimental Comparison between 8B-Parameter Mamba, Mamba-2, Mamba-2-Hybrid, and Transformer Models Trained on Upto 3.5T Tokens

Uncovering GStreamer secrets

Create a virtual stock technical analyst using Amazon Bedrock Agents

The latest KB5053657 for Windows 11 versions 22H2 and 23H2 finally improves multilingual text rending

Optimizing Costs and Performance in Databricks: A FinOps Approach

Embracing Joy and Empowerment

List All Folders in Mailbox – Exchange/O365/PowerShell

This AI Paper Introduces Evo: A Genomic Foundation Model that Enables Prediction and Generation Tasks from the Molecular to Genome-Scale

Related Posts