Optimizing Protein Design with Reinforcement Learning-Enhanced pLMs: Introducing DPO_pLM for Efficient and Targeted Sequence Generation

Autoregressive protein language models (pLMs) have become transformative tools for designing functional proteins with remarkable diversity, demonstrating success in creating enzyme families like lysozymes and carbonic anhydrases. These models generate protein sequences by sampling from learned probability distributions, uncovering intrinsic patterns within training datasets. Despite their ability to explore high-quality subspaces of the sequence landscape, pLMs struggle to target rare and valuable regions, limiting their effectiveness in tasks like engineering enzymatic activity or binding affinity. This challenge, compounded by the vast sequence space and expensive wet lab validation, makes protein optimization a complex problem. Traditional methods like directed evolution, which iteratively select desired traits, are limited to local exploration and lack tools for steering long-term evolutionary trajectories toward specific biological functions.

RL offers a promising framework to guide pLMs toward optimizing specific properties by aligning model outputs with feedback from an external oracle, such as predicted stability or binding affinities. Drawing inspiration from RL applications in robotics and gaming, recent efforts have applied RL techniques to protein design, demonstrating the potential to explore rare events and balance exploration-exploitation trade-offs efficiently. Examples include Proximal Policy Optimization (PPO) for DNA and protein design and Direct Preference Optimization (DPO) for thermostability prediction and binder design. While these studies showcase RL’s potential, there remains a need for experimentally validated, publicly available RL frameworks tailored to generative pLMs, which could advance the field of protein engineering.

Researchers from Universitat Pompeu Fabra, the Centre for Genomic Regulation, and other leading institutions developed DPO_pLM, an RL framework for optimizing protein sequences with generative pLMs. By fine-tuning pLMs using rewards from external oracles, DPO_pLM optimizes diverse user-defined properties without additional data while preserving sequence diversity. It outperforms traditional fine-tuning methods by reducing computational demands, mitigating catastrophic forgetting, and leveraging negative data. Demonstrating its effectiveness, DPO_pLM successfully designed nanomolar-affinity EGFR binders within hours.

The study introduces DPO and self-fine-tuning (s-FT) for optimizing protein sequences. DPO minimizes loss functions, including ranked and weighted forms, with negative log-likelihood proving effective. s-FT refines ZymCTRL iteratively, generating, ranking, and fine-tuning top sequences across 30 iterations. Model training uses Hugging Face’s transformers API, employing batch sizes of 4, a learning rate of 8×10⁻⁶, and evaluation every 10 steps. Structural similarity is assessed using ESMFold and Foldseek, while functional annotations rely on ESM1b embeddings and cosine similarity with CLEAN clusters. EGFR binder design applies fine-tuning on BLAST-retrieved sequences, followed by AlphaFold folding and optimization to enhance binder performance.

pLMs generate sequences resembling their training data and often achieve high functionality despite significant sequence deviations. For instance, ZymCTRL, trained on enzyme data with EC labels, created carbonic anhydrases with wild-type activity but only 39% sequence identity. Similarly, generated α-amylases outperformed wild-type activity. However, pLMs primarily replicate training set distributions, lacking precise control for optimizing specific properties like activity or stability. By applying RL, particularly methods like DPO, pLMs can be fine-tuned iteratively using feedback from oracles, enabling the generation of sequences with targeted properties while preserving diversity and quality.

In conclusion, pLMs excel at sampling from distributions but struggle to optimize specific properties. DPO_pLM overcomes this limitation by utilizing Direct Preference Optimization DPO, which refines sequences through external oracles without additional training data. ZymCTRL evaluations showed rapid and robust performance, enriching enzyme classes and folds in multi-objective tasks. In an EGFR binder design experiment, DPO_pLM achieved a 50% success rate, generating three nanomolar binders after 12 iterations in just hours. Unlike fine-tuning, DPO maximizes preference rewards, improving global predictions efficiently. Future work will focus on integrating DPO_pLM into automated labs for protein design innovations.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 60k+ ML SubReddit.

The post Optimizing Protein Design with Reinforcement Learning-Enhanced pLMs: Introducing DPO_pLM for Efficient and Targeted Sequence Generation appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

Save $400 on the best Samsung TVs, laptops, tablets, and more when you sign up for Verizon 5G Home or Home Internet

NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

NodeSource N|Solid Runtime Release – May 2025: Performance, Stability & the Final Update for v18

Big Changes at Meteor Software: Our Next Chapter

Apps in Generative AI – Transforming the Digital Experience

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

Microsoft’s allegiance isn’t to OpenAI’s pricey models — Satya Nadella’s focus is selling any AI customers want for maximum profits

If you think you can do better than Xbox or PlayStation in the Console Wars, you may just want to try out this card game

Surviving a 10 year stint in dev hell, this retro-styled hack n’ slash has finally arrived on Xbox

Optimizing Protein Design with Reinforcement Learning-Enhanced pLMs: Introducing DPO_pLM for Efficient and Targeted Sequence Generation

February 2025 Baseline monthly digest

Learn A1 Level Spanish

As the AI ‘grim reaper’ haunts more creative jobs, OpenAI’s CTO says, “maybe they shouldn’t have existed in the first place..if it is not very high quality”

Apple adding this feature to iOS 18 in 2024 is so basic it hurts my brain, and it makes me miss Windows Phone

Top 10 Highly Paid Indian CEOs in the USA

CodeSOD: A Pair of Loops

After Recallâ€™s mess, Microsoft isnâ€™t beating the security loopholes allegation any time soon

Microsoft tests a redesigned Edge Add-ons Store

Get a TP-Link Wi-Fi extender for just $20 with this Prime Day deal

Life in Startup Pivot Hell with Ex-Microsoft Lonewolf Engineer Sam Crombie [Podcast #171]

Optimizing Protein Design with Reinforcement Learning-Enhanced pLMs: Introducing DPO_pLM for Efficient and Targeted Sequence Generation

Related Posts