Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Stop writing tests: Automate fully with Generative AI

      August 19, 2025

      Opsera’s Codeglide.ai lets developers easily turn legacy APIs into MCP servers

      August 19, 2025

      Black Duck Security GitHub App, NuGet MCP Server preview, and more – Daily News Digest

      August 19, 2025

      10 Ways Node.js Development Boosts AI & Real-Time Data (2025-2026 Edition)

      August 18, 2025

      This new Coros watch has 3 weeks of battery life and tracks way more – even fly fishing

      August 20, 2025

      5 ways automation can speed up your daily workflow – and implementation is easy

      August 20, 2025

      This new C-suite role is more important than ever in the AI era – here’s why

      August 20, 2025

      iPhone users may finally be able to send encrypted texts to Android friends with iOS 26

      August 20, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Creating Dynamic Real-Time Features with Laravel Broadcasting

      August 20, 2025
      Recent

      Creating Dynamic Real-Time Features with Laravel Broadcasting

      August 20, 2025

      Understanding Tailwind CSS Safelist: Keep Your Dynamic Classes Safe!

      August 19, 2025

      Sitecore’s Content SDK: Everything You Need to Know

      August 19, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Why GNOME Replaced Eye of GNOME with Loupe as the Default Image Viewer

      August 19, 2025
      Recent

      Why GNOME Replaced Eye of GNOME with Loupe as the Default Image Viewer

      August 19, 2025

      Microsoft admits it broke “Reset this PC” in Windows 11 23H2 KB5063875, Windows 10 KB5063709

      August 19, 2025

      Windows 11 can now screen record specific app windows using Win + Shift + R (Snipping Tool)

      August 19, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Artificial Intelligence»A new model predicts how molecules will dissolve in different solvents

    A new model predicts how molecules will dissolve in different solvents

    August 20, 2025

    Using machine learning, MIT chemical engineers have created a computational model that can predict how well any given molecule will dissolve in an organic solvent — a key step in the synthesis of nearly any pharmaceutical. This type of prediction could make it much easier to develop new ways to produce drugs and other useful molecules.

    The new model, which predicts how much of a solute will dissolve in a particular solvent, should help chemists to choose the right solvent for any given reaction in their synthesis, the researchers say. Common organic solvents include ethanol and acetone, and there are hundreds of others that can also be used in chemical reactions.

    “Predicting solubility really is a rate-limiting step in synthetic planning and manufacturing of chemicals, especially drugs, so there’s been a longstanding interest in being able to make better predictions of solubility,” says Lucas Attia, an MIT graduate student and one of the lead authors of the new study.

    The researchers have made their model freely available, and many companies and labs have already started using it. The model could be particularly useful for identifying solvents that are less hazardous than some of the most commonly used industrial solvents, the researchers say.

    “There are some solvents which are known to dissolve most things. They’re really useful, but they’re damaging to the environment, and they’re damaging to people, so many companies require that you have to minimize the amount of those solvents that you use,” says Jackson Burns, an MIT graduate student who is also a lead author of the paper. “Our model is extremely useful in being able to identify the next-best solvent, which is hopefully much less damaging to the environment.”

    William Green, the Hoyt Hottel Professor of Chemical Engineering and director of the MIT Energy Initiative, is the senior author of the study, which appears today in Nature Communications. Patrick Doyle, the Robert T. Haslam Professor of Chemical Engineering, is also an author of the paper.

    Solving solubility

    The new model grew out of a project that Attia and Burns worked on together in an MIT course on applying machine learning to chemical engineering problems. Traditionally, chemists have predicted solubility with a tool known as the Abraham Solvation Model, which can be used to estimate a molecule’s overall solubility by adding up the contributions of chemical structures within the molecule. While these predictions are useful, their accuracy is limited.

    In the past few years, researchers have begun using machine learning to try to make more accurate solubility predictions. Before Burns and Attia began working on their new model, the state-of-the-art model for predicting solubility was a model developed in Green’s lab in 2022.

    That model, known as SolProp, works by predicting a set of related properties and combining them, using thermodynamics, to ultimately predict the solubility. However, the model has difficulty predicting solubility for solutes that it hasn’t seen before.

    “For drug and chemical discovery pipelines where you’re developing a new molecule, you want to be able to predict ahead of time what its solubility looks like,” Attia says.

    Part of the reason that existing solubility models haven’t worked well is because there wasn’t a comprehensive dataset to train them on. However, in 2023 a new dataset called BigSolDB was released, which compiled data from nearly 800 published papers, including information on solubility for about 800 molecules dissolved about more than 100 organic solvents that are commonly used in synthetic chemistry.

    Attia and Burns decided to try training two different types of models on this data. Both of these models represent the chemical structures of molecules using numerical representations known as embeddings, which incorporate information such as the number of atoms in a molecule and which atoms are bound to which other atoms. Models can then use these representations to predict a variety of chemical properties.

    One of the models used in this study, known as FastProp and developed by Burns and others in Green’s lab, incorporates “static embeddings.” This means that the model already knows the embedding for each molecule before it starts doing any kind of analysis.

    The other model, ChemProp, learns an embedding for each molecule during the training, at the same time that it learns to associate the features of the embedding with a trait such as solubility. This model, developed across multiple MIT labs, has already been used for tasks such as antibiotic discovery, lipid nanoparticle design, and predicting chemical reaction rates.

    The researchers trained both types of models on over 40,000 data points from BigSolDB, including information on the effects of temperature, which plays a significant role in solubility. Then, they tested the models on about 1,000 solutes that had been withheld from the training data. They found that the models’ predictions were two to three times more accurate than those of SolProp, the previous best model, and the new models were especially accurate at predicting variations in solubility due to temperature.

    “Being able to accurately reproduce those small variations in solubility due to temperature, even when the overarching experimental noise is very large, was a really positive sign that the network had correctly learned an underlying solubility prediction function,” Burns says.

    Accurate predictions

    The researchers had expected that the model based on ChemProp, which is able to learn new representations as it goes along, would be able to make more accurate predictions. However, to their surprise, they found that the two models performed essentially the same. That suggests that the main limitation on their performance is the quality of the data, and that the models are performing as well as theoretically possible based on the data that they’re using, the researchers say.

    “ChemProp should always outperform any static embedding when you have sufficient data,” Burns says. “We were blown away to see that the static and learned embeddings were statistically indistinguishable in performance across all the different subsets, which indicates to us that that the data limitations that are present in this space dominated the model performance.”

    The models could become more accurate, the researchers say, if better training and testing data were available — ideally, data obtained by one person or a group of people all trained to perform the experiments the same way.

    “One of the big limitations of using these kinds of compiled datasets is that different labs use different methods and experimental conditions when they perform solubility tests. That contributes to this variability between different datasets,” Attia says.

    Because the model based on FastProp makes its predictions faster and has code that is easier for other users to adapt, the researchers decided to make that one, known as FastSolv, available to the public. Multiple pharmaceutical companies have already begun using it.

    “There are applications throughout the drug discovery pipeline,” Burns says. “We’re also excited to see, outside of formulation and drug discovery, where people may use this model.”

    The research was funded, in part, by the U.S. Department of Energy.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleHow to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark
    Next Article New AI system uncovers hidden cell subtypes, boosts precision medicine

    Related Posts

    Repurposing Protein Folding Models for Generation with Latent Diffusion
    Artificial Intelligence

    Repurposing Protein Folding Models for Generation with Latent Diffusion

    August 20, 2025
    Artificial Intelligence

    Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

    August 20, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2023-53124 – MPT3SAS NULL Pointer Access Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    How E.ON saves £10 million annually with AI diagnostics for smart meters powered by Amazon Textract

    Machine Learning

    CVE-2025-6280 – TransformerOptimus SuperAGI EmailToolKit Path Traversal Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Benchmarking the next generation of never-ending learners

    Artificial Intelligence

    Highlights

    CVE-2025-32717 – Microsoft Office Word Heap-based Buffer Overflow Vulnerability

    June 10, 2025

    CVE ID : CVE-2025-32717

    Published : June 11, 2025, 12:15 a.m. | 1 hour, 35 minutes ago

    Description : Heap-based buffer overflow in Microsoft Office Word allows an unauthorized attacker to execute code locally.

    Severity: 8.4 | HIGH

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    How to Get Information About Your Linux System Through the Command Line

    June 11, 2025

    CVE-2025-23121 Critical Veeam Vulnerability: Backup Servers at Risk from Authenticated RCE Flaw

    June 19, 2025

    CodeSOD: Join Us in this Query

    April 2, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.