Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Sunshine And March Vibes (2025 Wallpapers Edition)

      May 16, 2025

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      May 16, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      May 16, 2025

      How To Prevent WordPress SQL Injection Attacks

      May 16, 2025

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025

      Minecraft licensing robbed us of this controversial NFL schedule release video

      May 16, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The power of generators

      May 16, 2025
      Recent

      The power of generators

      May 16, 2025

      Simplify Factory Associations with Laravel’s UseFactory Attribute

      May 16, 2025

      This Week in Laravel: React Native, PhpStorm Junie, and more

      May 16, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025
      Recent

      Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

      May 16, 2025

      Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

      May 16, 2025

      Microsoft might kill the Surface Laptop Studio as production is quietly halted

      May 16, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»LLM for Biology: This Paper Discusses How Language Models can be Applied to Biological Research

    LLM for Biology: This Paper Discusses How Language Models can be Applied to Biological Research

    August 15, 2024

    The integration of language models into biological research represents a significant challenge due to the inherent differences between natural language and biological sequences. Biological data, such as DNA, RNA, and protein sequences, are fundamentally different from natural language text, yet they share sequential characteristics that make them amenable to similar processing techniques. The primary challenge lies in effectively adapting language models, originally developed for natural language processing (NLP), to handle the complexities of biological sequences. Addressing this challenge is crucial for enabling more accurate predictions in fields such as protein structure prediction, gene expression analysis, and the identification of molecular interactions. Successfully overcoming these hurdles has the potential to revolutionize various domains within biology, particularly in areas requiring the analysis of large and complex datasets.

    Current methods for analyzing biological sequences rely heavily on traditional sequence alignment techniques and machine learning approaches. Sequence alignment tools like BLAST and Clustal are commonly used but often struggle with the computational complexity and scalability required for large datasets. These methods are further limited by their inability to capture the deeper structural and functional relationships within sequences. Machine learning techniques, including random forests and support vector machines, offer some improvements but are constrained by the need for manually engineered features and their lack of generalizability across diverse biological contexts. These limitations significantly reduce the effectiveness and applicability of these methods, particularly in real-time biological research where efficiency and accuracy are paramount.

    To address these limitations, Stanford researchers propose using language models, particularly those based on the transformer architecture, in biological research. This innovative approach leverages the ability of language models to process large-scale, heterogeneous datasets and to uncover complex patterns within sequential data. Pre-trained language models, such as ESM-2 for protein sequences and Geneformer for single-cell data, can be fine-tuned for specific biological tasks, offering a flexible and scalable solution that addresses the shortcomings of traditional methods. By harnessing the power of these models, the approach provides a significant advancement in the analysis of biological sequences, enabling more accurate and efficient predictions in critical areas of research.

    The proposed method relies on the transformer architecture, which is particularly effective for processing sequential data. The researchers have utilized various pre-trained models, including ESM-2, a protein language model trained on over 250 million protein sequences, and Geneformer, a single-cell language model trained on 30 million single-cell transcriptomes. These models employ masked language modeling, where parts of the sequence are hidden, and the model is trained to predict the missing elements. This training enables the model to learn the underlying patterns and relationships within the sequences, making it possible to predict outcomes such as protein stability, gene expression levels, and variant effects. The models can be further fine-tuned for specific tasks, such as integrating multi-modal data that includes gene expression, chromatin accessibility, and protein abundance.

    The proposed language models demonstrated substantial improvements across various biological tasks. For protein sequence analysis, the model achieved higher accuracy in predicting protein stability and evolutionary constraints, significantly outperforming existing methods. In single-cell data analysis, the model effectively predicted cell types and gene expression patterns with enhanced precision, offering superior performance in identifying subtle biological variations. These results underscore the models’ potential to transform biological research by providing accurate, scalable, and efficient tools for analyzing complex biological data, thereby advancing the capabilities of computational biology.

    In conclusion, this proposed method offers a significant contribution to AI-driven biological research by effectively adapting language models for the analysis of biological sequences. The approach addresses a critical challenge in the field by leveraging the strengths of transformer-based models to overcome the limitations of traditional methods. The use of models like ESM-2 and Geneformer provides a scalable and accurate solution for a wide range of biological tasks, with the potential to revolutionize fields such as genomics, proteomics, and personalized medicine by enhancing the efficiency and accuracy of biological data analysis.

    Check out the Paper and Colab Tutorial. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

    Don’t Forget to join our 48k+ ML SubReddit

    Find Upcoming AI Webinars here

    Researchers at FPT Software AI Center Introduce XMainframe: A State-of-the-Art Large Language Model (LLM) Specialized for Mainframe Modernization to Address the $100B Legacy Code Modernization

    The post LLM for Biology: This Paper Discusses How Language Models can be Applied to Biological Research appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleToolSandbox LLM Tool-Use Benchmark Released by Apple: A Conversational and Interactive Evaluation Benchmark for LLM Tool-Use Capabilities
    Next Article The AI Scientist: The World’s First AI System for Automating Scientific Research and Open-Ended Discovery

    Related Posts

    Machine Learning

    Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

    May 16, 2025
    Security

    Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

    May 16, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Pixtral 12B is now available on Amazon SageMaker JumpStart

    Development

    CVE-2025-28028 – TOTOLINK Buffer Overflow Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Selenium controlled Chrome and Edge can’t both access webcams

    Development

    Microsoft wants to redefine privacy with Windows 11, make it most trusted AI OS on planet

    Development

    Highlights

    Error: Unable to find window while the window is already displayed in LDTP

    June 16, 2024

    I’m using LDTP to write a GUI test case script in python. I’m running the script in a virtual machine by nosetests.

    And I get a block with the error of unable to find window X, while window X is absolutely displayed on the monitor. This error always occurs after LDTP actions.

    Example:

    After I open my subscription manager in the virtual machine (rhel6.8), I could find the subscription manager by calling getwindowlist():

    >>> getwindowlist()
    [‘frmTopExpandedEdgePanel’, ‘frmBottomExpandedEdgePanel’, ‘frmroot@localhost:~’,
    ‘frmx-nautilus-desktop’, ‘frmSubscriptionManager’]

    Then I use getobjectlist() to do some action:

    >>> getobjectlist(‘frmSubscriptionManager’)
    [‘flr8’, ‘flr4’, ‘mnuAbout’, ‘flr6’, ‘flr7’, ‘flr0’, ‘flr1’, ‘flr2’,
    ‘flr3’, ‘ukn2’, ‘ukn3’, ‘ukn0’, ‘ukn1’, ‘scpn1’, ‘scpn0’, ‘scpn3’, ‘scpn2’,
    ‘lblStatus1’, ‘lblContract’, ‘ptl0’, ‘flr5’, ‘txtStartEndDateText’,
    ‘tblBundledProductsTable’, ‘scbr0’, ‘mnuRedeemSubscription’,
    ‘tchEndDate’, ‘lblStatus’, ‘mnuSystem’, ‘mnuRegister’, ‘tchStartDate’,
    ‘lblSKU’, ‘txtSKUText’, ‘txtProvidingSubscriptionsText’, ‘tchQuantity’,
    ‘txtSupportTypeText’, ‘ttblMySubscriptionsView’, ‘mnuEmpty’,
    ‘txtArchText’, ‘mnuConfigureProxy’, ‘txtSupportLevelAndTypeText’,
    ‘mnuHelp’, ‘mnuOnlineDocumentation’, ‘lblStart-EndDate’, ‘mbr0’,
    ……etc]

    The window disappeared, even though it was still displayed on my virtual machine’s monitor.

    >>> getwindowlist()
    [‘frmTopExpandedEdgePanel’, ‘frmBottomExpandedEdgePanel’, ‘frmroot@localhost:~’, ‘frmx-nautilus-desktop’]

    Why does this error occur, and how should I deal with this situation in an automated test?

    Despite claims of generative AI being a fad, it could automate 54% of banking jobs — OpenAI’s GPT-4 already outperformed seasoned analysts in predicting financial trends

    June 21, 2024

    Inside Defender Media: the Ukrainian platform reporting on the business of war

    April 29, 2025

    Test Tools Need Testing

    May 24, 2024
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.