Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Designing Better UX For Left-Handed People

      July 25, 2025

      This week in AI dev tools: Gemini 2.5 Flash-Lite, GitLab Duo Agent Platform beta, and more (July 25, 2025)

      July 25, 2025

      Tenable updates Vulnerability Priority Rating scoring method to flag fewer vulnerabilities as critical

      July 24, 2025

      Google adds updated workspace templates in Firebase Studio that leverage new Agent mode

      July 24, 2025

      I ran with the Apple Watch and Samsung Watch 8 – here’s the better AI coach

      July 26, 2025

      8 smart home gadgets that instantly upgraded my house (and why they work)

      July 26, 2025

      I tested Panasonic’s new affordable LED TV model – here’s my brutally honest buying advice

      July 26, 2025

      OpenAI teases imminent GPT-5 launch. Here’s what to expect

      July 26, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      NativePHP Is Entering Its Next Phase

      July 26, 2025
      Recent

      NativePHP Is Entering Its Next Phase

      July 26, 2025

      Medical Card Generator Android App Project Using SQLite

      July 26, 2025

      The details of TC39’s last meeting

      July 26, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Elden Ring Nightreign’s Patch 1.02 update next week is adding a feature we’ve all been waiting for since launch — and another I’ve been begging for, too

      July 26, 2025
      Recent

      Elden Ring Nightreign’s Patch 1.02 update next week is adding a feature we’ve all been waiting for since launch — and another I’ve been begging for, too

      July 26, 2025

      The next time you look at Microsoft Copilot, it may look back — but who asked for this?

      July 26, 2025

      5 Open Source Apps You Can use for Seamless File Transfer Between Linux and Android

      July 26, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»A Code Implementation to Efficiently Leverage LangChain to Automate PubMed Literature Searches, Parsing, and Trend Visualization

    A Code Implementation to Efficiently Leverage LangChain to Automate PubMed Literature Searches, Parsing, and Trend Visualization

    July 24, 2025

    In this tutorial, we are excited to introduce the Advanced PubMed Research Assistant, which guides you through building a streamlined pipeline for querying and analyzing biomedical literature. In this tutorial, we focus on leveraging the PubmedQueryRun tool to perform targeted searches, such as “CRISPR gene editing,” and then parse, cache, and explore those results. You’ll learn how to extract publication dates, titles, and summaries; store queries for instant reuse; and prepare your data for visualization or further analysis.

    Copy CodeCopiedUse a different Browser
    !pip install -q langchain-community xmltodict pandas matplotlib seaborn wordcloud google-generativeai langchain-google-genai
    
    
    import os
    import re
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    from datetime import datetime, timedelta
    from collections import Counter
    from wordcloud import WordCloud
    import warnings
    warnings.filterwarnings('ignore')
    
    
    from langchain_community.tools.pubmed.tool import PubmedQueryRun
    from langchain_google_genai import ChatGoogleGenerativeAI
    from langchain.agents import initialize_agent, Tool
    from langchain.agents import AgentType

    We install and configure all the essential Python packages, including langchain-community, xmltodict, pandas, matplotlib, seaborn, and wordcloud, as well as Google Generative AI and LangChain Google integrations. We import core data‑processing and visualization libraries, silence warnings, and bring in the PubmedQueryRun tool and ChatGoogleGenerativeAI client. Finally, we prepare to initialize our LangChain agent with the PubMed search capability.

    Copy CodeCopiedUse a different Browser
    class AdvancedPubMedResearcher:
        """Advanced PubMed research assistant with analysis capabilities"""
       
        def __init__(self, gemini_api_key=None):
            """Initialize the researcher with optional Gemini integration"""
            self.pubmed_tool = PubmedQueryRun()
            self.research_cache = {}
           
            if gemini_api_key:
                os.environ["GOOGLE_API_KEY"] = gemini_api_key
                self.llm = ChatGoogleGenerativeAI(
                    model="gemini-1.5-flash",
                    temperature=0,
                    convert_system_message_to_human=True
                )
                self.agent = self._create_agent()
            else:
                self.llm = None
                self.agent = None
       
        def _create_agent(self):
            """Create LangChain agent with PubMed tool"""
            tools = [
                Tool(
                    name="PubMed Search",
                    func=self.pubmed_tool.invoke,
                    description="Search PubMed for biomedical literature. Use specific terms."
                )
            ]
           
            return initialize_agent(
                tools,
                self.llm,
                agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
                verbose=True
            )
       
        def search_papers(self, query, max_results=5):
            """Search PubMed and parse results"""
            print(f"🔍 Searching PubMed for: '{query}'")
           
            try:
                results = self.pubmed_tool.invoke(query)
                papers = self._parse_pubmed_results(results)
               
                self.research_cache[query] = {
                    'papers': papers,
                    'timestamp': datetime.now(),
                    'query': query
                }
               
                print(f"✅ Found {len(papers)} papers")
                return papers
               
            except Exception as e:
                print(f"❌ Error searching PubMed: {str(e)}")
                return []
       
        def _parse_pubmed_results(self, results):
            """Parse PubMed search results into structured data"""
            papers = []
           
            publications = results.split('nnPublished: ')[1:]
           
            for pub in publications:
                try:
                    lines = pub.strip().split('n')
                   
                    pub_date = lines[0] if lines else "Unknown"
                   
                    title_line = next((line for line in lines if line.startswith('Title: ')), '')
                    title = title_line.replace('Title: ', '') if title_line else "Unknown Title"
                   
                    summary_start = None
                    for i, line in enumerate(lines):
                        if 'Summary::' in line:
                            summary_start = i + 1
                            break
                   
                    summary = ""
                    if summary_start:
                        summary = ' '.join(lines[summary_start:])
                   
                    papers.append({
                        'date': pub_date,
                        'title': title,
                        'summary': summary,
                        'word_count': len(summary.split()) if summary else 0
                    })
                   
                except Exception as e:
                    print(f"⚠ Error parsing paper: {str(e)}")
                    continue
           
            return papers
       
        def analyze_research_trends(self, queries):
            """Analyze trends across multiple research topics"""
            print("📊 Analyzing research trends...")
           
            all_papers = []
            topic_counts = {}
           
            for query in queries:
                papers = self.search_papers(query, max_results=3)
                topic_counts[query] = len(papers)
               
                for paper in papers:
                    paper['topic'] = query
                    all_papers.append(paper)
           
            df = pd.DataFrame(all_papers)
           
            if df.empty:
                print("❌ No papers found for analysis")
                return None
           
            self._create_visualizations(df, topic_counts)
           
            return df
       
        def _create_visualizations(self, df, topic_counts):
            """Create research trend visualizations"""
            plt.style.use('seaborn-v0_8')
            fig, axes = plt.subplots(2, 2, figsize=(15, 12))
            fig.suptitle('PubMed Research Analysis Dashboard', fontsize=16, fontweight='bold')
           
            topics = list(topic_counts.keys())
            counts = list(topic_counts.values())
           
            axes[0,0].bar(range(len(topics)), counts, color='skyblue', alpha=0.7)
            axes[0,0].set_xlabel('Research Topics')
            axes[0,0].set_ylabel('Number of Papers')
            axes[0,0].set_title('Papers Found by Topic')
            axes[0,0].set_xticks(range(len(topics)))
            axes[0,0].set_xticklabels([t[:20]+'...' if len(t)>20 else t for t in topics], rotation=45, ha='right')
           
            if 'word_count' in df.columns and not df['word_count'].empty:
                axes[0,1].hist(df['word_count'], bins=10, color='lightcoral', alpha=0.7)
                axes[0,1].set_xlabel('Abstract Word Count')
                axes[0,1].set_ylabel('Frequency')
                axes[0,1].set_title('Distribution of Abstract Lengths')
           
            try:
                dates = pd.to_datetime(df['date'], errors='coerce')
                valid_dates = dates.dropna()
                if not valid_dates.empty:
                    axes[1,0].hist(valid_dates, bins=10, color='lightgreen', alpha=0.7)
                    axes[1,0].set_xlabel('Publication Date')
                    axes[1,0].set_ylabel('Number of Papers')
                    axes[1,0].set_title('Publication Timeline')
                    plt.setp(axes[1,0].xaxis.get_majorticklabels(), rotation=45)
            except:
                axes[1,0].text(0.5, 0.5, 'Date parsing unavailable', ha='center', va='center', transform=axes[1,0].transAxes)
           
            all_titles = ' '.join(df['title'].fillna('').astype(str))
            if all_titles.strip():
                clean_titles = re.sub(r'[^a-zA-Zs]', '', all_titles.lower())
               
                try:
                    wordcloud = WordCloud(width=400, height=300, background_color='white',
                                        max_words=50, colormap='viridis').generate(clean_titles)
                    axes[1,1].imshow(wordcloud, interpolation='bilinear')
                    axes[1,1].axis('off')
                    axes[1,1].set_title('Common Words in Titles')
                except:
                    axes[1,1].text(0.5, 0.5, 'Word cloud unavailable', ha='center', va='center', transform=axes[1,1].transAxes)
           
            plt.tight_layout()
            plt.show()
       
        def comparative_analysis(self, topic1, topic2):
            """Compare two research topics"""
            print(f"🔬 Comparing '{topic1}' vs '{topic2}'")
           
            papers1 = self.search_papers(topic1)
            papers2 = self.search_papers(topic2)
           
            avg_length1 = sum(p['word_count'] for p in papers1) / len(papers1) if papers1 else 0
            avg_length2 = sum(p['word_count'] for p in papers2) / len(papers2) if papers2 else 0
           
            print("n📈 Comparison Results:")
            print(f"Topic 1 ({topic1}):")
            print(f"  - Papers found: {len(papers1)}")
            print(f"  - Avg abstract length: {avg_length1:.1f} words")
           
            print(f"nTopic 2 ({topic2}):")
            print(f"  - Papers found: {len(papers2)}")
            print(f"  - Avg abstract length: {avg_length2:.1f} words")
           
            return papers1, papers2
       
        def intelligent_query(self, question):
            """Use AI agent to answer research questions (requires Gemini API)"""
            if not self.agent:
                print("❌ AI agent not available. Please provide Gemini API key.")
                print("💡 Get free API key at: https://makersuite.google.com/app/apikey")
                return None
           
            print(f"🤖 Processing intelligent query with Gemini: '{question}'")
            try:
                response = self.agent.run(question)
                return response
            except Exception as e:
                print(f"❌ Error with AI query: {str(e)}")
                return None
    

    We encapsulate the PubMed querying workflow in our AdvancedPubMedResearcher class, initializing the PubmedQueryRun tool and an optional Gemini-powered LLM agent for advanced analysis. We provide methods to search for papers, parse and cache results, analyze research trends with rich visualizations, and compare topics side by side. This class streamlines programmatic exploration of biomedical literature and intelligent querying in just a few method calls.

    Copy CodeCopiedUse a different Browser
    def main():
        """Main tutorial demonstration"""
        print("🚀 Advanced PubMed Research Assistant Tutorial")
        print("=" * 50)
       
        # Initialize researcher
        # Uncomment next line and add your free Gemini API key for AI features
        # Get your free API key at: https://makersuite.google.com/app/apikey
        # researcher = AdvancedPubMedResearcher(gemini_api_key="your-gemini-api-key")
        researcher = AdvancedPubMedResearcher()
       
        print("n1⃣ Basic PubMed Search")
        papers = researcher.search_papers("CRISPR gene editing", max_results=3)
       
        if papers:
            print(f"nFirst paper preview:")
            print(f"Title: {papers[0]['title']}")
            print(f"Date: {papers[0]['date']}")
            print(f"Summary preview: {papers[0]['summary'][:200]}...")
    
    
        print("nn2⃣ Research Trends Analysis")
        research_topics = [
            "machine learning healthcare",
            "CRISPR gene editing",
            "COVID-19 vaccine"
        ]
       
        df = researcher.analyze_research_trends(research_topics)
       
        if df is not None:
            print(f"nDataFrame shape: {df.shape}")
            print("nSample data:")
            print(df[['topic', 'title', 'word_count']].head())
    
    
        print("nn3⃣ Comparative Analysis")
        papers1, papers2 = researcher.comparative_analysis(
            "artificial intelligence diagnosis",
            "traditional diagnostic methods"
        )
       
        print("nn4⃣ Advanced Features")
        print("Cache contents:", list(researcher.research_cache.keys()))
       
        if researcher.research_cache:
            latest_query = list(researcher.research_cache.keys())[-1]
            cached_data = researcher.research_cache[latest_query]
            print(f"Latest cached query: '{latest_query}'")
            print(f"Cached papers count: {len(cached_data['papers'])}")
       
        print("n✅ Tutorial complete!")
        print("nNext steps:")
        print("- Add your FREE Gemini API key for AI-powered analysis")
        print("  Get it at: https://makersuite.google.com/app/apikey")
        print("- Customize queries for your research domain")
        print("- Export results to CSV with: df.to_csv('research_results.csv')")
       
        print("n🎁 Bonus: To test AI features, run:")
        print("researcher = AdvancedPubMedResearcher(gemini_api_key='your-key')")
        print("response = researcher.intelligent_query('What are the latest breakthrough in cancer treatment?')")
        print("print(response)")
    
    
    if __name__ == "__main__":
        main()
    

    We implement the main function to orchestrate the full tutorial demo, guiding users through basic PubMed searches, multi‑topic trend analyses, comparative studies, and cache inspection in a clear, numbered sequence. We wrap up by highlighting the next steps, including adding your Gemini API key for AI features, customizing queries to your domain, and exporting results to CSV, along with a bonus snippet for running intelligent, Gemini-powered research queries.

    In conclusion, we have now demonstrated how to harness the power of PubMed programmatically, from crafting precise search queries to parsing and caching results for quick retrieval. By following these steps, you can automate your literature review process, track research trends over time, and integrate advanced analyses into your workflows. We encourage you to experiment with different search terms, dive into the cached results, and extend this framework to support your ongoing biomedical research.


    Check out the CODES here. All credit for this research goes to the researchers of this project.

    Meet the AI Dev Newsletter read by 40k+ Devs and Researchers from NVIDIA, OpenAI, DeepMind, Meta, Microsoft, JP Morgan Chase, Amgen, Aflac, Wells Fargo and 100s more [SUBSCRIBE NOW]

    The post A Code Implementation to Efficiently Leverage LangChain to Automate PubMed Literature Searches, Parsing, and Trend Visualization appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSYNCOGEN: A Machine Learning Framework for Synthesizable 3D Molecular Generation Through Joint Graph and Coordinate Modeling
    Next Article Association for Computational Linguistics (ACL) 2025

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 26, 2025
    Machine Learning

    RoboBrain 2.0: The Next-Generation Vision-Language Model Unifying Embodied AI for Advanced Robotics

    July 26, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Microsoft upgrades Windows 11’s Photos, Paint, and Snipping Tool with AI-powered features

    Operating Systems

    OpenDocument Format (ODF) celebra il suo 20° anniversario!

    Linux

    CVE-2025-45488 – Linksys E5600 Command Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Sublime Text Releases Update With Support for Right Sidebar

    Development

    Highlights

    CVE-2025-46730 – “MobSF ZIP Bomb Denial of Service Vulnerability”

    May 5, 2025

    CVE ID : CVE-2025-46730

    Published : May 5, 2025, 8:15 p.m. | 3 hours, 19 minutes ago

    Description : MobSF is a mobile application security testing tool used. Typically, MobSF is deployed on centralized internal or cloud-based servers that also host other security tools and web applications. Access to the MobSF web interface is often granted to internal security teams, audit teams, and external vendors. MobSF provides a feature that allows users to upload ZIP files for static analysis. Upon upload, these ZIP files are automatically extracted and stored within the MobSF directory. However, in versions up to and including 4.3.2, this functionality lacks a check on the total uncompressed size of the ZIP file, making it vulnerable to a ZIP of Death (zip bomb) attack. Due to the absence of safeguards against oversized extractions, an attacker can craft a specially prepared ZIP file that is small in compressed form but expands to a massive size upon extraction. Exploiting this, an attacker can exhaust the server’s disk space, leading to a complete denial of service (DoS) not just for MobSF, but also for any other applications or websites hosted on the same server. This vulnerability can lead to complete server disruption in an organization which can affect other internal portals and tools too (which are hosted on the same server). If some organization has created their customized cloud based mobile security tool using MobSF core then an attacker can exploit this vulnerability to crash their servers. Commit 6987a946485a795f4fd38cebdb4860b368a1995d fixes this issue. As an additional mitigation, it is recommended to implement a safeguard that checks the total uncompressed size of any uploaded ZIP file before extraction. If the estimated uncompressed size exceeds a safe threshold (e.g., 100 MB), MobSF should reject the file and notify the user.

    Severity: 6.8 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-6177 – Google ChromeOS MiniOS Debug Shell Privilege Escalation

    June 16, 2025

    CVE-2025-52577 – Advantech iView SQL Injection and Remote Code Execution Vulnerability

    July 10, 2025

    Ransomware or Espionage? Fog Ransomware Attack in Asia Raises Suspicion with Rare Toolset

    June 13, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.