Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

      June 5, 2025

      How To Fix Largest Contentful Paint Issues With Subpart Analysis

      June 5, 2025

      How To Prevent WordPress SQL Injection Attacks

      June 5, 2025

      CodeSOD: Integral to a Database Read

      June 5, 2025

      Players aren’t buying Call of Duty’s “error” excuse for the ads Activision started forcing into the game’s menus recently

      June 4, 2025

      In Sam Altman’s world, the perfect AI would be “a very tiny model with superhuman reasoning capabilities” for any context

      June 4, 2025

      Sam Altman’s ouster from OpenAI was so dramatic that it’s apparently becoming a movie — Will we finally get the full story?

      June 4, 2025

      One of Microsoft’s biggest hardware partners joins its “bold strategy, Cotton” moment over upgrading to Windows 11, suggesting everyone just buys a Copilot+ PC

      June 4, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Enable Flexible Pattern Matching with Laravel’s Case-Insensitive Str::is Method

      June 5, 2025
      Recent

      Enable Flexible Pattern Matching with Laravel’s Case-Insensitive Str::is Method

      June 5, 2025

      Laravel OpenRouter

      June 5, 2025

      This Week in Laravel: Starter Kits, Alpine, PDFs and Roles/Permissions

      June 5, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      FOSS Weekly #25.23: Helwan Linux, Quarkdown, Konsole Tweaks, Keyboard Shortcuts and More Linux Stuff

      June 5, 2025
      Recent

      FOSS Weekly #25.23: Helwan Linux, Quarkdown, Konsole Tweaks, Keyboard Shortcuts and More Linux Stuff

      June 5, 2025

      Grow is a declarative website generator

      June 5, 2025

      Raspberry Pi 5 Desktop Mini PC: Benchmarking

      June 5, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»An Advanced Coding Implementation: Mastering Browser‑Driven AI in Google Colab with Playwright, browser_use Agent & BrowserContext, LangChain, and Gemini

    An Advanced Coding Implementation: Mastering Browser‑Driven AI in Google Colab with Playwright, browser_use Agent & BrowserContext, LangChain, and Gemini

    April 20, 2025

    In this tutorial, we will learn how to harness the power of a browser‑driven AI agent entirely within Google Colab. We will utilize Playwright’s headless Chromium engine, along with the browser_use library’s high-level Agent and BrowserContext abstractions, to programmatically navigate websites, extract data, and automate complex workflows. We will wrap Google’s Gemini model via the langchain_google_genai connector to provide natural‑language reasoning and decision‑making, secured by pydantic’s SecretStr for safe API‑key handling. With getpass managing credentials, asyncio orchestrating non‑blocking execution, and optional .env support via python-dotenv, this setup will give you an end‑to‑end, interactive agent platform without ever leaving your notebook environment.

    Copy CodeCopiedUse a different Browser
    !apt-get update -qq
    !apt-get install -y -qq chromium-browser chromium-chromedriver fonts-liberation
    !pip install -qq playwright python-dotenv langchain-google-generative-ai browser-use
    !playwright install

    We first refresh the system package lists and install headless Chromium, its WebDriver, and the Liberation fonts to enable browser automation. It then installs Playwright along with python-dotenv, the LangChain GoogleGenerativeAI connector, and browser-use, and finally downloads the necessary browser binaries via playwright install.

    Copy CodeCopiedUse a different Browser
    import os
    import asyncio
    from getpass import getpass
    from pydantic import SecretStr
    from langchain_google_genai import ChatGoogleGenerativeAI
    from browser_use import Agent, Browser, BrowserContextConfig, BrowserConfig
    from browser_use.browser.browser import BrowserContext

    We bring in the core Python utilities, os for environment management and asyncio for asynchronous execution, plus getpass and pydantic’s SecretStr for secure API‑key input and storage. It then loads LangChain’s Gemini wrapper (ChatGoogleGenerativeAI) and the browser_use toolkit (Agent, Browser, BrowserContextConfig, BrowserConfig, and BrowserContext) to configure and drive a headless browser agent.

    Copy CodeCopiedUse a different Browser
    os.environ["ANONYMIZED_TELEMETRY"] = "false"

    We disable anonymous usage reporting by setting the ANONYMIZED_TELEMETRY environment variable to “false”, ensuring that neither Playwright nor the browser_use library sends any telemetry data back to its maintainers.

    Copy CodeCopiedUse a different Browser
    async def setup_browser(headless: bool = True):
        browser = Browser(config=BrowserConfig(headless=headless))
        context = BrowserContext(
            browser=browser,
            config=BrowserContextConfig(
                wait_for_network_idle_page_load_time=5.0,
                highlight_elements=True,
                save_recording_path="./recordings",
            )
        )
        return browser, context

    This asynchronous helper initializes a headless (or headed) Browser instance and wraps it in a BrowserContext configured to wait for network‑idle page loads, visually highlight elements during interactions, and save a recording of each session under ./recordings. It then returns both the browser and its ready‑to‑use context for your agent’s tasks.

    Copy CodeCopiedUse a different Browser
    async def agent_loop(llm, browser_context, query, initial_url=None):
        initial_actions = [{"open_tab": {"url": initial_url}}] if initial_url else None
        agent = Agent(
            task=query,
            llm=llm,
            browser_context=browser_context,
            use_vision=True,
            generate_gif=False,  
            initial_actions=initial_actions,
        )
        result = await agent.run()
        return result.final_result() if result else None

    This async helper encapsulates one “think‐and‐browse” cycle: it spins up an Agent configured with your LLM, the browser context, and optional initial URL tab, leverages vision when available, and disables GIF recording. Once you call agent_loop, it runs the agent through its steps and returns the agent’s final result (or None if nothing is produced).

    Copy CodeCopiedUse a different Browser
    async def main():
        raw_key = getpass("Enter your GEMINI_API_KEY: ")
    
    
        os.environ["GEMINI_API_KEY"] = raw_key
    
    
        api_key = SecretStr(raw_key)
        model_name = "gemini-2.5-flash-preview-04-17"
    
    
        llm = ChatGoogleGenerativeAI(model=model_name, api_key=api_key)
    
    
        browser, context = await setup_browser(headless=True)
    
    
        try:
            while True:
                query = input("nEnter prompt (or leave blank to exit): ").strip()
                if not query:
                    break
                url = input("Optional URL to open first (or blank to skip): ").strip() or None
    
    
                print("n🤖 Running agent…")
                answer = await agent_loop(llm, context, query, initial_url=url)
                print("n📊 Search Resultsn" + "-"*40)
                print(answer or "No results found")
                print("-"*40)
        finally:
            print("Closing browser…")
            await browser.close()
    
    
    await main()

    Finally, this main coroutine drives the entire Colab session: it securely prompts for your Gemini API key (using getpass and SecretStr), sets up the ChatGoogleGenerativeAI LLM and a headless Playwright browser context, then enters an interactive loop where it reads your natural‑language prompts (and optional start URL), invokes the agent_loop to perform the browser‑driven AI task, prints the results, and finally ensures the browser closes cleanly.

    In conclusion, by following this guide, you now have a reproducible Colab template that integrates browser automation, LLM reasoning, and secure credential management into a single cohesive pipeline. Whether you’re scraping real‑time market data, summarizing news articles, or automating reporting tasks, the combination of Playwright, browser_use, and LangChain’s Gemini interface provides a flexible foundation for your next AI‑powered project. Feel free to extend the agent’s capabilities, re‑enable GIF recording, add custom navigation steps, or swap in other LLM backends to tailor the workflow precisely to your research or production needs.


    Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post An Advanced Coding Implementation: Mastering Browser‑Driven AI in Google Colab with Playwright, browser_use Agent & BrowserContext, LangChain, and Gemini appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleDistroWatch Weekly, Issue 1118
    Next Article Fourier Neural Operators Just Got a Turbo Boost: Researchers from UC Riverside Introduce TurboFNO, a Fully Fused FFT-GEMM-iFFT Kernel Achieving Up to 150% Speedup over PyTorch

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 5, 2025
    Machine Learning

    H Company Releases Runner H Public Beta Alongside Holo-1 and Tester H for Developers

    June 5, 2025
    Leave A Reply Cancel Reply

    Continue Reading

    Apply a radom pacing in Jmeter without using Timers

    Development

    Windows 11 update breaks Adobe Premiere Pro’s ability to move or drag clips on the timeline

    Operating Systems

    CVE-2025-2605 – Honeywell MB-Secure OS Command Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    The fastest way of creating a real-time app with Zod and TypeScript

    Development

    Highlights

    Who should run OpenAI? 50% say Altman and only 23% say Musk, in ZDNET poll

    February 14, 2025

    Now that the battle for OpenAI has ensued, ZDNET polled readers about who should control…

    6 Must Run Performance Tests for Black Friday

    November 21, 2024

    World’s largest Oilfield Services Firm Halliburton Allegedly Hit by Cyberattack

    August 22, 2024

    Microsoft’s first major update to MIDI in more than 40 years is finally here

    February 6, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.