AutoWebGLM: A GPT-4-Outperforming Automated Web Navigation Agent Built Upon ChatGLM3-6B

Large Language Models (LLMs) have become essential tools for various intelligent agent tasks such as web navigation. The notion of self-governing digital agents, particularly those powered by LLMs, has great potential to transform the relationship of humans with technology. These agents provide previously unthinkable possibilities by their exceptional cognition and response skills.

However, most current agents frequently fail to meet real-world needs on web pages due to the following three reasons.

Versatility of Actions on Websites: Traditional agents find it difficult to efficiently explore webpages due to their extensive array of actions and interactions.

HTML Text Processing Capacity: The sheer amount of HTML text on a webpage can be more than the typical models can handle, resulting in less-than-ideal performance and incomplete comprehension.Â

The complexity of decision-making: Agents must make relevant decisions in real-time due to the open-domain nature of the web, which creates a complex decision-making environment.Â

In order to address these issues, a team of researchers has suggested AutoWebGLM, an automatic web navigator that goes above and beyond GPT-4â€™s capabilities and is based on the ChatGLM3-6B paradigm. Several significant developments have been involved in the development of AutoWebGLM, which are as follows.Â

HTML Simplification Algorithm: The team has created an HTML simplification algorithm to more concisely express webpages while maintaining important information based on human browsing behaviours. The objective of this algorithm is to optimise the way webpage material is processed so that the model can comprehend it more effectively.

Hybrid Human-AI Data Generation: High-quality web surfing data has been generated using a hybrid technique that combines human experience and AI capabilities in order to train AutoWebGLM efficiently. The curriculum training is based on this carefully selected dataset, which helps the model learn and perform better over time.Â

Reinforcement learning techniques have been used to bootstrap the model, and rejection sampling has been added to improve the modelâ€™s ability to comprehend webpages, perform browser actions, and break down tasks on its own. With this method, AutoWebGLM can adjust and improve its methods in response to encounters in the actual world.

The team has also created the multilingual benchmark known as AutoWebBench to evaluate AutoWebGLMâ€™s performance in real-world web browsing operations. The benefits of AutoWebGLM have been demonstrated through extensive testing on a variety of web navigation benchmarks, along with the underlying issues that still need to be resolved for real-world navigation.

The team has summarised their primary contributions as follows.

The team has created and deployed AutoWebGLM, an autonomous web browser that can efficiently perform online surfing activities. Curriculum learning techniques have been applied and self-sampling reinforcement learning has been used along with rejection sampling finetuning (RFT) in the web surfing environment to bootstrap the agentâ€™s training.Â

The team has collected and organised 10,000 records of actual webpage viewing activities. This dataset is produced using both manual and model-assisted techniques. AutoWebBench has also been introduced, which is a multilingual (English and Chinese) web browsing benchmark to ease evaluation across various linguistic contexts.

Using tests, the team has shown that AutoWebGLM, with 6 billion parameters, performs at a level that is competitive with the latest LLM-based agents. The team has shared that it achieves a genuinely usable level for real-world web tasks, surpassing an important threshold and demonstrating its effectiveness in tackling the difficulties associated with web navigation.

Check out theÂ Paper and Github.Â All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 40k+ ML SubReddit

The post AutoWebGLM: A GPT-4-Outperforming Automated Web Navigation Agent Built Upon ChatGLM3-6B appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

AutoWebGLM: A GPT-4-Outperforming Automated Web Navigation Agent Built Upon ChatGLM3-6B

Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

Google to Block Entrust Certificates in Chrome Starting November 2024

CVE-2022-42450 – HCL Domino Volt SVG Injection Vulnerability

Create a virtual stock technical analyst using Amazon Bedrock Agents

How to set up a print server on your home network with Linux

From drop-out to software architect with Jason Lengstorf [Podcast #167]

Explore new ways to accelerate your creative workflows

An AI dataset carves new paths to tornado detection

CISA Releases Guide on Modern Approaches to Network Access Security

AutoWebGLM: A GPT-4-Outperforming Automated Web Navigation Agent Built Upon ChatGLM3-6B

Related Posts