Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      From Data To Decisions: UX Strategies For Real-Time Dashboards

      September 13, 2025

      Honeycomb launches AI observability suite for developers

      September 13, 2025

      Low-Code vs No-Code Platforms for Node.js: What CTOs Must Know Before Investing

      September 12, 2025

      ServiceNow unveils Zurich AI platform

      September 12, 2025

      Building personal apps with open source and AI

      September 12, 2025

      What Can We Actually Do With corner-shape?

      September 12, 2025

      Craft, Clarity, and Care: The Story and Work of Mengchu Yao

      September 12, 2025

      Distribution Release: Q4OS 6.1

      September 12, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      Optimizely Mission Control – Part III

      September 14, 2025
      Recent

      Optimizely Mission Control – Part III

      September 14, 2025

      Learning from PHP Log to File Example

      September 13, 2025

      Online EMI Calculator using PHP – Calculate Loan EMI, Interest, and Amortization Schedule

      September 13, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      sudo vs sudo-rs: What You Need to Know About the Rust Takeover of Classic Sudo Command

      September 14, 2025
      Recent

      sudo vs sudo-rs: What You Need to Know About the Rust Takeover of Classic Sudo Command

      September 14, 2025

      Dmitry — The Deep Magic

      September 13, 2025

      Right way to record and share our Terminal sessions

      September 13, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»AgentA/B: A Scalable AI System Using LLM Agents that Simulate Real User Behavior to Transform Traditional A/B Testing on Live Web Platforms

    AgentA/B: A Scalable AI System Using LLM Agents that Simulate Real User Behavior to Transform Traditional A/B Testing on Live Web Platforms

    April 26, 2025

    Designing and evaluating web interfaces is one of the most critical tasks in today’s digital-first world. Every change in layout, element positioning, or navigation logic can influence how users interact with websites. This becomes even more crucial for platforms that rely on extensive user engagement, such as e-commerce or content streaming services. One of the most trusted methods for assessing the impact of design changes is A/B testing. In A/B testing, two or more versions of a webpage are shown to different user groups to measure their behavior and determine which variant performs better. It’s not just about aesthetics but also functional usability. This method enables product teams to gather user-centered evidence before fully rolling out a feature, allowing businesses to optimize user interfaces systematically based on observed interactions.

    Despite being a widely accepted tool, the traditional A/B testing process brings several inefficiencies that have proven problematic for many teams. The most significant challenge is the volume of real-user traffic needed to yield statistically valid results. In some scenarios, hundreds of thousands of users must interact with webpage variants to identify meaningful patterns. For smaller websites or early-stage features, securing this level of user interaction can be nearly impossible. The feedback cycle is also notably slow. Even after launching an experiment, it might take weeks to months before results can be confidently assessed due to the requirement of long observation periods. Also, these tests are resource-heavy; only a few variants can be evaluated due to the time and manpower required. Consequently, numerous promising ideas go untested because there’s simply no capacity to explore them all.

    Several methods have been explored to overcome these limitations; however, each has its shortcomings. For example, offline A/B testing techniques depend on rich historical interaction logs, which are not always available or reliable. Tools that enable prototyping and experimentation, such as Apparition and Fuse, have accelerated early design exploration but are primarily useful for prototyping physical interfaces. Algorithms that reframe A/B testing as a search problem through evolutionary models help automate some aspects but still depend on historical or real-user deployment data. Other strategies, like cognitive modeling with GOMS or ACT-R frameworks, require high levels of manual configuration and do not easily adapt to the complexities of dynamic web behavior. These tools, although innovative, have not provided the scalability and automation necessary to address the deeper structural limitations in A/B testing workflows.

    Researchers from Northeastern University, Pennsylvania State University, and Amazon introduced a new automated system named AgentA/B. This system offers an alternative approach to traditional user testing, utilizing Large Language Model (LLM)-based agents. Rather than depending on live user interaction, AgentA/B simulates human behavior using thousands of AI agents. These agents are assigned detailed personas that mimic characteristics such as age, educational background, technical proficiency, and shopping preferences. These personas enable agents to simulate a wide range of user interactions on real websites. The goal is to provide researchers and product managers with an efficient and scalable method for testing multiple design variants without relying on live user feedback or extensive traffic coordination.

    The system architecture of AgentA/B is structured into four main components. First, it generates agent personas based on the input demographics and behavioral diversity specified by the user. These personas are fed into the second stage, where testing scenarios are defined—this includes assigning agents to control and treatment groups and specifying which two webpage versions should be tested. The third component executes the interactions: agents are deployed into real browser environments, where they process the content through structured web data (converted into JSON observations) and take action like real users. They can search, filter, click, and even simulate purchases. The fourth and final component involves analyzing the results, where the system provides metrics like the number of clicks, purchases, or interaction durations to assess design effectiveness.

    During their testing phase, researchers used Amazon.com to demonstrate the tool’s practical value. A total of 100,000 virtual customer personas were generated, and 1,000 were randomly selected from this pool to act as LLM agents in the simulation. The experiment compared two different webpage layouts: one with all product filter options shown in a left-hand panel and another with only a reduced set of filters. The outcome was compelling. The agents interacting with the reduced-filter version performed more purchases and filter-based actions than those with the full list. Also, these virtual agents were significantly more efficient. Compared with one million real user interactions, LLM agents took fewer actions on average to complete tasks, indicating more goal-oriented behavior. These results mirrored the behavioral direction observed in human A/B tests, strengthening the case for AgentA/B as a valid complement to traditional testing.

    This research demonstrates a compelling advancement in interface evaluation. It doesn’t aim to replace live user A/B testing but instead proposes a supplementary method that offers rapid feedback, cost efficiency, and broader experimental coverage. By using AI agents instead of live participants, the system enables product teams to test numerous interface variations that would otherwise be infeasible. This model can significantly compress the design cycle, allowing ideas to be validated or rejected at a much earlier stage. It addresses the practical concerns of long wait times, traffic limitations, and testing resource constraints, making the web design process more data-informed and less prone to bottlenecks.

    Some Key Takeaways from the Research on AgentA/B include:

    • AgentA/B uses LLM-based agents to simulate realistic user behavior on live webpages.  
    • The system allows automated A/B testing with no need for live user deployment.  
    • 100,000 user personas were generated, and 1,000 were selected for live testing simulation.  
    • The system compared two webpage variants on Amazon.com: full filter panel vs. reduced filters.  
    • LLM agents in the reduced-filter group made more purchases and performed more filtering actions.  
    • Compared to 1 million human users, LLM agents showed shorter action sequences and more goal-directed behavior.  
    • AgentA/B can help evaluate interface changes before real user testing, saving months of development time.  
    • The system is modular and extensible, allowing it to be adaptable to various web platforms and testing goals.  
    • It directly addresses three core A/B testing challenges: long cycles, high user traffic needs, and experiment failure rates.

    Check out the Paper. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post AgentA/B: A Scalable AI System Using LLM Agents that Simulate Real User Behavior to Transform Traditional A/B Testing on Live Web Platforms appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMeta AI Introduces Token-Shuffle: A Simple AI Approach to Reducing Image Tokens in Transformers
    Next Article Google DeepMind Research Introduces QuestBench: Evaluating LLMs’ Ability to Identify Missing Information in Reasoning Tasks

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    September 3, 2025
    Machine Learning

    Announcing the new cluster creation experience for Amazon SageMaker HyperPod

    September 3, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Zhipu AI Releases GLM-4.5V: Versatile Multimodal Reasoning with Scalable Reinforcement Learning

    Machine Learning

    CVE-2025-47164 – Microsoft Office Use After Free Remote Code Execution Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-9363 – Linksys Wi-Fi Router Stack-Based Buffer Overflow Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Silent Hill 2 remake developer’s new game Cronos: The New Dawn gets gameplay trailer

    News & Updates

    Highlights

    CVE-2025-48923 – Drupal Toc.Js Cross-Site Scripting (XSS)

    June 26, 2025

    CVE ID : CVE-2025-48923

    Published : June 26, 2025, 2:15 p.m. | 49 minutes ago

    Description : Improper Neutralization of Input During Web Page Generation (‘Cross-site Scripting’) vulnerability in Drupal Toc.Js allows Cross-Site Scripting (XSS).This issue affects Toc.Js: from 0.0.0 before 3.2.1.

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    OpenAI’s most impressive move has nothing to do with AI

    April 18, 2025

    Xbox reminds us that Hollow Knight: Silksong is still coming to Xbox Game Pass

    April 3, 2025

    Distribuzioni GNU/Linux: 5 tra le più insolite e originali!

    April 28, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.