Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      How To Prevent WordPress SQL Injection Attacks

      June 13, 2025

      Java never goes out of style: Celebrating 30 years of the language

      June 12, 2025

      OpenAI o3-pro available in the API, BrowserStack adds Playwright support for real iOS devices, and more – Daily News Digest

      June 12, 2025

      Creating The “Moving Highlight” Navigation Bar With JavaScript And CSS

      June 11, 2025

      Microsoft Copilot’s own default configuration exposed users to the first-ever “zero-click” AI attack, but there was no data breach

      June 13, 2025

      Sam Altman says “OpenAI was forced to do a lot of unnatural things” to meet the Ghibli memes demand surge

      June 13, 2025

      5 things we didn’t get from the Xbox Games Showcase, because Xbox obviously hates me personally

      June 13, 2025

      Minecraft Vibrant Visuals finally has a release date and it’s dropping with the Happy Ghasts

      June 13, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      QAQ-QQ-AI-QUEST

      June 13, 2025
      Recent

      QAQ-QQ-AI-QUEST

      June 13, 2025

      JS Dark Arts: Abusing prototypes and the Result type

      June 13, 2025

      Helpful Git Aliases To Maximize Developer Productivity

      June 13, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Microsoft Copilot’s own default configuration exposed users to the first-ever “zero-click” AI attack, but there was no data breach

      June 13, 2025
      Recent

      Microsoft Copilot’s own default configuration exposed users to the first-ever “zero-click” AI attack, but there was no data breach

      June 13, 2025

      Sam Altman says “OpenAI was forced to do a lot of unnatural things” to meet the Ghibli memes demand surge

      June 13, 2025

      5 things we didn’t get from the Xbox Games Showcase, because Xbox obviously hates me personally

      June 13, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»AgentA/B: A Scalable AI System Using LLM Agents that Simulate Real User Behavior to Transform Traditional A/B Testing on Live Web Platforms

    AgentA/B: A Scalable AI System Using LLM Agents that Simulate Real User Behavior to Transform Traditional A/B Testing on Live Web Platforms

    April 26, 2025

    Designing and evaluating web interfaces is one of the most critical tasks in today’s digital-first world. Every change in layout, element positioning, or navigation logic can influence how users interact with websites. This becomes even more crucial for platforms that rely on extensive user engagement, such as e-commerce or content streaming services. One of the most trusted methods for assessing the impact of design changes is A/B testing. In A/B testing, two or more versions of a webpage are shown to different user groups to measure their behavior and determine which variant performs better. It’s not just about aesthetics but also functional usability. This method enables product teams to gather user-centered evidence before fully rolling out a feature, allowing businesses to optimize user interfaces systematically based on observed interactions.

    Despite being a widely accepted tool, the traditional A/B testing process brings several inefficiencies that have proven problematic for many teams. The most significant challenge is the volume of real-user traffic needed to yield statistically valid results. In some scenarios, hundreds of thousands of users must interact with webpage variants to identify meaningful patterns. For smaller websites or early-stage features, securing this level of user interaction can be nearly impossible. The feedback cycle is also notably slow. Even after launching an experiment, it might take weeks to months before results can be confidently assessed due to the requirement of long observation periods. Also, these tests are resource-heavy; only a few variants can be evaluated due to the time and manpower required. Consequently, numerous promising ideas go untested because there’s simply no capacity to explore them all.

    Several methods have been explored to overcome these limitations; however, each has its shortcomings. For example, offline A/B testing techniques depend on rich historical interaction logs, which are not always available or reliable. Tools that enable prototyping and experimentation, such as Apparition and Fuse, have accelerated early design exploration but are primarily useful for prototyping physical interfaces. Algorithms that reframe A/B testing as a search problem through evolutionary models help automate some aspects but still depend on historical or real-user deployment data. Other strategies, like cognitive modeling with GOMS or ACT-R frameworks, require high levels of manual configuration and do not easily adapt to the complexities of dynamic web behavior. These tools, although innovative, have not provided the scalability and automation necessary to address the deeper structural limitations in A/B testing workflows.

    Researchers from Northeastern University, Pennsylvania State University, and Amazon introduced a new automated system named AgentA/B. This system offers an alternative approach to traditional user testing, utilizing Large Language Model (LLM)-based agents. Rather than depending on live user interaction, AgentA/B simulates human behavior using thousands of AI agents. These agents are assigned detailed personas that mimic characteristics such as age, educational background, technical proficiency, and shopping preferences. These personas enable agents to simulate a wide range of user interactions on real websites. The goal is to provide researchers and product managers with an efficient and scalable method for testing multiple design variants without relying on live user feedback or extensive traffic coordination.

    The system architecture of AgentA/B is structured into four main components. First, it generates agent personas based on the input demographics and behavioral diversity specified by the user. These personas are fed into the second stage, where testing scenarios are defined—this includes assigning agents to control and treatment groups and specifying which two webpage versions should be tested. The third component executes the interactions: agents are deployed into real browser environments, where they process the content through structured web data (converted into JSON observations) and take action like real users. They can search, filter, click, and even simulate purchases. The fourth and final component involves analyzing the results, where the system provides metrics like the number of clicks, purchases, or interaction durations to assess design effectiveness.

    During their testing phase, researchers used Amazon.com to demonstrate the tool’s practical value. A total of 100,000 virtual customer personas were generated, and 1,000 were randomly selected from this pool to act as LLM agents in the simulation. The experiment compared two different webpage layouts: one with all product filter options shown in a left-hand panel and another with only a reduced set of filters. The outcome was compelling. The agents interacting with the reduced-filter version performed more purchases and filter-based actions than those with the full list. Also, these virtual agents were significantly more efficient. Compared with one million real user interactions, LLM agents took fewer actions on average to complete tasks, indicating more goal-oriented behavior. These results mirrored the behavioral direction observed in human A/B tests, strengthening the case for AgentA/B as a valid complement to traditional testing.

    This research demonstrates a compelling advancement in interface evaluation. It doesn’t aim to replace live user A/B testing but instead proposes a supplementary method that offers rapid feedback, cost efficiency, and broader experimental coverage. By using AI agents instead of live participants, the system enables product teams to test numerous interface variations that would otherwise be infeasible. This model can significantly compress the design cycle, allowing ideas to be validated or rejected at a much earlier stage. It addresses the practical concerns of long wait times, traffic limitations, and testing resource constraints, making the web design process more data-informed and less prone to bottlenecks.

    Some Key Takeaways from the Research on AgentA/B include:

    • AgentA/B uses LLM-based agents to simulate realistic user behavior on live webpages.  
    • The system allows automated A/B testing with no need for live user deployment.  
    • 100,000 user personas were generated, and 1,000 were selected for live testing simulation.  
    • The system compared two webpage variants on Amazon.com: full filter panel vs. reduced filters.  
    • LLM agents in the reduced-filter group made more purchases and performed more filtering actions.  
    • Compared to 1 million human users, LLM agents showed shorter action sequences and more goal-directed behavior.  
    • AgentA/B can help evaluate interface changes before real user testing, saving months of development time.  
    • The system is modular and extensible, allowing it to be adaptable to various web platforms and testing goals.  
    • It directly addresses three core A/B testing challenges: long cycles, high user traffic needs, and experiment failure rates.

    Check out the Paper. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

    The post AgentA/B: A Scalable AI System Using LLM Agents that Simulate Real User Behavior to Transform Traditional A/B Testing on Live Web Platforms appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleMeta AI Introduces Token-Shuffle: A Simple AI Approach to Reducing Image Tokens in Transformers
    Next Article Google DeepMind Research Introduces QuestBench: Evaluating LLMs’ Ability to Identify Missing Information in Reasoning Tasks

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    June 13, 2025
    Machine Learning

    Google AI Unveils a Hybrid AI-Physics Model for Accurate Regional Climate Risk Forecasts with Better Uncertainty Assessment

    June 13, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-5364 – Campcodes Online Hospital Management System SQL Injection

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-4118 – Weitong Mall Product History Handler Improper Access Controls Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    How to Use PostgreSQL in Django

    Development

    CVE-2025-31245 – Apple macOS Denial of Service Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2025-3757 – OpenPubkey Invalid JWS Signature Verification

    May 13, 2025

    CVE ID : CVE-2025-3757

    Published : May 13, 2025, 5:16 p.m. | 1 hour, 48 minutes ago

    Description : Versions of OpenPubkey library prior to 0.10.0 contained a vulnerability that would allow a specially crafted JWS to bypass signature verification.

    Severity: 0.0 | NA

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    CVE-2025-28038 – TOTOLINK EX1200T Remote Command Execution Vulnerability

    April 22, 2025

    I finally found a power bank with a wireless charging pad, and it’s sustainably-made

    April 28, 2025

    Pulumi IDP provides developers with faster access to self-service cloud infrastructure provisioning

    May 8, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.