Imagine a Roomba that only told you your floors were dirty, but didn’t actually clean them for you. Helpful? Debatable. Annoying? Very.
When ChatGPT first arrived, that was about where things stood. It could describe how to do math problems and discuss theory endlessly, but it couldn’t reliably handle a simple arithmetic question. Connecting it with an external application, however (like an online calculator) significantly improved its abilities—just like connecting Roomba’s sensors with its robot body makes it capable of actually cleaning your floor.
That simple discovery was a precursor to an evolution that’s now occurring in generative AI where large language models (LLM) power AI agents that can pursue complex goals with limited direct supervision.
In these systems, the LLM serves as the brain while additional algorithms and tools are layered on top to accomplish key tasks ranging from generating software development plans to booking plane tickets. Proof-of-concepts like AutoGPT offer examples, such as a marketing agent that looks for Reddit comments with questions about a given product and then answers them autonomously. At their best, these agents hold the promise of pursuing complex goals with minimal direct oversight—and that means removing toil and mundane linear tasks while allowing us to focus on higher-level thinking. And when you connect AI agents with other AI agents to make multi-agent systems, like we’re doing with GitHub Copilot Workspace, the realm of possibility grows exponentially.
All this is to say, if you’re a developer you’ll likely start encountering more and more instances of agentic AI in the tools you use (including on GitHub) and in the news you read. So, this feels like as good a time as any to dive into exactly what agentic AI and AI agents are, how they work on a technical level, some of the technical challenges, and what this means for software development.
💡 Why prompt engineering matters with today’s LLMs
Since the beginning, ChatGPT has been able to retain the context needed to answer follow-up questions to an initial prompt. You could ask a question, for instance, and if the model gave the wrong answer you could tweak your prompt, providing the model with the context needed to build upon its previous answer. Developers saw the potential and soon began prompt chaining, or building prompts that feed the output of one prompt into the next prompt. But that makes getting the prompt right critically important.
The best prompts are clear and precise about the intent behind them. If you ask GitHub Copilot or ChatGPT to “write a calculator application,†for instance, you express clarity about what you want—but not precision about how to get there. Do you want an iOS or Android application? Something for Windows? The prompt you offered is imprecise when it comes to answering these questions.
This is where the idea of prompt engineering comes into play: by investing in backend systems that help an underlying LLM glean as much context as possible about a given prompt (like the neighboring tabs in your IDE or the language you’re coding in), developers and researchers are improving the ability of generative AI models—and AI agents—to derive clarity and precision from prompts.
Learn more about prompt engineering >
What are AI agents and agentic AI?
Agentic AI refers to artificial intelligence capable of making decisions, planning, and adapting to new information in real time. AI agents learn and enhance their performance through feedback, utilizing advanced algorithms and sensory inputs to execute tasks and engage with their environments.
According to Lilian Weng, the head of safety systems at OpenAI and their former head of applied AI research, an AI agent features three key characteristics:
Planning: an AI agent is capable of creating a step-by-step plan with discrete milestone goals from a prompt while learning from mistakes via a reward system to improve future outputs.
Memory: an AI agent combines the ability to use short-term memory to process chat-based prompts and follow-up prompts with longer-term data retention and recall (often via retrieval augmented generation, or RAG).
Tool use: an agent can query APIs to request additional information or execute an action based on an end user’s request.
What are examples of open source AI agents?
While there are plenty of new proprietary AI agents arriving on the market, there are also numerous examples of open source AI agent projects on GitHub:
AutoGPT, which seeks to make OpenAI’s GPT-4 generative AI model fully autonomous.
Clippy, which helps developers plan, write, debug, and test code.
DemoGPT, which can be used to generate demos of applications.
There are plenty more examples—and you can find a great roundup of them on GitHub.
What are the different types of AI agents?
AI agents range from simple reflex agents to sophisticated learning agents, and each has its strengths and weaknesses.
As this field continues to evolve, more types of AI agents will likely emerge. Whether you’re looking to build your own AI agent or understand a bit more about how GitHub uses AI to improve developer tools, here’s a list of the different types of AI agents you’ll most commonly encounter:
Characteristics
Examples
Reflex agent
Uses a model of the world to make decisions. They can remember some past states and make decisions based on both current and past experiences.
Linting tools like ESLint or Pylint that apply a set of predefined rules to evaluate code.
Goal-based agent
Achieves specific goals using their knowledge and the stated goal (or prompt) to make decisions.
Advanced IDEs with AI-powered code completion such as GitHub Copilot.
Utility-based agent
Aims to achieve a goal in the best way possible, as determined by evaluating different possible approaches.
Tools that prioritize and assign bugs based on severity, impact, and developer workloads.
Learning agent
Improves performance over time by learning from experiences. They consist of a learning element that makes improvements to the AI agent’s outputs based on user feedback and a performance element that uses the learned knowledge.
Code completion tools, such as GitHub Copilot, that improve over time.
Common technical challenges with AI agents today
While there’s a lot of promise in agentic AI, there are two core industry-wide technical challenges when developing agentic AI systems today:
We can’t deterministically predict what an AI model will say or do next, and that makes explaining what and how their inputs work (that is, the combination of the prompt and the training data they use to generate a response) challenging.
We don’t have models that can fully explain their outputs, though work is being done to offer greater transparency by enabling them to explain how they arrived at a solution.
As a result, it is difficult to debug agentic systems and to create evaluation frameworks to understand their effectiveness, efficiency, and impact.
AI agents are difficult to debug, because they are prone to solve problems in unexpected ways. This is a nuance that has long been known in—of all things—chess, where machines make moves that seem counterintuitive to their human opponents, but can win games. The more sophisticated an agent becomes, the longer you expect it to run, the more difficult it is to debug—especially when you consider how quickly a log can grow.
AI agents are also difficult to evaluate in a repeatable way that shows progress without employing artificial constraints. This is especially challenging as the core capabilities of the underlying LLMs continue to rapidly improve, which makes it difficult to know whether your approach has improved results or if it’s simply the underlying model. Developers often encounter problems in choosing the right metrics, benchmarking overall performance against a set heuristic or rubric, and collecting end-user feedback and telemetry to evaluate agent output efficacy.
💡 What to keep in mind when building AI agents
As developers, we’re often used to writing imperative-style code as opposed to declarative-style code. We may want an app that does X, Y, and Z, and so we’ll write code that outlines the steps to perform those tasks.
Agents, on the other hand, are far more autonomous: you’ll often just need to declare the goal—the desired end state—and the agent develops the plan to achieve it.
How we think about AI agents at GitHub
Our focus at GitHub has been to rethink the developer “inner loop†as collaboration with AI. That means AI agents that can reliably build, test, and debug code. It means reducing the energy needed to get started and empowering more people to learn and contribute to code bases. We know that it requires tackling every part of the developer’s day where they run into friction, and that’s where multi-agent systems like Copilot Workspace and code scanning autofix come in.
Earlier this year, we launched a technical preview of Copilot Workspace, our Copilot-native developer environment. It’s a multi-agent system—a network of agents that interact and collaborate to achieve a larger goal. Each agent in a system typically has specialized skills or functions, and they can communicate and coordinate with one another to solve complex problems more efficiently than a single agent could.
For Copilot Workspace, that means a developer can ask Copilot to help create an application, and it will not only generate a software development plan, but also the code, pull requests, and more, needed to achieve that plan.
There’s more in the works to make developers more productive and make their days a little bit (or a lot) better.
Why this matters (and some final thoughts)
There’s a lot of buzz around AI agents—and for good reason. As they continue to evolve, they’ll be able to work together to handle more complex tasks, which means less upfront cost of prompt engineering for users. For developers though, the benefit of AI agents is simple: they can allow developers to focus on higher-value activities.
When you give LLMs access to tools, memory, and plans to create agents, they become a bit like LEGO blocks that you can piece together to create more advanced systems. That’s because, at their best, AI agents are modular, adaptable, interoperable, and scalable, like LEGO blocks. Just as a child can transform a pile of colorful LEGO blocks into anything from a towering castle to a sleek spaceship, developers can use AI agents to build multi-agent systems that promise to revolutionize software development.
At GitHub, we’re excited about what AI agents, agentic AI, and multi-agent systems mean more broadly for software developers. With agentic AI coding tools like Copilot Workspace and code scanning autofix, developers will be able to build software that’s more secure, faster—and that’s just the beginning.
The post What are AI agents and why do they matter? appeared first on The GitHub Blog.
Source: Read MoreÂ