The era of artificial intelligence (AI) and Machine Learning (ML) began when we started training computers using data instead of code, but during this time, the applications could only perform the tasks for which they were trained such as classification, object identification, etc.
Then, at the end of 2022, OpenAI released ChatGPT, which could generate content and perform a wide range of tasks. It quickly caught the attention of millions of users worldwide and became a rage. According to the Gartner Hype Cycle for AI, 2023 Generative AI is at the peak of inflated expectations and is expected to reach ‘Plateau of Productivity’ in 5 to 10 years.
Limitations
In Gartner's Hype Cycle for AI, the Plateau of Productivity is reached when enough businesses successfully use AI, its benefits are well-defined, and there are clear guidelines for implementing it. This signifies AI transforming from a hyped concept to a practical tool. To move from the current state to a plateau of productivity we need to look at the limitations of the technology and how agents can help us overcome these limitations.
The current LLMs (large language model) are good at multiple tasks, such as generating emails, essays, sentiment analysis, etc. but they are not very good at certain other tasks, such as math calculations, or multi-step complex problems. Current LLM models suffer from a variety of other limitations, such as
- Hallucinations or misleading outputs
- Technical limitations include limited context length and memory
- Bias in output
- Toxic or harmful speech
- Limited knowledge (ChatGPT 3.5 has a knowledge cutoff date of September 2021)
But if we come to think of it, we humans also have similar challenges. We are prone to giving out false information (intentionally or unintentionally), suffer from bias, have limited knowledge, memory, and may even give out harmful responses. So, how do we go about managing these shortcomings?
- We look for information on the internet and use other tools, such as Excel, Word etc.
- We revise our work again and again to fix errors and enhance it untill we are satisfied with the output.
- We seek feedback from peers and mentors and incorporate the same.
- We work in teams and collaborate with each other.
Therefore, we can use similar concepts to improve outputs from LLM’s, which will take us to the concept of Agents.
What are Generative AI Agents?
We can mitigate many of the above limitations using Agents. Agents execute complex tasks that current stand-alone LLM’s are not able to accomplish. For example, if we have a repository of information on a set of companies and user asks for top three companies by revenue then the steps are.
- Get revenue for all the companies in the repository.
- Sort the companies by revenue.
- Return the top 3 companies.
To accomplish the same agents, combine LLM’s with key modules / components such as planning, memory, and access to tools.
- Planning: Listed out above three steps is part of planning and is executed using an LLM.
- Memory: While performing any complex task, we need to formulate multiple intermediate steps and retain information being processed in each of these steps. Memory helps the agent to retain information while performing multiple steps.
- Tools: Tools are used by the agent to perform required tasks as explained in the next section.
The key features of an agent are to:
- Plan and execute tasks
- Reflect on outcomes
- Use tools to accomplish specified goals
- Require little to no human intervention
Some examples of agents can be.
- Web site builders based on certain inputs and prompts.
- Data analyst to give data insights from data in an excel sheet.
- Travel agent to plan out weekend travel for certain number of days in a specified city.
Tools
As we saw above, tools are one of the most important components for Agents because they help them perform required tasks. Let us explore the concept of tools in a bit more detail.
We perform tasks using tools such as the internet browser, Word, Excel, applications, etc.
Similarly, in the world of generative AI, tools correspond to a set of enablers for an LLM agent to interact with external environments and applications such as Internet search, Wikipedia search, code interpreter, and math engine. Tools can also access databases, knowledge bases, and external models.
For example, Travel agent will need following tools to perform its tasks.
- Search flights
- Book flights
- Search the internet
The below diagram lists a few other tools that can be handy for an agent based on the objectives or goals given.
- Entity Extraction: Extract specific information from an unstructured document, such as the total price, date, and customer name from an invoice. Source documents need not be in a consistent format
- Chat DB: Without SQL or DB schema knowledge, business users can obtain the required information from a database
- Knowledge Bot: Utilizes RAG (retrieval augmented generation) to answer questions based on custom knowledge repository. This repository can be based on unstructured data sources such as documents and files
- Internet Search: Extract key words from user queries and fetch content from internet using any of the available search engines, such as Google, Bing or DuckDuckGo
- Summarization: Get a summary of large documents from the perspective of a specific persona, such as the CEO or CFO
- Program Execution: Utilizes PAL (program-aided language model) to write and execute Python code to generate answers for a specific problem
- Wikipedia Search: Extract key words from user queries and fetch content from Wikipedia
- Comparison: Answer comparison questions such as company performance in this quarter vs last quarter, the best mobile phone under Rs. 10,000, the best performing mutual fund scheme in the equity space
Agentic Design Patterns
We need to orchestrate these tools to perform complex tasks, for which we have included tools as one of the important components of Agents. Now that we understand the concept of tools and agents a bit better, let us explore a few agentic design patterns based on lectures by Andrew NG.
- Reflection: The LLM examines its own work in order to come up with ways to improve it. The crux of reflection is that the model criticizes its own output to improve its response.
- Tool use: The LLM is given tools such as
- Web search - {tool: web-search, query: "coffee maker reviews"}
- Code execution - {tool: python-interpreter, code: "100 * (1+0.07)**12"} or any other function to help it gather information, act, or process data
- Planning: The LLM comes up with a multistep plan to achieve a goal, and then executes it.
- Multi-agent collaboration: More than one AI agent works together, splitting up tasks and discussing and debating ideas, to come up with better solutions than a single agent would.
While the first two design patterns give predictable outcomes, the last two design patterns are more in the experimental phase.
LLM Agent Framework
Now that we understand agents, tools, and agentic design patterns, we can talk about a variation of planning design patterns. At a high level, it works by defining a task or goal and then asking these two questions, followed by a feedback loop.
- Planning: What should be the next action.
- Action: Execute using a router agent and tools.
An LLM agent consists of the following core components:
- Brain / LLM acts as a coordinator.
- Memory (Vector DB) to save various intermediate steps / results of the execution.
There are two main memory types:
- Short-term memory stores context information, which is finite due to the constraint of the context window, which can be passed to an LLM.
- Long-term memory is an external vector store to provide relevant contextual information to the agent.
- Tools / Internet to enable the agent to perform various tasks such as a web search, or Wikipedia search, or program execution tool.
- Policy – Toxicity to build in trust by design by using a policy to ensure that toxic inputs are not processed.
In ‘Planning’ agentic, design pattern agent comes up with a multistep plan to achieve a goal, and then executes it. This pattern is a slight variation wherein instead of thinking about all the sequence of steps the LLM just plans and executes the very next step and iterates repeatedly till the goal is achieved. Here’s a narrative of the diagram below:
Flow Narrative
- Problem / Query is given by the user
- What should be the next action? Agent looks at the query and identifies the immediate next step
- Human in Loop (optional) - User can see and refine the next step planned by the agent
- Refine the task, Additional Inputs – User can either refine the next step planned by the agent or provide additional information required for the query
- Router Agent / Tools – Router agent has a list of tools available along with descriptions for each of the tools. This description is used to identify the right tool for the task and use it to get results
- Goal Accomplished? – Verify if the user query is answered. If no, then go back to the agent iteratively for the next step, or else return the final answer to the user
Conclusion: A World Aided by Intelligent Agents
The future of generative AI is collaborative. We can expect to see a blossoming of intelligent agents, each with specialized functionalities, working alongside humans to achieve remarkable feats. Imagine a doctor leveraging an agent to research personalized treatment options for a patient, a designer collaborating with an agent to create innovative products, or a customer service agent seamlessly supported by an agent that anticipates and respond to customer needs.
The possibilities are vast. Upcoming innovations might involve agents that can reason and adapt in real-time, fostering even more natural and productive human-agent partnerships. We can also expect to see the emergence of use cases that span across various industries, from scientific discovery to artistic creation.
For companies looking to leverage generative AI, intelligent agents offer a strategic upgrade. Just like Jarvis in Iron Man, these agents won't replace human ingenuity; they'll empower it. By integrating agents into their AI solutions, companies can unlock a new level of efficiency, personalization, and problem-solving capabilities. This future holds immense potential for innovation and progress, and generative AI agents will undoubtedly play a pivotal role in shaping it.
References / Further Readings
- LLM Agents
- TedxPSUT: Generative AI is just the beginning AI agents are what comes next: Daud Abdel Hadi
- Andrew NG @Sequoia Capital: What’s next for AI agentic workflows
- Agentic Design Patterns Part 1
- Agentic Design Patterns Part 2, Reflection
- Agentic Design Patterns Part 3, Tool Use
- Agentic Design Patterns Part 4, Planning
- Agentic Design Patterns Part 5, Multi-Agent Collaboration
- What’s New in Artificial Intelligence from the 2023 Gartner Hype Cycle