Anatomy of AI agents.
I greatly like this definition by @dharmesh- "Agent AI: Software that uses artificial intelligence to pursue a specified goal. It accomplishes this by decomposing the goal into actionable tasks, monitoring its progress, and engaging with digital resources and other agents as necessary." So, if you provide a task, an AI Agent will figure out what needs to be done to achieve this goal. A good example might be the writing of a blog post. In this case, we must define only the subject and wait until the AI Agent returns the result.
In the simplest form, we need to assemble only 4 elements - Observation Receiver, Memory, Planner and Action Executor. And connect these 4 components to the Environment, which can be anything from the real world to a game. For instance, JARVIS-1 is an agent for Minecraft, and Minecraft is an Environment in this case.
Observation Receiver - receives queries from the user, results of actions executed or any other events we believe might be interesting for the Agent. Memory is responsible for storing context for the Planner. Planner might be the most challenging component to implement. Planner is responsible for splitting a task into actions to execute and logical inferences. Once Planner produces commands Action Executor takes care of them.
Wait a minute, where is LLM? Observation Receiver needs to be good at creating history records for context, which means translating observations into meaningful pieces of content for later usage by Planner. This is why Observation Receiver needs to use LLM for summarization, but it also needs to have multimodality capabilities in some cases. Due to the limitations of context window by LLMs Memory component has to provide the most relevant content by leveraging MemGPT architecture or SPR. Planner leverages LLMs abilities to reason.
The real power of AI Agents comes from their ability to communicate with each other to achieve a common goal. MechAgents Paper describes how Chat Manager, Critic, Admin, Planner, Executor, Scientist and Engineer solve one problem in the mechanics space.
I think there will be further developments in UX for AI Agents. So far, many AI Agents have a UI of chatbots, which is not always usable, especially for multiple agents and considering the async communication model. Game dev might be a good source of inspiration for UX/UI for AI Agents.
References:
https://agent.ai/p/agent-ai-excitement - What Is Agent AI And Why All The Excitement?
https://www.ionio.ai/blog/what-is-llm-agent-ultimate-guide-to-llm-agent-with-technical-breakdown - What is LLM Agent? Ultimate Guide to LLM Agent
https://agentgpt.reworkd.ai/ - AgentGPT is an autonomous AI Agent platform
https://arxiv.org/abs/2311.05997 - JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
https://arxiv.org/abs/2304.03442 - Generative Agents: Interactive Simulacra of Human Behavior
https://memgpt.ai/ - MemGPT: Towards LLMs as Operating Systems
YouTube - Don't Use MemGPT!! This is way better (and easier)! Use Sparse Priming Representations!
https://arxiv.org/abs/2311.08166 - MechAgents: Large language model multi-agent collaborations can solve mechanics problems, generate new data, and integrate knowledge