End-to-end optimisation of AI Agents

Feb 02, 2025

Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning

A typical AI Agent consists of many execution steps, and as a result, there are two approaches for optimizing AI Agents: optimizing each step individually and optimizing end-to-end. Optimization in isolation might lead to misalignment between individual step goals and the overall goal of an agent.

The first optimization framework from the "A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions" paper is based on agents. The optimization process starts with the execution of a target agent. We need this step to establish a baseline. Once the baseline is known, we can begin to optimize it interactively. Each optimization iteration requires a well-defined hypothesis. Then, we convert the hypothesis to modification. Modification can change the role of the target agent or its workflow. After the evaluation, we discard or accept the hypothesis and related modifications.

As was mentioned, the first approach is a multi-agent framework. The paper proposes the following agents: Refinement Agent (proposes improvements), Execution Agent (runs the target agent), Evaluation Agent (uses Llama to evaluate the output from the target agent), Modification Agent (applies changes to the target agent) and Documentation Agent.

The algorithm was tested on several target agents: Market Research Agent, AI Architect Agent, Outreach Agent, and others. All of them showed improvements from 70% to 104%.

An alternative approach is to use Reinforcement Learning, as was shown in the "Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning" paper. The paper focuses on optimizing RAG systems, but we can think of an RAG system as a multi-agent workflow. From this perspective, the RAG consists of the following agents: Query Rewriter, Selector, and Generator. In this approach, the retriever is not an agent. The whole RAG pipeline can be evaluated based on a unified reward, such as the F1 score.

The MMOA-RAG (Multi-Module joint Optimization Algorithm) framework focuses on collaborative optimization. The proposed approach is based on Multi-Agent Proximal Policy Optimization with an extension for a multi-agent environment. In the RAG scenario, we can select the best documents for the answer but generate a low-quality answer. This is why shared global reward is important. It helps to promote cooperation among all agents.

References

Paper: A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops - https://arxiv.org/abs/2412.17149
Paper: Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning - https://arxiv.org/abs/2501.15228

Shchegrikovich LLM

Discussion about this post