How do you create your own LLM and win The Open LLM Leaderboard with one Yaml file?

Mar 24, 2024

Instead of creating a new LLM from scratch, we can merge two or more existing LLMs into one. Model merging is not new. Some work on it dates back to the 1980s and 1990s. In fact, it was used to create avatars with Stable Diffusion in the web UI. The process involved creating checkpoints(using selfies) and merging them into one resulting model.

The FuseLLM paper presented an approach and tool for merging LLM models. Three models, Llama-2, OpenLLaMA, and MPT were merged into FuseLLM. The resulting model showed improvements in almost all tests, with an average improvement of +5% across 27 tasks.

The easiest way to play with model merging is with the MergeKit library. The library accepts a yaml file with configuration. In the configuration file, we specify source models and the merge method. There are several available methods: SLERP, TIES, DARE, or Passthrough.

MergeKit knows how to merge models with identical architecture and models with different architectures. In the late case, like FuseLLM, MergeKit requires additional pretraining to make this work.

Today, many merged models are at the top of The Open LLM Leaderboard. There are several reasons for this. First, model merging is a super-cost-efficient option for creating new LLMs. Secondly, merged models inherit the capabilities of source models. Suppose you have three models specialized in three different domains. In that case, there is a big chance that the resulting model will perform at the same level in all three domains. That's the difference with fine-tuning, where the fine-tuned model can forget some of the capabilities of the source model.

The problem with all previous approaches is that they require domain knowledge to be correctly implemented. Basically, we need to experiment with models and methods. Would it be nice to do all of this heavy lifting automatically? It's possible—the "Evolutionary Optimization of Model Merging Recipes" paper shows how.

The paper introduced Evolutionary Model Merge - a method which automatically finds the optimal combination of models based on target capabilities. The proposed method is capable of merging models in Parameter Space, Data Flow Space and Both Spaces. In short, Parameter Space merging updates the weights of models built with the same architecture. Data Flow Space merging doesn't change weights but rather changes connections between layers during inference.

Using this approach, EVoLLM-JP-v1-7B was created from Shisa Gamma 7B v1, WizardMath 7B V1.1 and Abel 7B 002. EvoLLM is the best model for the Japanese language.

In summary, to beat all other models in The Open LLM Leaderboard, choose a niche, find a unique, underrepresented language and create a new LLM using model merging.

Resources:

https://arxiv.org/abs/2403.13187 - Evolutionary Optimization of Model Merging Recipes
https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54 - Merge Large Language Models with mergekit
https://arxiv.org/abs/2401.10491 - Knowledge Fusion of Large Language Models
https://github.com/fanqiwan/FuseLLM - FuseLLM & FuseChat Project
https://arxiv.org/abs/2403.13257 - Arcee's MergeKit: A Toolkit for Merging Large Language Models
https://github.com/arcee-ai/mergekit - Tools for merging pretrained large language models.
https://arxiv.org/abs/2403.13187 - Evolutionary Optimization of Model Merging Recipes
https://github.com/SakanaAI/evolutionary-model-merge - Official repository of Evolutionary Optimization of Model Merging Recipes

Shchegrikovich LLM

Discussion about this post