Modern LLM-based recommendation engines. Papers from Amazon, Walmart, Spotify, Bytedance, Ikea and Meta
The key to good recommendations is personalization and understanding a user's context or jobs. LLMs are ideal for making recommendations and generating "personal narratives - stories that resonate with users". From a simple list of items to watch, listen or buy next, we are moving to a world where automatic recommendations deeply resonate with the user. Spotify says that users 4x times more likely to click on recommendations accompanied by explanations. The same positive effect adds commentary alongside recommendations.
Before LLMs, the best practice for creating recommendation engines was to combine traditional inverted index and embedding-based neural retrieval. From an implementation perspective, it means using products such as Solar and BERT-based models. BERT was used to build custom embedding models to find similarities between a search request and products in the catalogue. This is the main difference between web search and product search - products have much less information than a web page.
LLMs, on the other hand, can make end-to-end product recommendations. The paper from Ikea shows how to help LLMs remember product IDs. In this case, we can send the user's request as a prompt directly to LLM and receive a product ID without additional actions. The method is based on fine-tuning and extending the vocabulary with product IDs.
The most advanced methods include user preferences, behaviour and multi-modality. The Meta coined a new term - Preference Discerning, where user preferences are explicitly included in the recommendation system. The system consists of two steps. The first step is to collect preferences from a user. Product reviews, posts, or anything else can lead to understanding preferences. This process is called Preference Approximation in the paper. The next step is preference conditioning, which includes the most relevant preferences to find the best recommendations.
Walmart's 'Triple Modality Fusion' paper treats images, texts, and user behaviour as separate modalities. The paper proposes building adapters for image and behaviour data. These adapters and attention mechanisms fuse all three modalities into the same embedding space to use the result for LLM later.
We can divide recommendation systems (RS) with LLMs into three types: LLMs as RS(fine-tuning), LLM assists as RS enhancers, and AI agents as RS controllers. The last type requires a custom agent, which uses tools such as the rank or retrieve tool. The main idea is to use reasoning and simulation abilities to find the best match. The 'Let Me Do It For You' paper shows how to use surrogate users and attribute-oriented tools for recommendation systems.
Once the recommendation algorithm is ready, it's time to show it to users. The Monolith paper from Bytedance shows some aspects of the operational challenges. The first issue to resolve is online training, or, in other words, how to deliver product and recommendation updates to users. The second is optimizing speed and resources.
References:
Blog: Contextualized Recommendations Through Personalized Narratives using LLMs - Spotify Research : Spotify Research - https://research.atspotify.com/2024/12/contextualized-recommendations-through-personalized-narratives-using-llms/
Paper: Semantic Retrieval at Walmart - https://arxiv.org/abs/2412.04637
Paper: Text Is All You Need: Learning Language Representations for Sequential Recommendation - https://arxiv.org/abs/2305.13731
Paper: Learn by Selling: Equipping Large Language Models with Product Knowledge for Context-Driven Recommendations - https://arxiv.org/abs/2407.20856
Paper: Sequential LLM Framework for Fashion Recommendation - https://arxiv.org/abs/2410.11327
Paper: Triple Modality Fusion: Aligning Visual, Textual, and Graph Data with Large Language Models for Multi-Behavior Recommendations - https://arxiv.org/abs/2410.12228
Paper: Preference Discerning with LLM-Enhanced Generative Retrieval - https://arxiv.org/abs/2412.08604
Paper: Let Me Do It For You: Towards LLM Empowered Recommendation via Tool Learning - https://arxiv.org/abs/2405.15114
Paper: Monolith: Real Time Recommendation System With Collisionless Embedding Table - https://arxiv.org/abs/2209.07663