Neural Lexical Search with Learned Sparse Retrieval

1University of Amsterdam, 2Cohere, 3University of Glasgow, 4ISTI-CNR, 5Johns Hopkins University

Abstract

Learned Sparse Retrieval (LSR) techniques use neural machinery to represent queries and documents as learned bags of words. In contrast with other neural retrieval techniques, such as generative retrieval and dense retrieval, LSR has been shown to be a remarkably robust, transferable, and efficient family of methods for retrieving high-quality search results. This half-day tutorial aims to provide an extensive overview of LSR, ranging from its fundamentals to the latest emerging techniques. By the end of the tutorial, attendees will be familiar with the important design decisions of an LSR system, know how to apply them to text and other modalities, and understand the latest techniques for retrieving with them efficiently. Website: https://lsr-tutorial.github.io

SIGIR 2025 Tutorial

Part 1: Fundamentals

Section Duration
Introduction to LSR 20 min
Datasets and Evaluation 10 min
LSR Framework 20 min
Training LSR 20 min
Text LSR (Colab) 25 min

Part 2: Emerging Topics

Section Duration
Multilingual LSR 20 min
Multimodal LSR (Colab) 20 min
Indexing & Efficiency 25 min
Hybrid Dense-Sparse Retrieval 15 min
Conclusion 5 min

References

Part 1 – Fundamentals

1. Introduction to LSR

2. Datasets and Evaluation

3. LSR Framework

4. Text LSR

5. Training LSR Models

Part 2 – Emerging Topics

1. Multilingual LSR

2. Multimodal LSR

3. Indexing & Efficient LSR

4. Hybrid Dense-Sparse Retrieval