Motivation
Not every Transformer component is equally necessary.
LLM-Drop examines architectural redundancy across Transformer blocks and sublayers, then turns that analysis into reproducible dropping and benchmarking pipelines.
Pipeline
From importance estimation to dropped checkpoints.
The pipeline estimates module importance, selects layers or blocks to remove, writes updated model configs, and benchmarks the resulting model.
Resources
Paper, code, checkpoints, and benchmarks.
The repository includes the dropping pipeline, benchmark scripts, custom dropped-model classes, and released checkpoints.
News
Project updates.
- Feb 2026: The paper is published in Transactions on Machine Learning Research.
- May 2025: The project received the Qualcomm Innovation Fellowship North America award for efficiency-optimized Transformer architectures.
- Sep 2024: Dropped-model checkpoints were released on Hugging Face.
Citation
Cite this work.
If LLM-Drop helps your research, please cite the corresponding paper.
@article{
he2026uncovering,
title={Uncovering the Redundancy in Transformers via a Unified Study of Layer Dropping},
author={Shwai He and Guoheng Sun and Zheyu Shen and Ang Li},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2026},
url={https://openreview.net/forum?id=1I7PCbOPfe},
}