LLM-Drop: Uncovering Redundancy in Transformers

Motivation

Not every Transformer component is equally necessary.

LLM-Drop examines architectural redundancy across Transformer blocks and sublayers, then turns that analysis into reproducible dropping and benchmarking pipelines.

Unified dropping study The project compares block dropping, attention-layer dropping, MLP-layer dropping, and joint layer dropping under one framework.

Practical model editing Dropped configurations are represented through custom model files and updated configs, making the resulting checkpoints loadable with Transformers.

Efficiency evaluation The repo includes task performance, inference speed, and optional AWQ/GPTQ quantization workflows.

Pipeline

From importance estimation to dropped checkpoints.

The pipeline estimates module importance, selects layers or blocks to remove, writes updated model configs, and benchmarks the resulting model.

Block Drop Remove full Transformer blocks when both attention and MLP sublayers are selected.

Layer Drop Drop attention or MLP sublayers independently to study subcomponent redundancy.

Joint Layer Drop Combine attention and MLP dropping decisions for a broader compression schedule.

Quantization Evaluate dropped models with optional AWQ and GPTQ post-training quantization.

Resources

Paper, code, checkpoints, and benchmarks.

The repository includes the dropping pipeline, benchmark scripts, custom dropped-model classes, and released checkpoints.

Paper OpenReview

Read the TMLR version of the paper.

Code GitHub

Run block/layer dropping, speed tests, and LM-Eval benchmarks.

Models Hugging Face

Download released dropped-model checkpoints.

News

Project updates.

Feb 2026: The paper is published in Transactions on Machine Learning Research.
May 2025: The project received the Qualcomm Innovation Fellowship North America award for efficiency-optimized Transformer architectures.
Sep 2024: Dropped-model checkpoints were released on Hugging Face.

Citation

Cite this work.

If LLM-Drop helps your research, please cite the corresponding paper.

@article{
  he2026uncovering,
  title={Uncovering the Redundancy in Transformers via a Unified Study of Layer Dropping},
  author={Shwai He and Guoheng Sun and Zheyu Shen and Ang Li},
  journal={Transactions on Machine Learning Research},
  issn={2835-8856},
  year={2026},
  url={https://openreview.net/forum?id=1I7PCbOPfe},
}