Demystifying When Pruning Works via Representation Hierarchies

Core question

Why can pruning look safe in one evaluation but fail during generation?

Pruning often preserves fixed-target or non-generative scores, yet autoregressive outputs can drift. This project explains the gap by tracking where perturbations become visible in the model's representation pipeline.

Non-generative stability

Single-step and fixed-target evaluations may stay close because they examine constrained behavior rather than open-ended decoding.

Generative fragility

Small distributional changes can compound across decoding steps, eventually changing token choices and final answers.

Representation lens

Instead of judging pruning with one metric, we compare perturbations in hidden, logit, and probability spaces.

Empirical evidence

Pruning effects differ across task regimes.

The same pruning setup can look acceptable under non-generative metrics while revealing significant performance degradations across generation tasks.

Non-generative metrics under pruning — Non-generative metrics often remain stable after pruning, especially in constrained evaluation settings.

Generative metrics under pruning — Generative quality can degrade because each generated token changes future contexts.

Example of generation-time collapse after pruning

Generation-time divergence

Local perturbations become a decoding trajectory problem.

The homepage now mirrors the README's key story: pruning is not only about a layerwise similarity score, but also about whether representation shifts change the autoregressive path.

Representation hierarchy

We trace pruning through representation spaces.

h → z = Wh → p = softmax(z / T)

We test whether perturbations stay local in hidden states, remain stable in logits, or alter decoding probabilities.

h: hidden states

Embedding space

Measure cosine deviation and decompose pruning deltas relative to dense hidden states.

z = Wh: logits

Logit space

Track whether hidden-state perturbations remain benign or shift the pre-softmax ranking signal that drives generation.

p = softmax(z / T): probabilities

Probability Space

Compare probability-space cosine similarity and KL divergence after temperature-scaled softmax.

Attention representation hierarchy under pruning — Attention sublayers show different sensitivity across hidden, logit, and probability spaces.

MLP representation hierarchy under pruning — MLP sublayers provide a complementary hierarchy view across embedding, logit, and probability spaces.

Theory intuition

Approximation Theorems connect the observed representation spaces.

Local approximations explain why hidden/logit perturbations and probability shifts can diverge.

Theorem 1: Local Deviation Induced by Pruning

For cosine similarity in any representation space, pruning-induced deviation follows a second-order approximation.

Theorem 1 formula for representation-space cosine deviation

Theorem 2: Sensitivity of Probability Space to Logit Perturbations

To compare probability-space and logit-space deviations on the same footing, probability-space deviation is rewritten in terms of the logit variable z.

Theorem 2 formula for probability-space cosine deviation

Theorem 3: Distributional Shift under Pruning

In probability space, KL divergence measures the distributional shift under pruning and can be approximated in closed form.

Theorem 3 formula for KL divergence approximation

Runnable analysis

The repository maps each claim to scripts.

The codebase includes inter-layer dropping, intra-layer sparsification, and representation-analysis scripts for dropped and pruned models.

Angular deviation theorem estimate against observed value

KL divergence theorem estimate against observed value

Read the full paper, derivations, and experiment details.

Code GitHub

Run inter-layer, intra-layer, and representation-level analysis.

Related LLM-Drop

Layer and block dropping foundation for inter-layer pruning.

Citation

Cite this work.

If this project helps your research, please cite the corresponding paper.

@misc{he2026demystifyingpruningworksrepresentation,
  title={Demystifying When Pruning Works via Representation Hierarchies},
  author={Shwai He and Guoheng Sun and Haichao Zhang and Yun Fu and Ang Li},
  year={2026},
  eprint={2603.24652},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2603.24652},
}