arXiv 2026

Demystifying When Pruning Works via Representation Hierarchies

A representation-level view of why pruning can preserve non-generative metrics while still damaging autoregressive generation.

Shuai He, Guoheng Sun, Haichao Zhang, Yun Fu, Ang Li
University of Maryland, College Park and Northeastern University

Core question

Why can pruning look safe in one evaluation but fail during generation?

Pruning often preserves fixed-target or non-generative scores, yet autoregressive outputs can drift. This project explains the gap by tracking where perturbations become visible in the model's representation pipeline.

Non-generative stability

Single-step and fixed-target evaluations may stay close because they examine constrained behavior rather than open-ended decoding.

Generative fragility

Small distributional changes can compound across decoding steps, eventually changing token choices and final answers.

Representation lens

Instead of judging pruning with one metric, we compare perturbations in hidden, logit, and probability spaces.

Empirical evidence

Pruning effects differ across task regimes.

The same pruning setup can look acceptable under non-generative metrics while revealing significant performance degradations across generation tasks.

Non-generative metrics under pruning
Non-generative metrics often remain stable after pruning, especially in constrained evaluation settings.
Generative metrics under pruning
Generative quality can degrade because each generated token changes future contexts.
Example of generation-time collapse after pruning

Generation-time divergence

Local perturbations become a decoding trajectory problem.

The homepage now mirrors the README's key story: pruning is not only about a layerwise similarity score, but also about whether representation shifts change the autoregressive path.

Representation hierarchy

We trace pruning through representation spaces.

h → z = Wh → p = softmax(z / T)

We test whether perturbations stay local in hidden states, remain stable in logits, or alter decoding probabilities.

h: hidden states

Embedding space

Measure cosine deviation and decompose pruning deltas relative to dense hidden states.

z = Wh: logits

Logit space

Track whether hidden-state perturbations remain benign or shift the pre-softmax ranking signal that drives generation.

p = softmax(z / T): probabilities

Probability Space

Compare probability-space cosine similarity and KL divergence after temperature-scaled softmax.

Attention representation hierarchy under pruning
Attention sublayers show different sensitivity across hidden, logit, and probability spaces.
MLP representation hierarchy under pruning
MLP sublayers provide a complementary hierarchy view across embedding, logit, and probability spaces.

Theory intuition

Approximation Theorems connect the observed representation spaces.

Local approximations explain why hidden/logit perturbations and probability shifts can diverge.

Theorem 1: Local Deviation Induced by Pruning

For cosine similarity in any representation space, pruning-induced deviation follows a second-order approximation.

Theorem 1 formula for representation-space cosine deviation

Theorem 2: Sensitivity of Probability Space to Logit Perturbations

To compare probability-space and logit-space deviations on the same footing, probability-space deviation is rewritten in terms of the logit variable z.

Theorem 2 formula for probability-space cosine deviation

Theorem 3: Distributional Shift under Pruning

In probability space, KL divergence measures the distributional shift under pruning and can be approximated in closed form.

Theorem 3 formula for KL divergence approximation

Runnable analysis

The repository maps each claim to scripts.

The codebase includes inter-layer dropping, intra-layer sparsification, and representation-analysis scripts for dropped and pruned models.

Full vocabulary top-token behavior under pruning
MCQ answer-option subspace analysis

MCQ subspaces

Contrast full-vocabulary behavior with answer-option subspaces.

Final hidden and logit behavior during generation
Final vocabulary probability behavior during generation

Generation metrics

Compare hidden/logit stability with vocabulary-space probability shifts.

Citation

Cite this work.

If this project helps your research, please cite the corresponding paper.

@misc{he2026demystifyingpruningworksrepresentation,
  title={Demystifying When Pruning Works via Representation Hierarchies},
  author={Shwai He and Guoheng Sun and Haichao Zhang and Yun Fu and Ang Li},
  year={2026},
  eprint={2603.24652},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2603.24652},
}