publications | Shiwei Liu

2025

Preprint

Diffusion language models know the answer before decoding

Pengxiang Li, Yefan Zhou, Dilxat Muhtar, and 6 more authors

arXiv preprint arXiv:2508.19982, 2025
NeurIPS2025

AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs

Di He, Ajay Jaiswal, Songjun Tu, and 4 more authors

arXiv preprint arXiv:2506.14562, 2025
NeurIPS2025

The Curse of Depth in Large Language Models

Wenfang Sun, Xinyuan Song, Pengxiang Li, and 3 more authors

arXiv preprint arXiv:2502.05795, 2025
Preprint

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

Kechen Li, Wenqi Zhu, Coralia Cartis, and 2 more authors

arXiv preprint arXiv:2502.20545, 2025
Preprint

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Tianjin Huang, Haotian Hu, Zhenyu Zhang, and 8 more authors

arXiv preprint arXiv:2502.17055, 2025
Preprint

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Xialie Zhuang, Zhikai Jia, Jianjin Li, and 4 more authors

arXiv preprint arXiv:2502.07490, 2025
ICLR2025

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Tianjin Huang, Ziquan Zhu, Gaojie Jin, and 3 more authors

arXiv preprint arXiv:2501.06842, 2025
TPAMI

Revisiting Flatness-aware Optimization in Continual Learning with Orthogonal Gradient Projection

Enneng Yang, Li Shen, Zhenyi Wang, and 4 more authors

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

2024

ICLR2025

Mix-ln: Unleashing the power of deeper layers by combining pre-ln and post-ln

Pengxiang Li, Lu Yin, and Shiwei Liu

arXiv preprint arXiv:2412.13795, 2024
ICLR2025

Composable interventions for language models

Arinbjorn Kolbeinsson, Kyle O’Brien, Tianjin Huang, and 8 more authors

arXiv preprint arXiv:2407.06483, 2024
Preprint

OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning

Pengxiang Li, Lu Yin, Xiaowei Gao, and 1 more author

arXiv preprint arXiv:2405.18380, 2024
CPAL2025

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

Zhenyu Zhang, Ajay Jaiswal, Lu Yin, and 4 more authors

arXiv preprint arXiv:2407.08296, 2024
Preprint

From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients

Ajay Jaiswal, Lu Yin, Zhenyu Zhang, and 4 more authors

arXiv preprint arXiv:2407.11239, 2024
MLSys2024

Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV Cache

Zhenyu Zhang, Shiwei Liu, Runjin Chen, and 3 more authors

Proceedings of Machine Learning and Systems, 2024
ICML2024

Outlier weighed layerwise sparsity (owl): A missing secret sauce for pruning llms to high sparsity

Lu Yin, You Wu, Zhenyu Zhang, and 7 more authors

2024
ICML2024

CaM: Cache Merging for Memory-efficient LLMs Inference

Yuxin Zhang, Yuxuan Du, Gen Luo, and 4 more authors

In Forty-first International Conference on Machine Learning, 2024
ICML2024

Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\backslashtextit {Irreversibly} and \backslashtextit {Monotonically} $ Impairs“Difficult" Downstream Tasks in LLMs

Lu Yin, AJAY KUMAR JAISWAL, Shiwei Liu, and 2 more authors

In Forty-first International Conference on Machine Learning, 2024
Interspeech2024

Dynamic Data Pruning for Automatic Speech Recognition

Qiao Xiao, Pingchuan Ma, Adriana Fernandez-Lopez, and 7 more authors

arXiv preprint arXiv:2406.18373, 2024
Interspeech2024

MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization

Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, and 5 more authors

arXiv preprint arXiv:2406.17614, 2024
ICLR2024

Dynamic sparse no training: Training-free fine-tuning for sparse llms

Yuxin Zhang, Lirui Zhao, Mingbao Lin, and 6 more authors

arXiv preprint arXiv:2310.08915, 2024
ICLR2024

Adamerging: Adaptive model merging for multi-task learning

Enneng Yang, Zhenyi Wang, Li Shen, and 4 more authors

arXiv preprint arXiv:2310.02575, 2024

2023

IJCV

Don’t be so dense: Sparse-to-sparse gan training without sacrificing performance

Shiwei Liu, Yuesong Tian, Tianlong Chen, and 1 more author

International Journal of Computer Vision, 2023
ICLR2023

More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity

Shiwei Liu, Tianlong Chen, Xiaohan Chen, and 7 more authors

arXiv preprint arXiv:2207.03620, 2023
ICLR2023

Revisiting pruning at initialization through the lens of ramanujan graph

Duc NM Hoang, Shiwei Liu, Radu Marculescu, and 1 more author

2023
ICLR2023

Sparse moe as the new dropout: Scaling dense and self-slimmable transformers

Tianlong Chen, Zhenyu Zhang, Ajay Jaiswal, and 2 more authors

2023
ICLR2023

Sparsity may cry: Let us fail (current) sparse neural networks together!

Shiwei Liu, Tianlong Chen, Zhenyu Zhang, and 4 more authors

2023

2022

LoG2022

You can have better graph neural networks by not training weights at all: Finding untrained gnns tickets

Tianjin Huang, Tianlong Chen, Meng Fang, and 8 more authors

2022
ICLR2022

The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse training

Shiwei Liu, Tianlong Chen, Xiaohan Chen, and 4 more authors

arXiv preprint arXiv:2202.02643, 2022
ICLR2022

Deep ensembling with no overhead for either training or testing: The all-round blessings of dynamic sparsity

Shiwei Liu, Tianlong Chen, Zahra Atashgahi, and 6 more authors

arXiv preprint arXiv:2106.14568, 2022

2021

NeurIPS2021

Sparse training via boosting pruning plasticity with neuroregeneration

Shiwei Liu, Tianlong Chen, Xiaohan Chen, and 7 more authors

Advances in Neural Information Processing Systems, 2021
ICML2021

Do we actually need dense over-parameterization? in-time over-parameterization in sparse training

Shiwei Liu, Lu Yin, Decebal Constantin Mocanu, and 1 more author

In International Conference on Machine Learning, 2021
ICML2021

Selfish sparse rnn training

Shiwei Liu, Decebal Constantin Mocanu, Yulong Pei, and 1 more author

In International Conference on Machine Learning, 2021