Hi, I am an PI at ELLIS Institute Tübingen and a group leader at Max Planck Institute for Intelligent Systems. Previously, I was a Royal Society Newton International Fellow at University of Oxford, and a Junior Research Fellow (JRF) at Somerville College. I was a postdoctoral fellow at UT Austin, working with Atlas Wang. I obtained my Ph.D. at the Eindhoven University of Technology (TU/e), the Netherlands, under the supervision of Mykola Pechenizkiy and Decebal Constantin Mocanu. My email address is sliu(at)tue.ellis.eu

Research Interests

My overarching research goal is to advance the frontiers of artificial intelligence from multiple perspectives, including but not limited to enhance the capabilities of large foundation models (pre-training and post-training), improve the efficiency and accessibility of AI, develop deeper fundamental understanding of deep learning behavior, and exploring novel architectural designs, and better data curation.

Hiring Info: I am actively hiring self-motivated PhD students and postdocs based on ELLIS Institute Tübingen, with opportunities for joining leading universities such as ETH Zurich, University of Oxford, and other.

Potential Collaborations: If you are interested in collaborating, feel free to reach out with the potential area of research you would like to contribute to!

news

Oct 01, 2025	I serve as the Area Chair of ICML 2026 and ARR 2025.
Sep 26, 2025	I will give invited talks in Tsinghua University (Shenzhen) and CUHK (Shenzhen) in September 2025.
Sep 18, 2025	📝 Three papers got accepted by NeurIPS 2025: Curse of Depth in LLMs, Layerwise Weight Decay, and Gated Activation Scaling.
Sep 17, 2025	I serve as the Program Chair for the third Conference on Parsimony and Learning (CPAL 2026) in Tübingen, Germany.
May 15, 2025	📝 Outlier-weighed Layerwise Sampling Fine-tuning got accepted by ACL 2025.
May 01, 2025	📝3 papers got accepted by ICML 2025: Masked Next-Token Prediction, How Low-Rank Weights Emerge for LLMs, Principal Weights for LLM Fine-tuning.
Jan 23, 2025	📝3 papers got accepted by ICLR 2025: Mix Layer Normalization, Spike-Aware Adam, LLM Composable Interventions.
Dec 10, 2024	📝 Visual prompting for pruning got accepted by AAAI 2025.
Dec 03, 2024	🧑‍🤝‍🧑 We are organizing the Workshop on Sparsity in LLMs (SLLM) at ICLR 2025. Link
Dec 03, 2024	🧑‍🤝‍🧑 We are organizing the Scalable Optimization for Efficient and Adaptive Foundation Models (SCOPE) workshop at ICLR 2025. Link
Oct 29, 2024	🧑‍🤝‍🧑 We are organizing the Stable Pre-Training Paradigms (SPP) workshop in IEEE CAI 2025. Link
Sep 30, 2024	I serve as an Area Chair for ICASSP 2025.
Sep 21, 2024	📝 3 papers are accepted by NeurIPS 2024: Found in the Middle, Sparse 3D Medical, Alpha Pruning.
Sep 21, 2024	📝 2 papers are accepted by EMNLP 2024: Layer-skip LLM and Is c4 enough for LLM Pruning?
Jul 22, 2024	I am honored to join in the organization committee of the Conference on Parsimony and Learning (CPAL). See you in Stanford.
Jun 25, 2024	I am very happy to be offered as a Junior Research Fellow (JRF) at Somerville College, one of the first two women’s colleges at Oxford.
Jun 21, 2024	I will have a talk tour around Europe at the NLP group at University of Sheffield, LTL group at University of Cambridge, and BlueNN group at University of Luxembourg.
Jun 15, 2024	📝 2 papers got accepted by Interspeech 2024: Sparse Multimodal from Scratch, Dynamic Data Pruning for Speech.
May 21, 2024	Our “Edge LLMs: Edge-Device Large Language Model Competition” competition has been accepted by NeurIPS 2024. Submission opens Link.
May 15, 2024	📝 1 paper Q-Hitter: Quantized-Sparse KV Cache got accepted by MLSys 2024.
May 02, 2024	📝 5 papers got accepted by ICML 2024: Layerwise Importance for LLMs, LLM Junk DNA, Bi-Level DST, KV Cache Merging, Saprse Cocktail.
Jan 16, 2024	📝 3 papers got accepted by ICLR 2024: Training-Free Sparse LLM Fine-tuning, Multi-Task Vector Merging, Sparse Training with Neuron Revitalization.
Jan 16, 2024	📝 4 papers got accepted by NeurIPS 2023: Channel-Level DST, Essential Sparsity, Pruning Topology, Note-Path Balance.
Jan 05, 2024	🏆 I am highly honored to receive the Rising Star in AI from KAUST and will give a talk at Rising Stars in AI Symposium.
Nov 01, 2023	📝 Block Sparse Training accepted by CPAL.
Oct 16, 2023	🏆 I am highly grateful to receive the Best PhD Dissertation Runner-up Award from the Informatics Europe.
Oct 10, 2023	🏆 I am highly honered to receive the Rising Star Award from CPAL and will give a presentation at HKU in Jan 2024.
Sep 20, 2023	🚀 I am grateful to receive the prestigious Newton International Fellowship from the British Academy and the Royal Society.
May 20, 2023	📝 The work I conducted during my internship at JD Academy has been accepted by International Journal of Computer Vision (IJCV) - STU-GAN.
Apr 20, 2023	📝 3 papers got accepted in ICML 2023, Instant Soup (Oral), Large Kernel Distillation, and Graph Ladling.
Mar 20, 2023	📝 2 papers SNN Ten Lessons and Channel-Level DST paper has been accepted as spotlight presentations at the SNN workshop.
Jan 20, 2023	📝 4 papers got accepted in ICLR 2023, Ramanujan Graph Pruning (oral), Sparsity May Cry Benchmark (spotlight%), MoE as Dropout (spotlight), SLaK:51x51 Large Conv.
Dec 01, 2022	📝 Our Untrained GNNs paper received the Best Paper Award from LoG 2022.11/2022*,
Nov 01, 2022	📝 One paper Lottery-Pools got accepted in AAAI 2023.9/2022*,
Apr 06, 2022	I got my PhD thesis abstract accepted by IDA 2022, which was also the first conference (symposium) that I attended in the first year of my PhD. PhD life is a cycle :).
Apr 01, 2022	Our tutorial Sparse Neural Networks Training has been accepted at ECMLPKDD 2022.
Jan 06, 2022	📝 Two of my first-author papers are accepted by ICLR 2022: Random pruning and FreeTickets.
Sep 06, 2021	🏆 I receive the “Outstanding Intern” honor in JD Academy Explore
Sep 05, 2021	📝 One paper got accepted by NeurIPs 2021: GraNet.
May 05, 2021	📝 2 papers are accepted by ICML 2021: In-Time Over-Parameterization and Selfish RNN.

selected publications

Preprint

Diffusion language models know the answer before decoding

Pengxiang Li, Yefan Zhou, Dilxat Muhtar, and 6 more authors

arXiv preprint arXiv:2508.19982, 2025
NeurIPS2025

The Curse of Depth in Large Language Models

Wenfang Sun, Xinyuan Song, Pengxiang Li, and 3 more authors

arXiv preprint arXiv:2502.05795, 2025
ICML2025

LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

Zihang Liu, Tianyu Pang, Oleg Balabanov, and 5 more authors

arXiv preprint arXiv:2506.14562, 2025
ICML2025

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Xialie Zhuang, Zhikai Jia, Jianjin Li, and 4 more authors

arXiv preprint arXiv:2502.07490, 2025
Preprint

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

Kechen Li, Wenqi Zhu, Coralia Cartis, and 2 more authors

arXiv preprint arXiv:2502.20545, 2025
Preprint

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Tianjin Huang, Haotian Hu, Zhenyu Zhang, and 8 more authors

arXiv preprint arXiv:2502.17055, 2025
ICLR2025

Mix-ln: Unleashing the power of deeper layers by combining pre-ln and post-ln

Pengxiang Li, Lu Yin, and Shiwei Liu

arXiv preprint arXiv:2412.13795, 2024
ICLR2025

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Tianjin Huang, Ziquan Zhu, Gaojie Jin, and 3 more authors

arXiv preprint arXiv:2501.06842, 2025
Preprint

OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning

Pengxiang Li, Lu Yin, Xiaowei Gao, and 1 more author

arXiv preprint arXiv:2405.18380, 2024
ICML2024

Outlier weighed layerwise sparsity (owl): A missing secret sauce for pruning llms to high sparsity

Lu Yin, You Wu, Zhenyu Zhang, and 7 more authors

2024
ICLR2023

More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity

Shiwei Liu, Tianlong Chen, Xiaohan Chen, and 7 more authors

arXiv preprint arXiv:2207.03620, 2023
LoG2022

You can have better graph neural networks by not training weights at all: Finding untrained gnns tickets

Tianjin Huang, Tianlong Chen, Meng Fang, and 8 more authors

2022
ICLR2022

The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse training

Shiwei Liu, Tianlong Chen, Xiaohan Chen, and 4 more authors

arXiv preprint arXiv:2202.02643, 2022
NeurIPS2021

Sparse training via boosting pruning plasticity with neuroregeneration

Shiwei Liu, Tianlong Chen, Xiaohan Chen, and 7 more authors

Advances in Neural Information Processing Systems, 2021
ICML2021

Do we actually need dense over-parameterization? in-time over-parameterization in sparse training

Shiwei Liu, Lu Yin, Decebal Constantin Mocanu, and 1 more author

In International Conference on Machine Learning, 2021