publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2024
- PreprintOwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuningarXiv preprint arXiv:2405.18380, 2024
- PreprintQ-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank GradientsarXiv preprint arXiv:2407.08296, 2024
- PreprintFrom GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank GradientsarXiv preprint arXiv:2407.11239, 2024
- MLSys2024Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV CacheProceedings of Machine Learning and Systems, 2024
- ICML2024Outlier weighed layerwise sparsity (owl): A missing secret sauce for pruning llms to high sparsity2024
- ICML2024CaM: Cache Merging for Memory-efficient LLMs InferenceIn Forty-first International Conference on Machine Learning, 2024
- ICML2024Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\backslashtextit {Irreversibly} and \backslashtextit {Monotonically} $ Impairs“Difficult" Downstream Tasks in LLMsIn Forty-first International Conference on Machine Learning, 2024
- Interspeech2024Dynamic Data Pruning for Automatic Speech RecognitionarXiv preprint arXiv:2406.18373, 2024
- Interspeech2024MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask OptimizationarXiv preprint arXiv:2406.17614, 2024
- ICLR2024Dynamic sparse no training: Training-free fine-tuning for sparse llmsarXiv preprint arXiv:2310.08915, 2024
- ICLR2024Adamerging: Adaptive model merging for multi-task learningarXiv preprint arXiv:2310.02575, 2024
2023
- IJCVDon’t be so dense: Sparse-to-sparse gan training without sacrificing performanceInternational Journal of Computer Vision, 2023
- ICLR2023More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsityarXiv preprint arXiv:2207.03620, 2023
- ICLR2023Revisiting pruning at initialization through the lens of ramanujan graph2023
- ICLR2023Sparse moe as the new dropout: Scaling dense and self-slimmable transformers2023
- ICLR2023Sparsity may cry: Let us fail (current) sparse neural networks together!2023
2022
- LoG2022You can have better graph neural networks by not training weights at all: Finding untrained gnns tickets2022
- ICLR2022The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse trainingarXiv preprint arXiv:2202.02643, 2022
- ICLR2022Deep ensembling with no overhead for either training or testing: The all-round blessings of dynamic sparsityarXiv preprint arXiv:2106.14568, 2022
2021
- NeurIPS2021Sparse training via boosting pruning plasticity with neuroregenerationAdvances in Neural Information Processing Systems, 2021
- ICML2021Do we actually need dense over-parameterization? in-time over-parameterization in sparse trainingIn International Conference on Machine Learning, 2021
- ICML2021Selfish sparse rnn trainingIn International Conference on Machine Learning, 2021