About Me
Hi, I’m Sid, a second year PhD candidate studying Computer Science at UCLA. I’m advised by Professor Baharan Mirzasoleiman. My research focus is on data-efficiency for learning with limited supervision i.e. selecting the best small subsets of data for training, to reduce costs without sacrificing accuracy. I aim to develop practically effective and theoretically rigorous approaches to solving these problems.
Open Office Hours: In an effort to pay forward all the help I’ve received in my journey so far in pursuing a career in ML research, I am dedicating 1-2 hours each week for open office hours. This is best suited for relatively junior students (undergraduate/masters) since I’m not very experienced myself :). If you’d like to chat about research, grad school or anything else, please fill out this form.
In my free time, I like to write (https://medium.com/@sjoshi804), read about philosophy and run.
Highlights
- Foundations of Data-efficient Machine Learning Tutorial @ ICML ‘24: (Slides, Video) Gave a 2 hour tutorial at ICML ‘24 on principled approaches to data curation / pruning for efficient learning!
- CLIPCov: CLIPCov selects subsets of pre-training data to enable data-efficient contrastive language-image pre-training (CLIP) (AISTATS ‘24). Speed up your CLIP model training with theoretically-grounded data-efficiency!
- SAS: SAS selects subsets of pre-training data to enable data-efficient contrastive SSL (ICML ‘23). Give it a spin to try out data-efficient SSL!
- SpuCo: SpuCo is a Python package developed to make research on address spurious correlations effortless. Check it out!
News
- October 2024: Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks preprint on arXiv!
- July 2024: Will be giving tutorial on Foundations of Data-Efficient Learning at ICML ‘24!
- June 2024: Will be interning this summer at Microsoft Research (AI Frontiers Team) under Dr. Neel Joshi!
- February 2024: I have successfully advanced to candidacy!
- January 2024: Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity is accepted to AISTATS 2024!
- January 2024: Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift and Investigating the Benefits of Projection Head for Representation Learning are accepted to ICLR 2024!
- June 2023: Towards Mitigating Spurious Correlations in the Wild: A Benchmark & New Datasets preprint on arXiv!
- May 2023: Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute the Least accepted to ICML 2023!
- May 2023: Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression accepted to ICML 2023 for an oral (top 2%)!
- July 2022: Low Rank Pruning via Output Perturbation at Sparsity in Neural Networks Workshop
Publications
[1] Siddharth Joshi, Arnav Jain, Ali Payani and Baharan Mirzasoleiman, Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity, AISTATS 2024.
[2] Yihao Xue, Siddharth Joshi, Dang Nguyen and Baharan Mirzasoleiman, Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift, ICLR 2024.
[3] Yihao Xue, Eric Gan, Jiayi Ni, Siddharth Joshi and Baharan Mirzasoleiman, Investigating the Benefits of Projection Head for Representation Learning, ICLR 2024.
[4] Siddharth Joshi and Baharan Mirzasoleiman, Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute the Least, ICML 2023.
[5] Yihao Xue, Siddharth Joshi, Eric Gan, Pin-Yu Chen and Baharan Mirzasoleiman, Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression, ICML 2023 (Oral).
[6] Siddharth Joshi*, Yuhan Liu* and Baharan Mirzasoleiman, Low Rank Pruning via Output Perturbation, Sparsity in Neural Networks Workshop 2022.
* = equal contribution
Preprints
[1] Siddharth Joshi, Yu Yang, Yihao Xue, Wenhan Yang and Baharan Mirzasoleiman, Towards Mitigating Spurious Correlations in the Wild: A Benchmark & New Datasets, arXiv.
[2] Siddharth Joshi, Jiayi Ni and Baharan Mirzasoleiman, Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks, arXiv.