Alert button
Picture for Ramachandran Ramjee

Ramachandran Ramjee

Alert button

Microsoft

Vidur: A Large-Scale Simulation Framework For LLM Inference

Add code
Bookmark button
Alert button
May 08, 2024
Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav Gulavani, Ramachandran Ramjee, Alexey Tumanov

Viaarxiv icon

vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

Add code
Bookmark button
Alert button
May 07, 2024
Ramya Prabhu, Ajay Nayak, Jayashree Mohan, Ramachandran Ramjee, Ashish Panwar

Viaarxiv icon

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

Add code
Bookmark button
Alert button
Mar 04, 2024
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee

Figure 1 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Figure 2 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Figure 3 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Figure 4 for Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Viaarxiv icon

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

Add code
Bookmark button
Alert button
Aug 31, 2023
Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee

Figure 1 for SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Figure 2 for SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Figure 3 for SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Figure 4 for SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Viaarxiv icon

NGAME: Negative Mining-aware Mini-batching for Extreme Classification

Add code
Bookmark button
Alert button
Jul 10, 2022
Kunal Dahiya, Nilesh Gupta, Deepak Saini, Akshay Soni, Yajun Wang, Kushal Dave, Jian Jiao, Gururaj K, Prasenjit Dey, Amit Singh, Deepesh Hada, Vidit Jain, Bhawna Paliwal, Anshul Mittal, Sonu Mehta, Ramachandran Ramjee, Sumeet Agarwal, Purushottam Kar, Manik Varma

Figure 1 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Figure 2 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Figure 3 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Figure 4 for NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Viaarxiv icon

Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads

Add code
Bookmark button
Alert button
Feb 21, 2022
Dharma Shukla, Muthian Sivathanu, Srinidhi Viswanatha, Bhargav Gulavani, Rimma Nehme, Amey Agrawal, Chen Chen, Nipun Kwatra, Ramachandran Ramjee, Pankaj Sharma, Atul Katiyar, Vipul Modi, Vaibhav Sharma, Abhishek Singh, Shreshth Singhal, Kaustubh Welankar, Lu Xun, Ravi Anupindi, Karthik Elangovan, Hasibur Rahman, Zhou Lin, Rahul Seetharaman, Cheng Xu, Eddie Ailijiang, Suresh Krishnappa, Mark Russinovich

Figure 1 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Figure 2 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Figure 3 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Figure 4 for Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Viaarxiv icon

LRTuner: A Learning Rate Tuner for Deep Neural Networks

Add code
Bookmark button
Alert button
May 30, 2021
Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu

Figure 1 for LRTuner: A Learning Rate Tuner for Deep Neural Networks
Figure 2 for LRTuner: A Learning Rate Tuner for Deep Neural Networks
Figure 3 for LRTuner: A Learning Rate Tuner for Deep Neural Networks
Figure 4 for LRTuner: A Learning Rate Tuner for Deep Neural Networks
Viaarxiv icon

Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

Add code
Bookmark button
Alert button
Mar 09, 2020
Nikhil Iyer, V Thejas, Nipun Kwatra, Ramachandran Ramjee, Muthian Sivathanu

Figure 1 for Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Figure 2 for Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Figure 3 for Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Figure 4 for Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule
Viaarxiv icon

Privado: Practical and Secure DNN Inference

Add code
Bookmark button
Alert button
Oct 01, 2018
Shruti Tople, Karan Grover, Shweta Shinde, Ranjita Bhagwan, Ramachandran Ramjee

Figure 1 for Privado: Practical and Secure DNN Inference
Figure 2 for Privado: Practical and Secure DNN Inference
Figure 3 for Privado: Practical and Secure DNN Inference
Figure 4 for Privado: Practical and Secure DNN Inference
Viaarxiv icon