Alert button
Picture for Saeed Rashidi

Saeed Rashidi

Alert button

Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

Add code
Bookmark button
Alert button
May 26, 2023
Srinivas Sridharan, Taekyung Heo, Louis Feng, Zhaodong Wang, Matt Bergeron, Wenyin Fu, Shengbao Zheng, Brian Coutinho, Saeed Rashidi, Changhai Man, Tushar Krishna

Figure 1 for Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces
Figure 2 for Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces
Figure 3 for Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces
Figure 4 for Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces
Viaarxiv icon

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

Add code
Bookmark button
Alert button
Mar 24, 2023
William Won, Taekyung Heo, Saeed Rashidi, Srinivas Sridharan, Sudarshan Srinivasan, Tushar Krishna

Figure 1 for ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
Figure 2 for ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
Figure 3 for ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
Figure 4 for ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
Viaarxiv icon

COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training

Add code
Bookmark button
Alert button
Nov 30, 2022
Divya Kiran Kadiyala, Saeed Rashidi, Taekyung Heo, Abhimanyu Rajeshkumar Bambhaniya, Tushar Krishna, Alexandros Daglis

Figure 1 for COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training
Figure 2 for COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training
Figure 3 for COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training
Figure 4 for COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training
Viaarxiv icon

Impact of RoCE Congestion Control Policies on Distributed Training of DNNs

Add code
Bookmark button
Alert button
Jul 22, 2022
Tarannum Khan, Saeed Rashidi, Srinivas Sridharan, Pallavi Shurpali, Aditya Akella, Tushar Krishna

Figure 1 for Impact of RoCE Congestion Control Policies on Distributed Training of DNNs
Figure 2 for Impact of RoCE Congestion Control Policies on Distributed Training of DNNs
Figure 3 for Impact of RoCE Congestion Control Policies on Distributed Training of DNNs
Figure 4 for Impact of RoCE Congestion Control Policies on Distributed Training of DNNs
Viaarxiv icon

Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models

Add code
Bookmark button
Alert button
Oct 09, 2021
Saeed Rashidi, William Won, Sudarshan Srinivasan, Srinivas Sridharan, Tushar Krishna

Figure 1 for Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Figure 2 for Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Figure 3 for Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Figure 4 for Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models
Viaarxiv icon

Exploring Multi-dimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models

Add code
Bookmark button
Alert button
Sep 24, 2021
William Won, Saeed Rashidi, Sudarshan Srinivasan, Tushar Krishna

Figure 1 for Exploring Multi-dimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models
Figure 2 for Exploring Multi-dimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models
Figure 3 for Exploring Multi-dimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models
Figure 4 for Exploring Multi-dimensional Hierarchical Network Topologies for Efficient Distributed Training of Trillion Parameter DL Models
Viaarxiv icon

Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference

Add code
Bookmark button
Alert button
Aug 19, 2020
Afshin Abdi, Saeed Rashidi, Faramarz Fekri, Tushar Krishna

Figure 1 for Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference
Figure 2 for Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference
Figure 3 for Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference
Figure 4 for Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference
Viaarxiv icon