Alert button
Picture for Michael Andersch

Michael Andersch

Alert button

ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours

Add code
Bookmark button
Alert button
Apr 17, 2024
Feiwen Zhu, Arkadiusz Nowaczynski, Rundong Li, Jie Xin, Yifei Song, Michal Marcinkiewicz, Sukru Burc Eryilmaz, Jun Yang, Michael Andersch

Viaarxiv icon

Reducing Activation Recomputation in Large Transformer Models

Add code
Bookmark button
Alert button
May 10, 2022
Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro

Figure 1 for Reducing Activation Recomputation in Large Transformer Models
Figure 2 for Reducing Activation Recomputation in Large Transformer Models
Figure 3 for Reducing Activation Recomputation in Large Transformer Models
Figure 4 for Reducing Activation Recomputation in Large Transformer Models
Viaarxiv icon

Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip

Add code
Bookmark button
Alert button
Apr 26, 2018
Feiwen Zhu, Jeff Pool, Michael Andersch, Jeremy Appleyard, Fung Xie

Figure 1 for Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip
Figure 2 for Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip
Figure 3 for Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip
Figure 4 for Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip
Viaarxiv icon