Alert button

Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

Feb 20, 2023
Bobby He, James Martens, Guodong Zhang, Aleksandar Botev, Andrew Brock, Samuel L Smith, Yee Whye Teh

Figure 1 for Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
Figure 2 for Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
Figure 3 for Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
Figure 4 for Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

Share this with someone who'll enjoy it:

View paper onarxiv iconopen_review iconOpenReview

Share this with someone who'll enjoy it: