Alert button
Picture for Zihang Dai

Zihang Dai

Alert button

Transformer Quality in Linear Time

Add code
Bookmark button
Alert button
Feb 21, 2022
Weizhe Hua, Zihang Dai, Hanxiao Liu, Quoc V. Le

Figure 1 for Transformer Quality in Linear Time
Figure 2 for Transformer Quality in Linear Time
Figure 3 for Transformer Quality in Linear Time
Figure 4 for Transformer Quality in Linear Time
Viaarxiv icon

Combined Scaling for Zero-shot Transfer Learning

Add code
Bookmark button
Alert button
Nov 19, 2021
Hieu Pham, Zihang Dai, Golnaz Ghiasi, Hanxiao Liu, Adams Wei Yu, Minh-Thang Luong, Mingxing Tan, Quoc V. Le

Figure 1 for Combined Scaling for Zero-shot Transfer Learning
Figure 2 for Combined Scaling for Zero-shot Transfer Learning
Figure 3 for Combined Scaling for Zero-shot Transfer Learning
Figure 4 for Combined Scaling for Zero-shot Transfer Learning
Viaarxiv icon

Primer: Searching for Efficient Transformers for Language Modeling

Add code
Bookmark button
Alert button
Sep 17, 2021
David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc V. Le

Figure 1 for Primer: Searching for Efficient Transformers for Language Modeling
Figure 2 for Primer: Searching for Efficient Transformers for Language Modeling
Figure 3 for Primer: Searching for Efficient Transformers for Language Modeling
Figure 4 for Primer: Searching for Efficient Transformers for Language Modeling
Viaarxiv icon

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

Add code
Bookmark button
Alert button
Aug 24, 2021
Zirui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao

Figure 1 for SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Figure 2 for SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Figure 3 for SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Figure 4 for SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Viaarxiv icon

Combiner: Full Attention Transformer with Sparse Computation Cost

Add code
Bookmark button
Alert button
Jul 12, 2021
Hongyu Ren, Hanjun Dai, Zihang Dai, Mengjiao Yang, Jure Leskovec, Dale Schuurmans, Bo Dai

Figure 1 for Combiner: Full Attention Transformer with Sparse Computation Cost
Figure 2 for Combiner: Full Attention Transformer with Sparse Computation Cost
Figure 3 for Combiner: Full Attention Transformer with Sparse Computation Cost
Figure 4 for Combiner: Full Attention Transformer with Sparse Computation Cost
Viaarxiv icon

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Add code
Bookmark button
Alert button
Jun 09, 2021
Zihang Dai, Hanxiao Liu, Quoc V. Le, Mingxing Tan

Figure 1 for CoAtNet: Marrying Convolution and Attention for All Data Sizes
Figure 2 for CoAtNet: Marrying Convolution and Attention for All Data Sizes
Figure 3 for CoAtNet: Marrying Convolution and Attention for All Data Sizes
Figure 4 for CoAtNet: Marrying Convolution and Attention for All Data Sizes
Viaarxiv icon

Pay Attention to MLPs

Add code
Bookmark button
Alert button
Jun 01, 2021
Hanxiao Liu, Zihang Dai, David R. So, Quoc V. Le

Figure 1 for Pay Attention to MLPs
Figure 2 for Pay Attention to MLPs
Figure 3 for Pay Attention to MLPs
Figure 4 for Pay Attention to MLPs
Viaarxiv icon

Unsupervised Parallel Corpus Mining on Web Data

Add code
Bookmark button
Alert button
Sep 18, 2020
Guokun Lai, Zihang Dai, Yiming Yang

Figure 1 for Unsupervised Parallel Corpus Mining on Web Data
Figure 2 for Unsupervised Parallel Corpus Mining on Web Data
Figure 3 for Unsupervised Parallel Corpus Mining on Web Data
Figure 4 for Unsupervised Parallel Corpus Mining on Web Data
Viaarxiv icon

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Add code
Bookmark button
Alert button
Jun 05, 2020
Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le

Figure 1 for Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Figure 2 for Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Figure 3 for Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Figure 4 for Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Viaarxiv icon