Alert button
Picture for Rewon Child

Rewon Child

Alert button

PaLM: Scaling Language Modeling with Pathways

Add code
Bookmark button
Alert button
Apr 19, 2022
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel

Figure 1 for PaLM: Scaling Language Modeling with Pathways
Figure 2 for PaLM: Scaling Language Modeling with Pathways
Figure 3 for PaLM: Scaling Language Modeling with Pathways
Figure 4 for PaLM: Scaling Language Modeling with Pathways
Viaarxiv icon

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Add code
Bookmark button
Alert button
Feb 04, 2022
Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro

Figure 1 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 2 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 3 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Figure 4 for Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Viaarxiv icon

Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images

Add code
Bookmark button
Alert button
Nov 20, 2020
Rewon Child

Figure 1 for Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
Figure 2 for Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
Figure 3 for Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
Figure 4 for Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
Viaarxiv icon

Language Models are Few-Shot Learners

Add code
Bookmark button
Alert button
Jun 05, 2020
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei

Figure 1 for Language Models are Few-Shot Learners
Figure 2 for Language Models are Few-Shot Learners
Figure 3 for Language Models are Few-Shot Learners
Figure 4 for Language Models are Few-Shot Learners
Viaarxiv icon

Scaling Laws for Neural Language Models

Add code
Bookmark button
Alert button
Jan 23, 2020
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei

Figure 1 for Scaling Laws for Neural Language Models
Figure 2 for Scaling Laws for Neural Language Models
Figure 3 for Scaling Laws for Neural Language Models
Figure 4 for Scaling Laws for Neural Language Models
Viaarxiv icon

Generating Long Sequences with Sparse Transformers

Add code
Bookmark button
Alert button
Apr 23, 2019
Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever

Figure 1 for Generating Long Sequences with Sparse Transformers
Figure 2 for Generating Long Sequences with Sparse Transformers
Figure 3 for Generating Long Sequences with Sparse Transformers
Figure 4 for Generating Long Sequences with Sparse Transformers
Viaarxiv icon

Exploring Neural Transducers for End-to-End Speech Recognition

Add code
Bookmark button
Alert button
Jul 24, 2017
Eric Battenberg, Jitong Chen, Rewon Child, Adam Coates, Yashesh Gaur, Yi Li, Hairong Liu, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu

Figure 1 for Exploring Neural Transducers for End-to-End Speech Recognition
Figure 2 for Exploring Neural Transducers for End-to-End Speech Recognition
Figure 3 for Exploring Neural Transducers for End-to-End Speech Recognition
Figure 4 for Exploring Neural Transducers for End-to-End Speech Recognition
Viaarxiv icon

Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting

Add code
Bookmark button
Alert button
Jul 04, 2017
Sercan O. Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Chris Fougner, Ryan Prenger, Adam Coates

Figure 1 for Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting
Figure 2 for Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting
Figure 3 for Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting
Figure 4 for Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting
Viaarxiv icon

Reducing Bias in Production Speech Models

Add code
Bookmark button
Alert button
May 11, 2017
Eric Battenberg, Rewon Child, Adam Coates, Christopher Fougner, Yashesh Gaur, Jiaji Huang, Heewoo Jun, Ajay Kannan, Markus Kliegl, Atul Kumar, Hairong Liu, Vinay Rao, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu

Figure 1 for Reducing Bias in Production Speech Models
Figure 2 for Reducing Bias in Production Speech Models
Figure 3 for Reducing Bias in Production Speech Models
Figure 4 for Reducing Bias in Production Speech Models
Viaarxiv icon

Active Learning for Speech Recognition: the Power of Gradients

Add code
Bookmark button
Alert button
Dec 10, 2016
Jiaji Huang, Rewon Child, Vinay Rao, Hairong Liu, Sanjeev Satheesh, Adam Coates

Figure 1 for Active Learning for Speech Recognition: the Power of Gradients
Figure 2 for Active Learning for Speech Recognition: the Power of Gradients
Figure 3 for Active Learning for Speech Recognition: the Power of Gradients
Viaarxiv icon