Alert button
Picture for Max Vladymyrov

Max Vladymyrov

Alert button

Linear Transformers are Versatile In-Context Learners

Add code
Bookmark button
Alert button
Feb 21, 2024
Max Vladymyrov, Johannes von Oswald, Mark Sandler, Rong Ge

Viaarxiv icon

Uncovering mesa-optimization algorithms in Transformers

Add code
Bookmark button
Alert button
Sep 11, 2023
Johannes von Oswald, Eyvind Niklasson, Maximilian Schlegel, Seijin Kobayashi, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento

Viaarxiv icon

Continual Few-Shot Learning Using HyperTransformers

Add code
Bookmark button
Alert button
Jan 12, 2023
Max Vladymyrov, Andrey Zhmoginov, Mark Sandler

Figure 1 for Continual Few-Shot Learning Using HyperTransformers
Figure 2 for Continual Few-Shot Learning Using HyperTransformers
Figure 3 for Continual Few-Shot Learning Using HyperTransformers
Figure 4 for Continual Few-Shot Learning Using HyperTransformers
Viaarxiv icon

Training trajectories, mini-batch losses and the curious role of the learning rate

Add code
Bookmark button
Alert button
Jan 05, 2023
Mark Sandler, Andrey Zhmoginov, Max Vladymyrov, Nolan Miller

Figure 1 for Training trajectories, mini-batch losses and the curious role of the learning rate
Figure 2 for Training trajectories, mini-batch losses and the curious role of the learning rate
Figure 3 for Training trajectories, mini-batch losses and the curious role of the learning rate
Figure 4 for Training trajectories, mini-batch losses and the curious role of the learning rate
Viaarxiv icon

Transformers learn in-context by gradient descent

Add code
Bookmark button
Alert button
Dec 15, 2022
Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, Max Vladymyrov

Figure 1 for Transformers learn in-context by gradient descent
Figure 2 for Transformers learn in-context by gradient descent
Figure 3 for Transformers learn in-context by gradient descent
Figure 4 for Transformers learn in-context by gradient descent
Viaarxiv icon

Decentralized Learning with Multi-Headed Distillation

Add code
Bookmark button
Alert button
Nov 28, 2022
Andrey Zhmoginov, Mark Sandler, Nolan Miller, Gus Kristiansen, Max Vladymyrov

Figure 1 for Decentralized Learning with Multi-Headed Distillation
Figure 2 for Decentralized Learning with Multi-Headed Distillation
Figure 3 for Decentralized Learning with Multi-Headed Distillation
Figure 4 for Decentralized Learning with Multi-Headed Distillation
Viaarxiv icon

Fine-tuning Image Transformers using Learnable Memory

Add code
Bookmark button
Alert button
Mar 30, 2022
Mark Sandler, Andrey Zhmoginov, Max Vladymyrov, Andrew Jackson

Figure 1 for Fine-tuning Image Transformers using Learnable Memory
Figure 2 for Fine-tuning Image Transformers using Learnable Memory
Figure 3 for Fine-tuning Image Transformers using Learnable Memory
Figure 4 for Fine-tuning Image Transformers using Learnable Memory
Viaarxiv icon

HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning

Add code
Bookmark button
Alert button
Jan 15, 2022
Andrey Zhmoginov, Mark Sandler, Max Vladymyrov

Figure 1 for HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning
Figure 2 for HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning
Figure 3 for HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning
Figure 4 for HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning
Viaarxiv icon

GradMax: Growing Neural Networks using Gradient Information

Add code
Bookmark button
Alert button
Jan 13, 2022
Utku Evci, Max Vladymyrov, Thomas Unterthiner, Bart van Merriënboer, Fabian Pedregosa

Figure 1 for GradMax: Growing Neural Networks using Gradient Information
Figure 2 for GradMax: Growing Neural Networks using Gradient Information
Figure 3 for GradMax: Growing Neural Networks using Gradient Information
Figure 4 for GradMax: Growing Neural Networks using Gradient Information
Viaarxiv icon

Meta-Learning Bidirectional Update Rules

Add code
Bookmark button
Alert button
Apr 10, 2021
Mark Sandler, Max Vladymyrov, Andrey Zhmoginov, Nolan Miller, Andrew Jackson, Tom Madams, Blaise Aguera y Arcas

Figure 1 for Meta-Learning Bidirectional Update Rules
Figure 2 for Meta-Learning Bidirectional Update Rules
Figure 3 for Meta-Learning Bidirectional Update Rules
Figure 4 for Meta-Learning Bidirectional Update Rules
Viaarxiv icon