Alert button
Picture for Amirkeivan Mohtashami

Amirkeivan Mohtashami

Alert button

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

Add code
Bookmark button
Alert button
Mar 30, 2024
Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman

Viaarxiv icon

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

Add code
Bookmark button
Alert button
Feb 04, 2024
Matteo Pagliardini, Amirkeivan Mohtashami, Francois Fleuret, Martin Jaggi

Viaarxiv icon

Social Learning: Towards Collaborative Learning with Large Language Models

Add code
Bookmark button
Alert button
Dec 18, 2023
Amirkeivan Mohtashami, Florian Hartmann, Sian Gooding, Lukas Zilka, Matt Sharifi, Blaise Aguera y Arcas

Viaarxiv icon

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Add code
Bookmark button
Alert button
Nov 27, 2023
Zeming Chen, Alejandro Hernández Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, Antoine Bosselut

Viaarxiv icon

CoTFormer: More Tokens With Attention Make Up For Less Depth

Add code
Bookmark button
Alert button
Oct 16, 2023
Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi

Viaarxiv icon

Landmark Attention: Random-Access Infinite Context Length for Transformers

Add code
Bookmark button
Alert button
May 25, 2023
Amirkeivan Mohtashami, Martin Jaggi

Figure 1 for Landmark Attention: Random-Access Infinite Context Length for Transformers
Figure 2 for Landmark Attention: Random-Access Infinite Context Length for Transformers
Figure 3 for Landmark Attention: Random-Access Infinite Context Length for Transformers
Figure 4 for Landmark Attention: Random-Access Infinite Context Length for Transformers
Viaarxiv icon

Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models

Add code
Bookmark button
Alert button
Feb 07, 2023
Amirkeivan Mohtashami, Mauro Verzetti, Paul K. Rubenstein

Figure 1 for Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
Figure 2 for Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
Figure 3 for Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
Figure 4 for Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
Viaarxiv icon

On Avoiding Local Minima Using Gradient Descent With Large Learning Rates

Add code
Bookmark button
Alert button
May 30, 2022
Amirkeivan Mohtashami, Martin Jaggi, Sebastian Stich

Figure 1 for On Avoiding Local Minima Using Gradient Descent With Large Learning Rates
Figure 2 for On Avoiding Local Minima Using Gradient Descent With Large Learning Rates
Figure 3 for On Avoiding Local Minima Using Gradient Descent With Large Learning Rates
Figure 4 for On Avoiding Local Minima Using Gradient Descent With Large Learning Rates
Viaarxiv icon

Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods

Add code
Bookmark button
Alert button
Feb 03, 2022
Amirkeivan Mohtashami, Sebastian Stich, Martin Jaggi

Figure 1 for Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods
Figure 2 for Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods
Figure 3 for Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods
Viaarxiv icon

Simultaneous Training of Partially Masked Neural Networks

Add code
Bookmark button
Alert button
Jun 16, 2021
Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich

Figure 1 for Simultaneous Training of Partially Masked Neural Networks
Figure 2 for Simultaneous Training of Partially Masked Neural Networks
Figure 3 for Simultaneous Training of Partially Masked Neural Networks
Figure 4 for Simultaneous Training of Partially Masked Neural Networks
Viaarxiv icon