Alert button
Picture for Armando Solar-Lezama

Armando Solar-Lezama

Alert button

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

Add code
Bookmark button
Alert button
Mar 12, 2024
Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion Stoica

Figure 1 for LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Figure 2 for LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Figure 3 for LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Figure 4 for LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Viaarxiv icon

The Counterfeit Conundrum: Can Code Language Models Grasp the Nuances of Their Incorrect Generations?

Add code
Bookmark button
Alert button
Feb 29, 2024
Alex Gu, Wen-Ding Li, Naman Jain, Theo X. Olausson, Celine Lee, Koushik Sen, Armando Solar-Lezama

Viaarxiv icon

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Add code
Bookmark button
Alert button
Jan 05, 2024
Alex Gu, Baptiste Rozière, Hugh Leather, Armando Solar-Lezama, Gabriel Synnaeve, Sida I. Wang

Viaarxiv icon

LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers

Add code
Bookmark button
Alert button
Oct 23, 2023
Theo X. Olausson, Alex Gu, Benjamin Lipkin, Cedegao E. Zhang, Armando Solar-Lezama, Joshua B. Tenenbaum, Roger Levy

Viaarxiv icon

Learning a Hierarchical Planner from Humans in Multiple Generations

Add code
Bookmark button
Alert button
Oct 17, 2023
Leonardo Hernandez Cano, Yewen Pu, Robert D. Hawkins, Josh Tenenbaum, Armando Solar-Lezama

Viaarxiv icon

Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models

Add code
Bookmark button
Alert button
Jun 24, 2023
Sarah J. Zhang, Samuel Florin, Ariel N. Lee, Eamon Niknafs, Andrei Marginean, Annie Wang, Keith Tyser, Zad Chin, Yann Hicke, Nikhil Singh, Madeleine Udell, Yoon Kim, Tonio Buonassisi, Armando Solar-Lezama, Iddo Drori

Figure 1 for Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models
Figure 2 for Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models
Figure 3 for Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models
Figure 4 for Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models
Viaarxiv icon

Demystifying GPT Self-Repair for Code Generation

Add code
Bookmark button
Alert button
Jun 22, 2023
Theo X. Olausson, Jeevana Priya Inala, Chenglong Wang, Jianfeng Gao, Armando Solar-Lezama

Figure 1 for Demystifying GPT Self-Repair for Code Generation
Figure 2 for Demystifying GPT Self-Repair for Code Generation
Figure 3 for Demystifying GPT Self-Repair for Code Generation
Figure 4 for Demystifying GPT Self-Repair for Code Generation
Viaarxiv icon

SPARLING: Learning Latent Representations with Extremely Sparse Activations

Add code
Bookmark button
Alert button
Feb 03, 2023
Kavi Gupta, Osbert Bastani, Armando Solar-Lezama

Figure 1 for SPARLING: Learning Latent Representations with Extremely Sparse Activations
Figure 2 for SPARLING: Learning Latent Representations with Extremely Sparse Activations
Figure 3 for SPARLING: Learning Latent Representations with Extremely Sparse Activations
Figure 4 for SPARLING: Learning Latent Representations with Extremely Sparse Activations
Viaarxiv icon

Top-Down Synthesis for Library Learning

Add code
Bookmark button
Alert button
Nov 29, 2022
Matthew Bowers, Theo X. Olausson, Catherine Wong, Gabriel Grand, Joshua B. Tenenbaum, Kevin Ellis, Armando Solar-Lezama

Figure 1 for Top-Down Synthesis for Library Learning
Figure 2 for Top-Down Synthesis for Library Learning
Figure 3 for Top-Down Synthesis for Library Learning
Figure 4 for Top-Down Synthesis for Library Learning
Viaarxiv icon

Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark

Add code
Bookmark button
Alert button
Nov 22, 2022
Vitali Petsiuk, Alexander E. Siemenn, Saisamrit Surbehera, Zad Chin, Keith Tyser, Gregory Hunter, Arvind Raghavan, Yann Hicke, Bryan A. Plummer, Ori Kerret, Tonio Buonassisi, Kate Saenko, Armando Solar-Lezama, Iddo Drori

Figure 1 for Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark
Figure 2 for Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark
Figure 3 for Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark
Figure 4 for Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark
Viaarxiv icon