Picture for Jason Phang

Jason Phang

Lessons from the Trenches on Reproducible Evaluation of Language Models

Add code
May 23, 2024
Viaarxiv icon

Investigating the Effectiveness of HyperTuning via Gisting

Add code
Feb 26, 2024
Viaarxiv icon

Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

Add code
Sep 19, 2023
Figure 1 for Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
Figure 2 for Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
Figure 3 for Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
Figure 4 for Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
Viaarxiv icon

Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs

Add code
May 23, 2023
Figure 1 for Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs
Figure 2 for Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs
Figure 3 for Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs
Figure 4 for Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs
Viaarxiv icon

Tool Learning with Foundation Models

Add code
Apr 17, 2023
Figure 1 for Tool Learning with Foundation Models
Figure 2 for Tool Learning with Foundation Models
Figure 3 for Tool Learning with Foundation Models
Figure 4 for Tool Learning with Foundation Models
Viaarxiv icon

Pretraining Language Models with Human Preferences

Add code
Feb 16, 2023
Figure 1 for Pretraining Language Models with Human Preferences
Figure 2 for Pretraining Language Models with Human Preferences
Figure 3 for Pretraining Language Models with Human Preferences
Figure 4 for Pretraining Language Models with Human Preferences
Viaarxiv icon

HyperTuning: Toward Adapting Large Language Models without Back-propagation

Add code
Nov 22, 2022
Figure 1 for HyperTuning: Toward Adapting Large Language Models without Back-propagation
Figure 2 for HyperTuning: Toward Adapting Large Language Models without Back-propagation
Figure 3 for HyperTuning: Toward Adapting Large Language Models without Back-propagation
Figure 4 for HyperTuning: Toward Adapting Large Language Models without Back-propagation
Viaarxiv icon

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Add code
Nov 09, 2022
Viaarxiv icon

What Language Model to Train if You Have One Million GPU Hours?

Add code
Nov 08, 2022
Figure 1 for What Language Model to Train if You Have One Million GPU Hours?
Figure 2 for What Language Model to Train if You Have One Million GPU Hours?
Figure 3 for What Language Model to Train if You Have One Million GPU Hours?
Figure 4 for What Language Model to Train if You Have One Million GPU Hours?
Viaarxiv icon

Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions

Add code
Oct 19, 2022
Figure 1 for Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions
Figure 2 for Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions
Figure 3 for Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions
Figure 4 for Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions
Viaarxiv icon