Picture for Yaobo Liang

Yaobo Liang

PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion

Add code
Mar 06, 2024
Figure 1 for PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion
Figure 2 for PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion
Figure 3 for PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion
Figure 4 for PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion
Viaarxiv icon

Competition-Level Problems are Effective LLM Evaluators

Add code
Dec 05, 2023
Figure 1 for Competition-Level Problems are Effective LLM Evaluators
Figure 2 for Competition-Level Problems are Effective LLM Evaluators
Figure 3 for Competition-Level Problems are Effective LLM Evaluators
Figure 4 for Competition-Level Problems are Effective LLM Evaluators
Viaarxiv icon

PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion

Add code
Nov 07, 2023
Figure 1 for PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion
Figure 2 for PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion
Figure 3 for PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion
Figure 4 for PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion
Viaarxiv icon

EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation

Add code
Oct 12, 2023
Figure 1 for EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation
Figure 2 for EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation
Figure 3 for EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation
Figure 4 for EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation
Viaarxiv icon

GameEval: Evaluating LLMs on Conversational Games

Add code
Aug 19, 2023
Figure 1 for GameEval: Evaluating LLMs on Conversational Games
Figure 2 for GameEval: Evaluating LLMs on Conversational Games
Figure 3 for GameEval: Evaluating LLMs on Conversational Games
Figure 4 for GameEval: Evaluating LLMs on Conversational Games
Viaarxiv icon

Machine-Created Universal Language for Cross-lingual Transfer

Add code
May 22, 2023
Figure 1 for Machine-Created Universal Language for Cross-lingual Transfer
Figure 2 for Machine-Created Universal Language for Cross-lingual Transfer
Figure 3 for Machine-Created Universal Language for Cross-lingual Transfer
Figure 4 for Machine-Created Universal Language for Cross-lingual Transfer
Viaarxiv icon

Analyzing and Reducing the Performance Gap in Cross-Lingual Transfer with Fine-tuning Slow and Fast

Add code
May 19, 2023
Figure 1 for Analyzing and Reducing the Performance Gap in Cross-Lingual Transfer with Fine-tuning Slow and Fast
Figure 2 for Analyzing and Reducing the Performance Gap in Cross-Lingual Transfer with Fine-tuning Slow and Fast
Figure 3 for Analyzing and Reducing the Performance Gap in Cross-Lingual Transfer with Fine-tuning Slow and Fast
Figure 4 for Analyzing and Reducing the Performance Gap in Cross-Lingual Transfer with Fine-tuning Slow and Fast
Viaarxiv icon

Learning to Program with Natural Language

Add code
Apr 23, 2023
Figure 1 for Learning to Program with Natural Language
Figure 2 for Learning to Program with Natural Language
Figure 3 for Learning to Program with Natural Language
Figure 4 for Learning to Program with Natural Language
Viaarxiv icon

Low-code LLM: Visual Programming over LLMs

Add code
Apr 20, 2023
Figure 1 for Low-code LLM: Visual Programming over LLMs
Figure 2 for Low-code LLM: Visual Programming over LLMs
Figure 3 for Low-code LLM: Visual Programming over LLMs
Figure 4 for Low-code LLM: Visual Programming over LLMs
Viaarxiv icon

AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

Add code
Apr 13, 2023
Figure 1 for AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Figure 2 for AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Figure 3 for AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Figure 4 for AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
Viaarxiv icon