Picture for Kaijie Zhu

Kaijie Zhu

NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models

Add code
Mar 05, 2024
Figure 1 for NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models
Figure 2 for NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models
Figure 3 for NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models
Figure 4 for NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models
Viaarxiv icon

DyVal 2: Dynamic Evaluation of Large Language Models by Meta Probing Agents

Add code
Feb 21, 2024
Viaarxiv icon

The Good, The Bad, and Why: Unveiling Emotions in Generative AI

Add code
Dec 19, 2023
Figure 1 for The Good, The Bad, and Why: Unveiling Emotions in Generative AI
Figure 2 for The Good, The Bad, and Why: Unveiling Emotions in Generative AI
Figure 3 for The Good, The Bad, and Why: Unveiling Emotions in Generative AI
Figure 4 for The Good, The Bad, and Why: Unveiling Emotions in Generative AI
Viaarxiv icon

PromptBench: A Unified Library for Evaluation of Large Language Models

Add code
Dec 13, 2023
Viaarxiv icon

CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents

Add code
Oct 26, 2023
Figure 1 for CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents
Figure 2 for CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents
Figure 3 for CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents
Figure 4 for CompeteAI: Understanding the Competition Behaviors in Large Language Model-based Agents
Viaarxiv icon

DyVal: Graph-informed Dynamic Evaluation of Large Language Models

Add code
Oct 05, 2023
Figure 1 for DyVal: Graph-informed Dynamic Evaluation of Large Language Models
Figure 2 for DyVal: Graph-informed Dynamic Evaluation of Large Language Models
Figure 3 for DyVal: Graph-informed Dynamic Evaluation of Large Language Models
Figure 4 for DyVal: Graph-informed Dynamic Evaluation of Large Language Models
Viaarxiv icon

EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus

Add code
Aug 01, 2023
Figure 1 for EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus
Figure 2 for EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus
Figure 3 for EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus
Figure 4 for EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus
Viaarxiv icon

Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning

Add code
Aug 01, 2023
Figure 1 for Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning
Figure 2 for Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning
Figure 3 for Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning
Figure 4 for Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning
Viaarxiv icon

A Survey on Evaluation of Large Language Models

Add code
Jul 18, 2023
Figure 1 for A Survey on Evaluation of Large Language Models
Figure 2 for A Survey on Evaluation of Large Language Models
Figure 3 for A Survey on Evaluation of Large Language Models
Figure 4 for A Survey on Evaluation of Large Language Models
Viaarxiv icon

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

Add code
Jun 13, 2023
Figure 1 for PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Figure 2 for PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Figure 3 for PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Figure 4 for PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Viaarxiv icon