Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huijia Wu

Dynamic Generation of Personalities with Large Language Models

Apr 10, 2024
Jianzhi Liu, Hexiang Gu, Tianyu Zheng, Liuyu Xiang, Huijia Wu, Jie Fu, Zhaofeng He

In the realm of mimicking human deliberation, large language models (LLMs) show promising performance, thereby amplifying the importance of this research area. Deliberation is influenced by both logic and personality. However, previous studies predominantly focused on the logic of LLMs, neglecting the exploration of personality aspects. In this work, we introduce Dynamic Personality Generation (DPG), a dynamic personality generation method based on Hypernetworks. Initially, we embed the Big Five personality theory into GPT-4 to form a personality assessment machine, enabling it to evaluate characters' personality traits from dialogues automatically. We propose a new metric to assess personality generation capability based on this evaluation method. Then, we use this personality assessment machine to evaluate dialogues in script data, resulting in a personality-dialogue dataset. Finally, we fine-tune DPG on the personality-dialogue dataset. Experiments prove that DPG's personality generation capability is stronger after fine-tuning on this dataset than traditional fine-tuning methods, surpassing prompt-based GPT-4.

Via

Access Paper or Ask Questions

HyperMoE: Paying Attention to Unselected Experts in Mixture of Experts via Dynamic Transfer

Feb 25, 2024
Hao Zhao, Zihan Qiu, Huijia Wu, Zili Wang, Zhaofeng He, Jie Fu

The Mixture of Experts (MoE) for language models has been proven effective in augmenting the capacity of models by dynamically routing each input token to a specific subset of experts for processing. Despite the success, most existing methods face a challenge for balance between sparsity and the availability of expert knowledge: enhancing performance through increased use of expert knowledge often results in diminishing sparsity during expert selection. To mitigate this contradiction, we propose HyperMoE, a novel MoE framework built upon Hypernetworks. This framework integrates the computational processes of MoE with the concept of knowledge transferring in multi-task learning. Specific modules generated based on the information of unselected experts serve as supplementary information, which allows the knowledge of experts not selected to be used while maintaining selection sparsity. Our comprehensive empirical evaluations across multiple datasets and backbones establish that HyperMoE significantly outperforms existing MoE methods under identical conditions concerning the number of experts.

Via

Access Paper or Ask Questions

Shortcut Sequence Tagging

Jan 03, 2017
Huijia Wu, Jiajun Zhang, Chengqing Zong

Deep stacked RNNs are usually hard to train. Adding shortcut connections across different layers is a common way to ease the training of stacked networks. However, extra shortcuts make the recurrent step more complicated. To simply the stacked architecture, we propose a framework called shortcut block, which is a marriage of the gating mechanism and shortcuts, while discarding the self-connected part in LSTM cell. We present extensive empirical experiments showing that this design makes training easy and improves generalization. We propose various shortcut block topologies and compositions to explore its effectiveness. Based on this architecture, we obtain a 6% relatively improvement over the state-of-the-art on CCGbank supertagging dataset. We also get comparable results on POS tagging task.

* 10 pages. arXiv admin note: text overlap with arXiv:1610.03167

Via

Access Paper or Ask Questions

An Empirical Exploration of Skip Connections for Sequential Tagging

Oct 11, 2016
Huijia Wu, Jiajun Zhang, Chengqing Zong

Figure 1 for An Empirical Exploration of Skip Connections for Sequential Tagging

Figure 2 for An Empirical Exploration of Skip Connections for Sequential Tagging

Figure 3 for An Empirical Exploration of Skip Connections for Sequential Tagging

Figure 4 for An Empirical Exploration of Skip Connections for Sequential Tagging

In this paper, we empirically explore the effects of various kinds of skip connections in stacked bidirectional LSTMs for sequential tagging. We investigate three kinds of skip connections connecting to LSTM cells: (a) skip connections to the gates, (b) skip connections to the internal states and (c) skip connections to the cell outputs. We present comprehensive experiments showing that skip connections to cell outputs outperform the remaining two. Furthermore, we observe that using gated identity functions as skip mappings works pretty well. Based on this novel skip connections, we successfully train deep stacked bidirectional LSTM models and obtain state-of-the-art results on CCG supertagging and comparable results on POS tagging.

* Accepted at COLING 2016

Via

Access Paper or Ask Questions

A Dynamic Window Neural Network for CCG Supertagging

Oct 10, 2016
Huijia Wu, Jiajun Zhang, Chengqing Zong

Figure 1 for A Dynamic Window Neural Network for CCG Supertagging

Figure 2 for A Dynamic Window Neural Network for CCG Supertagging

Figure 3 for A Dynamic Window Neural Network for CCG Supertagging

Figure 4 for A Dynamic Window Neural Network for CCG Supertagging

Combinatory Category Grammar (CCG) supertagging is a task to assign lexical categories to each word in a sentence. Almost all previous methods use fixed context window sizes as input features. However, it is obvious that different tags usually rely on different context window sizes. These motivate us to build a supertagger with a dynamic window approach, which can be treated as an attention mechanism on the local contexts. Applying dropout on the dynamic filters can be seen as drop on words directly, which is superior to the regular dropout on word embeddings. We use this approach to demonstrate the state-of-the-art CCG supertagging performance on the standard test set.

* 8 pages, 3 figures

Via

Access Paper or Ask Questions