Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yonatan Belinkov

DEPTH: Discourse Education through Pre-Training Hierarchically

May 13, 2024

Zachary Bamberger, Ofek Glick, Chaim Baskin, Yonatan Belinkov

Language Models (LMs) often struggle with linguistic understanding at the discourse level, even though discourse patterns such as coherence, cohesion, and narrative flow are prevalent in their pre-training data. Current methods address these challenges only after the pre-training phase, relying on expensive human annotated data to align the model. To improve the discourse capabilities of LMs already at the pre-training stage, we introduce DEPTH, an encoder-decoder model that learns to represent sentences using a discourse-oriented pre-training objective. DEPTH combines hierarchical sentence representations with two objectives: (1) Sentence Un-Shuffling, and (2) Span-Corruption. This approach trains the model to represent both sub-word-level and sentence-level dependencies over a massive amount of unstructured text. When trained either from scratch or continuing from a pre-trained T5 checkpoint, DEPTH learns semantic and discourse-level representations faster than T5, outperforming it in span-corruption loss despite the additional sentence-un-shuffling objective. Evaluations on the GLUE, DiscoEval, and NI benchmarks demonstrate DEPTH's ability to quickly learn diverse downstream tasks, which require syntactic, semantic, and discourse capabilities. Overall, our approach extends the discourse capabilities of T5, while minimally impacting other natural language understanding (NLU) capabilities in the resulting LM.

* 28 pages, 10 figures, 8 tables

Via

Access Paper or Ask Questions

Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs

Apr 15, 2024

Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov

Large language models (LLMs) are susceptible to hallucination, which sparked a widespread effort to detect and prevent them. Recent work attempts to mitigate hallucinations by intervening in the model's computation during generation, using different setups and heuristics. Those works lack separation between different hallucination causes. In this work, we first introduce an approach for constructing datasets based on the model knowledge for detection and intervention methods in closed-book and open-book question-answering settings. We then characterize the effect of different choices for intervention, such as the intervened components (MLPs, attention block, residual stream, and specific heads), and how often and how strongly to intervene. We find that intervention success varies depending on the component, with some components being detrimental to language modeling capabilities. Finally, we find that interventions can benefit from pre-hallucination steering direction instead of post-hallucination. The code is available at https://github.com/technion-cs-nlp/hallucination-mitigation

Via

Access Paper or Ask Questions

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Mar 31, 2024

Samuel Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, Aaron Mueller

Figure 1 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Figure 2 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Figure 3 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Figure 4 for Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

We introduce methods for discovering and applying sparse feature circuits. These are causally implicated subnetworks of human-interpretable features for explaining language model behaviors. Circuits identified in prior work consist of polysemantic and difficult-to-interpret units like attention heads or neurons, rendering them unsuitable for many downstream applications. In contrast, sparse feature circuits enable detailed understanding of unanticipated mechanisms. Because they are based on fine-grained units, sparse feature circuits are useful for downstream tasks: We introduce SHIFT, where we improve the generalization of a classifier by ablating features that a human judges to be task-irrelevant. Finally, we demonstrate an entirely unsupervised and scalable interpretability pipeline by discovering thousands of sparse feature circuits for automatically discovered model behaviors.

* Code and data at https://github.com/saprmarks/feature-circuits. Demonstration at https://feature-circuits.xyz

Via

Access Paper or Ask Questions

Jamba: A Hybrid Transformer-Mamba Language Model

Mar 28, 2024

Figure 1 for Jamba: A Hybrid Transformer-Mamba Language Model

Figure 2 for Jamba: A Hybrid Transformer-Mamba Language Model

Figure 3 for Jamba: A Hybrid Transformer-Mamba Language Model

Figure 4 for Jamba: A Hybrid Transformer-Mamba Language Model

We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU. Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length. We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling. We also describe several interesting properties of these architectures which the training and evaluation of Jamba have revealed, and plan to release checkpoints from various ablation runs, to encourage further exploration of this novel architecture. We make the weights of our implementation of Jamba publicly available under a permissive license.

* Webpage: https://www.ai21.com/jamba

Via

Access Paper or Ask Questions

Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms

Mar 26, 2024

Michael Hanna, Sandro Pezzelle, Yonatan Belinkov

Figure 1 for Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms

Figure 2 for Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms

Figure 3 for Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms

Figure 4 for Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms

Many recent language model (LM) interpretability studies have adopted the circuits framework, which aims to find the minimal computational subgraph, or circuit, that explains LM behavior on a given task. Most studies determine which edges belong in a LM's circuit by performing causal interventions on each edge independently, but this scales poorly with model size. Edge attribution patching (EAP), gradient-based approximation to interventions, has emerged as a scalable but imperfect solution to this problem. In this paper, we introduce a new method - EAP with integrated gradients (EAP-IG) - that aims to better maintain a core property of circuits: faithfulness. A circuit is faithful if all model edges outside the circuit can be ablated without changing the model's performance on the task; faithfulness is what justifies studying circuits, rather than the full model. Our experiments demonstrate that circuits found using EAP are less faithful than those found using EAP-IG, even though both have high node overlap with circuits found previously using causal interventions. We conclude more generally that when using circuits to compare the mechanisms models use to solve tasks, faithfulness, not overlap, is what should be measured.

Via

Access Paper or Ask Questions

Concept-Best-Matching: Evaluating Compositionality in Emergent Communication

Mar 17, 2024

Boaz Carmeli, Yonatan Belinkov, Ron Meir

Artificial agents that learn to communicate in order to accomplish a given task acquire communication protocols that are typically opaque to a human. A large body of work has attempted to evaluate the emergent communication via various evaluation measures, with \emph{compositionality} featuring as a prominent desired trait. However, current evaluation procedures do not directly expose the compositionality of the emergent communication. We propose a procedure to assess the compositionality of emergent communication by finding the best-match between emerged words and natural language concepts. The best-match algorithm provides both a global score and a translation-map from emergent words to natural language concepts. To the best of our knowledge, it is the first time that such direct and interpretable mapping between emergent words and human concepts is provided.

Via

Access Paper or Ask Questions

Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

Mar 14, 2024

Shadi Iskander, Kira Radinsky, Yonatan Belinkov

Figure 1 for Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

Figure 2 for Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

Figure 3 for Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

Figure 4 for Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information

Mitigating social biases typically requires identifying the social groups associated with each data sample. In this paper, we present DAFair, a novel approach to address social bias in language models. Unlike traditional methods that rely on explicit demographic labels, our approach does not require any such information. Instead, we leverage predefined prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias in the model's representations. Our empirical results across two tasks and two models demonstrate the effectiveness of our method compared to previous approaches that do not rely on labeled data. Moreover, with limited demographic-annotated data, our approach outperforms common debiasing approaches.

Via

Access Paper or Ask Questions

Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

Mar 09, 2024

Michael Toker, Hadas Orgad, Mor Ventura, Dana Arad, Yonatan Belinkov

Figure 1 for Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

Figure 2 for Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

Figure 3 for Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

Figure 4 for Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines

Text-to-image diffusion models (T2I) use a latent representation of a text prompt to guide the image generation process. However, the process by which the encoder produces the text representation is unknown. We propose the Diffusion Lens, a method for analyzing the text encoder of T2I models by generating images from its intermediate representations. Using the Diffusion Lens, we perform an extensive analysis of two recent T2I models. Exploring compound prompts, we find that complex scenes describing multiple objects are composed progressively and more slowly compared to simple scenes; Exploring knowledge retrieval, we find that representation of uncommon concepts requires further computation compared to common concepts, and that knowledge retrieval is gradual across layers. Overall, our findings provide valuable insights into the text encoder component in T2I pipelines.

* Project webpage: tokeron.github.io/DiffusionLensWeb

Via

Access Paper or Ask Questions

A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry

Feb 27, 2024

Michael Toker, Oren Mishali, Ophir Münz-Manor, Benny Kimelfeld, Yonatan Belinkov

There is a large volume of late antique and medieval Hebrew texts. They represent a crucial linguistic and cultural bridge between Biblical and modern Hebrew. Poetry is prominent in these texts and one of its main haracteristics is the frequent use of metaphor. Distinguishing figurative and literal language use is a major task for scholars of the Humanities, especially in the fields of literature, linguistics, and hermeneutics. This paper presents a new, challenging dataset of late antique and medieval Hebrew poetry with expert annotations of metaphor, as well as some baseline results, which we hope will facilitate further research in this area.

* EACL 2024. Project webpage: https://tokeron.github.io/metaphor/

Via

Access Paper or Ask Questions

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

Feb 22, 2024

Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, David Bau

Fine-tuning on generalized tasks such as instruction following, code generation, and mathematics has been shown to enhance language models' performance on a range of tasks. Nevertheless, explanations of how such fine-tuning influences the internal computations in these models remain elusive. We study how fine-tuning affects the internal mechanisms implemented in language models. As a case study, we explore the property of entity tracking, a crucial facet of language comprehension, where models fine-tuned on mathematics have substantial performance gains. We identify the mechanism that enables entity tracking and show that (i) in both the original model and its fine-tuned versions primarily the same circuit implements entity tracking. In fact, the entity tracking circuit of the original model on the fine-tuned versions performs better than the full original model. (ii) The circuits of all the models implement roughly the same functionality: Entity tracking is performed by tracking the position of the correct entity in both the original model and its fine-tuned versions. (iii) Performance boost in the fine-tuned models is primarily attributed to its improved ability to handle the augmented positional information. To uncover these findings, we employ: Patch Patching, DCM, which automatically detects model components responsible for specific semantics, and CMAP, a new approach for patching activations across models to reveal improved mechanisms. Our findings suggest that fine-tuning enhances, rather than fundamentally alters, the mechanistic operation of the model.

* ICLR 2024. 26 pages, 13 figures. Code and data at https://finetuning.baulab.info/

Via

Access Paper or Ask Questions