B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability
- URL: http://arxiv.org/abs/2502.12992v1
- Date: Tue, 18 Feb 2025 16:13:08 GMT
- Title: B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability
- Authors: Yifan Wang, Sukrut Rao, Ji-Ung Lee, Mayank Jobanputra, Vera Demberg,
- Abstract summary: Post-hoc explanation methods for black-box models often struggle with faithfulness and human interpretability.
We introduce B-cos LMs, i.e., B-cos networks empowered for NLP tasks.
Our approach directly transforms pre-trained language models into B-cos LMs by combining B-cos conversion and task fine-tuning.
- Score: 21.480463138209483
- License:
- Abstract: Post-hoc explanation methods for black-box models often struggle with faithfulness and human interpretability due to the lack of explainability in current neural models. Meanwhile, B-cos networks have been introduced to improve model explainability through architectural and computational adaptations, but their application has so far been limited to computer vision models and their associated training pipelines. In this work, we introduce B-cos LMs, i.e., B-cos networks empowered for NLP tasks. Our approach directly transforms pre-trained language models into B-cos LMs by combining B-cos conversion and task fine-tuning, improving efficiency compared to previous B-cos methods. Our automatic and human evaluation results demonstrate that B-cos LMs produce more faithful and human interpretable explanations than post hoc methods, while maintaining task performance comparable to conventional fine-tuning. Our in-depth analysis explores how B-cos LMs differ from conventionally fine-tuned models in their learning processes and explanation patterns. Finally, we provide practical guidelines for effectively building B-cos LMs based on our findings. Our code is available at https://anonymous.4open.science/r/bcos_lm.
Related papers
- A Bayesian Approach to Data Point Selection [24.98069363998565]
Data point selection (DPS) is becoming a critical topic in deep learning.
Existing approaches to DPS are predominantly based on a bi-level optimisation (BLO) formulation.
We propose a novel Bayesian approach to DPS.
arXiv Detail & Related papers (2024-11-06T09:04:13Z) - B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable [53.848005910548565]
'B-cosification' is a novel approach to transform existing pre-trained models to become inherently interpretable.
We find that B-cosification can yield models that are on par with B-cos models trained from scratch in terms of interpretability.
arXiv Detail & Related papers (2024-11-01T16:28:11Z) - Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning [5.487210426671288]
In this work, we demonstrate that the reasoning abilities of small-scale LMs can be enhanced through self-training.
We also show that the conventional self-training can be further augmented by a preference learning algorithm called Direct Preference Optimization.
arXiv Detail & Related papers (2024-07-25T17:59:16Z) - Black-Box Tuning of Vision-Language Models with Effective Gradient
Approximation [71.21346469382821]
We introduce collaborative black-box tuning (CBBT) for both textual prompt optimization and output feature adaptation for black-box models.
CBBT is extensively evaluated on eleven downstream benchmarks and achieves remarkable improvements compared to existing black-box VL adaptation methods.
arXiv Detail & Related papers (2023-12-26T06:31:28Z) - An Emulator for Fine-Tuning Large Language Models using Small Language
Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales.
We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training.
Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z) - Concept-aware Training Improves In-context Learning Ability of Language
Models [0.0]
Many recent language models (LMs) of Transformers family exhibit so-called in-context learning (ICL) ability.
We propose a method to create LMs able to better utilize the in-context information.
We measure that data sampling of Concept-aware Training consistently improves models' reasoning ability.
arXiv Detail & Related papers (2023-05-23T07:44:52Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Automated Essay Scoring Using Transformer Models [0.415623340386296]
We consider a transformer-based approach for automated essay scoring (AES)
We compare its performance to a logistic regression model based on the BOW approach and discuss their differences.
We show how such models can help increase the accuracy of human raters.
arXiv Detail & Related papers (2021-10-13T17:09:47Z) - Joint Energy-based Model Training for Better Calibrated Natural Language
Understanding Models [61.768082640087]
We explore joint energy-based model (EBM) training during the finetuning of pretrained text encoders for natural language understanding tasks.
Experiments show that EBM training can help the model reach a better calibration that is competitive to strong baselines.
arXiv Detail & Related papers (2021-01-18T01:41:31Z) - oLMpics -- On what Language Model Pre-training Captures [84.60594612120173]
We propose eight reasoning tasks, which require operations such as comparison, conjunction, and composition.
A fundamental challenge is to understand whether the performance of a LM on a task should be attributed to the pre-trained representations or to the process of fine-tuning on the task data.
arXiv Detail & Related papers (2019-12-31T12:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.