The Effectiveness of Masked Language Modeling and Adapters for Factual
Knowledge Injection
- URL: http://arxiv.org/abs/2210.00907v1
- Date: Mon, 3 Oct 2022 13:08:09 GMT
- Title: The Effectiveness of Masked Language Modeling and Adapters for Factual
Knowledge Injection
- Authors: Sondre Wold
- Abstract summary: We train adapter modules on parts of the ConceptNet knowledge graph using the masked language modeling objective.
We evaluate the success of the method by a series of probing experiments on the LAMA probe.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the problem of injecting factual knowledge into large
pre-trained language models. We train adapter modules on parts of the
ConceptNet knowledge graph using the masked language modeling objective and
evaluate the success of the method by a series of probing experiments on the
LAMA probe. Mean P@K curves for different configurations indicate that the
technique is effective, increasing the performance on subsets of the LAMA probe
for large values of k by adding as little as 2.1% additional parameters to the
original models.
Related papers
- Scalable Influence and Fact Tracing for Large Language Model Pretraining [14.598556308631018]
Training data attribution (TDA) methods aim to attribute model outputs back to specific training examples.
This paper refines existing gradient-based methods to work effectively at scale.
arXiv Detail & Related papers (2024-10-22T20:39:21Z) - Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts [75.85448576746373]
We propose a method of grouping and pruning similar experts to improve the model's parameter efficiency.
We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures.
The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks.
arXiv Detail & Related papers (2024-07-12T17:25:02Z) - Exploring the Effectiveness of Methods for Persona Extraction [0.0]
The paper presents a study of methods for extracting information about dialogue participants and evaluating their performance in Russian.
To train models for this task, the Multi-Session Chat dataset was translated into Russian using multiple translation models.
A metric based on the F-score concept is presented to evaluate the effectiveness of the extraction models.
arXiv Detail & Related papers (2024-07-12T11:30:10Z) - Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting
Pre-trained Language Models [22.977852629450346]
We propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models.
In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture.
Our experiment results show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters.
arXiv Detail & Related papers (2023-10-24T23:29:06Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective [106.92016199403042]
We empirically investigate knowledge transfer from larger to smaller models through a parametric perspective.
We employ sensitivity-based techniques to extract and align knowledge-specific parameters between different large language models.
Our findings highlight the critical factors contributing to the process of parametric knowledge transfer.
arXiv Detail & Related papers (2023-10-17T17:58:34Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - A Cohesive Distillation Architecture for Neural Language Models [0.0]
A recent trend in Natural Language Processing is the exponential growth in Language Model (LM) size.
This study investigates methods for Knowledge Distillation (KD) to provide efficient alternatives to large-scale models.
arXiv Detail & Related papers (2023-01-12T08:01:53Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - Visualizing the Relationship Between Encoded Linguistic Information and
Task Performance [53.223789395577796]
We study the dynamic relationship between the encoded linguistic information and task performance from the viewpoint of Pareto Optimality.
We conduct experiments on two popular NLP tasks, i.e., machine translation and language modeling, and investigate the relationship between several kinds of linguistic information and task performances.
Our empirical findings suggest that some syntactic information is helpful for NLP tasks whereas encoding more syntactic information does not necessarily lead to better performance.
arXiv Detail & Related papers (2022-03-29T19:03:10Z) - Maximizing Efficiency of Language Model Pre-training for Learning
Representation [6.518508607788086]
ELECTRA is a novel approach for improving the compute efficiency of pre-trained language models.
Our work proposes adaptive early exit strategy to maximize the efficiency of the pre-training process.
arXiv Detail & Related papers (2021-10-13T10:25:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.