Efficient Attribute Injection for Pretrained Language Models
- URL: http://arxiv.org/abs/2109.07953v1
- Date: Thu, 16 Sep 2021 13:08:24 GMT
- Title: Efficient Attribute Injection for Pretrained Language Models
- Authors: Reinald Kim Amplayo and Kang Min Yoo and Sang-Woo Lee
- Abstract summary: We propose a lightweight and memory-efficient method to inject attributes to pretrained language models (PLMs)
To limit the increase of parameters especially when the attribute vocabulary is large, we use low-rank approximations and hypercomplex multiplications.
Our method outperforms previous attribute injection methods and achieves state-of-the-art performance on various datasets.
- Score: 20.39972635495006
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Metadata attributes (e.g., user and product IDs from reviews) can be
incorporated as additional inputs to neural-based NLP models, by modifying the
architecture of the models, in order to improve their performance. Recent
models however rely on pretrained language models (PLMs), where previously used
techniques for attribute injection are either nontrivial or ineffective. In
this paper, we propose a lightweight and memory-efficient method to inject
attributes to PLMs. We extend adapters, i.e. tiny plug-in feed-forward modules,
to include attributes both independently of or jointly with the text. To limit
the increase of parameters especially when the attribute vocabulary is large,
we use low-rank approximations and hypercomplex multiplications, significantly
decreasing the total parameters. We also introduce training mechanisms to
handle domains in which attributes can be multi-labeled or sparse. Extensive
experiments and analyses on eight datasets from different domains show that our
method outperforms previous attribute injection methods and achieves
state-of-the-art performance on various datasets.
Related papers
- CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient Incremental Object Detection [30.46562066023117]
We propose a novel method utilizing attributes in vision-language foundation models for incremental object detection.
Our method constructs a Class-Agnostic Shared Attribute base (CASA) to capture common semantic information among incremental classes.
Our method adds only 0.7% to parameter storage through parameter-efficient fine-tuning to significantly enhance the scalability and adaptability of our proposed method.
arXiv Detail & Related papers (2024-10-08T08:36:12Z) - Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition [58.79807861739438]
Existing pedestrian recognition (PAR) algorithms are mainly developed based on a static image.
We propose to understand human attributes using video frames that can fully use temporal information.
arXiv Detail & Related papers (2024-04-27T14:43:32Z) - SequencePAR: Understanding Pedestrian Attributes via A Sequence
Generation Paradigm [18.53048511206039]
We propose a novel sequence generation paradigm for pedestrian attribute recognition, termed SequencePAR.
It extracts the pedestrian features using a pre-trained CLIP model and embeds the attribute set into query tokens under the guidance of text prompts.
The masked multi-head attention layer is introduced into the decoder module to prevent the model from remembering the next attribute while making attribute predictions during training.
arXiv Detail & Related papers (2023-12-04T05:42:56Z) - Increasing Performance And Sample Efficiency With Model-agnostic
Interactive Feature Attributions [3.0655581300025996]
We provide model-agnostic implementations for two popular explanation methods (Occlusion and Shapley values) to enforce entirely different attributions in the complex model.
We show how our proposed approach can significantly improve the model's performance only by augmenting its training dataset based on corrected explanations.
arXiv Detail & Related papers (2023-06-28T15:23:28Z) - Meta-Learning the Difference: Preparing Large Language Models for
Efficient Adaptation [11.960178399478718]
Large pretrained language models (PLMs) are often domain- or task-adapted via fine-tuning or prompting.
Instead, we prepare PLMs for data- and parameter-efficient adaptation by learning to learn the difference between general and adapted PLMs.
arXiv Detail & Related papers (2022-07-07T18:00:22Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak
Supervision [93.26737878221073]
We study the attribute mining problem in an open-world setting to extract novel attributes and their values.
We propose a principled framework that first generates attribute value candidates and then groups them into clusters of attributes.
Our model significantly outperforms strong baselines and can generalize to unseen attributes and product types.
arXiv Detail & Related papers (2022-04-29T04:16:04Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - Model-agnostic and Scalable Counterfactual Explanations via
Reinforcement Learning [0.5729426778193398]
We propose a deep reinforcement learning approach that transforms the optimization procedure into an end-to-end learnable process.
Our experiments on real-world data show that our method is model-agnostic, relying only on feedback from model predictions.
arXiv Detail & Related papers (2021-06-04T16:54:36Z) - AdaTag: Multi-Attribute Value Extraction from Product Profiles with
Adaptive Decoding [55.89773725577615]
We present AdaTag, which uses adaptive decoding to handle attribute extraction.
Our experiments on a real-world e-Commerce dataset show marked improvements over previous methods.
arXiv Detail & Related papers (2021-06-04T07:54:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.