Efficient Attribute Injection for Pretrained Language Models
        - URL: http://arxiv.org/abs/2109.07953v1
- Date: Thu, 16 Sep 2021 13:08:24 GMT
- Title: Efficient Attribute Injection for Pretrained Language Models
- Authors: Reinald Kim Amplayo and Kang Min Yoo and Sang-Woo Lee
- Abstract summary: We propose a lightweight and memory-efficient method to inject attributes to pretrained language models (PLMs)
To limit the increase of parameters especially when the attribute vocabulary is large, we use low-rank approximations and hypercomplex multiplications.
Our method outperforms previous attribute injection methods and achieves state-of-the-art performance on various datasets.
- Score: 20.39972635495006
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Metadata attributes (e.g., user and product IDs from reviews) can be
incorporated as additional inputs to neural-based NLP models, by modifying the
architecture of the models, in order to improve their performance. Recent
models however rely on pretrained language models (PLMs), where previously used
techniques for attribute injection are either nontrivial or ineffective. In
this paper, we propose a lightweight and memory-efficient method to inject
attributes to PLMs. We extend adapters, i.e. tiny plug-in feed-forward modules,
to include attributes both independently of or jointly with the text. To limit
the increase of parameters especially when the attribute vocabulary is large,
we use low-rank approximations and hypercomplex multiplications, significantly
decreasing the total parameters. We also introduce training mechanisms to
handle domains in which attributes can be multi-labeled or sparse. Extensive
experiments and analyses on eight datasets from different domains show that our
method outperforms previous attribute injection methods and achieves
state-of-the-art performance on various datasets.
 
      
        Related papers
        - Rethinking Data: Towards Better Performing Domain-Specific Small   Language Models [0.0]
 This paper presents our approach to finetuning a small Language Models (LM)
We achieve this by improving data quality at each stage of the LM training pipeline.
We improve the model generalization ability by merging the models fine-tuned with different parameters on different data subsets.
 arXiv  Detail & Related papers  (2025-03-03T12:19:12Z)
- Adaptive Prototype Model for Attribute-based Multi-label Few-shot Action   Recognition [11.316708754749103]
 In real-world action recognition systems, incorporating more attributes helps achieve a more comprehensive understanding of human behavior.
We propose a novel method i.e. Adaptive Attribute Prototype Model (AAPM) for human action recognition, which captures rich action-relevant attribute information.
Our AAPM achieves the state-of-the-art performance in both attribute-based multi-label few-shot action recognition and single-label few-shot action recognition.
 arXiv  Detail & Related papers  (2025-02-18T06:39:28Z)
- ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
 ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
 arXiv  Detail & Related papers  (2024-12-11T12:31:30Z)
- CASA: Class-Agnostic Shared Attributes in Vision-Language Models for   Efficient Incremental Object Detection [30.46562066023117]
 We propose a novel method utilizing attributes in vision-language foundation models for incremental object detection.
Our method constructs a Class-Agnostic Shared Attribute base (CASA) to capture common semantic information among incremental classes.
Our method adds only 0.7% to parameter storage through parameter-efficient fine-tuning to significantly enhance the scalability and adaptability of our proposed method.
 arXiv  Detail & Related papers  (2024-10-08T08:36:12Z)
- Spatio-Temporal Side Tuning Pre-trained Foundation Models for   Video-based Pedestrian Attribute Recognition [58.79807861739438]
 Existing pedestrian recognition (PAR) algorithms are mainly developed based on a static image.
We propose to understand human attributes using video frames that can fully use temporal information.
 arXiv  Detail & Related papers  (2024-04-27T14:43:32Z)
- SequencePAR: Understanding Pedestrian Attributes via A Sequence
  Generation Paradigm [18.53048511206039]
 We propose a novel sequence generation paradigm for pedestrian attribute recognition, termed SequencePAR.
It extracts the pedestrian features using a pre-trained CLIP model and embeds the attribute set into query tokens under the guidance of text prompts.
The masked multi-head attention layer is introduced into the decoder module to prevent the model from remembering the next attribute while making attribute predictions during training.
 arXiv  Detail & Related papers  (2023-12-04T05:42:56Z)
- Increasing Performance And Sample Efficiency With Model-agnostic
  Interactive Feature Attributions [3.0655581300025996]
 We provide model-agnostic implementations for two popular explanation methods (Occlusion and Shapley values) to enforce entirely different attributions in the complex model.
We show how our proposed approach can significantly improve the model's performance only by augmenting its training dataset based on corrected explanations.
 arXiv  Detail & Related papers  (2023-06-28T15:23:28Z)
- Meta-Learning the Difference: Preparing Large Language Models for
  Efficient Adaptation [11.960178399478718]
 Large pretrained language models (PLMs) are often domain- or task-adapted via fine-tuning or prompting.
Instead, we prepare PLMs for data- and parameter-efficient adaptation by learning to learn the difference between general and adapted PLMs.
 arXiv  Detail & Related papers  (2022-07-07T18:00:22Z)
- MACE: An Efficient Model-Agnostic Framework for Counterfactual
  Explanation [132.77005365032468]
 We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
 arXiv  Detail & Related papers  (2022-05-31T04:57:06Z)
- OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak
  Supervision [93.26737878221073]
 We study the attribute mining problem in an open-world setting to extract novel attributes and their values.
We propose a principled framework that first generates attribute value candidates and then groups them into clusters of attributes.
Our model significantly outperforms strong baselines and can generalize to unseen attributes and product types.
 arXiv  Detail & Related papers  (2022-04-29T04:16:04Z)
- MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
  Adaptation [68.30497162547768]
 We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
 arXiv  Detail & Related papers  (2022-04-15T23:19:37Z)
- Efficient Nearest Neighbor Language Models [114.40866461741795]
 Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
 arXiv  Detail & Related papers  (2021-09-09T12:32:28Z)
- Model-agnostic and Scalable Counterfactual Explanations via
  Reinforcement Learning [0.5729426778193398]
 We propose a deep reinforcement learning approach that transforms the optimization procedure into an end-to-end learnable process.
Our experiments on real-world data show that our method is model-agnostic, relying only on feedback from model predictions.
 arXiv  Detail & Related papers  (2021-06-04T16:54:36Z)
- AdaTag: Multi-Attribute Value Extraction from Product Profiles with
  Adaptive Decoding [55.89773725577615]
 We present AdaTag, which uses adaptive decoding to handle attribute extraction.
Our experiments on a real-world e-Commerce dataset show marked improvements over previous methods.
 arXiv  Detail & Related papers  (2021-06-04T07:54:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.