A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis
- URL: http://arxiv.org/abs/2204.05356v1
- Date: Mon, 11 Apr 2022 18:31:53 GMT
- Title: A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis
- Authors: Ehsan Hosseini-Asl, Wenhao Liu, Caiming Xiong
- Abstract summary: We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
- Score: 90.24921443175514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sentiment analysis is an important task in natural language processing. In
recent works, pre-trained language models are often used to achieve
state-of-the-art results, especially when training data is scarce. It is common
to fine-tune on the downstream task, usually by adding task-specific layers on
top of the model. In this paper, we focus on aspect-based sentiment analysis,
which involves extracting aspect term, category, and predicting their
corresponding polarities. In particular, we are interested in few-shot
settings. We propose to reformulate the extraction and prediction tasks into
the sequence generation task, using a generative language model with
unidirectional attention (GPT2 is used unless stated otherwise). This way, the
model learns to accomplish the tasks via language generation without the need
of training task-specific layers. Our evaluation results on the single-task
polarity prediction show that our approach outperforms the previous
state-of-the-art (based on BERT) on average performance by a large margins in
few-shot and full-shot settings. More importantly, our generative approach
significantly reduces the model variance caused by low-resource data. We
further demonstrate that the proposed generative language model can handle
joint and multi-task settings, unlike previous work. We observe that the
proposed sequence generation method achieves further improved performances on
polarity prediction when the model is trained via joint and multi-task
settings. Further evaluation on similar sentiment analysis datasets, SST-2,
SST- and OOS intent detection validates the superiority and noise robustness of
generative language model in few-shot settings.
Related papers
- P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs [84.24644520272835]
Large language models (LLMs) showcase varied multilingual capabilities across tasks like translation, code generation, and reasoning.
Previous assessments often limited their scope to fundamental natural language processing (NLP) or isolated capability-specific tasks.
We present a pipeline for selecting available and reasonable benchmarks from massive ones, addressing the oversight in previous work regarding the utility of these benchmarks.
We introduce P-MMEval, a large-scale benchmark covering effective fundamental and capability-specialized datasets.
arXiv Detail & Related papers (2024-11-14T01:29:36Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - Multi Task Learning For Zero Shot Performance Prediction of Multilingual
Models [12.759281077118567]
Massively Multilingual Transformer based Language Models have been observed to be surprisingly effective on zero-shot transfer across languages.
We build upon some of the existing techniques for predicting the zero-shot performance on a task, by modeling it as a multi-task learning problem.
arXiv Detail & Related papers (2022-05-12T14:47:03Z) - ANNA: Enhanced Language Representation for Question Answering [5.713808202873983]
We show how approaches affect performance individually and that the approaches are jointly considered in pre-training models.
We propose an extended pre-training task, and a new neighbor-aware mechanism that attends neighboring tokens more to capture the richness of context for pre-training language modeling.
Our best model achieves new state-of-the-art results of 95.7% F1 and 90.6% EM on SQuAD 1.1 and also outperforms existing pre-trained language models such as RoBERTa, ALBERT, ELECTRA, and XLNet.
arXiv Detail & Related papers (2022-03-28T05:26:52Z) - Learning Better Sentence Representation with Syntax Information [0.0]
We propose a novel approach to combining syntax information with a pre-trained language model.
Our model achieves 91.2% accuracy, outperforming the baseline model by 37.8% on sentence completion task.
arXiv Detail & Related papers (2021-01-09T12:15:08Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Multi-task Learning of Negation and Speculation for Targeted Sentiment
Classification [15.85111852764517]
We show that targeted sentiment models are not robust to linguistic phenomena, specifically negation and speculation.
We propose a multi-task learning method to incorporate information from syntactic and semantic auxiliary tasks, including negation and speculation scope detection.
We create two challenge datasets to evaluate model performance on negated and speculative samples.
arXiv Detail & Related papers (2020-10-16T11:20:03Z) - Exploring Versatile Generative Language Model Via Parameter-Efficient
Transfer Learning [70.81910984985683]
We propose an effective way to fine-tune multiple down-stream generation tasks simultaneously using a single, large pre-trained model.
The experiments on five diverse language generation tasks show that by just using an additional 2-3% parameters for each task, our model can maintain or even improve the performance of fine-tuning the whole model.
arXiv Detail & Related papers (2020-04-08T06:18:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.