Related papers: Iterative Data Augmentation with Large Language Models for Aspect-based Sentiment Analysis

Iterative Data Augmentation with Large Language Models for Aspect-based Sentiment Analysis

URL: http://arxiv.org/abs/2407.00341v1
Date: Sat, 29 Jun 2024 07:00:37 GMT
Title: Iterative Data Augmentation with Large Language Models for Aspect-based Sentiment Analysis
Authors: Haiyun Li, Qihuang Zhong, Ke Zhu, Juhua Liu, Bo Du, Dacheng Tao,
Abstract summary: Aspect-based Sentiment Analysis (ABSA) is an important sentiment analysis task, which aims to determine the sentiment polarity towards an aspect in a sentence. Due to the expensive and limited labeled data, data augmentation (DA) has become the standard for improving the performance of ABSA. We propose a systematic Iterative Data augmentation framework, namely IterD, to boost the performance of ABSA.
Score: 82.98490089763175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Aspect-based Sentiment Analysis (ABSA) is an important sentiment analysis task, which aims to determine the sentiment polarity towards an aspect in a sentence. Due to the expensive and limited labeled data, data augmentation (DA) has become the standard for improving the performance of ABSA. However, current DA methods usually have some shortcomings: 1) poor fluency and coherence, 2) lack of diversity of generated data, and 3) reliance on some existing labeled data, hindering its applications in real-world scenarios. In response to these problems, we propose a systematic Iterative Data augmentation framework, namely IterD, to boost the performance of ABSA. The core of IterD is to leverage the powerful ability of large language models (LLMs) to iteratively generate more fluent and diverse synthetic labeled data, starting from an unsupervised sentence corpus. Extensive experiments on 4 widely-used ABSA benchmarks show that IterD brings consistent and significant performance gains among 5 baseline ABSA models. More encouragingly, the synthetic data generated by IterD can achieve comparable or even better performance against the manually annotated data.

Related papers

Balanced Training Data Augmentation for Aspect-Based Sentiment Analysis [21.540505918226348]
Aspect-based sentiment analysis (ABSA) is a crucial fine-grained task in social media scenarios.<n>In this paper, we propose an LLM-based ABSA approach with training data augmentation.
arXiv Detail & Related papers (2025-07-13T04:07:07Z)
MARAGE: Transferable Multi-Model Adversarial Attack for Retrieval-Augmented Generation Data Extraction [6.917134562107388]
Retrieval-Augmented Generation (RAG) offers a solution to hallucinations in Large Language Models (LLMs) by grounding their outputs to knowledge retrieved from external sources. Existing RAG extraction attacks often rely on manually crafted prompts, which limit their effectiveness. We introduce a framework called MARAGE for optimizing an adversarial string that, when appended to user queries submitted to a target RAG system, causes outputs containing the retrieved RAG data.
arXiv Detail & Related papers (2025-02-05T00:17:01Z)
DS$^2$-ABSA: Dual-Stream Data Synthesis with Label Refinement for Few-Shot Aspect-Based Sentiment Analysis [28.40606116720525]
DS$2$-ABSA is a dual-stream data synthesis framework for few-shot sentiment analysis. It generates diverse and high-quality ABSA samples in low-resource settings.
arXiv Detail & Related papers (2024-12-19T13:39:47Z)
Evaluating Language Models as Synthetic Data Generators [74.80905172696366]
AgoraBench is a benchmark that provides standardized settings and metrics to evaluate LMs' data generation abilities. Through synthesizing 1.26 million training instances using 6 LMs and training 99 student models, we uncover key insights about LMs' data generation capabilities.
arXiv Detail & Related papers (2024-12-04T19:20:32Z)
Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models [79.65071553905021]
We propose Data Advisor, a method for generating data that takes into account the characteristics of the desired dataset. Data Advisor monitors the status of the generated data, identifies weaknesses in the current dataset, and advises the next iteration of data generation.
arXiv Detail & Related papers (2024-10-07T17:59:58Z)
Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation [13.120801609024147]
retrieval augmented generation (RAG) has been shown to enhance factuality of large language model (LLM) outputs. RAG inputs are more complex than most datasets used for training NLI models. We introduce Automatic Generative Domain Adaptation (Auto-GDA) to enable unsupervised domain adaptation.
arXiv Detail & Related papers (2024-10-04T14:21:27Z)
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models [88.16197692794707]
UniGen is a comprehensive framework designed to produce diverse, accurate, and highly controllable datasets. To augment data diversity, UniGen incorporates an attribute-guided generation module and a group checking feature. Extensive experiments demonstrate the superior quality of data generated by UniGen.
arXiv Detail & Related papers (2024-06-27T07:56:44Z)
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity. Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data. Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z)
ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs [60.81649785463651]
We introduce ExaRanker-Open, where we adapt and explore the use of open-source language models to generate explanations. Our findings reveal that incorporating explanations consistently enhances neural rankers, with benefits escalating as the LLM size increases.
arXiv Detail & Related papers (2024-02-09T11:23:14Z)
Improving Pseudo-labelling and Enhancing Robustness for Semi-Supervised Domain Generalization [7.9776163947539755]
We study the problem of Semi-Supervised Domain Generalization which is crucial for real-world applications like automated healthcare. We propose new SSDG approach, which utilizes a novel uncertainty-guided pseudo-labelling with model averaging. Our uncertainty-guided pseudo-labelling (UPL) uses model uncertainty to improve pseudo-labelling selection, addressing poor model calibration under multi-source unlabelled data.
arXiv Detail & Related papers (2024-01-25T05:55:44Z)
AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications [5.465142671132731]
Adversarial testing of large language models (LLMs) is crucial for their safe and responsible deployment. We introduce a novel approach for automated generation of adversarial evaluation datasets to test the safety of LLM generations on new downstream applications. We call it AI-assisted Red-Teaming (AART) - an automated alternative to current manual red-teaming efforts.
arXiv Detail & Related papers (2023-11-14T23:28:23Z)
Targeted Data Generation: Finding and Fixing Model Weaknesses [6.9649605149785465]
Even when aggregate accuracy is high, state-of-the-art NLP models often fail systematically on specific subgroups of data. We propose Targeted Data Generation (TDG), a framework that automatically identifies challenging subgroups. In experiments, TDG significantly improves the accuracy on challenging subgroups for state-of-the-art sentiment analysis and natural language inference models.
arXiv Detail & Related papers (2023-05-28T19:36:50Z)
DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences. Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.