Iterative Augmentation with Summarization Refinement (IASR) Evaluation for Unstructured Survey data Modeling and Analysis
- URL: http://arxiv.org/abs/2507.12126v1
- Date: Wed, 16 Jul 2025 10:49:30 GMT
- Title: Iterative Augmentation with Summarization Refinement (IASR) Evaluation for Unstructured Survey data Modeling and Analysis
- Authors: Payal Bhattad, Sai Manoj Pudukotai Dinakarrao, Anju Gupta,
- Abstract summary: This work introduces a principled evaluation framework for large language model (LLM) based text augmentation.<n> Empirical evaluations show that GPT-3.5 Turbo achieved the best balance of semantic fidelity, diversity, and generation efficiency.
- Score: 0.43988112145759295
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Text data augmentation is a widely used strategy for mitigating data sparsity in natural language processing (NLP), particularly in low-resource settings where limited samples hinder effective semantic modeling. While augmentation can improve input diversity and downstream interpretability, existing techniques often lack mechanisms to ensure semantic preservation during large-scale or iterative generation, leading to redundancy and instability. This work introduces a principled evaluation framework for large language model (LLM) based text augmentation, comprising two components: (1) Scalability Analysis, which measures semantic consistency as augmentation volume increases, and (2) Iterative Augmentation with Summarization Refinement (IASR), which evaluates semantic drift across recursive paraphrasing cycles. Empirical evaluations across state-of-the-art LLMs show that GPT-3.5 Turbo achieved the best balance of semantic fidelity, diversity, and generation efficiency. Applied to a real-world topic modeling task using BERTopic with GPT-enhanced few-shot labeling, the proposed approach results in a 400% increase in topic granularity and complete elimination of topic overlaps. These findings validated the utility of the proposed frameworks for structured evaluation of LLM-based augmentation in practical NLP pipelines.
Related papers
- Arg-LLaDA: Argument Summarization via Large Language Diffusion Models and Sufficiency-Aware Refinement [14.24815847815289]
We introduce Arg-LLaDA, a novel large language diffusion framework that iteratively improves summaries.<n>Our method combines a flexible masking controller with a sufficiency-checking module to identify and revise unsupported, redundant, or incomplete spans.<n> Empirical results on two benchmark datasets demonstrate that Arg-LLaDA surpasses state-of-the-art baselines in 7 out of 10 automatic evaluation metrics.
arXiv Detail & Related papers (2025-07-25T09:07:52Z) - Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders [50.52694757593443]
Existing SAE training algorithms often lack rigorous mathematical guarantees and suffer from practical limitations.<n>We first propose a novel statistical framework for the feature recovery problem, which includes a new notion of feature identifiability.<n>We introduce a new SAE training algorithm based on bias adaptation'', a technique that adaptively adjusts neural network bias parameters to ensure appropriate activation sparsity.
arXiv Detail & Related papers (2025-06-16T20:58:05Z) - Semantic-preserved Augmentation with Confidence-weighted Fine-tuning for Aspect Category Sentiment Analysis [3.1394848827666544]
Large language model (LLM) is an effective approach to addressing data scarcity in low-resource scenarios.<n>We introduce a data augmentation strategy for the aspect category sentiment analysis task.<n>We employ a post-processing technique to ensure semantic consistency between the generated sentence and the original sentence.
arXiv Detail & Related papers (2025-06-08T13:53:28Z) - Adaptive Spatial Augmentation for Semi-supervised Semantic Segmentation [51.645152962504056]
In semi-supervised semantic segmentation, data augmentation plays a crucial role in the weak-to-strong consistency regularization framework.<n>We show that spatial augmentation can contribute to model training in SSSS, despite generating inconsistent masks between the weak and strong augmentations.<n>We propose an adaptive augmentation strategy that dynamically adjusts the augmentation for each instance based on entropy.
arXiv Detail & Related papers (2025-05-29T13:35:48Z) - T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation [60.620408007636016]
We propose T2I-Eval-R1, a novel reinforcement learning framework that trains open-source MLLMs using only coarse-grained quality scores.<n>Our approach integrates Group Relative Policy Optimization into the instruction-tuning process, enabling models to generate both scalar scores and interpretable reasoning chains.
arXiv Detail & Related papers (2025-05-23T13:44:59Z) - LARES: Latent Reasoning for Sequential Recommendation [96.26996622771593]
We present LARES, a novel and scalable LAtent REasoning framework for Sequential recommendation.<n>Our proposed approach employs a recurrent architecture that allows flexible expansion of reasoning depth without increasing parameter complexity.<n>Our framework exhibits seamless compatibility with existing advanced models, further improving their recommendation performance.
arXiv Detail & Related papers (2025-05-22T16:22:54Z) - CSE-SFP: Enabling Unsupervised Sentence Representation Learning via a Single Forward Pass [3.0566617373924325]
Recent advances in pre-trained language models (PLMs) have driven remarkable progress in this field.<n>We propose CSE-SFP, an innovative method that exploits the structural characteristics of generative models.<n>We show that CSE-SFP not only produces higher-quality embeddings but also significantly reduces both training time and memory consumption.
arXiv Detail & Related papers (2025-05-01T08:27:14Z) - Unified Enhancement of the Generalization and Robustness of Language Models via Bi-Stage Optimization [2.502393972789905]
We propose a bi-stage optimization framework to uniformly enhance both the generalization and robustness of LMs.<n>We show that our method significantly improves the generalization and robustness of LMs compared to other existing methods.
arXiv Detail & Related papers (2025-03-19T13:50:36Z) - Semantic Consistency Regularization with Large Language Models for Semi-supervised Sentiment Analysis [20.503153899462323]
We propose a framework for semi-supervised sentiment analysis.<n>We introduce two prompting strategies to semantically enhance unlabeled text.<n> Experiments show our method achieves remarkable performance over prior semi-supervised methods.
arXiv Detail & Related papers (2025-01-29T12:03:11Z) - Enhancing SLM via ChatGPT and Dataset Augmentation [0.3844771221441211]
We employ knowledge distillation-based techniques and synthetic dataset augmentation to bridge the performance gap between large language models (LLMs) and small language models (SLMs)
Our methods involve two forms of rationale generation--information extraction and informed reasoning--to enrich the ANLI dataset.
Our findings reveal that the incorporation of synthetic rationales significantly improves the model's ability to comprehend natural language, leading to 1.3% and 2.3% higher classification accuracy, respectively, on the ANLI dataset.
arXiv Detail & Related papers (2024-09-19T09:24:36Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.