Related papers: Iterative Augmentation with Summarization Refinement (IASR) Evaluation for Unstructured Survey data Modeling and Analysis

Iterative Augmentation with Summarization Refinement (IASR) Evaluation for Unstructured Survey data Modeling and Analysis

URL: http://arxiv.org/abs/2507.12126v1
Date: Wed, 16 Jul 2025 10:49:30 GMT
Title: Iterative Augmentation with Summarization Refinement (IASR) Evaluation for Unstructured Survey data Modeling and Analysis
Authors: Payal Bhattad, Sai Manoj Pudukotai Dinakarrao, Anju Gupta,
Abstract summary: This work introduces a principled evaluation framework for large language model (LLM) based text augmentation.<n> Empirical evaluations show that GPT-3.5 Turbo achieved the best balance of semantic fidelity, diversity, and generation efficiency.
Score: 0.43988112145759295
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Text data augmentation is a widely used strategy for mitigating data sparsity in natural language processing (NLP), particularly in low-resource settings where limited samples hinder effective semantic modeling. While augmentation can improve input diversity and downstream interpretability, existing techniques often lack mechanisms to ensure semantic preservation during large-scale or iterative generation, leading to redundancy and instability. This work introduces a principled evaluation framework for large language model (LLM) based text augmentation, comprising two components: (1) Scalability Analysis, which measures semantic consistency as augmentation volume increases, and (2) Iterative Augmentation with Summarization Refinement (IASR), which evaluates semantic drift across recursive paraphrasing cycles. Empirical evaluations across state-of-the-art LLMs show that GPT-3.5 Turbo achieved the best balance of semantic fidelity, diversity, and generation efficiency. Applied to a real-world topic modeling task using BERTopic with GPT-enhanced few-shot labeling, the proposed approach results in a 400% increase in topic granularity and complete elimination of topic overlaps. These findings validated the utility of the proposed frameworks for structured evaluation of LLM-based augmentation in practical NLP pipelines.

Related papers

Arg-LLaDA: Argument Summarization via Large Language Diffusion Models and Sufficiency-Aware Refinement [14.24815847815289]
We introduce Arg-LLaDA, a novel large language diffusion framework that iteratively improves summaries.<n>Our method combines a flexible masking controller with a sufficiency-checking module to identify and revise unsupported, redundant, or incomplete spans.<n> Empirical results on two benchmark datasets demonstrate that Arg-LLaDA surpasses state-of-the-art baselines in 7 out of 10 automatic evaluation metrics.
arXiv Detail & Related papers (2025-07-25T09:07:52Z)
Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders [50.52694757593443]
Existing SAE training algorithms often lack rigorous mathematical guarantees and suffer from practical limitations.<n>We first propose a novel statistical framework for the feature recovery problem, which includes a new notion of feature identifiability.<n>We introduce a new SAE training algorithm based on bias adaptation'', a technique that adaptively adjusts neural network bias parameters to ensure appropriate activation sparsity.
arXiv Detail & Related papers (2025-06-16T20:58:05Z)
Semantic-preserved Augmentation with Confidence-weighted Fine-tuning for Aspect Category Sentiment Analysis [3.1394848827666544]
Large language model (LLM) is an effective approach to addressing data scarcity in low-resource scenarios.<n>We introduce a data augmentation strategy for the aspect category sentiment analysis task.<n>We employ a post-processing technique to ensure semantic consistency between the generated sentence and the original sentence.
arXiv Detail & Related papers (2025-06-08T13:53:28Z)
Adaptive Spatial Augmentation for Semi-supervised Semantic Segmentation [51.645152962504056]
In semi-supervised semantic segmentation, data augmentation plays a crucial role in the weak-to-strong consistency regularization framework.<n>We show that spatial augmentation can contribute to model training in SSSS, despite generating inconsistent masks between the weak and strong augmentations.<n>We propose an adaptive augmentation strategy that dynamically adjusts the augmentation for each instance based on entropy.
arXiv Detail & Related papers (2025-05-29T13:35:48Z)
T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation [60.620408007636016]
We propose T2I-Eval-R1, a novel reinforcement learning framework that trains open-source MLLMs using only coarse-grained quality scores.<n>Our approach integrates Group Relative Policy Optimization into the instruction-tuning process, enabling models to generate both scalar scores and interpretable reasoning chains.
arXiv Detail & Related papers (2025-05-23T13:44:59Z)
LARES: Latent Reasoning for Sequential Recommendation [96.26996622771593]
We present LARES, a novel and scalable LAtent REasoning framework for Sequential recommendation.<n>Our proposed approach employs a recurrent architecture that allows flexible expansion of reasoning depth without increasing parameter complexity.<n>Our framework exhibits seamless compatibility with existing advanced models, further improving their recommendation performance.
arXiv Detail & Related papers (2025-05-22T16:22:54Z)
CSE-SFP: Enabling Unsupervised Sentence Representation Learning via a Single Forward Pass [3.0566617373924325]
Recent advances in pre-trained language models (PLMs) have driven remarkable progress in this field.<n>We propose CSE-SFP, an innovative method that exploits the structural characteristics of generative models.<n>We show that CSE-SFP not only produces higher-quality embeddings but also significantly reduces both training time and memory consumption.
arXiv Detail & Related papers (2025-05-01T08:27:14Z)
Unified Enhancement of the Generalization and Robustness of Language Models via Bi-Stage Optimization [2.502393972789905]
We propose a bi-stage optimization framework to uniformly enhance both the generalization and robustness of LMs.<n>We show that our method significantly improves the generalization and robustness of LMs compared to other existing methods.
arXiv Detail & Related papers (2025-03-19T13:50:36Z)
Semantic Consistency Regularization with Large Language Models for Semi-supervised Sentiment Analysis [20.503153899462323]
We propose a framework for semi-supervised sentiment analysis.<n>We introduce two prompting strategies to semantically enhance unlabeled text.<n> Experiments show our method achieves remarkable performance over prior semi-supervised methods.
arXiv Detail & Related papers (2025-01-29T12:03:11Z)
Enhancing SLM via ChatGPT and Dataset Augmentation [0.3844771221441211]
We employ knowledge distillation-based techniques and synthetic dataset augmentation to bridge the performance gap between large language models (LLMs) and small language models (SLMs) Our methods involve two forms of rationale generation--information extraction and informed reasoning--to enrich the ANLI dataset. Our findings reveal that the incorporation of synthetic rationales significantly improves the model's ability to comprehend natural language, leading to 1.3% and 2.3% higher classification accuracy, respectively, on the ANLI dataset.
arXiv Detail & Related papers (2024-09-19T09:24:36Z)
Contextualization Distillation from Large Language Model for Knowledge Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks. Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments. Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation. Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.