Hallucination Detection and Mitigation in Scientific Text Simplification using Ensemble Approaches: DS@GT at CLEF 2025 SimpleText
- URL: http://arxiv.org/abs/2508.11823v1
- Date: Fri, 15 Aug 2025 21:57:27 GMT
- Title: Hallucination Detection and Mitigation in Scientific Text Simplification using Ensemble Approaches: DS@GT at CLEF 2025 SimpleText
- Authors: Krishna Chaitanya Marturi, Heba H. Elwazzan,
- Abstract summary: We describe our methodology for the CLEF 2025 SimpleText Task 2, which focuses on detecting and evaluating creative generation and information distortion.<n>Our solution integrates multiple strategies: we construct an ensemble framework that leverages BERT-based classifier, semantic similarity measure, natural language inference model, and large language model.<n>For grounded generation, we employ an LLM-based post-editing system that revises simplifications based on the original input texts.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we describe our methodology for the CLEF 2025 SimpleText Task 2, which focuses on detecting and evaluating creative generation and information distortion in scientific text simplification. Our solution integrates multiple strategies: we construct an ensemble framework that leverages BERT-based classifier, semantic similarity measure, natural language inference model, and large language model (LLM) reasoning. These diverse signals are combined using meta-classifiers to enhance the robustness of spurious and distortion detection. Additionally, for grounded generation, we employ an LLM-based post-editing system that revises simplifications based on the original input texts.
Related papers
- Text Simplification with Sentence Embeddings [4.484170173286332]
We learn to learn a transformation between sentence embeddings representing high-complexity and low-complexity texts.<n>We conclude that learning transformations in sentence embedding space is a promising direction for future research.
arXiv Detail & Related papers (2025-10-28T12:41:10Z) - Trace Is In Sentences: Unbiased Lightweight ChatGPT-Generated Text Detector [2.11622808613962]
We introduce a novel task to detect both original and PSP-modified AI-generated texts.<n>We propose a lightweight framework that classifies texts based on their internal structure.<n>Our approach encodes sentence embeddings from pre-trained language models and models their relationships via attention.
arXiv Detail & Related papers (2025-09-23T02:00:35Z) - LLM-Guided Planning and Summary-Based Scientific Text Simplification: DS@GT at CLEF 2025 SimpleText [0.0]
We present our approach for the CLEF 2025 SimpleText Task 1, which addresses both sentence-level and document-level scientific text simplification.<n>For sentence-level simplification, our methodology employs large language models (LLMs) to first generate a structured plan, followed by plan-driven simplification of individual sentences.<n>At the document level, we leverage LLMs to produce concise summaries and subsequently guide the simplification process using these summaries.
arXiv Detail & Related papers (2025-08-15T21:44:52Z) - Semantic Consistency Regularization with Large Language Models for Semi-supervised Sentiment Analysis [20.503153899462323]
We propose a framework for semi-supervised sentiment analysis.<n>We introduce two prompting strategies to semantically enhance unlabeled text.<n> Experiments show our method achieves remarkable performance over prior semi-supervised methods.
arXiv Detail & Related papers (2025-01-29T12:03:11Z) - SEFD: Semantic-Enhanced Framework for Detecting LLM-Generated Text [12.639191350218528]
We present a novel semantic-enhanced framework for detecting large language models (LLMs)-generated text (SEFD)
Our framework improves upon existing detection methods by systematically integrating retrieval-based techniques with traditional detectors.
We showcase the effectiveness of our approach in sequential text scenarios common in real-world applications, such as online forums and Q&A platforms.
arXiv Detail & Related papers (2024-11-17T20:13:30Z) - Mind the Gap: A Generalized Approach for Cross-Modal Embedding Alignment [0.0]
Retrieval-Augmented Generation (RAG) systems retrieve context across different text modalities due to semantic gaps.
We introduce a generalized projection-based method, inspired by adapter modules in transfer learning, that efficiently bridges these gaps.
Our approach emphasizes speed, accuracy, and data efficiency, requiring minimal resources for training and inference.
arXiv Detail & Related papers (2024-10-30T20:28:10Z) - Unveiling Large Language Models Generated Texts: A Multi-Level Fine-Grained Detection Framework [9.976099891796784]
Large language models (LLMs) have transformed human writing by enhancing grammar correction, content expansion, and stylistic refinement.
Existing detection methods, which mainly rely on single-feature analysis and binary classification, often fail to effectively identify LLM-generated text in academic contexts.
We propose a novel Multi-level Fine-grained Detection framework that detects LLM-generated text by integrating low-level structural, high-level semantic, and deep-level linguistic features.
arXiv Detail & Related papers (2024-10-18T07:25:00Z) - Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text [61.22649031769564]
We propose a novel framework, paraphrased text span detection (PTD)
PTD aims to identify paraphrased text spans within a text.
We construct a dedicated dataset, PASTED, for paraphrased text span detection.
arXiv Detail & Related papers (2024-05-21T11:22:27Z) - Sequential Visual and Semantic Consistency for Semi-supervised Text
Recognition [56.968108142307976]
Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training.
Most existing STR methods resort to synthetic data, which may introduce domain discrepancy and degrade the performance of STR models.
This paper proposes a novel semi-supervised learning method for STR that incorporates word-level consistency regularization from both visual and semantic aspects.
arXiv Detail & Related papers (2024-02-24T13:00:54Z) - Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism.
Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs.
We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users.
Recent works have proposed algorithms to detect LLM-generated text and protect LLMs.
We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z) - Classifiers are Better Experts for Controllable Text Generation [63.17266060165098]
We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts.
The same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.
arXiv Detail & Related papers (2022-05-15T12:58:35Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z) - Controllable Text Simplification with Explicit Paraphrasing [88.02804405275785]
Text Simplification improves the readability of sentences through several rewriting transformations, such as lexical paraphrasing, deletion, and splitting.
Current simplification systems are predominantly sequence-to-sequence models that are trained end-to-end to perform all these operations simultaneously.
We propose a novel hybrid approach that leverages linguistically-motivated rules for splitting and deletion, and couples them with a neural paraphrasing model to produce varied rewriting styles.
arXiv Detail & Related papers (2020-10-21T13:44:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.