Enhancing SDG-Text Classification with Combinatorial Fusion Analysis and Generative AI
- URL: http://arxiv.org/abs/2602.11168v1
- Date: Mon, 19 Jan 2026 02:56:03 GMT
- Title: Enhancing SDG-Text Classification with Combinatorial Fusion Analysis and Generative AI
- Authors: Jingyan Xu, Marcelo L. LaFleur, Christina Schweikert, D. Frank Hsu,
- Abstract summary: Social analysis with human context is an area that can benefit from text classification, as it relies substantially on text data.<n>We use a generative AI model to generate synthetic data for model training and then apply CFA to this classification task.<n>It is demonstrated that combining intelligence from multiple ML/AI models using CFA and getting input from human experts can, not only complement, but also enhance each other.
- Score: 3.0862493469454275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: (Natural Language Processing) NLP techniques such as text classification and topic discovery are very useful in many application areas including information retrieval, knowledge discovery, policy formulation, and decision-making. However, it remains a challenging problem in cases where the categories are unavailable, difficult to differentiate, or are interrelated. Social analysis with human context is an area that can benefit from text classification, as it relies substantially on text data. The focus of this paper is to enhance the classification of text according to the UN's Sustainable Development Goals (SDGs) by collecting and combining intelligence from multiple models. Combinatorial Fusion Analysis (CFA), a system fusion paradigm using a rank-score characteristic (RSC) function and cognitive diversity (CD), has been used to enhance classifier methods by combining a set of relatively good and mutually diverse classification models. We use a generative AI model to generate synthetic data for model training and then apply CFA to this classification task. The CFA technique achieves 96.73% performance, outperforming the best individual model. We compare the outcomes with those obtained from human domain experts. It is demonstrated that combining intelligence from multiple ML/AI models using CFA and getting input from human experts can, not only complement, but also enhance each other.
Related papers
- Enhancing Sentiment Classification with Machine Learning and Combinatorial Fusion [41.99844472131922]
This paper presents a novel approach to sentiment classification using the application of Combinatorial Fusion Analysis (CFA)<n>CFA leverages the concept of cognitive diversity, which utilizes rank-score characteristic functions to quantify the dissimilarity between models and strategically combine their predictions.<n> Experimental results also indicate that CFA outperforms traditional ensemble methods by effectively computing and employing model diversity.
arXiv Detail & Related papers (2025-10-30T21:30:30Z) - FAID: Fine-Grained AI-Generated Text Detection Using Multi-Task Auxiliary and Multi-Level Contrastive Learning [45.28976933063373]
We introduce a fine-grained detection framework FAID to classify text into three categories: human-written, LLM-generated, and human--LLM collaborative texts.<n>Our method combines multi-level contrastive learning with multi-task auxiliary classification to learn subtle stylistic cues.<n>Our experimental results demonstrate that FAID outperforms several baselines, particularly enhancing the generalization accuracy on unseen domains and new LLMs.
arXiv Detail & Related papers (2025-05-20T12:23:31Z) - GeAR: Generation Augmented Retrieval [82.20696567697016]
This paper introduces a novel method, $textbfGe$neration.<n>It improves the global document-Query similarity through contrastive learning, but also integrates well-designed fusion and decoding modules.<n>When used as a retriever, GeAR does not incur any additional computational cost over bi-encoders.
arXiv Detail & Related papers (2025-01-06T05:29:00Z) - Combining Autoregressive and Autoencoder Language Models for Text Classification [1.0878040851638]
CAALM-TC is a novel method that enhances text classification by integrating autoregressive and autoencoder language models.<n> Experimental results on four benchmark datasets demonstrate that CAALM consistently outperforms existing methods.
arXiv Detail & Related papers (2024-11-20T12:49:42Z) - AI-Aided Kalman Filters [65.35350122917914]
The Kalman filter (KF) and its variants are among the most celebrated algorithms in signal processing.<n>Recent developments illustrate the possibility of fusing deep neural networks (DNNs) with classic Kalman-type filtering.<n>This article provides a tutorial-style overview of design approaches for incorporating AI in aiding KF-type algorithms.
arXiv Detail & Related papers (2024-10-16T06:47:53Z) - Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Automatically Generating Numerous Context-Driven SFT Data for LLMs across Diverse Granularity [0.0]
AugCon is capable of automatically generating context-driven SFT data across multiple levels of granularity with high diversity, quality and fidelity.
We train a scorer through contrastive learning to collaborate with CST to rank and refine queries.
The results highlight the significant advantages of AugCon in producing high diversity, quality, and fidelity SFT data against several state-of-the-art methods.
arXiv Detail & Related papers (2024-05-26T14:14:18Z) - An Ensemble Approach to Question Classification: Integrating Electra
Transformer, GloVe, and LSTM [0.0]
This study presents an innovative ensemble approach for question classification, combining the strengths of Electra, GloVe, and LSTM models.
Rigorously tested on the well-regarded TREC dataset, the model demonstrates how the integration of these disparate technologies can lead to superior results.
arXiv Detail & Related papers (2023-08-13T18:14:10Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Enabling Classifiers to Make Judgements Explicitly Aligned with Human
Values [73.82043713141142]
Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values.
We introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command.
arXiv Detail & Related papers (2022-10-14T09:10:49Z) - An Evolutionary Approach for Creating of Diverse Classifier Ensembles [11.540822622379176]
We propose a framework for classifier selection and fusion based on a four-step protocol called CIF-E.
We implement and evaluate 24 varied ensemble approaches following the proposed CIF-E protocol.
Experiments show that the proposed evolutionary approach can outperform the state-of-the-art literature approaches in many well-known UCI datasets.
arXiv Detail & Related papers (2022-08-23T14:23:27Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.