Leveraging Large Language Models for Automated Definition Extraction with TaxoMatic A Case Study on Media Bias
- URL: http://arxiv.org/abs/2504.00343v1
- Date: Tue, 01 Apr 2025 01:47:16 GMT
- Title: Leveraging Large Language Models for Automated Definition Extraction with TaxoMatic A Case Study on Media Bias
- Authors: Timo Spinde, Luyang Lin, Smi Hinterreiter, Isao Echizen,
- Abstract summary: TaxoMatic is a framework that automates definition extraction from academic literature.<n>Analysed on a dataset of 2,398 manually rated articles, Claude-3-sonnet achieved the best results in both relevance classification and definition extraction.
- Score: 6.581006293385288
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces TaxoMatic, a framework that leverages large language models to automate definition extraction from academic literature. Focusing on the media bias domain, the framework encompasses data collection, LLM-based relevance classification, and extraction of conceptual definitions. Evaluated on a dataset of 2,398 manually rated articles, the study demonstrates the frameworks effectiveness, with Claude-3-sonnet achieving the best results in both relevance classification and definition extraction. Future directions include expanding datasets and applying TaxoMatic to additional domains.
Related papers
- MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs [54.5729817345543]
MOLE is a framework that automatically extracts metadata attributes from scientific papers covering datasets of languages other than Arabic.<n>Our methodology processes entire documents across multiple input formats and incorporates robust validation mechanisms for consistent output.
arXiv Detail & Related papers (2025-05-26T10:31:26Z) - ConExion: Concept Extraction with Large Language Models [0.6472397166280683]
We present an approach for concept extraction from documents using pre-trained large language models (LLMs)
Our approach tackles a more challenging task of extracting all present concepts related to the specific domain, not just the important ones.
arXiv Detail & Related papers (2025-04-17T13:05:14Z) - LLMTaxo: Leveraging Large Language Models for Constructing Taxonomy of Factual Claims from Social Media [4.808928577524327]
This paper introduces LLMTaxo, a novel framework for the automated construction of taxonomy of factual claims from social media.
We implement this framework with different models across three distinct datasets and introduce specially designed taxonomy evaluation metrics.
With the evaluations from both human evaluators and GPT-4, the results indicate that LLMTaxo effectively categorizes factual claims from social media.
arXiv Detail & Related papers (2025-04-11T18:43:16Z) - How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks.
We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z) - Benchmarking pre-trained text embedding models in aligning built asset information [0.0]
This study presents a comparative benchmark of state-of-the-art text embedding models to evaluate their effectiveness in aligning built asset information with domain-specific technical concepts.
The results of our benchmarking across six proposed datasets, covering three tasks of clustering, retrieval, and reranking, highlight the need for future research on domain adaptation techniques.
arXiv Detail & Related papers (2024-11-18T20:54:17Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - CiteFusion: An Ensemble Framework for Citation Intent Classification Harnessing Dual-Model Binary Couples and SHAP Analyses [1.7812428873698407]
This study introduces CiteFusion, an ensemble framework designed to address the multiclass Citation Intent Classification task.<n>CiteFusion achieves state-of-the-art performance, with Macro-F1 scores of 89.60% on SciCite and 76.24% on ACL-ARC.<n>We release a web-based application that classifies citation intents leveraging CiteFusion models developed on SciCite.
arXiv Detail & Related papers (2024-07-18T09:29:33Z) - Improving Attributed Text Generation of Large Language Models via Preference Learning [28.09715554543885]
We model the attribution task as preference learning and introduce an Automatic Preference Optimization framework.
APO achieves state-of-the-art citation F1 with higher answer quality.
arXiv Detail & Related papers (2024-03-27T09:19:13Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.