Related papers: A Hybrid Deterministic Framework for Named Entity Extraction in Broadcast News Video

A Hybrid Deterministic Framework for Named Entity Extraction in Broadcast News Video

URL: http://arxiv.org/abs/2602.09154v1
Date: Mon, 09 Feb 2026 19:58:50 GMT
Title: A Hybrid Deterministic Framework for Named Entity Extraction in Broadcast News Video
Authors: Andrea Filiberto Lucas, Dylan Seychell,
Abstract summary: This work presents a comprehensive framework for automatically detecting and extracting personal names from news videos.<n>It introduces a curated and balanced corpus of annotated frames capturing the diversity of contemporary news graphics.<n>The pipeline is evaluated against a contrasting class of generative multimodal methods, revealing a clear trade-off between deterministic auditability and inference.
Score: 0.2864713389096699
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The growing volume of video-based news content has heightened the need for transparent and reliable methods to extract on-screen information. Yet the variability of graphical layouts, typographic conventions, and platform-specific design patterns renders manual indexing impractical. This work presents a comprehensive framework for automatically detecting and extracting personal names from broadcast and social-media-native news videos. It introduces a curated and balanced corpus of annotated frames capturing the diversity of contemporary news graphics and proposes an interpretable, modular extraction pipeline designed to operate under deterministic and auditable conditions. The pipeline is evaluated against a contrasting class of generative multimodal methods, revealing a clear trade-off between deterministic auditability and stochastic inference. The underlying detector achieves 95.8% mAP@0.5, demonstrating operationally robust performance for graphical element localisation. While generative systems achieve marginally higher raw accuracy (F1: 84.18% vs 77.08%), they lack the transparent data lineage required for journalistic and analytical contexts. The proposed pipeline delivers balanced precision (79.9%) and recall (74.4%), avoids hallucination, and provides full traceability across each processing stage. Complementary user findings indicate that 59% of respondents report difficulty reading on-screen names in fast-paced broadcasts, underscoring the practical relevance of the task. The results establish a methodologically rigorous and interpretable baseline for hybrid multimodal information extraction in modern news media.

Related papers

Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching [2.9079112030626146]
We present a hybrid moderation framework that combines supervised classification for known violations with reference-based similarity matching for novel or subtle cases.<n>In production, the classification pipeline achieves 67% recall at 80% precision, and the similarity pipeline achieves 76% recall at 80% precision.<n>These results demonstrate a scalable and adaptable approach to multimodal content governance, capable of addressing both explicit violations and emerging adversarial behaviors.
arXiv Detail & Related papers (2025-12-03T08:20:58Z)
Beyond Quantity: Distribution-Aware Labeling for Visual Grounding [72.43984105242177]
Visual grounding requires large and diverse region-text pairs.<n>Existing pseudo-labeling pipelines often overfit to biased distributions.<n>We propose DAL, a distribution-aware labeling framework for visual grounding.
arXiv Detail & Related papers (2025-05-30T09:04:47Z)
A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.<n>We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.<n>By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z)
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation [59.53678957969471]
Multimodal Large Language Models (MLLMs) have made significant strides in visual understanding and generation tasks.<n> generating interleaved image-text content remains a challenge.<n>OpenING is a benchmark comprising 5,400 high-quality human-annotated instances across 56 real-world tasks.<n>IntJudge is a judge model for evaluating open-ended multimodal generation methods.
arXiv Detail & Related papers (2024-11-27T16:39:04Z)
VMID: A Multimodal Fusion LLM Framework for Detecting and Identifying Misinformation of Short Videos [14.551693267228345]
This paper presents a novel fake news detection method based on multimodal information, designed to identify misinformation through a multi-level analysis of video content. The proposed framework successfully integrates multimodal features within videos, significantly enhancing the accuracy and reliability of fake news detection.
arXiv Detail & Related papers (2024-11-15T08:20:26Z)
Vision-Language Models are Strong Noisy Label Detectors [76.07846780815794]
This paper presents a Denoising Fine-Tuning framework, called DeFT, for adapting vision-language models. DeFT utilizes the robust alignment of textual and visual features pre-trained on millions of auxiliary image-text pairs to sieve out noisy labels. Experimental results on seven synthetic and real-world noisy datasets validate the effectiveness of DeFT in both noisy label detection and image classification.
arXiv Detail & Related papers (2024-09-29T12:55:17Z)
Exposing and Explaining Fake News On-the-Fly [4.278181795494584]
This work contributes with an explainable and online classification method to recognize fake news in real-time. The proposed method combines both unsupervised and supervised Machine Learning approaches with online created lexica. The performance of the proposed solution has been validated with real data sets from Twitter and the results attain 80 % accuracy and macro F-measure.
arXiv Detail & Related papers (2024-05-03T14:49:04Z)
UATVR: Uncertainty-Adaptive Text-Video Retrieval [90.8952122146241]
A common practice is to transfer text-video pairs to the same embedding space and craft cross-modal interactions with certain entities. We propose an Uncertainty-language Text-Video Retrieval approach, termed UATVR, which models each look-up as a distribution matching procedure.
arXiv Detail & Related papers (2023-01-16T08:43:17Z)
Interpretable Fake News Detection with Topic and Deep Variational Models [2.15242029196761]
We focus on fake news detection using interpretable features and methods. We have developed a deep probabilistic model that integrates a dense representation of textual news. Our model achieves comparable performance to state-of-the-art competing models.
arXiv Detail & Related papers (2022-09-04T05:31:00Z)
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment [93.09267863425492]
We argue that understanding both high-level semantics and internal temporal structures of actions in competitive sports videos is the key to making predictions accurate and interpretable. We construct a new fine-grained dataset, called FineDiving, developed on diverse diving events with detailed annotations on action procedures.
arXiv Detail & Related papers (2022-04-07T17:59:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.