NewsRECON: News article REtrieval for image CONtextualization
- URL: http://arxiv.org/abs/2601.14121v1
- Date: Tue, 20 Jan 2026 16:15:53 GMT
- Title: NewsRECON: News article REtrieval for image CONtextualization
- Authors: Jonathan Tonglet, Iryna Gurevych, Tinne Tuytelaars, Marie-Francine Moens,
- Abstract summary: We introduce NewsRECON, a method that links images to relevant news articles to infer their date and location from article metadata.<n>Experiments on the TARA and 5Pils-OOC show that NewsRECON outperforms prior work and can be combined with a multimodal large language model to achieve new SOTA results in the absence of RIS evidence.
- Score: 96.77112912009987
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Identifying when and where a news image was taken is crucial for journalists and forensic experts to produce credible stories and debunk misinformation. While many existing methods rely on reverse image search (RIS) engines, these tools often fail to return results, thereby limiting their practical applicability. In this work, we address the challenging scenario where RIS evidence is unavailable. We introduce NewsRECON, a method that links images to relevant news articles to infer their date and location from article metadata. NewsRECON leverages a corpus of over 90,000 articles and integrates: (1) a bi-encoder for retrieving event-relevant articles; (2) two cross-encoders for reranking articles by location and event consistency. Experiments on the TARA and 5Pils-OOC show that NewsRECON outperforms prior work and can be combined with a multimodal large language model to achieve new SOTA results in the absence of RIS evidence. We make our code available.
Related papers
- Dataset of News Articles with Provenance Metadata for Media Relevance Assessment [0.7366405857677227]
Out-of-context and misattributed imagery is the leading form of media manipulation in today's misinformation and disinformation landscape.<n>We introduce News Media Provenance dataset, a dataset of news articles with provenance-tagged images.<n>We formulate two tasks on this dataset, location of origin relevance (LOR) and date and time of origin relevance (DTOR), and present baseline results on six large language models (LLMs).<n>We identify that, while the zero-shot performance on LOR is promising, the performance on DTOR hinders, leaving room for specialized architectures and future work.
arXiv Detail & Related papers (2025-06-11T15:21:05Z) - WikiVideo: Article Generation from Multiple Videos [82.00241010200368]
We introduce the task of creating a Wikipedia-style article from multiple videos about real-world events.<n>To close this gap, we introduce WikiVideo, a benchmark consisting of expert-written articles and densely annotated videos.<n>We propose Collaborative Article Generation (CAG), a novel interactive method for article creation from multiple videos.
arXiv Detail & Related papers (2025-04-01T16:22:15Z) - A Novel Method for News Article Event-Based Embedding [8.183446952097528]
We propose a novel lightweight method that optimized news embedding generation by focusing on entities and themes mentioned in articles.
We leveraged over 850,000 news articles and 1,000,000 events from the GDELT project to test and evaluate our method.
Our experiments demonstrate that our approach can both improve and outperform state-of-the-art methods on shared event detection tasks.
arXiv Detail & Related papers (2024-05-20T20:55:07Z) - Assessing News Thumbnail Representativeness: Counterfactual text can enhance the cross-modal matching ability [5.111382868644429]
We focus on whether a news image represents the actors discussed in the news text.
We introduce NewsTT, a dataset of 1000 news thumbnail images and text pairs.
We propose CFT-CLIP, a contrastive learning framework that updates vision and language bi-encoders according to the hypothesis.
arXiv Detail & Related papers (2024-02-17T01:27:29Z) - Visually-Aware Context Modeling for News Image Captioning [54.31708859631821]
News Image Captioning aims to create captions from news articles and images.
We propose a face-naming module for learning better name embeddings.
We use CLIP to retrieve sentences that are semantically close to the image.
arXiv Detail & Related papers (2023-08-16T12:39:39Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval [18.270878909735256]
We propose an ENtity-aware article GeneratIoN and rEtrieval framework, to explicitly incorporate named entities into language models.
We conducted experiments on three public datasets: GoodNews, VisualNews, and WikiText.
Our results demonstrate that our model can boost both article generation and article retrieval performance, with a 4-5 perplexity improvement in article generation and a 3-4% boost in recall@1 in article retrieval.
arXiv Detail & Related papers (2021-12-11T05:32:09Z) - Supporting verification of news articles with automated search for
semantically similar articles [0.0]
We propose an evidence retrieval approach to handle fake news.
The learning task is formulated as an unsupervised machine learning problem.
We find that our approach is agnostic to concept drifts, i.e. the machine learning task is independent of the hypotheses in a text.
arXiv Detail & Related papers (2021-03-29T12:56:59Z) - News Image Steganography: A Novel Architecture Facilitates the Fake News
Identification [52.83247667841588]
A larger portion of fake news quotes untampered images from other sources with ulterior motives.
This paper proposes an architecture named News Image Steganography to reveal the inconsistency through image steganography based on GAN.
arXiv Detail & Related papers (2021-01-03T11:12:23Z) - Transform and Tell: Entity-Aware News Image Captioning [77.4898875082832]
We propose an end-to-end model which generates captions for images embedded in news articles.
We address the first challenge by associating words in the caption with faces and objects in the image, via a multi-modal, multi-head attention mechanism.
We tackle the second challenge with a state-of-the-art transformer language model that uses byte-pair-encoding to generate captions as a sequence of word parts.
arXiv Detail & Related papers (2020-04-17T05:44:37Z) - Untrue.News: A New Search Engine For Fake Stories [2.642406403099596]
In this paper, we demonstrate Untrue News, a new search engine for fake stories.
Untrue News relies on scalable, a new analytic search engine based on the Lucene library that provides near real-time results.
arXiv Detail & Related papers (2020-02-16T14:32:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.