Multilingual News Location Detection using an Entity-Based Siamese
Network with Semi-Supervised Contrastive Learning and Knowledge Base
- URL: http://arxiv.org/abs/2212.11856v1
- Date: Thu, 22 Dec 2022 16:42:21 GMT
- Title: Multilingual News Location Detection using an Entity-Based Siamese
Network with Semi-Supervised Contrastive Learning and Knowledge Base
- Authors: V\'ictor Su\'arez-Paniagua and Steven Derby and Tri Kurniawan Wijaya
- Abstract summary: Early detection of relevant locations in a piece of news is important in extreme events such as environmental disasters, war conflicts, disease outbreaks, or political turmoils.
We propose a system that infers the relevant locations even when they are not mentioned explicitly in the text.
We contribute to the research community with a gold-standard multilingual news-location dataset, NewsLOC.
- Score: 0.7734726150561089
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Early detection of relevant locations in a piece of news is especially
important in extreme events such as environmental disasters, war conflicts,
disease outbreaks, or political turmoils. Additionally, this detection also
helps recommender systems to promote relevant news based on user locations.
Note that, when the relevant locations are not mentioned explicitly in the
text, state-of-the-art methods typically fail to recognize them because these
methods rely on syntactic recognition. In contrast, by incorporating a
knowledge base and connecting entities with their locations, our system
successfully infers the relevant locations even when they are not mentioned
explicitly in the text. To evaluate the effectiveness of our approach, and due
to the lack of datasets in this area, we also contribute to the research
community with a gold-standard multilingual news-location dataset, NewsLOC. It
contains the annotation of the relevant locations (and their WikiData IDs) of
600+ Wikinews articles in five different languages: English, French, German,
Italian, and Spanish. Through experimental evaluations, we show that our
proposed system outperforms the baselines and the fine-tuned version of the
model using semi-supervised data that increases the classification rate. The
source code and the NewsLOC dataset are publicly available for being used by
the research community at https://github.com/vsuarezpaniagua/NewsLocation.
Related papers
- Discovering Geo-dependent Stories by Combining Density-based Clustering
and Thread-based Aggregation techniques [0.0]
This paper introduces a global analysis of the geo-tagged posts in social media.
It supports (i) the detection of unexpected behavior in the city and (ii) the analysis of the posts to infer what is happening.
We have applied our methodology to a dataset obtained from Instagram activity in New York City for seven months.
arXiv Detail & Related papers (2023-12-18T10:17:12Z) - What's happening in your neighborhood? A Weakly Supervised Approach to Detect Local News [0.3749861135832073]
We develop an integrated pipeline that enables automatic local news detection and content-based local news recommendations.
Compared with Stanford Core NER model, our pipeline has higher precision and recall evaluated on a real-world and human-labeled dataset.
arXiv Detail & Related papers (2023-01-15T03:20:18Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - Contextual information integration for stance detection via
cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target.
Most existing stance detection models are limited because they do not consider relevant contextual information.
We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z) - Location reference recognition from texts: A survey and comparison [9.36819544451632]
Review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management.
Next, we thoroughly evaluate the correctness and computational efficiency of the 27 most widely used approaches for location reference recognition based on 26 public datasets with different types of texts with different types of texts containing 39,736 location references across the world.
arXiv Detail & Related papers (2022-07-04T19:25:15Z) - Evaluation of Fake News Detection with Knowledge-Enhanced Language
Models [10.45851991054367]
Recent advances in fake news detection have exploited the success of large-scale pre-trained language models (PLMs)
The predominant state-of-the-art approaches are based on fine-tuning PLMs on labelled fake news datasets.
The use of existing knowledge bases (KBs) with rich human-curated factual information has thus the potential to make fake news detection more effective and robust.
arXiv Detail & Related papers (2022-04-01T14:14:46Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Assessing the quality of sources in Wikidata across languages: a hybrid
approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages.
We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata.
The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z) - Triggering Failures: Out-Of-Distribution detection by learning from
local adversarial attacks in Semantic Segmentation [76.2621758731288]
We tackle the detection of out-of-distribution (OOD) objects in semantic segmentation.
Our main contribution is a new OOD detection architecture called ObsNet associated with a dedicated training scheme based on Local Adversarial Attacks (LAA)
We show it obtains top performances both in speed and accuracy when compared to ten recent methods of the literature on three different datasets.
arXiv Detail & Related papers (2021-08-03T17:09:56Z) - BanFakeNews: A Dataset for Detecting Fake News in Bangla [1.4170999534105675]
We propose an annotated dataset of 50K news that can be used for building automated fake news detection systems.
We develop a benchmark system with state of the art NLP techniques to identify Bangla fake news.
arXiv Detail & Related papers (2020-04-19T07:42:22Z) - Local-Global Video-Text Interactions for Temporal Grounding [77.5114709695216]
This paper addresses the problem of text-to-video temporal grounding, which aims to identify the time interval in a video semantically relevant to a text query.
We tackle this problem using a novel regression-based model that learns to extract a collection of mid-level features for semantic phrases in a text query.
The proposed method effectively predicts the target time interval by exploiting contextual information from local to global.
arXiv Detail & Related papers (2020-04-16T08:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.