XREF: Entity Linking for Chinese News Comments with Supplementary
Article Reference
- URL: http://arxiv.org/abs/2006.14017v1
- Date: Wed, 24 Jun 2020 19:42:54 GMT
- Title: XREF: Entity Linking for Chinese News Comments with Supplementary
Article Reference
- Authors: Xinyu Hua, Lei Li, Lifeng Hua, Lu Wang
- Abstract summary: We study the problem of entity linking for Chinese news comments given mentions' spans.
We propose a novel model, XREF, that leverages attention mechanisms to pinpoint relevant context.
We develop a weakly supervised training scheme to utilize the large-scale unlabeled corpus.
- Score: 19.811371589597382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic identification of mentioned entities in social media posts
facilitates quick digestion of trending topics and popular opinions.
Nonetheless, this remains a challenging task due to limited context and diverse
name variations. In this paper, we study the problem of entity linking for
Chinese news comments given mentions' spans. We hypothesize that comments often
refer to entities in the corresponding news article, as well as topics
involving the entities. We therefore propose a novel model, XREF, that
leverages attention mechanisms to (1) pinpoint relevant context within
comments, and (2) detect supporting entities from the news article. To improve
training, we make two contributions: (a) we propose a supervised attention loss
in addition to the standard cross entropy, and (b) we develop a weakly
supervised training scheme to utilize the large-scale unlabeled corpus. Two new
datasets in entertainment and product domains are collected and annotated for
experiments. Our proposed method outperforms previous methods on both datasets.
Related papers
- Improving Long Context Document-Level Machine Translation [51.359400776242786]
Document-level context for neural machine translation (NMT) is crucial to improve translation consistency and cohesion.
Many works have been published on the topic of document-level NMT, but most restrict the system to just local context.
We propose a constrained attention variant that focuses the attention on the most relevant parts of the sequence, while simultaneously reducing the memory consumption.
arXiv Detail & Related papers (2023-06-08T13:28:48Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - Generative Entity-to-Entity Stance Detection with Knowledge Graph
Augmentation [7.857310305816312]
Stance detection is typically framed as predicting the sentiment in a text towards a target entity.
In this paper, we emphasize the need for studying interactions among entities when inferring stances.
We first introduce a new task, entity-to-entity (E2E) stance detection, which primes models to identify entities in their canonical names and discern stances jointly.
arXiv Detail & Related papers (2022-11-02T20:16:42Z) - RaFoLa: A Rationale-Annotated Corpus for Detecting Indicators of Forced
Labour [4.393754160527062]
This paper presents the first openly accessible English corpus annotated for multi-class and multi-label forced labour detection.
The corpus consists of 989 news articles retrieved from specialised data sources and annotated according to risk indicators defined by the International Labour Organization (ILO)
arXiv Detail & Related papers (2022-05-05T14:43:31Z) - Out of Context: A New Clue for Context Modeling of Aspect-based
Sentiment Analysis [54.735400754548635]
ABSA aims to predict the sentiment expressed in a review with respect to a given aspect.
The given aspect should be considered as a new clue out of context in the context modeling process.
We design several aspect-aware context encoders based on different backbones.
arXiv Detail & Related papers (2021-06-21T02:26:03Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - RuREBus: a Case Study of Joint Named Entity Recognition and Relation
Extraction from e-Government Domain [7.6462329126769815]
We show-case an application of information extraction methods, such as named entity recognition (NER) and relation extraction (RE) to a novel corpus, consisting of documents, issued by a state agency.
The main challenges of this corpus are: 1) the annotation scheme differs greatly from the one used for the general domain corpora, and 2) the documents are written in a language other than English.
arXiv Detail & Related papers (2020-10-29T20:56:15Z) - Integrating Semantic and Structural Information with Graph Convolutional
Network for Controversy Detection [15.578214777082104]
We propose a Topic-Post-Comment Graph Convolutional Network (TPC-GCN) for post-level controversy detection.
We extend our model to Disentangled TPC-GCN to disentangle topic-related and topic-unrelated features.
Our models can integrate both semantic and structural information with significant generalizability.
arXiv Detail & Related papers (2020-05-16T06:29:14Z) - Generating Representative Headlines for News Stories [31.67864779497127]
Grouping articles that are reporting the same event into news stories is a common way of assisting readers in their news consumption.
It remains a challenging research problem to efficiently and effectively generate a representative headline for each story.
We develop a distant supervision approach to train large-scale generation models without any human annotation.
arXiv Detail & Related papers (2020-01-26T02:08:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.