Named Entity Recognition for Social Media Texts with Semantic
Augmentation
- URL: http://arxiv.org/abs/2010.15458v1
- Date: Thu, 29 Oct 2020 10:06:46 GMT
- Title: Named Entity Recognition for Social Media Texts with Semantic
Augmentation
- Authors: Yuyang Nie, Yuanhe Tian, Xiang Wan, Yan Song, and Bo Dai
- Abstract summary: Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts.
We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
- Score: 70.44281443975554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing approaches for named entity recognition suffer from data sparsity
problems when conducted on short and informal texts, especially user-generated
social media content. Semantic augmentation is a potential way to alleviate
this problem. Given that rich semantic information is implicitly preserved in
pre-trained word embeddings, they are potential ideal resources for semantic
augmentation. In this paper, we propose a neural-based approach to NER for
social media texts where both local (from running text) and augmented semantics
are taken into account. In particular, we obtain the augmented semantic
information from a large-scale corpus, and propose an attentive semantic
augmentation module and a gate module to encode and aggregate such information,
respectively. Extensive experiments are performed on three benchmark datasets
collected from English and Chinese social media platforms, where the results
demonstrate the superiority of our approach to previous studies across all
three datasets.
Related papers
- Semantic Communication Enhanced by Knowledge Graph Representation Learning [11.68356846628016]
This paper investigates the advantages of representing and processing semantic knowledge extracted into graphs within the emerging paradigm of semantic communications.
We propose sending semantic symbols solely equivalent to node embeddings through the wireless channel and inferring the complete knowledge graph at the receiver.
arXiv Detail & Related papers (2024-07-27T20:57:10Z) - Self-Supervised Speech Representations are More Phonetic than Semantic [52.02626675137819]
Self-supervised speech models (S3Ms) have become an effective backbone for speech applications.
We seek a more fine-grained analysis of the word-level linguistic properties encoded in S3Ms.
Our study reveals that S3M representations consistently and significantly exhibit more phonetic than semantic similarity.
arXiv Detail & Related papers (2024-06-12T20:04:44Z) - SememeASR: Boosting Performance of End-to-End Speech Recognition against
Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge [58.979490858061745]
We introduce sememe-based semantic knowledge information to speech recognition.
Our experiments show that sememe information can improve the effectiveness of speech recognition.
In addition, our further experiments show that sememe knowledge can improve the model's recognition of long-tailed data.
arXiv Detail & Related papers (2023-09-04T08:35:05Z) - Disentangling Learnable and Memorizable Data via Contrastive Learning
for Semantic Communications [81.10703519117465]
A novel machine reasoning framework is proposed to disentangle source data so as to make it semantic-ready.
In particular, a novel contrastive learning framework is proposed, whereby instance and cluster discrimination are performed on the data.
Deep semantic clusters of highest confidence are considered learnable, semantic-rich data.
Our simulation results showcase the superiority of our contrastive learning approach in terms of semantic impact and minimalism.
arXiv Detail & Related papers (2022-12-18T12:00:12Z) - Performance Optimization for Semantic Communications: An Attention-based
Reinforcement Learning Approach [187.4094332217186]
A semantic communication framework is proposed for textual data transmission.
A metric of semantic similarity (MSS) that jointly captures the semantic accuracy and completeness of the recovered text is proposed.
arXiv Detail & Related papers (2022-08-17T11:39:16Z) - Boosting Video-Text Retrieval with Explicit High-Level Semantics [115.66219386097295]
We propose a novel visual-linguistic aligning model named HiSE for VTR.
It improves the cross-modal representation by incorporating explicit high-level semantics.
Our method achieves the superior performance over state-of-the-art methods on three benchmark datasets.
arXiv Detail & Related papers (2022-08-08T15:39:54Z) - Regional Semantic Contrast and Aggregation for Weakly Supervised
Semantic Segmentation [25.231470587575238]
We propose regional semantic contrast and aggregation (RCA) for learning semantic segmentation.
RCA is equipped with a regional memory bank to store massive, diverse object patterns appearing in training data.
RCA earns a strong capability of fine-grained semantic understanding, and eventually establishes new state-of-the-art results on two popular benchmarks.
arXiv Detail & Related papers (2022-03-17T23:29:03Z) - Data Expansion using Back Translation and Paraphrasing for Hate Speech
Detection [1.192436948211501]
We present a new deep learning-based method that fuses a Back Translation method, and a Paraphrasing technique for data augmentation.
We evaluate our proposal on five publicly available datasets; namely, AskFm corpus, Formspring dataset, Warner and Waseem dataset, Olid, and Wikipedia toxic comments dataset.
arXiv Detail & Related papers (2021-05-25T09:52:42Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - A Framework for Pre-processing of Social Media Feeds based on Integrated
Local Knowledge Base [1.5749416770494706]
This paper proposes an improved framework for pre-processing of social media feeds for better performance.
The framework had an accuracy of 94.07% on a standardized dataset, and 99.78% on localised dataset when used to extract sentiments from tweets.
arXiv Detail & Related papers (2020-06-29T07:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.