Related papers: Unsupervised Named Entity Disambiguation for Low Resource Domains

Unsupervised Named Entity Disambiguation for Low Resource Domains

URL: http://arxiv.org/abs/2412.10054v1
Date: Fri, 13 Dec 2024 11:35:00 GMT
Title: Unsupervised Named Entity Disambiguation for Low Resource Domains
Authors: Debarghya Datta, Soumajit Pramanik,
Abstract summary: We present an unsupervised approach leveraging the concept of Group Steiner Trees ( GST)<n> GST can identify the most relevant candidates for entity disambiguation using the contextual similarities across candidate entities.<n>We outperform the state-of-the-art unsupervised methods by more than 40% (in avg.) in terms of Precision@1 across various domain-specific datasets.
Score: 0.4297070083645049
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the ever-evolving landscape of natural language processing and information retrieval, the need for robust and domain-specific entity linking algorithms has become increasingly apparent. It is crucial in a considerable number of fields such as humanities, technical writing and biomedical sciences to enrich texts with semantics and discover more knowledge. The use of Named Entity Disambiguation (NED) in such domains requires handling noisy texts, low resource settings and domain-specific KBs. Existing approaches are mostly inappropriate for such scenarios, as they either depend on training data or are not flexible enough to work with domain-specific KBs. Thus in this work, we present an unsupervised approach leveraging the concept of Group Steiner Trees (GST), which can identify the most relevant candidates for entity disambiguation using the contextual similarities across candidate entities for all the mentions present in a document. We outperform the state-of-the-art unsupervised methods by more than 40\% (in avg.) in terms of Precision@1 across various domain-specific datasets.

Related papers

Automated Collection of Evaluation Dataset for Semantic Search in Low-Resource Domain Language [4.5224851085910585]
Domain-specific languages that use a lot of specific terminology often fall into the category of low-resource languages.<n>This study addresses the challenge of automated collecting test datasets to evaluate semantic search in low-resource domain-specific German language.
arXiv Detail & Related papers (2024-12-13T09:47:26Z)
Few-Shot Domain Adaptation for Named-Entity Recognition via Joint Constrained k-Means and Subspace Selection [6.390468088226495]
We propose a weakly supervised algorithm that combines small labeled datasets with large amounts of unlabeled data.<n>This framework achieves state-of-the-art results in few-shot NER on several English datasets.
arXiv Detail & Related papers (2024-11-30T10:52:24Z)
Unearthing Large Scale Domain-Specific Knowledge from Public Corpora [103.0865116794534]
We introduce large models into the data collection pipeline to guide the generation of domain-specific information.<n>We refer to this approach as Retrieve-from-CC.<n>It not only collects data related to domain-specific knowledge but also mines the data containing potential reasoning procedures from the public corpus.
arXiv Detail & Related papers (2024-01-26T03:38:23Z)
Seed-Guided Fine-Grained Entity Typing in Science and Engineering Domains [51.02035914828596]
We study the task of seed-guided fine-grained entity typing in science and engineering domains. We propose SEType which first enriches the weak supervision by finding more entities for each seen type from an unlabeled corpus. It then matches the enriched entities to unlabeled text to get pseudo-labeled samples and trains a textual entailment model that can make inferences for both seen and unseen types.
arXiv Detail & Related papers (2024-01-23T22:36:03Z)
The Fellowship of the Authors: Disambiguating Names from Social Network Context [2.3605348648054454]
Authority lists with extensive textual descriptions for each entity are lacking and ambiguous named entities. We combine BERT-based mention representations with a variety of graph induction strategies and experiment with supervised and unsupervised cluster inference methods. We find that in-domain language model pretraining can significantly improve mention representations, especially for larger corpora.
arXiv Detail & Related papers (2022-08-31T21:51:55Z)
Disentangled Unsupervised Image Translation via Restricted Information Flow [61.44666983942965]
Many state-of-art methods hard-code the desired shared-vs-specific split into their architecture. We propose a new method that does not rely on inductive architectural biases. We show that the proposed method achieves consistently high manipulation accuracy across two synthetic and one natural dataset.
arXiv Detail & Related papers (2021-11-26T00:27:54Z)
Extracting Domain-specific Concepts from Large-scale Linked Open Data [0.0]
The proposed method defines search entities by linking the LOD vocabulary with terms related to the target domain. The occurrences of common upper-level entities and the chain-of-path relationships are examined to determine the range of conceptual connections in the target domain.
arXiv Detail & Related papers (2021-11-22T10:25:57Z)
Structured Latent Embeddings for Recognizing Unseen Classes in Unseen Domains [108.11746235308046]
We propose a novel approach that learns domain-agnostic structured latent embeddings by projecting images from different domains. Our experiments on the challenging DomainNet and DomainNet-LS benchmarks show the superiority of our approach over existing methods.
arXiv Detail & Related papers (2021-07-12T17:57:46Z)
Streaming Self-Training via Domain-Agnostic Unlabeled Images [62.57647373581592]
We present streaming self-training (SST) that aims to democratize the process of learning visual recognition models. Key to SST are two crucial observations: (1) domain-agnostic unlabeled images enable us to learn better models with a few labeled examples without any additional knowledge or supervision; and (2) learning is a continuous process and can be done by constructing a schedule of learning updates.
arXiv Detail & Related papers (2021-04-07T17:58:39Z)
Inferring Latent Domains for Unsupervised Deep Domain Adaptation [54.963823285456925]
Unsupervised Domain Adaptation (UDA) refers to the problem of learning a model in a target domain where labeled data are not available. This paper introduces a novel deep architecture which addresses the problem of UDA by automatically discovering latent domains in visual datasets. We evaluate our approach on publicly available benchmarks, showing that it outperforms state-of-the-art domain adaptation methods.
arXiv Detail & Related papers (2021-03-25T14:33:33Z)
Domain-Transferable Method for Named Entity Recognition Task [0.6040938686276304]
This paper describes a method to learn a domain-specific NER model for an arbitrary set of named entities. We assume that the supervision can be obtained with no human effort, and neural models can learn from each other.
arXiv Detail & Related papers (2020-11-24T15:45:52Z)
Unsupervised Domain Clusters in Pretrained Language Models [61.832234606157286]
We show that massive pre-trained language models implicitly learn sentence representations that cluster by domains without supervision. We propose domain data selection methods based on such models. We evaluate our data selection methods for neural machine translation across five diverse domains.
arXiv Detail & Related papers (2020-04-05T06:22:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.