A Zipf's Law-based Text Generation Approach for Addressing Imbalance in
Entity Extraction
- URL: http://arxiv.org/abs/2205.12636v3
- Date: Fri, 1 Sep 2023 00:09:09 GMT
- Title: A Zipf's Law-based Text Generation Approach for Addressing Imbalance in
Entity Extraction
- Authors: Zhenhua Wang, Ming Ren, Dong Gao, Zhuang Li
- Abstract summary: This paper proposes a novel approach by viewing the issue through the quantitative information.
It recognizes that entities exhibit certain levels of commonality while others are scarce, which can be reflected in the quantifiable distribution of words.
The Zipf's Law emerges as a well-suited adoption, and to transition from words to entities, words within the documents are classified as common and rare ones.
- Score: 19.55959053873699
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Entity extraction is critical in the intelligent advancement across diverse
domains. Nevertheless, a challenge to its effectiveness arises from the data
imbalance. This paper proposes a novel approach by viewing the issue through
the quantitative information, recognizing that entities exhibit certain levels
of commonality while others are scarce, which can be reflected in the
quantifiable distribution of words. The Zipf's Law emerges as a well-suited
adoption, and to transition from words to entities, words within the documents
are classified as common and rare ones. Subsequently, sentences are classified
into common and rare ones, and are further processed by text generation models
accordingly. Rare entities within the generated sentences are then labeled
using human-designed rules, serving as a supplement to the raw dataset, thereby
mitigating the imbalance problem. The study presents a case of extracting
entities from technical documents, and experimental results from two datasets
prove the effectiveness of the proposed method. Furthermore, the significance
of Zipf's law in driving the progress of AI is discussed, broadening the reach
and coverage of Informetrics. This paper presents a successful demonstration of
extending Informetrics to interface with AI through Zipf's Law.
Related papers
- Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.
We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation.
We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z) - SparseCL: Sparse Contrastive Learning for Contradiction Retrieval [87.02936971689817]
Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query.
Existing methods such as similarity search and crossencoder models exhibit significant limitations.
We introduce SparseCL that leverages specially trained sentence embeddings designed to preserve subtle, contradictory nuances between sentences.
arXiv Detail & Related papers (2024-06-15T21:57:03Z) - Empowering Prior to Court Legal Analysis: A Transparent and Accessible Dataset for Defensive Statement Classification and Interpretation [5.646219481667151]
This paper introduces a novel dataset tailored for classification of statements made during police interviews, prior to court proceedings.
We introduce a fine-tuned DistilBERT model that achieves state-of-the-art performance in distinguishing truthful from deceptive statements.
We also present an XAI interface that empowers both legal professionals and non-specialists to interact with and benefit from our system.
arXiv Detail & Related papers (2024-05-17T11:22:27Z) - FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction [85.26780391682894]
We propose Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction (FENICE)
FENICE leverages an NLI-based alignment between information in the source document and a set of atomic facts, referred to as claims, extracted from the summary.
Our metric sets a new state of the art on AGGREFACT, the de-facto benchmark for factuality evaluation.
arXiv Detail & Related papers (2024-03-04T17:57:18Z) - BERM: Training the Balanced and Extractable Representation for Matching
to Improve Generalization Ability of Dense Retrieval [54.66399120084227]
We propose a novel method to improve the generalization of dense retrieval via capturing matching signal called BERM.
Dense retrieval has shown promise in the first-stage retrieval process when trained on in-domain labeled datasets.
arXiv Detail & Related papers (2023-05-18T15:43:09Z) - Rhetorical Role Labeling of Legal Documents using Transformers and Graph
Neural Networks [1.290382979353427]
This paper presents the approaches undertaken to perform the task of rhetorical role labelling on Indian Court Judgements as part of SemEval Task 6: understanding legal texts, shared subtask A.
arXiv Detail & Related papers (2023-05-06T17:04:51Z) - Elastic Weight Removal for Faithful and Abstractive Dialogue Generation [61.40951756070646]
A dialogue system should generate responses that are faithful to the knowledge contained in relevant documents.
Many models generate hallucinated responses instead that contradict it or contain unverifiable information.
We show that our method can be extended to simultaneously discourage hallucinations and extractive responses.
arXiv Detail & Related papers (2023-03-30T17:40:30Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Understanding Points of Correspondence between Sentences for Abstractive
Summarization [39.7404761923196]
We present an investigation into fusing sentences drawn from a document by introducing the notion of points of correspondence.
We create a dataset containing the documents, source and fusion sentences, and human annotations of points of correspondence between sentences.
arXiv Detail & Related papers (2020-06-10T02:42:38Z) - An Effective Contextual Language Modeling Framework for Speech
Summarization with Augmented Features [13.97006782398121]
Bidirectional Representations from Transformers (BERT) model was proposed and has achieved record-breaking success on many natural language processing tasks.
We explore the incorporation of confidence scores into sentence representations to see if such an attempt could help alleviate the negative effects caused by imperfect automatic speech recognition.
We validate the effectiveness of our proposed method on a benchmark dataset.
arXiv Detail & Related papers (2020-06-01T18:27:48Z) - Hybrid Attention-Based Transformer Block Model for Distant Supervision
Relation Extraction [20.644215991166902]
We propose a new framework using hybrid attention-based Transformer block with multi-instance learning to perform the DSRE task.
The proposed approach can outperform the state-of-the-art algorithms on the evaluation dataset.
arXiv Detail & Related papers (2020-03-10T13:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.