Document-Level Definition Detection in Scholarly Documents: Existing
Models, Error Analyses, and Future Directions
- URL: http://arxiv.org/abs/2010.05129v1
- Date: Sun, 11 Oct 2020 01:16:10 GMT
- Title: Document-Level Definition Detection in Scholarly Documents: Existing
Models, Error Analyses, and Future Directions
- Authors: Dongyeop Kang, Andrew Head, Risham Sidhu, Kyle Lo, Daniel S. Weld,
Marti A. Hearst
- Abstract summary: We develop a new definition detection system, HEDDEx, that utilizes syntactic features, transformer encoders, and filters, and evaluate it on a standard sentence-level benchmark.
HEDDEx outperforms the leading system on both the sentence-level and the document-level tasks, by 12.7 F1 points and 14.4 F1 points, respectively.
- Score: 40.64025648548128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of definition detection is important for scholarly papers, because
papers often make use of technical terminology that may be unfamiliar to
readers. Despite prior work on definition detection, current approaches are far
from being accurate enough to use in real-world applications. In this paper, we
first perform in-depth error analysis of the current best performing definition
detection system and discover major causes of errors. Based on this analysis,
we develop a new definition detection system, HEDDEx, that utilizes syntactic
features, transformer encoders, and heuristic filters, and evaluate it on a
standard sentence-level benchmark. Because current benchmarks evaluate randomly
sampled sentences, we propose an alternative evaluation that assesses every
sentence within a document. This allows for evaluating recall in addition to
precision. HEDDEx outperforms the leading system on both the sentence-level and
the document-level tasks, by 12.7 F1 points and 14.4 F1 points, respectively.
We note that performance on the high-recall document-level task is much lower
than in the standard evaluation approach, due to the necessity of incorporation
of document structure as features. We discuss remaining challenges in
document-level definition detection, ideas for improvements, and potential
issues for the development of reading aid applications.
Related papers
- JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance.
We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods.
In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z) - READoc: A Unified Benchmark for Realistic Document Structured Extraction [44.44722729958791]
We introduce a novel benchmark named READoc, which defines DSE as a realistic task.
The READoc dataset is derived from 2,233 diverse and real-world documents from arXiv and GitHub.
In addition, we develop a unified evaluation of state-of-the-art DSE approaches.
arXiv Detail & Related papers (2024-09-08T15:42:48Z) - Magic Markup: Maintaining Document-External Markup with an LLM [1.0538052824177144]
We present a system that re-tags modified programs, enabling rich annotations to automatically follow code as it evolves.
Our system achieves an accuracy of 90% on our benchmarks and can replace a document's tags in parallel at a rate of 5 seconds per tag.
While there remains significant room for improvement, we find performance reliable enough to justify further exploration of applications.
arXiv Detail & Related papers (2024-03-06T05:40:31Z) - SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window [24.524282909076767]
We present a metric named SLIDE (SLIding Document Evaluator) which operates on blocks of sentences.
We find that SLIDE obtains significantly higher pairwise system accuracy than its sentence-level baseline.
This suggests that source context may provide the same information as a human reference in disambiguating source ambiguities.
arXiv Detail & Related papers (2023-09-16T01:30:58Z) - DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection [55.70982767084996]
A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark.
We present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions.
DeepfakeBench contains 15 state-of-the-art detection methods, 9CL datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations.
arXiv Detail & Related papers (2023-07-04T01:34:41Z) - End-to-End Page-Level Assessment of Handwritten Text Recognition [69.55992406968495]
HTR systems increasingly face the end-to-end page-level transcription of a document.
Standard metrics do not take into account the inconsistencies that might appear.
We propose a two-fold evaluation, where the transcription accuracy and the RO goodness are considered separately.
arXiv Detail & Related papers (2023-01-14T15:43:07Z) - Neural Rankers for Effective Screening Prioritisation in Medical
Systematic Review Literature Search [31.797257552928336]
We apply several pre-trained language models to the systematic review document ranking task.
An empirical analysis compares how effective neural methods compare to traditional methods for this task.
Our results show that BERT-based rankers outperform the current state-of-the-art screening prioritisation methods.
arXiv Detail & Related papers (2022-12-18T05:26:40Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - SPECTER: Document-level Representation Learning using Citation-informed
Transformers [51.048515757909215]
SPECTER generates document-level embedding of scientific documents based on pretraining a Transformer language model.
We introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction to document classification and recommendation.
arXiv Detail & Related papers (2020-04-15T16:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.