Semi-Supervised Joint Estimation of Word and Document Readability
- URL: http://arxiv.org/abs/2104.13103v1
- Date: Tue, 27 Apr 2021 10:56:47 GMT
- Title: Semi-Supervised Joint Estimation of Word and Document Readability
- Authors: Yoshinari Fujinuma, Masato Hagiwara
- Abstract summary: We propose to jointly estimate word and document difficulty through a graph convolutional network (GCN)
Our experimental results reveal that the GCN-based method can achieve higher accuracy than strong baselines, and stays robust even with a smaller amount of labeled data.
- Score: 6.34044741105807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Readability or difficulty estimation of words and documents has been
investigated independently in the literature, often assuming the existence of
extensive annotated resources for the other. Motivated by our analysis showing
that there is a recursive relationship between word and document difficulty, we
propose to jointly estimate word and document difficulty through a graph
convolutional network (GCN) in a semi-supervised fashion. Our experimental
results reveal that the GCN-based method can achieve higher accuracy than
strong baselines, and stays robust even with a smaller amount of labeled data.
Related papers
- GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction [15.246183329778656]
Document-level relation extraction (DocRE) aims to extract relations between entities from unstructured document text.
To overcome these challenges, we propose GEGA, a novel model for DocRE.
We evaluate the GEGA model on three widely used benchmark datasets: DocRED, Re-DocRED, and Revisit-DocRED.
arXiv Detail & Related papers (2024-07-31T07:15:33Z) - FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction [85.26780391682894]
We propose Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction (FENICE)
FENICE leverages an NLI-based alignment between information in the source document and a set of atomic facts, referred to as claims, extracted from the summary.
Our metric sets a new state of the art on AGGREFACT, the de-facto benchmark for factuality evaluation.
arXiv Detail & Related papers (2024-03-04T17:57:18Z) - Document-Level Relation Extraction with Sentences Importance Estimation
and Focusing [52.069206266557266]
Document-level relation extraction (DocRE) aims to determine the relation between two entities from a document of multiple sentences.
We propose a Sentence Estimation and Focusing (SIEF) framework for DocRE, where we design a sentence importance score and a sentence focusing loss.
Experimental results on two domains show that our SIEF not only improves overall performance, but also makes DocRE models more robust.
arXiv Detail & Related papers (2022-04-27T03:20:07Z) - WMDecompose: A Framework for Leveraging the Interpretable Properties of
Word Mover's Distance in Sociocultural Analysis [0.0]
One popular model that balances legibility and interpretability is Word Mover's Distance (WMD)
We introduce WMDecompose: a model and Python library that decomposes document-level distances into their constituent word-level distances, and subsequently clusters words to induce thematic elements.
arXiv Detail & Related papers (2021-10-14T13:04:38Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents.
Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents.
Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z) - SueNes: A Weakly Supervised Approach to Evaluating Single-Document
Summarization via Negative Sampling [25.299937353444854]
We present a proof-of-concept study to a weakly supervised summary evaluation approach without the presence of reference summaries.
Massive data in existing summarization datasets are transformed for training by pairing documents with corrupted reference summaries.
arXiv Detail & Related papers (2020-05-13T15:40:13Z) - Reasoning with Latent Structure Refinement for Document-Level Relation
Extraction [20.308845516900426]
We propose a novel model that empowers the relational reasoning across sentences by automatically inducing the latent document-level graph.
Specifically, our model achieves an F1 score of 59.05 on a large-scale document-level dataset (DocRED)
arXiv Detail & Related papers (2020-05-13T13:36:09Z) - A Framework for Evaluation of Machine Reading Comprehension Gold
Standards [7.6250852763032375]
This paper proposes a unifying framework to investigate the present linguistic features, required reasoning and background knowledge and factual correctness.
The absence of features that contribute towards lexical ambiguity, the varying factual correctness of the expected answers and the presence of lexical cues, all of which potentially lower the reading comprehension complexity and quality of the evaluation data.
arXiv Detail & Related papers (2020-03-10T11:30:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.