ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set
of Simple Sentences
- URL: http://arxiv.org/abs/2106.12027v1
- Date: Tue, 22 Jun 2021 19:31:28 GMT
- Title: ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set
of Simple Sentences
- Authors: Yanjun Gao, Ting-hao (Kenneth) Huang, Rebecca J. Passonneau
- Abstract summary: We propose a new task to decompose each complex sentence into simple sentences derived from the tensed clauses in the source.
Our neural model learns to Accept, Break, Copy or Drop elements of a graph that combines word adjacency and grammatical dependencies.
We introduce DeSSE, a new dataset designed to train and evaluate complex sentence decomposition.
- Score: 7.639576741566091
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Atomic clauses are fundamental text units for understanding complex
sentences. Identifying the atomic sentences within complex sentences is
important for applications such as summarization, argument mining, discourse
analysis, discourse parsing, and question answering. Previous work mainly
relies on rule-based methods dependent on parsing. We propose a new task to
decompose each complex sentence into simple sentences derived from the tensed
clauses in the source, and a novel problem formulation as a graph edit task.
Our neural model learns to Accept, Break, Copy or Drop elements of a graph that
combines word adjacency and grammatical dependencies. The full processing
pipeline includes modules for graph construction, graph editing, and sentence
generation from the output graph. We introduce DeSSE, a new dataset designed to
train and evaluate complex sentence decomposition, and MinWiki, a subset of
MinWikiSplit. ABCD achieves comparable performance as two parsing baselines on
MinWiki. On DeSSE, which has a more even balance of complex sentence types, our
model achieves higher accuracy on the number of atomic sentences than an
encoder-decoder baseline. Results include a detailed error analysis.
Related papers
- Conjunct Resolution in the Face of Verbal Omissions [51.220650412095665]
We propose a conjunct resolution task that operates directly on the text and makes use of a split-and-rephrase paradigm in order to recover the missing elements in the coordination structure.
We curate a large dataset, containing over 10K examples of naturally-occurring verbal omissions with crowd-sourced annotations.
We train various neural baselines for this task, and show that while our best method obtains decent performance, it leaves ample space for improvement.
arXiv Detail & Related papers (2023-05-26T08:44:02Z) - Syntactic Complexity Identification, Measurement, and Reduction Through
Controlled Syntactic Simplification [0.0]
We present a classical syntactic dependency-based approach to split and rephrase a compound and complex sentence into a set of simplified sentences.
The paper also introduces an algorithm to identify and measure a sentence's syntactic complexity.
This work is accepted and presented in International workshop on Learning with Knowledge Graphs (IWLKG) at WSDM-2023 Conference.
arXiv Detail & Related papers (2023-04-16T13:13:58Z) - Grounded Graph Decoding Improves Compositional Generalization in
Question Answering [68.72605660152101]
Question answering models struggle to generalize to novel compositions of training patterns, such as longer sequences or more complex test structures.
We propose Grounded Graph Decoding, a method to improve compositional generalization of language representations by grounding structured predictions with an attention mechanism.
Our model significantly outperforms state-of-the-art baselines on the Compositional Freebase Questions (CFQ) dataset, a challenging benchmark for compositional generalization in question answering.
arXiv Detail & Related papers (2021-11-05T17:50:14Z) - Clustering and Network Analysis for the Embedding Spaces of Sentences
and Sub-Sentences [69.3939291118954]
This paper reports research on a set of comprehensive clustering and network analyses targeting sentence and sub-sentence embedding spaces.
Results show that one method generates the most clusterable embeddings.
In general, the embeddings of span sub-sentences have better clustering properties than the original sentences.
arXiv Detail & Related papers (2021-10-02T00:47:35Z) - Using BERT Encoding and Sentence-Level Language Model for Sentence
Ordering [0.9134244356393667]
We propose an algorithm for sentence ordering in a corpus of short stories.
Our proposed method uses a language model based on Universal Transformers (UT) that captures sentences' dependencies by employing an attention mechanism.
The proposed model includes three components: Sentence, Language Model, and Sentence Arrangement with Brute Force Search.
arXiv Detail & Related papers (2021-08-24T23:03:36Z) - Narrative Incoherence Detection [76.43894977558811]
We propose the task of narrative incoherence detection as a new arena for inter-sentential semantic understanding.
Given a multi-sentence narrative, decide whether there exist any semantic discrepancies in the narrative flow.
arXiv Detail & Related papers (2020-12-21T07:18:08Z) - Inducing Alignment Structure with Gated Graph Attention Networks for
Sentence Matching [24.02847802702168]
This paper proposes a graph-based approach for sentence matching.
We represent a sentence pair as a graph with several carefully design strategies.
We then employ a novel gated graph attention network to encode the constructed graph for sentence matching.
arXiv Detail & Related papers (2020-10-15T11:25:54Z) - ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification
Models with Multiple Rewriting Transformations [97.27005783856285]
This paper introduces ASSET, a new dataset for assessing sentence simplification in English.
We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
arXiv Detail & Related papers (2020-05-01T16:44:54Z) - CompLex: A New Corpus for Lexical Complexity Prediction from Likert
Scale Data [13.224233182417636]
This paper presents the first English dataset for continuous lexical complexity prediction.
We use a 5-point Likert scale scheme to annotate complex words in texts from three sources/domains: the Bible, Europarl, and biomedical texts.
arXiv Detail & Related papers (2020-03-16T03:54:22Z) - Fact-aware Sentence Split and Rephrase with Permutation Invariant
Training [93.66323661321113]
Sentence Split and Rephrase aims to break down a complex sentence into several simple sentences with its meaning preserved.
Previous studies tend to address the issue by seq2seq learning from parallel sentence pairs.
We introduce Permutation Training to verifies the effects of order variance in seq2seq learning for this task.
arXiv Detail & Related papers (2020-01-16T07:30:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.