Extraction of Templates from Phrases Using Sequence Binary Decision
Diagrams
- URL: http://arxiv.org/abs/2001.10175v1
- Date: Tue, 28 Jan 2020 05:30:53 GMT
- Title: Extraction of Templates from Phrases Using Sequence Binary Decision
Diagrams
- Authors: Daiki Hirano, Kumiko Tanaka-Ishii and Andrew Finch
- Abstract summary: This paper presents an unsupervised approach for extracting templates from only tagged text by using a novel relaxed variant of the Sequence Binary Decision Diagram (SeqBDD)
The main contribution of this paper is a relaxed form of the SeqBDD construction algorithm that enables it to form general representations from a small amount of data.
Experiments show that the method is capable of high-quality extraction on tasks based on verb+preposition templates from corpora and phrasal templates from short messages from social media.
- Score: 3.867363075280544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The extraction of templates such as ``regard X as Y'' from a set of related
phrases requires the identification of their internal structures. This paper
presents an unsupervised approach for extracting templates on-the-fly from only
tagged text by using a novel relaxed variant of the Sequence Binary Decision
Diagram (SeqBDD). A SeqBDD can compress a set of sequences into a graphical
structure equivalent to a minimal DFA, but more compact and better suited to
the task of template extraction. The main contribution of this paper is a
relaxed form of the SeqBDD construction algorithm that enables it to form
general representations from a small amount of data. The process of compression
of shared structures in the text during Relaxed SeqBDD construction, naturally
induces the templates we wish to extract. Experiments show that the method is
capable of high-quality extraction on tasks based on verb+preposition templates
from corpora and phrasal templates from short messages from social media.
Related papers
- fPLSA: Learning Semantic Structures in Document Collections Using Foundation Models [19.099810900404357]
We introduce fPLSA, a foundation-model-based Probabilistic Latent Semantic Analysis (PLSA) method.
PLSA iteratively clusters and tags document segments based on document-level contexts.
Our experiments on story writing, math, and multi-step reasoning datasets demonstrate that fPLSA tags help reconstruct the original texts better than existing tagging methods.
arXiv Detail & Related papers (2024-10-07T20:25:52Z) - Detection and Measurement of Syntactic Templates in Generated Text [58.111650675717414]
We offer an analysis of syntactic features to characterize general repetition in models.
We find that models tend to produce templated text in downstream tasks at a higher rate than what is found in human-reference texts.
arXiv Detail & Related papers (2024-06-28T19:34:23Z) - A Quality-based Syntactic Template Retriever for
Syntactically-controlled Paraphrase Generation [67.98367574025797]
Existing syntactically-controlled paraphrase generation models perform promisingly with human-annotated or well-chosen syntactic templates.
The prohibitive cost makes it unfeasible to manually design decent templates for every source sentence.
We propose a novel Quality-based Syntactic Template Retriever (QSTR) to retrieve templates based on the quality of the to-be-generated paraphrases.
arXiv Detail & Related papers (2023-10-20T03:55:39Z) - Diffusion Models for Open-Vocabulary Segmentation [79.02153797465324]
OVDiff is a novel method that leverages generative text-to-image diffusion models for unsupervised open-vocabulary segmentation.
It relies solely on pre-trained components and outputs the synthesised segmenter directly, without training.
arXiv Detail & Related papers (2023-06-15T17:51:28Z) - Iterative Document-level Information Extraction via Imitation Learning [32.012467653148846]
We present a novel iterative extraction model, IterX, for extracting complex relations.
Our imitation learning approach casts the problem as a Markov decision process (MDP)
It leads to state-of-the-art results on two established benchmarks.
arXiv Detail & Related papers (2022-10-12T21:46:04Z) - Classifiers are Better Experts for Controllable Text Generation [63.17266060165098]
We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts.
The same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.
arXiv Detail & Related papers (2022-05-15T12:58:35Z) - PSG: Prompt-based Sequence Generation for Acronym Extraction [26.896811663334162]
We propose a Prompt-based Sequence Generation (PSG) method for the acronym extraction task.
Specifically, we design a template for prompting the extracted acronym texts with auto-regression.
A position extraction algorithm is designed for extracting the position of the generated answers.
arXiv Detail & Related papers (2021-11-29T02:14:38Z) - Generating Synthetic Data for Task-Oriented Semantic Parsing with
Hierarchical Representations [0.8203855808943658]
In this work, we explore the possibility of generating synthetic data for neural semantic parsing.
Specifically, we first extract masked templates from the existing labeled utterances, and then fine-tune BART to generate synthetic utterances conditioning.
We show the potential of our approach when evaluating on the Facebook TOP dataset for navigation domain.
arXiv Detail & Related papers (2020-11-03T22:55:40Z) - GRIT: Generative Role-filler Transformers for Document-level Event
Entity Extraction [134.5580003327839]
We introduce a generative transformer-based encoder-decoder framework (GRIT) to model context at the document level.
We evaluate our approach on the MUC-4 dataset, and show that our model performs substantially better than prior work.
arXiv Detail & Related papers (2020-08-21T01:07:36Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z) - Variational Template Machine for Data-to-Text Generation [37.03488881357614]
We claim that an open set of templates is crucial for enriching the phrase constructions and realizing varied generations.
This paper explores the problem of automatically learning reusable "templates" from paired and non-paired data.
We propose the variational template machine (VTM), a novel method to generate text descriptions from data tables.
arXiv Detail & Related papers (2020-02-04T04:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.