Sentence Compression as Deletion with Contextual Embeddings
- URL: http://arxiv.org/abs/2006.03210v1
- Date: Fri, 5 Jun 2020 02:40:46 GMT
- Title: Sentence Compression as Deletion with Contextual Embeddings
- Authors: Minh-Tien Nguyen and Bui Cong Minh and Dung Tien Le and Le Thai Linh
- Abstract summary: We exploit contextual embeddings that enable our model capturing the context of inputs.
Experimental results on a benchmark Google dataset show that by utilizing contextual embeddings, our model achieves a new state-of-the-art F-score.
- Score: 3.3263205689999444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sentence compression is the task of creating a shorter version of an input
sentence while keeping important information. In this paper, we extend the task
of compression by deletion with the use of contextual embeddings. Different
from prior work usually using non-contextual embeddings (Glove or Word2Vec), we
exploit contextual embeddings that enable our model capturing the context of
inputs. More precisely, we utilize contextual embeddings stacked by
bidirectional Long-short Term Memory and Conditional Random Fields for dealing
with sequence labeling. Experimental results on a benchmark Google dataset show
that by utilizing contextual embeddings, our model achieves a new
state-of-the-art F-score compared to strong methods reported on the leader
board.
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models [5.330795983408874]
We introduce a novel method called late chunking, which leverages long context embedding models to first embed all tokens of the long text.
The resulting chunk embeddings capture the full contextual information, leading to superior results across various retrieval tasks.
arXiv Detail & Related papers (2024-09-07T03:54:46Z) - Fine-grained Controllable Text Generation through In-context Learning with Feedback [57.396980277089135]
We present a method for rewriting an input sentence to match specific values of nontrivial linguistic features, such as dependency depth.
In contrast to earlier work, our method uses in-context learning rather than finetuning, making it applicable in use cases where data is sparse.
arXiv Detail & Related papers (2024-06-17T08:55:48Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Text Ranking and Classification using Data Compression [1.332560004325655]
We propose a language-agnostic approach to text categorization.
We use the Zstandard compressor and strengthen these ideas in several ways, calling the resulting technique Zest.
We show that Zest complements and can compete with language-specific multidimensional content embeddings in production, but cannot outperform other counting methods on public datasets.
arXiv Detail & Related papers (2021-09-23T18:13:17Z) - A Condense-then-Select Strategy for Text Summarization [53.10242552203694]
We propose a novel condense-then-select framework for text summarization.
Our framework helps to avoid the loss of salient information, while preserving the high efficiency of sentence-level compression.
arXiv Detail & Related papers (2021-06-19T10:33:10Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.