Boosting the Performance of Transformer Architectures for Semantic
Textual Similarity
- URL: http://arxiv.org/abs/2306.00708v1
- Date: Thu, 1 Jun 2023 14:16:53 GMT
- Title: Boosting the Performance of Transformer Architectures for Semantic
Textual Similarity
- Authors: Ivan Rep, Vladimir \v{C}eperi\'c
- Abstract summary: We fine-tune transformer architectures for semantic textual similarity on the Semantic Textual Similarity Benchmark.
We experiment with BERT, RoBERTa, and DeBERTaV3 cross-encoders by approaching the problem as a binary classification task or a regression task.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic textual similarity is the task of estimating the similarity between
the meaning of two texts. In this paper, we fine-tune transformer architectures
for semantic textual similarity on the Semantic Textual Similarity Benchmark by
tuning the model partially and then end-to-end. We experiment with BERT,
RoBERTa, and DeBERTaV3 cross-encoders by approaching the problem as a binary
classification task or a regression task. We combine the outputs of the
transformer models and use handmade features as inputs for boosting algorithms.
Due to worse test set results coupled with improvements on the validation set,
we experiment with different dataset splits to further investigate this
occurrence. We also provide an error analysis, focused on the edges of the
prediction range.
Related papers
- Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval [68.61855682218298]
Cross-modal retrieval methods employ two-stream encoders with different architectures for images and texts.
Inspired by recent advances of Transformers in vision tasks, we propose to unify the encoder architectures with Transformers for both modalities.
We design a cross-modal retrieval framework purely based on two-stream Transformers, dubbed textbfHierarchical Alignment Transformers (HAT), which consists of an image Transformer, a text Transformer, and a hierarchical alignment module.
arXiv Detail & Related papers (2023-08-08T15:43:59Z) - BERT-Based Combination of Convolutional and Recurrent Neural Network for
Indonesian Sentiment Analysis [0.0]
This research extends the previous hybrid deep learning using BERT representation for Indonesian sentiment analysis.
Our simulation shows that the BERT representation improves the accuracies of all hybrid architectures.
arXiv Detail & Related papers (2022-11-10T00:32:40Z) - Structural Biases for Improving Transformers on Translation into
Morphologically Rich Languages [120.74406230847904]
TP-Transformer augments the traditional Transformer architecture to include an additional component to represent structure.
The second method imbues structure at the data level by segmenting the data with morphological tokenization.
We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset.
arXiv Detail & Related papers (2022-08-11T22:42:24Z) - A Cognitive Study on Semantic Similarity Analysis of Large Corpora: A
Transformer-based Approach [0.0]
We perform semantic similarity analysis and modeling on the U.S. Patent Phrase to Phrase Matching dataset using both traditional and transformer-based techniques.
The experimental results demonstrate our methodology's enhanced performance compared to traditional techniques, with an average Pearson correlation score of 0.79.
arXiv Detail & Related papers (2022-07-24T11:06:56Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - Transformer Models for Text Coherence Assessment [14.132559978971377]
Coherence is an important aspect of text quality and is crucial for ensuring its readability.
Previous work has leveraged entity-based methods, syntactic patterns, discourse relations, and more recently traditional deep learning architectures for text coherence assessment.
We propose four different Transformer-based architectures for the task: vanilla Transformer, hierarchical Transformer, multi-task learning-based model, and a model with fact-based input representation.
arXiv Detail & Related papers (2021-09-05T22:27:17Z) - Neural String Edit Distance [77.72325513792981]
We propose the neural string edit distance model for string-pair classification and sequence generation.
We modify the original expectation-maximization learned edit distance algorithm into a differentiable loss function.
We show that we can trade off between performance and interpretability in a single framework.
arXiv Detail & Related papers (2021-04-16T22:16:47Z) - Syntax-Enhanced Pre-trained Model [49.1659635460369]
We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa.
Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages.
We present a model that utilizes the syntax of text in both pre-training and fine-tuning stages.
arXiv Detail & Related papers (2020-12-28T06:48:04Z) - Logic Constrained Pointer Networks for Interpretable Textual Similarity [11.142649867439406]
We introduce a novel pointer network based model with a sentinel gating function to align constituent chunks.
We improve this base model with a loss function to equally penalize misalignments in both sentences, ensuring the alignments are bidirectional.
The model achieves an F1 score of 97.73 and 96.32 on the benchmark SemEval datasets for the chunk alignment task.
arXiv Detail & Related papers (2020-07-15T13:01:44Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.