Non-Contextual BERT or FastText? A Comparative Analysis
- URL: http://arxiv.org/abs/2411.17661v3
- Date: Thu, 20 Feb 2025 14:41:23 GMT
- Title: Non-Contextual BERT or FastText? A Comparative Analysis
- Authors: Abhay Shanbhag, Suramya Jadhav, Amogh Thakurdesai, Ridhima Sinare, Raviraj Joshi,
- Abstract summary: We analyze the effectiveness of non-contextual embeddings from BERT models and FastText models for tasks such as news classification, sentiment analysis, and hate speech detection.<n>Our findings indicate that non-contextual BERT embeddings extracted from the model's first embedding layer outperform FastText embeddings.
- Score: 0.4194295877935868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural Language Processing (NLP) for low-resource languages, which lack large annotated datasets, faces significant challenges due to limited high-quality data and linguistic resources. The selection of embeddings plays a critical role in achieving strong performance in NLP tasks. While contextual BERT embeddings require a full forward pass, non-contextual BERT embeddings rely only on table lookup. Existing research has primarily focused on contextual BERT embeddings, leaving non-contextual embeddings largely unexplored. In this study, we analyze the effectiveness of non-contextual embeddings from BERT models (MuRIL and MahaBERT) and FastText models (IndicFT and MahaFT) for tasks such as news classification, sentiment analysis, and hate speech detection in one such low-resource language Marathi. We compare these embeddings with their contextual and compressed variants. Our findings indicate that non-contextual BERT embeddings extracted from the model's first embedding layer outperform FastText embeddings, presenting a promising alternative for low-resource NLP.
Related papers
- Resource-Efficient Adaptation of Large Language Models for Text Embeddings via Prompt Engineering and Contrastive Fine-tuning [3.9914181590063884]
Large Language Models (LLMs) have become a cornerstone in Natural Language Processing (NLP)<n>We explore several adaptation strategies for pre-trained, decoder-only LLMs.
arXiv Detail & Related papers (2025-07-30T14:49:30Z) - Chain-of-Translation Prompting (CoTR): A Novel Prompting Technique for Low Resource Languages [0.4499833362998489]
Chain of Translation Prompting (CoTR) is a novel strategy designed to enhance the performance of language models in low-resource languages.
CoTR restructures prompts to first translate the input context from a low-resource language into a higher-resource language, such as English.
We demonstrate the effectiveness of this method through a case study on the low-resource Indic language Marathi.
arXiv Detail & Related papers (2024-09-06T17:15:17Z) - A Data Selection Approach for Enhancing Low Resource Machine Translation Using Cross-Lingual Sentence Representations [0.4499833362998489]
This study focuses on the case of English-Marathi language pairs, where existing datasets are notably noisy.
To mitigate the impact of data quality issues, we propose a data filtering approach based on cross-lingual sentence representations.
Results demonstrate a significant improvement in translation quality over the baseline post-filtering with IndicSBERT.
arXiv Detail & Related papers (2024-09-04T13:49:45Z) - Enhancing Depressive Post Detection in Bangla: A Comparative Study of TF-IDF, BERT and FastText Embeddings [0.0]
This study introduces a well-grounded approach to identify depressive social media posts in Bangla.
The dataset used in this work, annotated by domain experts, includes both depressive and non-depressive posts.
To address the issue of class imbalance, we utilised random oversampling for the minority class.
arXiv Detail & Related papers (2024-07-12T11:40:17Z) - MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer [50.40191599304911]
We introduce MoSECroT Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer.
In this paper, we present the first framework that leverages relative representations to construct a common space for the embeddings of a source language PLM and the static word embeddings of a target language.
We show that although our proposed framework is competitive with weak baselines when addressing MoSECroT, it fails to achieve competitive results compared with some strong baselines.
arXiv Detail & Related papers (2024-01-09T21:09:07Z) - Improving Text Embeddings with Large Language Models [59.930513259982725]
We introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps.
We leverage proprietary LLMs to generate diverse synthetic data for hundreds of thousands of text embedding tasks across 93 languages.
Experiments demonstrate that our method achieves strong performance on highly competitive text embedding benchmarks without using any labeled data.
arXiv Detail & Related papers (2023-12-31T02:13:18Z) - Visually-augmented pretrained language models for NLP tasks without
images [77.74849855049523]
Existing solutions often rely on explicit images for visual knowledge augmentation.
We propose a novel textbfVisually-textbfAugmented fine-tuning approach.
Our approach can consistently improve the performance of BERT, RoBERTa, BART, and T5 at different scales.
arXiv Detail & Related papers (2022-12-15T16:13:25Z) - BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives [0.0]
BERT has revolutionized the NLP field by enabling transfer learning with large language models.
This article studies how to better cope with the different embeddings provided by the BERT output layer and the usage of language-specific instead of multilingual models.
arXiv Detail & Related papers (2022-01-10T15:05:05Z) - ZeroBERTo -- Leveraging Zero-Shot Text Classification by Topic Modeling [57.80052276304937]
This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task.
We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12% in the F1 score in the FolhaUOL dataset.
arXiv Detail & Related papers (2022-01-04T20:08:17Z) - Knowledge-Rich BERT Embeddings for Readability Assessment [0.0]
We propose an alternative way of utilizing the information-rich embeddings of BERT models through a joint-learning method.
Results show that the proposed method outperforms classical approaches in readability assessment using English and Filipino datasets.
arXiv Detail & Related papers (2021-06-15T07:37:48Z) - Evaluation of BERT and ALBERT Sentence Embedding Performance on
Downstream NLP Tasks [4.955649816620742]
This paper explores on sentence embedding models for BERT and ALBERT.
We take a modified BERT network with siamese and triplet network structures called Sentence-BERT (SBERT) and replace BERT with ALBERT to create Sentence-ALBERT (SALBERT)
arXiv Detail & Related papers (2021-01-26T09:14:06Z) - ERICA: Improving Entity and Relation Understanding for Pre-trained
Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text.
Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z) - GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight
Gated Injection Method [29.352569563032056]
We propose a novel method to explicitly inject linguistic knowledge in the form of word embeddings into a pre-trained BERT.
Our performance improvements on multiple semantic similarity datasets when injecting dependency-based and counter-fitted embeddings indicate that such information is beneficial and currently missing from the original model.
arXiv Detail & Related papers (2020-10-23T17:00:26Z) - Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it.
In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Exploring Cross-sentence Contexts for Named Entity Recognition with BERT [1.4998865865537996]
We present a study exploring the use of cross-sentence information for NER using BERT models in five languages.
We find that adding context in the form of additional sentences to BERT input increases NER performance on all of the tested languages and models.
We propose a straightforward method, Contextual Majority Voting (CMV), to combine different predictions for sentences and demonstrate this to further increase NER performance with BERT.
arXiv Detail & Related papers (2020-06-02T12:34:52Z) - Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining.
We distill the approximate marginal distribution over words in context from the syntactic LM.
Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z) - Table Search Using a Deep Contextualized Language Model [20.041167804194707]
In this paper, we use the deep contextualized language model BERT for the task of ad hoc table retrieval.
We propose an approach that incorporates features from prior literature on table retrieval and jointly trains them with BERT.
arXiv Detail & Related papers (2020-05-19T04:18:04Z) - Coreferential Reasoning Learning for Language Representation [88.14248323659267]
We present CorefBERT, a novel language representation model that can capture the coreferential relations in context.
The experimental results show that, compared with existing baseline models, CorefBERT can achieve significant improvements consistently on various downstream NLP tasks.
arXiv Detail & Related papers (2020-04-15T03:57:45Z) - Incorporating BERT into Neural Machine Translation [251.54280200353674]
We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence.
We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets.
arXiv Detail & Related papers (2020-02-17T08:13:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.