HeRo: RoBERTa and Longformer Hebrew Language Models
- URL: http://arxiv.org/abs/2304.11077v1
- Date: Tue, 18 Apr 2023 05:56:32 GMT
- Title: HeRo: RoBERTa and Longformer Hebrew Language Models
- Authors: Vitaly Shalumov and Harel Haskey
- Abstract summary: We provide a state-of-the-art pre-trained language model HeRo for standard length inputs and an efficient transformer LongHeRo for long input sequences.
The HeRo model was evaluated on the sentiment analysis, the named entity recognition, and the question answering tasks.
The LongHeRo model was evaluated on the document classification task with a dataset composed of long documents.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we fill in an existing gap in resources available to the
Hebrew NLP community by providing it with the largest so far pre-train dataset
HeDC4, a state-of-the-art pre-trained language model HeRo for standard length
inputs and an efficient transformer LongHeRo for long input sequences. The HeRo
model was evaluated on the sentiment analysis, the named entity recognition,
and the question answering tasks while the LongHeRo model was evaluated on the
document classification task with a dataset composed of long documents. Both
HeRo and LongHeRo presented state-of-the-art performance. The dataset and model
checkpoints used in this work are publicly available.
Related papers
- How to Train Long-Context Language Models (Effectively) [75.5418485597276]
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information.
ProLong-8B, which is from Llama-3 and trained on 40B tokens, demonstrates state-of-the-art long-context performance among similarly sized models at a length of 128K.
arXiv Detail & Related papers (2024-10-03T16:46:52Z) - Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA [71.04146366608904]
Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows.
We propose a novel long-context benchmark, Loong, aligning with realistic scenarios through extended multi-document question answering (QA)
Loong introduces four types of tasks with a range of context lengths: Spotlight Locating, Comparison, Clustering, and Chain of Reasoning.
arXiv Detail & Related papers (2024-06-25T09:42:56Z) - LongEmbed: Extending Embedding Models for Long Context Retrieval [87.60404151086715]
This paper explores context window extension of embedding models, pushing the limit to 32k without requiring additional training.
First, we examine the performance of current embedding models for long context retrieval on our newly constructed LongEmbed benchmark.
Experiments show that training-free context window extension strategies like positionRo can effectively extend the context window of existing embedding models by several folds.
arXiv Detail & Related papers (2024-04-18T11:29:23Z) - BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models [141.21603469555225]
Large language models (LLMs) have achieved dramatic proficiency over NLP tasks with normal length.
We propose BAMBOO, a multi-task long context benchmark.
It consists of 10 datasets from 5 different long text understanding tasks.
arXiv Detail & Related papers (2023-09-23T11:36:15Z) - Lost in the Middle: How Language Models Use Long Contexts [88.78803442320246]
We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts.
We find that performance can degrade significantly when changing the position of relevant information.
Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long-context language models.
arXiv Detail & Related papers (2023-07-06T17:54:11Z) - Leveraging BERT Language Model for Arabic Long Document Classification [0.47138177023764655]
We propose two models to classify long length Arabic documents.
Both of our models outperform the Longformer and RoBERT in this task over two different datasets.
arXiv Detail & Related papers (2023-05-04T13:56:32Z) - Finding the Needle in a Haystack: Unsupervised Rationale Extraction from
Long Text Classifiers [20.10172411803626]
We propose a compositional soft attention architecture that applies RoBERTa sentence-wise to extract plausible rationales at the token-level.
We find this method to significantly outperform Longformer-driven baselines on sentiment classification datasets.
arXiv Detail & Related papers (2023-03-14T15:45:35Z) - LoRaLay: A Multilingual and Multimodal Dataset for Long Range and
Layout-Aware Summarization [19.301567079372436]
Text Summarization is a popular task and an active area of research for the Natural Language Processing community.
All publicly available summarization datasets only provide plain text content.
We present LoRaLay, a collection of datasets for long-range summarization with accompanying visual/Lay information.
arXiv Detail & Related papers (2023-01-26T18:50:54Z) - Adapting Pretrained Text-to-Text Models for Long Text Sequences [39.62224414485055]
We adapt an existing pretrained text-to-text model for long-sequence inputs.
We build a long-context model that achieves competitive performance on long-text QA tasks.
arXiv Detail & Related papers (2022-09-21T00:41:07Z) - SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts.
SCROLLS contains summarization, question answering, and natural language inference tasks.
We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z) - Longformer: The Long-Document Transformer [40.18988262517733]
Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length.
We introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer.
Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention.
arXiv Detail & Related papers (2020-04-10T17:54:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.