Keeping in Time: Adding Temporal Context to Sentiment Analysis Models
- URL: http://arxiv.org/abs/2309.13562v1
- Date: Sun, 24 Sep 2023 06:38:21 GMT
- Title: Keeping in Time: Adding Temporal Context to Sentiment Analysis Models
- Authors: Dean Ninalga
- Abstract summary: This paper presents a state-of-the-art solution to the LongEval CLEF 2023 Lab Task 2: LongEval-Classification.
The goal of this task is to improve and preserve the performance of sentiment analysis models across shorter and longer time periods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a state-of-the-art solution to the LongEval CLEF 2023 Lab
Task 2: LongEval-Classification. The goal of this task is to improve and
preserve the performance of sentiment analysis models across shorter and longer
time periods. Our framework feeds date-prefixed textual inputs to a pre-trained
language model, where the timestamp is included in the text. We show
date-prefixed samples better conditions model outputs on the temporal context
of the respective texts. Moreover, we further boost performance by performing
self-labeling on unlabeled data to train a student model. We augment the
self-labeling process using a novel augmentation strategy leveraging the
date-prefixed formatting of our samples. We demonstrate concrete performance
gains on the LongEval-Classification evaluation set over non-augmented
self-labeling. Our framework achieves a 2nd place ranking with an overall score
of 0.6923 and reports the best Relative Performance Drop (RPD) of -0.0656 over
the short evaluation set.
Related papers
- Improving Embedding Accuracy for Document Retrieval Using Entity Relationship Maps and Model-Aware Contrastive Sampling [0.0]
APEX-Embedding-7B is a 7-billion parameter decoder-only text Feature Extraction Model.
Our approach employs two training techniques that yield an emergent improvement in factual focus.
Based on our evaluations, our model establishes a new state-of-the-art standard in text feature extraction for longer context document retrieval tasks.
arXiv Detail & Related papers (2024-10-08T17:36:48Z) - Large Language Model-guided Document Selection [23.673690115025913]
Large Language Model (LLM) pre-training exhausts an ever growing compute budget.
Recent research has demonstrated that careful document selection enables comparable model quality with only a fraction of the FLOPs.
We explore a promising direction for scalable general-domain document selection.
arXiv Detail & Related papers (2024-06-07T04:52:46Z) - Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models [81.27391252152199]
Large language models (LLMs) have achieved impressive performance across various natural language benchmarks.
We propose to automate dataset updating and provide systematic analysis regarding its effectiveness.
There are two updating strategies: 1) mimicking strategy to generate similar samples based on original data, and 2) extending strategy that further expands existing samples.
arXiv Detail & Related papers (2024-02-19T07:15:59Z) - Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens.
Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z) - Adapting Pretrained Text-to-Text Models for Long Text Sequences [39.62224414485055]
We adapt an existing pretrained text-to-text model for long-sequence inputs.
We build a long-context model that achieves competitive performance on long-text QA tasks.
arXiv Detail & Related papers (2022-09-21T00:41:07Z) - Building for Tomorrow: Assessing the Temporal Persistence of Text
Classifiers [18.367109894193486]
Performance of text classification models can drop over time when new data to be classified is more distant in time from the data used for training.
This raises important research questions on the design of text classification models intended to persist over time.
We perform longitudinal classification experiments on three datasets spanning between 6 and 19 years.
arXiv Detail & Related papers (2022-05-11T12:21:14Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Opinions are Made to be Changed: Temporally Adaptive Stance
Classification [9.061088449712859]
We introduce two novel large-scale, longitudinal stance datasets.
We evaluate the performance persistence of stance classifiers over time and demonstrate how it decays as the temporal gap between training and testing data increases.
We propose and compare several approaches to embedding adaptation and find that the Incremental Temporal Alignment (ITA) model leads to the best results in reducing performance drop over time.
arXiv Detail & Related papers (2021-08-27T19:47:31Z) - A Closer Look at Temporal Sentence Grounding in Videos: Datasets and
Metrics [70.45937234489044]
We re- organize two widely-used TSGV datasets (Charades-STA and ActivityNet Captions) to make it different from the training split.
We introduce a new evaluation metric "dR@$n$,IoU@$m$" to calibrate the basic IoU scores.
All the results demonstrate that the re-organized datasets and new metric can better monitor the progress in TSGV.
arXiv Detail & Related papers (2021-01-22T09:59:30Z) - Pre-training for Abstractive Document Summarization by Reinstating
Source Text [105.77348528847337]
This paper presents three pre-training objectives which allow us to pre-train a Seq2Seq based abstractive summarization model on unlabeled text.
Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines.
arXiv Detail & Related papers (2020-04-04T05:06:26Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.