CXP949 at WNUT-2020 Task 2: Extracting Informative COVID-19 Tweets --
RoBERTa Ensembles and The Continued Relevance of Handcrafted Features
- URL: http://arxiv.org/abs/2010.07988v1
- Date: Thu, 15 Oct 2020 19:12:52 GMT
- Title: CXP949 at WNUT-2020 Task 2: Extracting Informative COVID-19 Tweets --
RoBERTa Ensembles and The Continued Relevance of Handcrafted Features
- Authors: Calum Perrio and Harish Tayyar Madabushi
- Abstract summary: This paper presents our submission to Task 2 of the Workshop on Noisy User-generated Text.
We explore improving the performance of a pre-trained language model fine-tuned for text classification through an ensemble implementation.
We show that inclusion of additional features can improve classification results and achieve a score within 2 points of the top performing team.
- Score: 0.6980076213134383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents our submission to Task 2 of the Workshop on Noisy
User-generated Text. We explore improving the performance of a pre-trained
transformer-based language model fine-tuned for text classification through an
ensemble implementation that makes use of corpus level information and a
handcrafted feature. We test the effectiveness of including the aforementioned
features in accommodating the challenges of a noisy data set centred on a
specific subject outside the remit of the pre-training data. We show that
inclusion of additional features can improve classification results and achieve
a score within 2 points of the top performing team.
Related papers
- Proposal Report for the 2nd SciCAP Competition 2024 [20.58804817441756]
We propose a method for document summarization using auxiliary information.
Our experiments demonstrate that leveraging high-quality OCR data enables efficient summarization of the content related to described objects.
Our method achieved top scores of 4.33 and 4.66 in the long caption and short caption tracks, respectively, of the 2024 SciCAP competition.
arXiv Detail & Related papers (2024-07-02T02:42:29Z) - Nullpointer at ArAIEval Shared Task: Arabic Propagandist Technique Detection with Token-to-Word Mapping in Sequence Tagging [0.0]
This paper investigates the optimization of propaganda technique detection in Arabic text, including tweets & news paragraphs, from ArAIEval shared task 1.
Experimental results show relying on the first token of the word for technique prediction produces the best performance.
arXiv Detail & Related papers (2024-07-01T15:15:24Z) - TAROT: A Hierarchical Framework with Multitask Co-Pretraining on
Semi-Structured Data towards Effective Person-Job Fit [60.31175803899285]
We propose TAROT, a hierarchical multitask co-pretraining framework, to better utilize structural and semantic information for informative text embeddings.
TAROT targets semi-structured text in profiles and jobs, and it is co-pretained with multi-grained pretraining tasks to constrain the acquired semantic information at each level.
arXiv Detail & Related papers (2024-01-15T07:57:58Z) - Information Type Classification with Contrastive Task-Specialized
Sentence Encoders [8.301569507291006]
We propose the use of contrastive task-specialized sentence encoders for downstream classification.
We show performance gains w.r.t. F1-score on the CrisisLex, HumAID, and TrecIS information type classification tasks.
arXiv Detail & Related papers (2023-12-18T08:45:39Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - ChatGraph: Interpretable Text Classification by Converting ChatGPT
Knowledge to Graphs [54.48467003509595]
ChatGPT has shown superior performance in various natural language processing (NLP) tasks.
We propose a novel framework that leverages the power of ChatGPT for specific tasks, such as text classification.
Our method provides a more transparent decision-making process compared with previous text classification methods.
arXiv Detail & Related papers (2023-05-03T19:57:43Z) - DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse
Segmentation, Classification, and Connective Detection [4.371388370559826]
Our system, called DisCoDisCo, enhances contextualized word embeddings with hand-crafted features.
Results on relation classification suggest strong performance on the new 2021 benchmark.
A partial evaluation of multiple pre-trained Transformer-based language models indicates that models pre-trained on the Next Sentence Prediction task are optimal for relation classification.
arXiv Detail & Related papers (2021-09-20T18:11:05Z) - Focused Attention Improves Document-Grounded Generation [111.42360617630669]
Document grounded generation is the task of using the information provided in a document to improve text generation.
This work focuses on two different document grounded generation tasks: Wikipedia Update Generation task and Dialogue response generation.
arXiv Detail & Related papers (2021-04-26T16:56:29Z) - On the use of Self-supervised Pre-trained Acoustic and Linguistic
Features for Continuous Speech Emotion Recognition [2.294014185517203]
We use wav2vec and camemBERT as self-supervised learned models to represent our data in order to perform continuous emotion recognition from speech.
To the authors' knowledge, this paper presents the first study showing that the joint use of wav2vec and BERT-like pre-trained features is very relevant to deal with continuous SER task.
arXiv Detail & Related papers (2020-11-18T11:10:29Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.