Related papers: Binary Token-Level Classification with DeBERTa for All-Type MWE Identification: A Lightweight Approach with Linguistic Enhancement

Binary Token-Level Classification with DeBERTa for All-Type MWE Identification: A Lightweight Approach with Linguistic Enhancement

URL: http://arxiv.org/abs/2601.19360v1
Date: Tue, 27 Jan 2026 08:42:54 GMT
Title: Binary Token-Level Classification with DeBERTa for All-Type MWE Identification: A Lightweight Approach with Linguistic Enhancement
Authors: Diego Rossini, Lonneke van der Plas,
Abstract summary: We present a comprehensive approach for multiword expression (MWE) identification that combines binary token-level classification, linguistic feature integration, and data augmentation.<n>Our DeBERTa-v3-large model achieves 69.8% F1 on the CoAM dataset, surpassing the best results (Qwen-72B, 57.8% F1) on this dataset by 12 points while using 165x fewer parameters.
Score: 1.8429656136522097
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a comprehensive approach for multiword expression (MWE) identification that combines binary token-level classification, linguistic feature integration, and data augmentation. Our DeBERTa-v3-large model achieves 69.8% F1 on the CoAM dataset, surpassing the best results (Qwen-72B, 57.8% F1) on this dataset by 12 points while using 165x fewer parameters. We achieve this performance by (1) reformulating detection as binary token-level START/END/INSIDE classification rather than span-based prediction, (2) incorporating NP chunking and dependency features that help discontinuous and NOUN-type MWEs identification, and (3) applying oversampling that addresses severe class imbalance in the training data. We confirm the generalization of our method on the STREUSLE dataset, achieving 78.9% F1. These results demonstrate that carefully designed smaller models can substantially outperform LLMs on structured NLP tasks, with important implications for resource-constrained deployments.

Related papers

Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches [0.0]
We explore strategies to fine-tune decoder-only Large Language Models (LLMs) for downstream text classification under resource constraints.<n>Two approaches are investigated: (1) attaching a classification head to a pre-trained causal LLM and fine-tuning on the task, and (2) instruction-tuning the LLM in a prompt->response format for classification.
arXiv Detail & Related papers (2025-12-14T13:02:06Z)
CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks [96.64597365827046]
We present the first unified framework that jointly handles three operationally heterogeneous saliency tasks.<n>We introduce a Chain-of-Thought (CoT) reasoning process in a Vision-Language Model (VLM) to bridge task heterogeneity.<n>We show our model matches or outperforms specialized SOTA methods and strong closed-source VLMs across all tasks.
arXiv Detail & Related papers (2025-11-01T04:37:01Z)
An Evaluation Study of Hybrid Methods for Multilingual PII Detection [0.026059379504241156]
We present RECAP, a framework that combines deterministic regular expressions with context-aware large language models (LLMs) for scalable PII detection.<n>Our system outperforms fine-tuned NER models by 82% and zero-shot LLMs by 17% in weighted F1-score.<n>This work offers a scalable and adaptable solution for efficient PII detection in compliance-focused applications.
arXiv Detail & Related papers (2025-10-08T21:03:59Z)
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks.<n>Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales.<n>We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z)
How to Make LLMs Strong Node Classifiers? [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, such as Graph Neural Networks (GNNs) and Graph Transformers (GTs)<n>We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art (SOTA) GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z)
A Novel Two-Step Fine-Tuning Pipeline for Cold-Start Active Learning in Text Classification Tasks [7.72751543977484]
This work investigates the effectiveness of BERT-based contextual embeddings in active learning (AL) tasks on cold-start scenarios. Our primary contribution is the proposal of a more robust fine-tuning pipeline - DoTCAL. Our evaluation contrasts BERT-based embeddings with other prevalent text representation paradigms, including Bag of Words (BoW), Latent Semantic Indexing (LSI) and FastText.
arXiv Detail & Related papers (2024-07-24T13:50:21Z)
Low-resource classification of mobility functioning information in clinical sentences using large language models [0.0]
This study evaluates the ability of publicly available large language models (LLMs) to accurately identify the presence of functioning information from clinical notes. We collect a balanced binary classification dataset of 1000 sentences from the Mobility NER dataset, which was curated from n2c2 clinical notes.
arXiv Detail & Related papers (2023-12-15T20:59:17Z)
Text Classification via Large Language Models [63.1874290788797]
We introduce Clue And Reasoning Prompting (CARP) to address complex linguistic phenomena involved in text classification. Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used text-classification benchmarks. More importantly, we find that CARP delivers impressive abilities on low-resource and domain-adaptation setups.
arXiv Detail & Related papers (2023-05-15T06:24:45Z)
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE [203.65227947509933]
This report describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard. SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks.
arXiv Detail & Related papers (2022-12-04T15:36:18Z)
Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks. adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations. In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z)
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding [35.92571138322246]
Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks. We present a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks. Our approach demonstrates strong performance across seven classification tasks of the GLUE benchmark.
arXiv Detail & Related papers (2022-02-09T16:02:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.