Your Next State-of-the-Art Could Come from Another Domain: A Cross-Domain Analysis of Hierarchical Text Classification
- URL: http://arxiv.org/abs/2412.12744v1
- Date: Tue, 17 Dec 2024 10:08:57 GMT
- Title: Your Next State-of-the-Art Could Come from Another Domain: A Cross-Domain Analysis of Hierarchical Text Classification
- Authors: Nan Li, Bo Kang, Tijl De Bie,
- Abstract summary: Text classification with hierarchical labels is a prevalent and challenging task in natural language processing.
We provide the first comprehensive cross-domain overview with empirical analysis of state-of-the-art methods.
- Score: 13.210645250173997
- License:
- Abstract: Text classification with hierarchical labels is a prevalent and challenging task in natural language processing. Examples include assigning ICD codes to patient records, tagging patents into IPC classes, assigning EUROVOC descriptors to European legal texts, and more. Despite its widespread applications, a comprehensive understanding of state-of-the-art methods across different domains has been lacking. In this paper, we provide the first comprehensive cross-domain overview with empirical analysis of state-of-the-art methods. We propose a unified framework that positions each method within a common structure to facilitate research. Our empirical analysis yields key insights and guidelines, confirming the necessity of learning across different research areas to design effective methods. Notably, under our unified evaluation pipeline, we achieved new state-of-the-art results by applying techniques beyond their original domains.
Related papers
- Text Classification using Graph Convolutional Networks: A Comprehensive Survey [11.1080224302799]
Graph convolution network (GCN)-based approaches have gained a lot of traction in this domain over the last decade.
This work aims to summarize and categorize various GCN-based Text Classification approaches with regard to the architecture and mode of supervision.
arXiv Detail & Related papers (2024-10-12T07:03:42Z) - Exploring Ordinality in Text Classification: A Comparative Study of Explicit and Implicit Techniques [3.197435100145382]
Ordinal Classification (OC) is a widely encountered challenge in Natural Language Processing (NLP)
Previous approaches to tackle OC have primarily focused on modifying existing or creating novel loss functions that textbfexplicitly account for the ordinal nature of labels.
With the advent of Pretrained Language Models (PLMs), it became possible to tackle ordinality through the textbfimplicit semantics of the labels as well.
arXiv Detail & Related papers (2024-05-20T04:31:04Z) - Mind Your Neighbours: Leveraging Analogous Instances for Rhetorical Role Labeling for Legal Documents [1.2562034805037443]
This study introduces novel techniques to enhance Rhetorical Role Labeling (RRL) performance.
For inference-based methods, we explore techniques that bolster label predictions without re-training.
While in training-based methods, we integrate learning with our novel discourse-aware contrastive method that work directly on embedding spaces.
arXiv Detail & Related papers (2024-03-31T08:10:45Z) - Cross-domain Chinese Sentence Pattern Parsing [67.1381983012038]
Sentence Pattern Structure (SPS) parsing is a syntactic analysis method primarily employed in language teaching.
Existing SPSs rely heavily on textbook corpora for training, lacking cross-domain capability.
This paper proposes an innovative approach leveraging large language models (LLMs) within a self-training framework.
arXiv Detail & Related papers (2024-02-26T05:30:48Z) - Bidirectional Generative Framework for Cross-domain Aspect-based
Sentiment Analysis [68.742820522137]
Cross-domain aspect-based sentiment analysis (ABSA) aims to perform various fine-grained sentiment analysis tasks on a target domain by transferring knowledge from a source domain.
We propose a unified bidirectional generative framework to tackle various cross-domain ABSA tasks.
Our framework trains a generative model in both text-to-label and label-to-text directions.
arXiv Detail & Related papers (2023-05-16T15:02:23Z) - A Robust Contrastive Alignment Method For Multi-Domain Text
Classification [21.35729884948437]
Multi-domain text classification can automatically classify texts in various scenarios.
Current advanced methods use the private-shared paradigm, capturing domain-shared features by a shared encoder, and training a private encoder for each domain to extract domain-specific features.
We propose a robust contrastive alignment method to align text classification features of various domains in the same feature space by supervised contrastive learning.
arXiv Detail & Related papers (2022-04-26T07:34:24Z) - Revise and Resubmit: An Intertextual Model of Text-based Collaboration
in Peer Review [52.359007622096684]
Peer review is a key component of the publishing process in most fields of science.
Existing NLP studies focus on the analysis of individual texts.
editorial assistance often requires modeling interactions between pairs of texts.
arXiv Detail & Related papers (2022-04-22T16:39:38Z) - A Simple Information-Based Approach to Unsupervised Domain-Adaptive
Aspect-Based Sentiment Analysis [58.124424775536326]
We propose a simple but effective technique based on mutual information to extract their term.
Experiment results show that our proposed method outperforms the state-of-the-art methods for cross-domain ABSA by 4.32% Micro-F1.
arXiv Detail & Related papers (2022-01-29T10:18:07Z) - Structured Latent Embeddings for Recognizing Unseen Classes in Unseen
Domains [108.11746235308046]
We propose a novel approach that learns domain-agnostic structured latent embeddings by projecting images from different domains.
Our experiments on the challenging DomainNet and DomainNet-LS benchmarks show the superiority of our approach over existing methods.
arXiv Detail & Related papers (2021-07-12T17:57:46Z) - Multifaceted Context Representation using Dual Attention for Ontology
Alignment [6.445605125467574]
Ontology alignment is an important research problem that finds application in various fields such as data integration, data transfer, data preparation etc.
We propose VeeAlign, a Deep Learning based model that uses a dual-attention mechanism to compute the contextualized representation of a concept in order to learn alignments.
We validate our approach on various datasets from different domains and in multilingual settings, and show its superior performance over SOTA methods.
arXiv Detail & Related papers (2020-10-16T18:28:38Z) - Text Recognition in Real Scenarios with a Few Labeled Samples [55.07859517380136]
Scene text recognition (STR) is still a hot research topic in computer vision field.
This paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation.
Our approach can maximize the character-level confusion between the source domain and the target domain.
arXiv Detail & Related papers (2020-06-22T13:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.