An Exploration of Encoder-Decoder Approaches to Multi-Label
Classification for Legal and Biomedical Text
- URL: http://arxiv.org/abs/2305.05627v1
- Date: Tue, 9 May 2023 17:13:53 GMT
- Title: An Exploration of Encoder-Decoder Approaches to Multi-Label
Classification for Legal and Biomedical Text
- Authors: Yova Kementchedjhieva and Ilias Chalkidis
- Abstract summary: We compare four methods for multi-label classification, two based on an encoder only, and two based on an encoder-decoder.
Our results show that encoder-decoder methods outperform encoder-only methods, with a growing advantage on more complex datasets.
- Score: 20.100081284294973
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Standard methods for multi-label text classification largely rely on
encoder-only pre-trained language models, whereas encoder-decoder models have
proven more effective in other classification tasks. In this study, we compare
four methods for multi-label classification, two based on an encoder only, and
two based on an encoder-decoder. We carry out experiments on four datasets -two
in the legal domain and two in the biomedical domain, each with two levels of
label granularity- and always depart from the same pre-trained model, T5. Our
results show that encoder-decoder methods outperform encoder-only methods, with
a growing advantage on more complex datasets and labeling schemes of finer
granularity. Using encoder-decoder models in a non-autoregressive fashion, in
particular, yields the best performance overall, so we further study this
approach through ablations to better understand its strengths.
Related papers
- How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval? [99.87554379608224]
Cross-modal similarity score distribution of cross-encoder is more concentrated while the result of dual-encoder is nearly normal.
Only the relative order between hard negatives conveys valid knowledge while the order information between easy negatives has little significance.
We propose a novel Contrastive Partial Ranking Distillation (DCPR) method which implements the objective of mimicking relative order between hard negative samples with contrastive learning.
arXiv Detail & Related papers (2024-07-10T09:10:01Z) - Triple-View Knowledge Distillation for Semi-Supervised Semantic
Segmentation [54.23510028456082]
We propose a Triple-view Knowledge Distillation framework, termed TriKD, for semi-supervised semantic segmentation.
The framework includes the triple-view encoder and the dual-frequency decoder.
arXiv Detail & Related papers (2023-09-22T01:02:21Z) - String-based Molecule Generation via Multi-decoder VAE [56.465033997245776]
We investigate the problem of string-based molecular generation via variational autoencoders (VAEs)
We propose a simple, yet effective idea to improve the performance of VAE for the task.
In our experiments, the proposed VAE model particularly performs well for generating a sample from out-of-domain distribution.
arXiv Detail & Related papers (2022-08-23T03:56:30Z) - ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking
Inference [70.36083572306839]
This paper proposes a new training and inference paradigm for re-ranking.
We finetune a pretrained encoder-decoder model using in the form of document to query generation.
We show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference.
arXiv Detail & Related papers (2022-04-25T06:26:29Z) - Does Configuration Encoding Matter in Learning Software Performance? An
Empirical Study on Encoding Schemes [5.781900408390438]
The study covers five systems, seven models, and three encoding schemes, leading to 105 cases of investigation.
We empirically compared the widely used encoding schemes for software performance learning, namely label, scaled label, and one-hot encoding.
Our key findings reveal that: (1) conducting trial-and-error to find the best encoding scheme in a case by case manner can be rather expensive, requiring up to 400+ hours on some models and systems; (2) the one-hot encoding often leads to the most accurate results while the scaled label encoding is generally weak on accuracy over different models; (3) conversely, the scaled label encoding tends to
arXiv Detail & Related papers (2022-03-30T01:46:27Z) - LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text
Retrieval [117.15862403330121]
We propose LoopITR, which combines dual encoders and cross encoders in the same network for joint learning.
Specifically, we let the dual encoder provide hard negatives to the cross encoder, and use the more discriminative cross encoder to distill its predictions back to the dual encoder.
arXiv Detail & Related papers (2022-03-10T16:41:12Z) - Trans-Encoder: Unsupervised sentence-pair modelling through self- and
mutual-distillations [22.40667024030858]
Bi-encoders produce fixed-dimensional sentence representations and are computationally efficient.
Cross-encoders can leverage their attention heads to exploit inter-sentence interactions for better performance.
Trans-Encoder combines the two learning paradigms into an iterative joint framework to simultaneously learn enhanced bi- and cross-encoders.
arXiv Detail & Related papers (2021-09-27T14:06:47Z) - Rethinking and Improving Natural Language Generation with Layer-Wise
Multi-View Decoding [59.48857453699463]
In sequence-to-sequence learning, the decoder relies on the attention mechanism to efficiently extract information from the encoder.
Recent work has proposed to use representations from different encoder layers for diversified levels of information.
We propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
arXiv Detail & Related papers (2020-05-16T20:00:39Z) - Improved Multi-Stage Training of Online Attention-based Encoder-Decoder
Models [20.81248613653279]
We propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder models.
A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed.
Our models achieve a word error rate (WER) of 5.04% and 4.48% on the Librispeech test-clean data for the smaller and bigger models respectively.
arXiv Detail & Related papers (2019-12-28T02:29:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.