Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label
Text Classification
- URL: http://arxiv.org/abs/2110.00685v1
- Date: Fri, 1 Oct 2021 23:43:29 GMT
- Title: Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label
Text Classification
- Authors: Jiong Zhang, Wei-cheng Chang, Hsiang-fu Yu, Inderjit S. Dhillon
- Abstract summary: Extreme multi-label text classification (XMC) seeks to find relevant labels from an extreme large label collection for a given text input.
transformer based XMC methods, such as X-Transformer and LightXML, have shown significant improvement over other XMC methods.
- Score: 54.26205045417422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extreme multi-label text classification (XMC) seeks to find relevant labels
from an extreme large label collection for a given text input. Many real-world
applications can be formulated as XMC problems, such as recommendation systems,
document tagging and semantic search. Recently, transformer based XMC methods,
such as X-Transformer and LightXML, have shown significant improvement over
other XMC methods. Despite leveraging pre-trained transformer models for text
representation, the fine-tuning procedure of transformer models on large label
space still has lengthy computational time even with powerful GPUs. In this
paper, we propose a novel recursive approach, XR-Transformer to accelerate the
procedure through recursively fine-tuning transformer models on a series of
multi-resolution objectives related to the original XMC objective function.
Empirical results show that XR-Transformer takes significantly less training
time compared to other transformer-based XMC models while yielding better
state-of-the-art results. In particular, on the public Amazon-3M dataset with 3
million labels, XR-Transformer is not only 20x faster than X-Transformer but
also improves the Precision@1 from 51% to 54%.
Related papers
- TransVG++: End-to-End Visual Grounding with Language Conditioned Vision
Transformer [188.00681648113223]
We explore neat yet effective Transformer-based frameworks for visual grounding.
TransVG establishes multi-modal correspondences by Transformers and localizes referred regions by directly regressing box coordinates.
We upgrade our framework to a purely Transformer-based one by leveraging Vision Transformer (ViT) for vision feature encoding.
arXiv Detail & Related papers (2022-06-14T06:27:38Z) - Extreme Zero-Shot Learning for Extreme Text Classification [80.95271050744624]
Extreme Zero-Shot XMC (EZ-XMC) and Few-Shot XMC (FS-XMC) are investigated.
We propose to pre-train Transformer-based encoders with self-supervised contrastive losses.
We develop a pre-training method MACLR, which thoroughly leverages the raw text with techniques including Multi-scale Adaptive Clustering, Label Regularization, and self-training with pseudo positive pairs.
arXiv Detail & Related papers (2021-12-16T06:06:42Z) - Sparse is Enough in Scaling Transformers [12.561317511514469]
Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that their use and study becomes out of reach.
We propose Scaling Transformers, a family of next generation Transformer models that use sparse layers to scale efficiently and perform unbatched decoding much faster than the standard Transformer.
arXiv Detail & Related papers (2021-11-24T19:53:46Z) - XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z) - FNet: Mixing Tokens with Fourier Transforms [0.578717214982749]
We show that Transformer encoder architectures can be massively sped up with limited accuracy costs.
We replace the self-attention sublayers with simple linear transformations that "mix" input tokens.
The resulting model, which we name FNet, scales very efficiently to long inputs.
arXiv Detail & Related papers (2021-05-09T03:32:48Z) - Long Range Arena: A Benchmark for Efficient Transformers [115.1654897514089]
Long-rangearena benchmark is a suite of tasks consisting of sequences ranging from $1K$ to $16K$ tokens.
We systematically evaluate ten well-established long-range Transformer models on our newly proposed benchmark suite.
arXiv Detail & Related papers (2020-11-08T15:53:56Z) - Segatron: Segment-Aware Transformer for Language Modeling and
Understanding [79.84562707201323]
We propose a segment-aware Transformer (Segatron) to generate better contextual representations from sequential tokens.
We first introduce the segment-aware mechanism to Transformer-XL, which is a popular Transformer-based language model.
We find that our method can further improve the Transformer-XL base model and large model, achieving 17.1 perplexity on the WikiText-103 dataset.
arXiv Detail & Related papers (2020-04-30T17:38:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.