UniTRec: A Unified Text-to-Text Transformer and Joint Contrastive
Learning Framework for Text-based Recommendation
- URL: http://arxiv.org/abs/2305.15756v1
- Date: Thu, 25 May 2023 06:11:31 GMT
- Title: UniTRec: A Unified Text-to-Text Transformer and Joint Contrastive
Learning Framework for Text-based Recommendation
- Authors: Zhiming Mao, Huimin Wang, Yiming Du and Kam-fai Wong
- Abstract summary: Prior study has shown that pretrained language models (PLM) can boost the performance of text-based recommendation.
We propose a unified local- and global-attention Transformer encoder to better model two-level contexts of user history.
Our framework, UniTRec, unifies the contrastive objectives of discriminative matching scores and candidate text perplexity to jointly enhance text-based recommendation.
- Score: 17.88375225459453
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prior study has shown that pretrained language models (PLM) can boost the
performance of text-based recommendation. In contrast to previous works that
either use PLM to encode user history as a whole input text, or impose an
additional aggregation network to fuse multi-turn history representations, we
propose a unified local- and global-attention Transformer encoder to better
model two-level contexts of user history. Moreover, conditioned on user history
encoded by Transformer encoders, our framework leverages Transformer decoders
to estimate the language perplexity of candidate text items, which can serve as
a straightforward yet significant contrastive signal for user-item text
matching. Based on this, our framework, UniTRec, unifies the contrastive
objectives of discriminative matching scores and candidate text perplexity to
jointly enhance text-based recommendation. Extensive evaluation shows that
UniTRec delivers SOTA performance on three text-based recommendation tasks.
Code is available at https://github.com/Veason-silverbullet/UniTRec.
Related papers
- ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy
in Transformer [88.61312640540902]
We introduce Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter)
Our model achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder.
Experimental results demonstrate that our model significantly outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2023-08-20T03:22:23Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Code-Switching Text Generation and Injection in Mandarin-English ASR [57.57570417273262]
We investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transducer (T-T)
We first propose a strategy to generate code-switching text data and then investigate injecting generated text into T-T model explicitly by Text-To-Speech (TTS) conversion or implicitly by tying speech and text latent spaces.
Experimental results on the T-T model trained with a dataset containing 1,800 hours of real Mandarin-English code-switched speech show that our approaches to inject generated code-switching text significantly boost the performance of T-T models.
arXiv Detail & Related papers (2023-03-20T09:13:27Z) - Learning Vector-Quantized Item Representation for Transferable
Sequential Recommenders [33.406897794088515]
VQ-Rec is a novel approach to learning Vector-Quantized item representations for transferable sequential Recommender.
We propose an enhanced contrastive pre-training approach, using semi-synthetic and mixed-domain code representations as hard negatives.
arXiv Detail & Related papers (2022-10-22T00:43:14Z) - JOIST: A Joint Speech and Text Streaming Model For ASR [63.15848310748753]
We present JOIST, an algorithm to train a streaming, cascaded, encoder end-to-end (E2E) model with both speech-text paired inputs, and text-only unpaired inputs.
We find that best text representation for JOIST improves WER across a variety of search and rare-word test sets by 4-14% relative, compared to a model not trained with text.
arXiv Detail & Related papers (2022-10-13T20:59:22Z) - M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation [66.92823764664206]
We propose M-Adapter, a novel Transformer-based module, to adapt speech representations to text.
While shrinking the speech sequence, M-Adapter produces features desired for speech-to-text translation.
Our experimental results show that our model outperforms a strong baseline by up to 1 BLEU.
arXiv Detail & Related papers (2022-07-03T04:26:53Z) - Text Compression-aided Transformer Encoding [77.16960983003271]
We propose explicit and implicit text compression approaches to enhance the Transformer encoding.
backbone information, meaning the gist of the input text, is not specifically focused on.
Our evaluation on benchmark datasets shows that the proposed explicit and implicit text compression approaches improve results in comparison to strong baselines.
arXiv Detail & Related papers (2021-02-11T11:28:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.