On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for
Multimodal Sentiment Analysis
- URL: http://arxiv.org/abs/2210.15937v1
- Date: Fri, 28 Oct 2022 06:48:35 GMT
- Title: On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for
Multimodal Sentiment Analysis
- Authors: Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki
Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato
- Abstract summary: Methods with domain-specific pre-trained encoders attain better performance than those with conventional features in both unimodal and multimodal scenarios.
We also find it better to use the outputs of the intermediate layers of the encoders than those of the output layer.
- Score: 27.497457891521538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the effectiveness and implementation of
modality-specific large-scale pre-trained encoders for multimodal sentiment
analysis~(MSA). Although the effectiveness of pre-trained encoders in various
fields has been reported, conventional MSA methods employ them for only
linguistic modality, and their application has not been investigated. This
paper compares the features yielded by large-scale pre-trained encoders with
conventional heuristic features. One each of the largest pre-trained encoders
publicly available for each modality are used; CLIP-ViT, WavLM, and BERT for
visual, acoustic, and linguistic modalities, respectively. Experiments on two
datasets reveal that methods with domain-specific pre-trained encoders attain
better performance than those with conventional features in both unimodal and
multimodal scenarios. We also find it better to use the outputs of the
intermediate layers of the encoders than those of the output layer. The codes
are available at https://github.com/ando-hub/MSA_Pretrain.
Related papers
- Advancing Multi-talker ASR Performance with Large Language Models [48.52252970956368]
Recognizing overlapping speech from multiple speakers in conversational scenarios is one of the most challenging problem for automatic speech recognition (ASR)
In this paper, we propose an LLM-based SOT approach for multi-talker ASR, leveraging pre-trained speech encoder and LLM.
Our approach surpasses traditional AED-based methods on the simulated dataset LibriMix and achieves state-of-the-art performance on the evaluation set of the real-world dataset AMI.
arXiv Detail & Related papers (2024-08-30T17:29:25Z) - Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction [57.16121098944589]
RDA is a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders.
It is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives.
For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs.
arXiv Detail & Related papers (2023-12-01T15:03:29Z) - An Exploration of Encoder-Decoder Approaches to Multi-Label
Classification for Legal and Biomedical Text [20.100081284294973]
We compare four methods for multi-label classification, two based on an encoder only, and two based on an encoder-decoder.
Our results show that encoder-decoder methods outperform encoder-only methods, with a growing advantage on more complex datasets.
arXiv Detail & Related papers (2023-05-09T17:13:53Z) - Towards Efficient Fine-tuning of Pre-trained Code Models: An
Experimental Study and Beyond [52.656743602538825]
Fine-tuning pre-trained code models incurs a large computational cost.
We conduct an experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning.
We propose Telly to efficiently fine-tune pre-trained code models via layer freezing.
arXiv Detail & Related papers (2023-04-11T13:34:13Z) - MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers [140.0479479231558]
In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER.
MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
arXiv Detail & Related papers (2022-12-15T13:57:07Z) - String-based Molecule Generation via Multi-decoder VAE [56.465033997245776]
We investigate the problem of string-based molecular generation via variational autoencoders (VAEs)
We propose a simple, yet effective idea to improve the performance of VAE for the task.
In our experiments, the proposed VAE model particularly performs well for generating a sample from out-of-domain distribution.
arXiv Detail & Related papers (2022-08-23T03:56:30Z) - A Comparative Study of Pre-trained Encoders for Low-Resource Named
Entity Recognition [10.0731894715001]
We introduce an encoder evaluation framework, and use it to compare the performance of state-of-the-art pre-trained representations on the task of low-resource NER.
We analyze a wide range of encoders pre-trained with different strategies, model architectures, intermediate-task fine-tuning, and contrastive learning.
arXiv Detail & Related papers (2022-04-11T09:48:26Z) - Variational Autoencoders for Studying the Manifold of Precoding Matrices
with High Spectral Efficiency [47.187609203210705]
We look at how to use a variational autoencoder to find a precoding matrix with a high Spectral Efficiency (SE)
Our objective is to create a less time-consuming algorithm with minimum quality degradation.
arXiv Detail & Related papers (2021-11-23T11:45:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.