MLEM: Generative and Contrastive Learning as Distinct Modalities for Event Sequences
- URL: http://arxiv.org/abs/2401.15935v4
- Date: Wed, 3 Jul 2024 09:28:50 GMT
- Title: MLEM: Generative and Contrastive Learning as Distinct Modalities for Event Sequences
- Authors: Viktor Moskvoretskii, Dmitry Osin, Egor Shvetsov, Igor Udovichenko, Maxim Zhelnin, Andrey Dukhovny, Anna Zhimerikina, Evgeny Burnaev,
- Abstract summary: This study explores the application of self-supervised learning techniques for event sequences.
It is a key modality in various applications such as banking, e-commerce, and healthcare.
We develop a novel method called the Multimodal-Learning Event Model (MLEM)
- Score: 14.885714704999799
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study explores the application of self-supervised learning techniques for event sequences. It is a key modality in various applications such as banking, e-commerce, and healthcare. However, there is limited research on self-supervised learning for event sequences, and methods from other domains like images, texts, and speech may not easily transfer. To determine the most suitable approach, we conduct a detailed comparative analysis of previously identified best-performing methods. We find that neither the contrastive nor generative method is superior. Our assessment includes classifying event sequences, predicting the next event, and evaluating embedding quality. These results further highlight the potential benefits of combining both methods. Given the lack of research on hybrid models in this domain, we initially adapt the baseline model from another domain. However, upon observing its underperformance, we develop a novel method called the Multimodal-Learning Event Model (MLEM). MLEM treats contrastive learning and generative modeling as distinct yet complementary modalities, aligning their embeddings. The results of our study demonstrate that combining contrastive and generative approaches into one procedure with MLEM achieves superior performance across multiple metrics.
Related papers
- Learning Generation Orders for Masked Discrete Diffusion Models via Variational Inference [19.909302863724758]
Masked discrete diffusion models (MDMs) are a promising new approach to generative modelling.<n>We propose a variational inference framework for learning parallel generation orders for MDMs.<n>Our method achieves 33.1% accuracy with an average of only 4 generation steps, compared to 23.7-29.0% accuracy achieved by standard competitor methods in the same number of steps.
arXiv Detail & Related papers (2026-02-27T12:26:19Z) - Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning [7.412307614007383]
Multimodal learning models are designed to bridge different modalities, such as images and text, by learning a shared representation space.
These models often exhibit a modality gap, where different modalities occupy distinct regions within the shared representation space.
We identify the critical roles of mismatched data pairs and a learnable temperature parameter in causing and perpetuating the modality gap during training.
arXiv Detail & Related papers (2024-12-10T20:36:49Z) - Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework [58.362064122489166]
This paper introduces the Cross-modal Few-Shot Learning task, which aims to recognize instances from multiple modalities when only a few labeled examples are available.
We propose a Generative Transfer Learning framework consisting of two stages: the first involves training on abundant unimodal data, and the second focuses on transfer learning to adapt to novel data.
Our finds demonstrate that GTL has superior performance compared to state-of-the-art methods across four distinct multi-modal datasets.
arXiv Detail & Related papers (2024-10-14T16:09:38Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Active Learning Principles for In-Context Learning with Large Language
Models [65.09970281795769]
This paper investigates how Active Learning algorithms can serve as effective demonstration selection methods for in-context learning.
We show that in-context example selection through AL prioritizes high-quality examples that exhibit low uncertainty and bear similarity to the test examples.
arXiv Detail & Related papers (2023-05-23T17:16:04Z) - Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs)
Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages.
The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z) - Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
Information [77.80071279597665]
We propose an all-in-one single-stage pre-training approach, named Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training)
Our approach achieves better performance than previous pre-training methods on various vision benchmarks, including ImageNet classification, object detection, LVIS long-tailed object detection, and ADE20k semantic segmentation.
arXiv Detail & Related papers (2022-11-17T18:59:49Z) - Offline Reinforcement Learning via High-Fidelity Generative Behavior
Modeling [34.88897402357158]
We show that due to the limited distributional expressivity of policy models, previous methods might still select unseen actions during training.
We adopt a generative approach by decoupling the learned policy into two parts: an expressive generative behavior model and an action evaluation model.
Our proposed method achieves competitive or superior performance compared with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2022-09-29T04:36:23Z) - Model-Agnostic Few-Shot Open-Set Recognition [36.97433312193586]
We tackle the Few-Shot Open-Set Recognition (FSOSR) problem.
We focus on developing model-agnostic inference methods that can be plugged into any existing model.
We introduce an Open Set Transductive Information Maximization method OSTIM.
arXiv Detail & Related papers (2022-06-18T16:27:59Z) - Towards Domain-Agnostic Contrastive Learning [103.40783553846751]
We propose a novel domain-agnostic approach to contrastive learning, named DACL.
Key to our approach is the use of Mixup noise to create similar and dissimilar examples by mixing data samples differently either at the input or hidden-state levels.
Our results show that DACL not only outperforms other domain-agnostic noising methods, such as Gaussian-noise, but also combines well with domain-specific methods, such as SimCLR.
arXiv Detail & Related papers (2020-11-09T13:41:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.