Memory in Plain Sight: Surveying the Uncanny Resemblances of Associative Memories and Diffusion Models
- URL: http://arxiv.org/abs/2309.16750v2
- Date: Tue, 28 May 2024 11:46:33 GMT
- Title: Memory in Plain Sight: Surveying the Uncanny Resemblances of Associative Memories and Diffusion Models
- Authors: Benjamin Hoover, Hendrik Strobelt, Dmitry Krotov, Judy Hoffman, Zsolt Kira, Duen Horng Chau,
- Abstract summary: generative process of Diffusion Models (DMs) has recently set state-of-the-art on many AI generation benchmarks.
We introduce a novel perspective to describe DMs using the mathematical language of memory retrieval from the field of energy-based Associative Memories (AMs)
We present a growing body of evidence that records DMs exhibiting empirical behavior we would expect from AMs, and conclude by discussing research opportunities that are revealed by understanding DMs as a form of energy-based memory.
- Score: 65.08133391009838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The generative process of Diffusion Models (DMs) has recently set state-of-the-art on many AI generation benchmarks. Though the generative process is traditionally understood as an "iterative denoiser", there is no universally accepted language to describe it. We introduce a novel perspective to describe DMs using the mathematical language of memory retrieval from the field of energy-based Associative Memories (AMs), making efforts to keep our presentation approachable to newcomers to both of these fields. Unifying these two fields provides insight that DMs can be seen as a particular kind of AM where Lyapunov stability guarantees are bypassed by intelligently engineering the dynamics (i.e., the noise and step size schedules) of the denoising process. Finally, we present a growing body of evidence that records DMs exhibiting empirical behavior we would expect from AMs, and conclude by discussing research opportunities that are revealed by understanding DMs as a form of energy-based memory.
Related papers
- Dynamic Traceback Learning for Medical Report Generation [12.746275623663289]
This study proposes a novel multi-modal dynamic traceback learning framework (DTrace) for medical report generation.
We introduce a traceback mechanism to supervise the semantic validity of generated content and a dynamic learning strategy to adapt to various proportions of image and text input.
The proposed DTrace framework outperforms state-of-the-art methods for medical report generation.
arXiv Detail & Related papers (2024-01-24T07:13:06Z) - I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal
Mutual Distillation [147.2183428328396]
We introduce a general Inter- and Intra-modal Mutual Distillation (I$2$MD) framework.
In I$2$MD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process.
To alleviate the interference of similar samples and exploit their underlying contexts, we further design the Intra-modal Mutual Distillation (IMD) strategy.
arXiv Detail & Related papers (2023-10-24T07:22:17Z) - Diffusion Model as Representation Learner [86.09969334071478]
Diffusion Probabilistic Models (DPMs) have recently demonstrated impressive results on various generative tasks.
We propose a novel knowledge transfer method that leverages the knowledge acquired by DPMs for recognition tasks.
arXiv Detail & Related papers (2023-08-21T00:38:39Z) - Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z) - A Comprehensive Survey on Knowledge Distillation of Diffusion Models [0.0]
Diffusion Models (DMs) utilize neural networks to specify score functions.
Our tutorial is intended for individuals with a basic understanding of generative models who wish to apply DM's distillation or embark on a research project in this field.
arXiv Detail & Related papers (2023-04-09T15:49:28Z) - A Biologically-Inspired Dual Stream World Model [0.456877715768796]
The medial temporal lobe (MTL) is hypothesized to be an experience-construction system in mammals.
We propose a novel variant, the Dual Stream World Model (DSWM), which learns from high-dimensional observations and dissociates them into context and content streams.
We show that this representation is useful as a reinforcement learning basis function, and that the generative model can be used to aid the policy learning process using Dyna-like updates.
arXiv Detail & Related papers (2022-09-16T16:27:48Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition [7.147235324895931]
This paper proposes a novel hierarchical graph network (HFGCN) model that learns more informative multimodal representations.
Specifically, the proposed model fuses multimodality inputs using a two-stage graph construction approach and encodes the modality dependencies into the conversation representation.
Experiments showed the effectiveness of our proposed model for more accurate AER, which yielded state-of-the-art results on two public datasets.
arXiv Detail & Related papers (2021-09-15T08:21:01Z) - Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person
Re-Identification [208.1227090864602]
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem.
Existing VI-ReID methods tend to learn global representations, which have limited discriminability and weak robustness to noisy images.
We propose a novel dynamic dual-attentive aggregation (DDAG) learning method by mining both intra-modality part-level and cross-modality graph-level contextual cues for VI-ReID.
arXiv Detail & Related papers (2020-07-18T03:08:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.