Learning Interpretable Low-dimensional Representation via Physical
Symmetry
- URL: http://arxiv.org/abs/2302.10890v4
- Date: Fri, 9 Feb 2024 06:02:19 GMT
- Title: Learning Interpretable Low-dimensional Representation via Physical
Symmetry
- Authors: Xuanjie Liu, Daniel Chin, Yichen Huang, Gus Xia
- Abstract summary: We take inspiration from modern physics and use physical symmetry as a self consistency constraint for the latent space of time-series data.
We show that physical symmetry leads the model to learn a linear pitch factor from unlabelled monophonic music audio in a self-supervised fashion.
The same methodology can be applied to computer vision, learning a 3D Cartesian space from videos of a simple moving object without labels.
- Score: 8.606028974758479
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We have recently seen great progress in learning interpretable music
representations, ranging from basic factors, such as pitch and timbre, to
high-level concepts, such as chord and texture. However, most methods rely
heavily on music domain knowledge. It remains an open question what general
computational principles give rise to interpretable representations, especially
low-dim factors that agree with human perception. In this study, we take
inspiration from modern physics and use physical symmetry as a self consistency
constraint for the latent space of time-series data. Specifically, it requires
the prior model that characterises the dynamics of the latent states to be
equivariant with respect to certain group transformations. We show that
physical symmetry leads the model to learn a linear pitch factor from
unlabelled monophonic music audio in a self-supervised fashion. In addition,
the same methodology can be applied to computer vision, learning a 3D Cartesian
space from videos of a simple moving object without labels. Furthermore,
physical symmetry naturally leads to counterfactual representation
augmentation, a new technique which improves sample efficiency.
Related papers
- Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery [98.58830663687911]
VIPERR-aq1 is a multimodal model that performs Visual Induction for Equation Reasoning.<n>It integrates visual perception, trajectory data, and symbolic reasoning to emulate the scientific discovery process.<n>It consistently outperforms state-of-the-art VLM baselines in accuracy and interpretability.
arXiv Detail & Related papers (2025-08-24T14:34:21Z) - Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z) - Towards fully covariant machine learning [0.0]
In machine learning, the most visible passive symmetry is the relabeling or permutation symmetry of graphs.
We discuss dos and don'ts for machine learning practice if passive symmetries are to be respected.
arXiv Detail & Related papers (2023-01-31T16:01:12Z) - Learning Physical Dynamics with Subequivariant Graph Neural Networks [99.41677381754678]
Graph Neural Networks (GNNs) have become a prevailing tool for learning physical dynamics.
Physical laws abide by symmetry, which is a vital inductive bias accounting for model generalization.
Our model achieves on average over 3% enhancement in contact prediction accuracy across 8 scenarios on Physion and 2X lower rollout MSE on RigidFall.
arXiv Detail & Related papers (2022-10-13T10:00:30Z) - Learning Motion-Dependent Appearance for High-Fidelity Rendering of
Dynamic Humans from a Single Camera [49.357174195542854]
A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations.
We show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.
arXiv Detail & Related papers (2022-03-24T00:22:03Z) - Contrastive Learning with Positive-Negative Frame Mask for Music
Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z) - A Scaling Law for Synthetic-to-Real Transfer: A Measure of Pre-Training [52.93808218720784]
Synthetic-to-real transfer learning is a framework in which we pre-train models with synthetically generated images and ground-truth annotations for real tasks.
Although synthetic images overcome the data scarcity issue, it remains unclear how the fine-tuning performance scales with pre-trained models.
We observe a simple and general scaling law that consistently describes learning curves in various tasks, models, and complexities of synthesized pre-training data.
arXiv Detail & Related papers (2021-08-25T02:29:28Z) - Tracing Back Music Emotion Predictions to Sound Sources and Intuitive
Perceptual Qualities [6.832341432995627]
Music emotion recognition is an important task in MIR (Music Information Retrieval) research.
One important step towards better models would be to understand what a model is actually learning from the data.
We show how to derive explanations of model predictions in terms of spectrogram image segments that connect to the high-level emotion prediction.
arXiv Detail & Related papers (2021-06-14T22:49:19Z) - Learning to dance: A graph convolutional adversarial network to generate
realistic dance motions from audio [7.612064511889756]
Learning to move naturally from music, i.e., to dance, is one of the more complex motions humans often perform effortlessly.
In this paper, we design a novel method based on graph convolutional networks to tackle the problem of automatic dance generation from audio information.
Our method uses an adversarial learning scheme conditioned on the input music audios to create natural motions preserving the key movements of different music styles.
arXiv Detail & Related papers (2020-11-25T19:53:53Z) - NiLBS: Neural Inverse Linear Blend Skinning [59.22647012489496]
We introduce a method to invert the deformations undergone via traditional skinning techniques via a neural network parameterized by pose.
The ability to invert these deformations allows values (e.g., distance function, signed distance function, occupancy) to be pre-computed at rest pose, and then efficiently queried when the character is deformed.
arXiv Detail & Related papers (2020-04-06T20:46:37Z) - Learning Style-Aware Symbolic Music Representations by Adversarial
Autoencoders [9.923470453197657]
We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information.
We introduce the first Music Adversarial Autoencoder (MusAE)
Our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders.
arXiv Detail & Related papers (2020-01-15T18:07:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.