Improving AI-generated music with user-guided training
- URL: http://arxiv.org/abs/2506.04852v1
- Date: Thu, 05 Jun 2025 10:22:54 GMT
- Title: Improving AI-generated music with user-guided training
- Authors: Vishwa Mohan Singh, Sai Anirudh Aryasomayajula, Ahan Chatterjee, Beste Aydemir, Rifat Mehreen Amin,
- Abstract summary: Image-generation algorithms can be applied to generate novel music.<n>These algorithms are typically trained on fixed datasets.<n>We propose a human-computation approach to gradually improve the performance of these algorithms based on user interactions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI music generation has advanced rapidly, with models like diffusion and autoregressive algorithms enabling high-fidelity outputs. These tools can alter styles, mix instruments, or isolate them. Since sound can be visualized as spectrograms, image-generation algorithms can be applied to generate novel music. However, these algorithms are typically trained on fixed datasets, which makes it challenging for them to interpret and respond to user input accurately. This is especially problematic because music is highly subjective and requires a level of personalization that image generation does not provide. In this work, we propose a human-computation approach to gradually improve the performance of these algorithms based on user interactions. The human-computation element involves aggregating and selecting user ratings to use as the loss function for fine-tuning the model. We employ a genetic algorithm that incorporates user feedback to enhance the baseline performance of a model initially trained on a fixed dataset. The effectiveness of this approach is measured by the average increase in user ratings with each iteration. In the pilot test, the first iteration showed an average rating increase of 0.2 compared to the baseline. The second iteration further improved upon this, achieving an additional increase of 0.39 over the first iteration.
Related papers
- POET: Prompt Offset Tuning for Continual Human Action Adaptation [61.63831623094721]
We aim to provide users and developers with the capability to personalize their experience by adding new action classes to their device models continually.<n>We formalize this as privacy-aware few-shot continual action recognition.<n>We propose a novel-temporal learnable prompt tuning approach, and are the first to apply such prompt tuning to Graph Neural Networks.
arXiv Detail & Related papers (2025-04-25T04:11:24Z) - Outlier-Robust Training of Machine Learning Models [21.352210662488112]
We propose an Adaptive Alternation Algorithm for training machine learning models with outliers.<n>The algorithm iteratively trains the model by using a weighted version of the non-robust loss, while updating the weights at each.<n>Considering arbitrary outliers (i.e., with no distributional assumption on the outliers), we show that the use of robust loss kernels sigma increases the region of convergence.
arXiv Detail & Related papers (2024-12-31T04:19:53Z) - Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Detection-Driven Object Count Optimization for Text-to-Image Diffusion Models [54.641726517633025]
We propose a new framework that uses pre-trained object counting techniques and object detectors to guide generation.<n>First, we optimize a counting token using an outer-loop loss computed on fully generated images.<n>Second, we introduce a detection-driven scaling term that corrects errors caused by viewpoint and proportion shifts.
arXiv Detail & Related papers (2024-08-21T15:51:46Z) - Enhancing Cross-Dataset Performance of Distracted Driving Detection With Score Softmax Classifier And Dynamic Gaussian Smoothing Supervision [6.891556476231427]
Deep neural networks enable real-time monitoring of in-vehicle drivers, facilitating the timely prediction of distractions, fatigue, and potential hazards.<n>Recent research has exposed unreliable cross-dataset driver behavior recognition due to a limited number of data samples and background noise.<n>We propose a Score-Softmax classifier, which reduces the model overconfidence by enhancing category independence.
arXiv Detail & Related papers (2023-10-08T15:28:01Z) - Improving Dual-Encoder Training through Dynamic Indexes for Negative
Mining [61.09807522366773]
We introduce an algorithm that approximates the softmax with provable bounds and that dynamically maintains the tree.
In our study on datasets with over twenty million targets, our approach cuts error by half in relation to oracle brute-force negative mining.
arXiv Detail & Related papers (2023-03-27T15:18:32Z) - Image reconstruction algorithms in radio interferometry: from
handcrafted to learned denoisers [7.1439425093981574]
We introduce a new class of iterative image reconstruction algorithms for radio interferometry, inspired by plug-and-play methods.
The approach consists in learning a prior image model by training a deep neural network (DNN) as a denoiser.
We plug the learned denoiser into the forward-backward optimization algorithm, resulting in a simple iterative structure alternating a denoising step with a gradient-descent data-fidelity step.
arXiv Detail & Related papers (2022-02-25T20:26:33Z) - Accurate, Interpretable, and Fast Animation: AnIterative, Sparse, and
Nonconvex Approach [0.9176056742068814]
A face rig must be accurate and, at the same time, compute fast to solve the problem.
One of the parameters at each common animation model is a sparsity regularization.
In order to reduce the complexity, a paradigm Majorization Mini (MM) is applied.
arXiv Detail & Related papers (2021-09-17T05:42:07Z) - TAdam: A Robust Stochastic Gradient Optimizer [6.973803123972298]
Machine learning algorithms aim to find patterns from observations, which may include some noise, especially in robotics domain.
To perform well even with such noise, we expect them to be able to detect outliers and discard them when needed.
We propose a new gradient optimization method, whose robustness is directly built in the algorithm, using the robust student-t distribution as its core idea.
arXiv Detail & Related papers (2020-02-29T04:32:36Z) - Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad
Samples [67.11669996924671]
We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm.
When updating the generator parameters, we zero out the gradient contributions from the elements of the batch that the critic scores as least realistic'
We show that this top-k update' procedure is a generally applicable improvement.
arXiv Detail & Related papers (2020-02-14T19:27:50Z) - RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement
Learning [69.20460466735852]
This paper presents a deep reinforcement learning algorithm for online accompaniment generation.
The proposed algorithm is able to respond to the human part and generate a melodic, harmonic and diverse machine part.
arXiv Detail & Related papers (2020-02-08T03:53:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.