Related papers: Unlearning Works Better Than You Think: Local Reinforcement-Based Selection of Auxiliary Objectives

Unlearning Works Better Than You Think: Local Reinforcement-Based Selection of Auxiliary Objectives

URL: http://arxiv.org/abs/2504.14418v1
Date: Sat, 19 Apr 2025 23:00:24 GMT
Title: Unlearning Works Better Than You Think: Local Reinforcement-Based Selection of Auxiliary Objectives
Authors: Abderrahim Bendahi, Adrien Fradin, Matthieu Lerasle,
Abstract summary: Local Reinforcement-Based Selection of Auxiliary Objectives (LRSAO) is a novel approach that selects auxiliary objectives using reinforcement learning (RL)<n>We analyze and evaluate LRSAO on the black-box complexity version of the non-monotonic Jump function.<n>Our approach improves over this result to achieve a complexity of $Theta(n2 / ell2 + n log(n))$ resulting in a significant improvement.
Score: 1.1743167854433303
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Local Reinforcement-Based Selection of Auxiliary Objectives (LRSAO), a novel approach that selects auxiliary objectives using reinforcement learning (RL) to support the optimization process of an evolutionary algorithm (EA) as in EA+RL framework and furthermore incorporates the ability to unlearn previously used objectives. By modifying the reward mechanism to penalize moves that do no increase the fitness value and relying on the local auxiliary objectives, LRSAO dynamically adapts its selection strategy to optimize performance according to the landscape and unlearn previous objectives when necessary. We analyze and evaluate LRSAO on the black-box complexity version of the non-monotonic Jump function, with gap parameter $\ell$, where each auxiliary objective is beneficial at specific stages of optimization. The Jump function is hard to optimize for evolutionary-based algorithms and the best-known complexity for reinforcement-based selection on Jump was $O(n^2 \log(n) / \ell)$. Our approach improves over this result to achieve a complexity of $\Theta(n^2 / \ell^2 + n \log(n))$ resulting in a significant improvement, which demonstrates the efficiency and adaptability of LRSAO, highlighting its potential to outperform traditional methods in complex optimization scenarios.

Related papers

Recursive Reward Aggregation [51.552609126905885]
We propose an alternative approach for flexible behavior alignment that eliminates the need to modify the reward function.<n>By introducing an algebraic perspective on Markov decision processes (MDPs), we show that the Bellman equations naturally emerge from the generation and aggregation of rewards.<n>Our approach applies to both deterministic and deterministic settings and seamlessly integrates with value-based and actor-critic algorithms.
arXiv Detail & Related papers (2025-07-11T12:37:20Z)
Make Optimization Once and for All with Fine-grained Guidance [78.14885351827232]
Learning to Optimize (L2O) enhances optimization efficiency with integrated neural networks.<n>L2O paradigms achieve great outcomes, e.g., refitting, generating unseen solutions iteratively or directly.<n>Our analyses explore general framework for learning optimization, called Diff-L2O, focusing on augmenting solutions from a wider view.
arXiv Detail & Related papers (2025-03-14T14:48:12Z)
BOPO: Neural Combinatorial Optimization via Best-anchored and Objective-guided Preference Optimization [17.694852175354555]
Preference Optimization for Combinatorial Optimization (POCO) is a training paradigm that leverages solution preferences via objective values.<n>POCO is architecture-agnostic, enabling integration with existing NCO models, and establishes preference optimization as a principled framework for optimization.
arXiv Detail & Related papers (2025-03-10T17:45:30Z)
Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins. We employ inverse RL (IRL) to automatically learn reward functions without manual tuning. We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z)
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness [27.43137305486112]
We propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss. The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods to achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-09-26T12:37:26Z)
Optimizing Variational Quantum Circuits Using Metaheuristic Strategies in Reinforcement Learning [2.7504809152812695]
This work explores the integration of metaheuristic algorithms -- Particle Swarm Optimization, Ant Colony Optimization, Tabu Search, Genetic Algorithm, Simulated Annealing, and Harmony Search -- into Quantum Reinforcement Learning. Evaluations in $5times5$ MiniGrid Reinforcement Learning environments show that, all algorithms yield near-optimal results.
arXiv Detail & Related papers (2024-08-02T11:14:41Z)
Towards Explainable Evolution Strategies with Large Language Models [0.0]
This paper introduces an approach that integrates self-adaptive Evolution Strategies (ES) with Large Language Models (LLMs) By employing a self-adaptive ES equipped with a restart mechanism, we effectively navigate the challenging landscapes of benchmark functions. An LLM is then utilized to process these logs, generating concise, user-friendly summaries.
arXiv Detail & Related papers (2024-07-11T09:28:27Z)
Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention. Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z)
LLM as a Complementary Optimizer to Gradient Descent: A Case Study in Prompt Tuning [69.95292905263393]
We show that gradient-based and high-level LLMs can effectively collaborate a combined optimization framework. In this paper, we show that these complementary to each other and can effectively collaborate a combined optimization framework.
arXiv Detail & Related papers (2024-05-30T06:24:14Z)
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts.<n>We identify two pivotal factors in model parameter learning: update direction and update method.<n>We develop a capable Gradient-inspired Prompt-based GPO.
arXiv Detail & Related papers (2024-02-27T15:05:32Z)
Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization [42.92248233465095]
We propose a simple but effective method, called symmetric replay training (SRT), which can be easily integrated into various Deep reinforcement learning (DRL) methods. Our method leverages high-reward samples to encourage exploration of symmetric regions without additional online interactions - free. Experimental results demonstrate the consistent improvement of our method in sample efficiency across diverse DRL methods applied to real-world tasks.
arXiv Detail & Related papers (2023-06-02T05:34:01Z)
Evolutionary Solution Adaption for Multi-Objective Metal Cutting Process Optimization [59.45414406974091]
We introduce a framework for system flexibility that allows us to study the ability of an algorithm to transfer solutions from previous optimization tasks. We study the flexibility of NSGA-II, which we extend by two variants: 1) varying goals, that optimize solutions for two tasks simultaneously to obtain in-between source solutions expected to be more adaptable, and 2) active-inactive genotype, that accommodates different possibilities that can be activated or deactivated. Results show that adaption with standard NSGA-II greatly reduces the number of evaluations required for optimization to a target goal, while the proposed variants further improve the adaption costs.
arXiv Detail & Related papers (2023-05-31T12:07:50Z)
NOVAS: Non-convex Optimization via Adaptive Stochastic Search for End-to-End Learning and Control [22.120942106939122]
We propose the use of adaptive search as a building block for general, non- neural optimization operations. We benchmark it against two existing alternatives on a synthetic energy-based structured task, and showcase its use in optimal control applications.
arXiv Detail & Related papers (2020-06-22T03:40:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.