Common Benchmarks Undervalue the Generalization Power of Programmatic Policies
- URL: http://arxiv.org/abs/2506.14162v1
- Date: Tue, 17 Jun 2025 03:53:18 GMT
- Title: Common Benchmarks Undervalue the Generalization Power of Programmatic Policies
- Authors: Amirhossein Rajabpour, Kiarash Aghakasiri, Sandra Zilles, Levi H. S. Lelis,
- Abstract summary: We argue that commonly used benchmarks undervalue the generalization capabilities of programmatic representations.<n>This is achieved with simple changes in the neural policies training pipeline.
- Score: 11.938597183669117
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Algorithms for learning programmatic representations for sequential decision-making problems are often evaluated on out-of-distribution (OOD) problems, with the common conclusion that programmatic policies generalize better than neural policies on OOD problems. In this position paper, we argue that commonly used benchmarks undervalue the generalization capabilities of programmatic representations. We analyze the experiments of four papers from the literature and show that neural policies, which were shown not to generalize, can generalize as effectively as programmatic policies on OOD problems. This is achieved with simple changes in the neural policies training pipeline. Namely, we show that simpler neural architectures with the same type of sparse observation used with programmatic policies can help attain OOD generalization. Another modification we have shown to be effective is the use of reward functions that allow for safer policies (e.g., agents that drive slowly can generalize better). Also, we argue for creating benchmark problems highlighting concepts needed for OOD generalization that may challenge neural policies but align with programmatic representations, such as tasks requiring algorithmic constructs like stacks.
Related papers
- Generalization Guarantees for Learning Branch-and-Cut Policies in Integer Programming [1.1510009152620668]
Mixed-integer programming (MIP) provides a powerful framework for optimization problems.<n>Branch-and-Cut (B&C) is the predominant algorithm in state-of-the-art solvers.
arXiv Detail & Related papers (2025-05-16T19:00:02Z) - Neural Time-Reversed Generalized Riccati Equation [60.92253836775246]
Hamiltonian equations offer an interpretation of optimality through auxiliary variables known as costates.
This paper introduces a novel neural-based approach to optimal control, with the aim of working forward-in-time.
arXiv Detail & Related papers (2023-12-14T19:29:37Z) - Generalisation Through Negation and Predicate Invention [25.944127431156627]
We introduce an inductive logic programming (ILP) approach that combines negation and predicate invention.
We implement our idea in NOPI, which can learn normal logic programs with predicate invention.
Our experimental results on multiple domains show that our approach can improve predictive accuracies and learning times.
arXiv Detail & Related papers (2023-01-18T16:12:27Z) - Lexicographic Multi-Objective Reinforcement Learning [65.90380946224869]
We present a family of both action-value and policy gradient algorithms that can be used to solve such problems.
We show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.
arXiv Detail & Related papers (2022-12-28T10:22:36Z) - Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule.<n>Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded.<n>We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Preliminary Results on Using Abstract AND-OR Graphs for Generalized
Solving of Stochastic Shortest Path Problems [25.152899734616298]
Shortest Path Problems (SSPs) are goal-oriented problems in the real-world.
A key difficulty for computing solutions for SSPs is finding solutions to even moderately sized problems intractable.
We show that our approach can be embedded in any SSP solver to compute hierarchically optimal policies.
arXiv Detail & Related papers (2022-04-08T21:30:47Z) - Programmatic Policy Extraction by Iterative Local Search [0.15229257192293197]
We present a simple and direct approach to extracting a programmatic policy from a pretrained neural policy.
Both when trained using a hand crafted expert policy and a learned neural policy, our method discovers simple and interpretable policies that perform almost as well as the original.
arXiv Detail & Related papers (2022-01-18T10:39:40Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Learning to Synthesize Programs as Interpretable and Generalizable
Policies [25.258598215642067]
We present a framework that learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner.
Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines.
arXiv Detail & Related papers (2021-08-31T07:03:06Z) - DisCo RL: Distribution-Conditioned Reinforcement Learning for
General-Purpose Policies [116.12670064963625]
We develop an off-policy algorithm called distribution-conditioned reinforcement learning (DisCo RL) to efficiently learn contextual policies.
We evaluate DisCo RL on a variety of robot manipulation tasks and find that it significantly outperforms prior methods on tasks that require generalization to new goal distributions.
arXiv Detail & Related papers (2021-04-23T16:51:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.