Related papers: Greedy Policy Search: A Simple Baseline for Learnable Test-Time Augmentation

Greedy Policy Search: A Simple Baseline for Learnable Test-Time Augmentation

URL: http://arxiv.org/abs/2002.09103v2
Date: Sat, 20 Jun 2020 13:10:23 GMT
Title: Greedy Policy Search: A Simple Baseline for Learnable Test-Time Augmentation
Authors: Dmitry Molchanov, Alexander Lyzhov, Yuliya Molchanova, Arsenii Ashukha, Dmitry Vetrov
Abstract summary: We introduce greedy policy search (GPS) as a simple but high-performing method for learning a policy of test-time augmentation. We demonstrate that augmentation policies learned with GPS achieve superior predictive performance on image classification problems.
Score: 65.92151529708036
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Test-time data augmentation$-$averaging the predictions of a machine learning model across multiple augmented samples of data$-$is a widely used technique that improves the predictive performance. While many advanced learnable data augmentation techniques have emerged in recent years, they are focused on the training phase. Such techniques are not necessarily optimal for test-time augmentation and can be outperformed by a policy consisting of simple crops and flips. The primary goal of this paper is to demonstrate that test-time augmentation policies can be successfully learned too. We introduce greedy policy search (GPS), a simple but high-performing method for learning a policy of test-time augmentation. We demonstrate that augmentation policies learned with GPS achieve superior predictive performance on image classification problems, provide better in-domain uncertainty estimation, and improve the robustness to domain shift.

Related papers

Test-time Offline Reinforcement Learning on Goal-related Experience [50.94457794664909]
Research in foundation models has shown that performance can be substantially improved through test-time training.<n>We propose a novel self-supervised data selection criterion, which selects transitions from an offline dataset according to their relevance to the current state.<n>Our goal-conditioned test-time training (GC-TTT) algorithm applies this routine in a receding-horizon fashion during evaluation, adapting the policy to the current trajectory as it is being rolled out.
arXiv Detail & Related papers (2025-07-24T21:11:39Z)
Relative Entropy Pathwise Policy Optimization [56.86405621176669]
We show how to construct a value-gradient driven, on-policy algorithm that allow training Q-value models purely from on-policy data.<n>We propose Relative Entropy Pathwise Policy Optimization (REPPO), an efficient on-policy algorithm that combines the sample-efficiency of pathwise policy gradients with the simplicity and minimal memory footprint of standard on-policy learning.
arXiv Detail & Related papers (2025-07-15T06:24:07Z)
Dense Policy: Bidirectional Autoregressive Learning of Actions [51.60428100831717]
This paper introduces a bidirectionally expanded learning approach, termed Dense Policy, to establish a new paradigm for autoregressive policies in action prediction. It employs a lightweight encoder-only architecture to iteratively unfold the action sequence from an initial single frame into the target sequence in a coarse-to-fine manner. Experiments validate that our dense policy has superior autoregressive learning capabilities and can surpass existing holistic generative policies.
arXiv Detail & Related papers (2025-03-17T14:28:08Z)
TD-M(PC)$^2$: Improving Temporal Difference MPC Through Policy Constraint [11.347808936693152]
Model-based reinforcement learning algorithms combine model-based planning and learned value/policy prior. Existing methods that rely on standard SAC-style policy iteration for value learning often result in emphpersistent value overestimation. We propose a policy regularization term reducing out-of-distribution (OOD) queries, thereby improving value learning.
arXiv Detail & Related papers (2025-02-05T19:08:42Z)
Enabling Efficient, Reliable Real-World Reinforcement Learning with Approximate Physics-Based Models [10.472792899267365]
We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data. In this paper we introduce a novel policy gradient-based policy optimization framework. We show that our approach can learn precise control strategies reliably and with only minutes of real-world data.
arXiv Detail & Related papers (2023-07-16T22:36:36Z)
Time Series Contrastive Learning with Information-Aware Augmentations [57.45139904366001]
A key component of contrastive learning is to select appropriate augmentations imposing some priors to construct feasible positive samples. How to find the desired augmentations of time series data that are meaningful for given contrastive learning tasks and datasets remains an open question. We propose a new contrastive learning approach with information-aware augmentations, InfoTS, that adaptively selects optimal augmentations for time series representation learning.
arXiv Detail & Related papers (2023-03-21T15:02:50Z)
Imbalanced Classification In Faulty Turbine Data: New Proximal Policy Optimization [0.5735035463793008]
We propose a framework for fault detection based on reinforcement learning and a policy known as proximal policy optimization. Using modified Proximal Policy Optimization, we can increase performance, overcome data imbalance, and better predict future faults.
arXiv Detail & Related papers (2023-01-10T16:03:25Z)
Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization [14.028916306297928]
Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy. We propose a novel value enhancement method to improve the performance of a given initial policy computed by existing state-of-the-art RL algorithms.
arXiv Detail & Related papers (2023-01-05T18:43:40Z)
Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach [95.74102207187545]
We show that a prior-free autonomous data augmentation's objective can be derived from a representation learning principle. We then propose a practical surrogate to the objective that can be efficiently optimized and integrated seamlessly into existing methods.
arXiv Detail & Related papers (2022-11-02T02:02:51Z)
Augmentation Learning for Semi-Supervised Classification [13.519613713213277]
We propose a Semi-Supervised Learning method that automatically selects the most effective data augmentation policy for a particular dataset. We show how policy learning can be used to adapt augmentations to datasets beyond ImageNet.
arXiv Detail & Related papers (2022-08-03T10:06:51Z)
DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled. Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step. Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z)
Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)
Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data. Can we learn effective policies via supervised learning without demonstrations? We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.