Reward Finetuning for Faster and More Accurate Unsupervised Object
Discovery
- URL: http://arxiv.org/abs/2310.19080v2
- Date: Sun, 5 Nov 2023 18:57:59 GMT
- Title: Reward Finetuning for Faster and More Accurate Unsupervised Object
Discovery
- Authors: Katie Z Luo, Zhenzhen Liu, Xiangyu Chen, Yurong You, Sagie Benaim,
Cheng Perng Phoo, Mark Campbell, Wen Sun, Bharath Hariharan, Kilian Q.
Weinberger
- Abstract summary: Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences.
We propose to adapt similar RL-based methods to unsupervised object discovery.
We demonstrate that our approach is not only more accurate, but also orders of magnitudes faster to train.
- Score: 64.41455104593304
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in machine learning have shown that Reinforcement Learning
from Human Feedback (RLHF) can improve machine learning models and align them
with human preferences. Although very successful for Large Language Models
(LLMs), these advancements have not had a comparable impact in research for
autonomous vehicles -- where alignment with human expectations can be
imperative. In this paper, we propose to adapt similar RL-based methods to
unsupervised object discovery, i.e. learning to detect objects from LiDAR
points without any training labels. Instead of labels, we use simple heuristics
to mimic human feedback. More explicitly, we combine multiple heuristics into a
simple reward function that positively correlates its score with bounding box
accuracy, i.e., boxes containing objects are scored higher than those without.
We start from the detector's own predictions to explore the space and reinforce
boxes with high rewards through gradient updates. Empirically, we demonstrate
that our approach is not only more accurate, but also orders of magnitudes
faster to train compared to prior works on object discovery.
Related papers
- Learning 3D Perception from Others' Predictions [64.09115694891679]
We investigate a new scenario to construct 3D object detectors: learning from the predictions of a nearby unit that is equipped with an accurate detector.
For example, when a self-driving car enters a new area, it may learn from other traffic participants whose detectors have been optimized for that area.
arXiv Detail & Related papers (2024-10-03T16:31:28Z) - H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding
Object Articulations from Interactions [62.510951695174604]
"Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR) is a probabilistic generative framework that generates hypotheses about how objects articulate given input observations.
We show that the proposed model significantly outperforms the current state-of-the-art articulated object manipulation framework.
We further improve the test-time efficiency of H-SAUR by integrating a learned prior from learning-based vision models.
arXiv Detail & Related papers (2022-10-22T18:39:33Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning.
CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner.
We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z) - Learning Oriented Remote Sensing Object Detection via Naive Geometric
Computing [38.508709334835316]
We propose a mechanism that learns the regression of horizontal proposals, oriented proposals, and rotation angles of objects in a consistent manner.
Our proposed idea is simple and intuitive that can be readily implemented.
arXiv Detail & Related papers (2021-12-01T13:58:42Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.