Related papers: Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery

Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery

URL: http://arxiv.org/abs/2310.19080v2
Date: Sun, 5 Nov 2023 18:57:59 GMT
Title: Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery
Authors: Katie Z Luo, Zhenzhen Liu, Xiangyu Chen, Yurong You, Sagie Benaim, Cheng Perng Phoo, Mark Campbell, Wen Sun, Bharath Hariharan, Kilian Q. Weinberger
Abstract summary: Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. We propose to adapt similar RL-based methods to unsupervised object discovery. We demonstrate that our approach is not only more accurate, but also orders of magnitudes faster to train.
Score: 64.41455104593304
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in machine learning have shown that Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. Although very successful for Large Language Models (LLMs), these advancements have not had a comparable impact in research for autonomous vehicles -- where alignment with human expectations can be imperative. In this paper, we propose to adapt similar RL-based methods to unsupervised object discovery, i.e. learning to detect objects from LiDAR points without any training labels. Instead of labels, we use simple heuristics to mimic human feedback. More explicitly, we combine multiple heuristics into a simple reward function that positively correlates its score with bounding box accuracy, i.e., boxes containing objects are scored higher than those without. We start from the detector's own predictions to explore the space and reinforce boxes with high rewards through gradient updates. Empirically, we demonstrate that our approach is not only more accurate, but also orders of magnitudes faster to train compared to prior works on object discovery.

Related papers

INTENT: Trajectory Prediction Framework with Intention-Guided Contrastive Clustering [13.079901321614937]
In this study, we advocate that understanding and reasoning road agents' intention plays a key role in trajectory prediction tasks. We present an efficient intention-guided trajectory prediction model that relies solely on information contained in the road agent's trajectory.
arXiv Detail & Related papers (2025-03-06T20:31:11Z)
Learning 3D Perception from Others' Predictions [64.09115694891679]
We investigate a new scenario to construct 3D object detectors: learning from the predictions of a nearby unit that is equipped with an accurate detector. For example, when a self-driving car enters a new area, it may learn from other traffic participants whose detectors have been optimized for that area.
arXiv Detail & Related papers (2024-10-03T16:31:28Z)
H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding Object Articulations from Interactions [62.510951695174604]
"Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR) is a probabilistic generative framework that generates hypotheses about how objects articulate given input observations. We show that the proposed model significantly outperforms the current state-of-the-art articulated object manipulation framework. We further improve the test-time efficiency of H-SAUR by integrating a learned prior from learning-based vision models.
arXiv Detail & Related papers (2022-10-22T18:39:33Z)
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms. Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward. Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z)
CCLF: A Contrastive-Curiosity-Driven Learning Framework for Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning. CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner. We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z)
Learning Oriented Remote Sensing Object Detection via Naive Geometric Computing [38.508709334835316]
We propose a mechanism that learns the regression of horizontal proposals, oriented proposals, and rotation angles of objects in a consistent manner. Our proposed idea is simple and intuitive that can be readily implemented.
arXiv Detail & Related papers (2021-12-01T13:58:42Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.