Tree-Structured Policy based Progressive Reinforcement Learning for
Temporally Language Grounding in Video
- URL: http://arxiv.org/abs/2001.06680v1
- Date: Sat, 18 Jan 2020 15:08:04 GMT
- Title: Tree-Structured Policy based Progressive Reinforcement Learning for
Temporally Language Grounding in Video
- Authors: Jie Wu, Guanbin Li, Si Liu, Liang Lin
- Abstract summary: Temporally language grounding in untrimmed videos is a newly-raised task in video understanding.
Inspired by human's coarse-to-fine decision-making paradigm, we formulate a novel Tree-Structured Policy based Progressive Reinforcement Learning framework.
- Score: 128.08590291947544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporally language grounding in untrimmed videos is a newly-raised task in
video understanding. Most of the existing methods suffer from inferior
efficiency, lacking interpretability, and deviating from the human perception
mechanism. Inspired by human's coarse-to-fine decision-making paradigm, we
formulate a novel Tree-Structured Policy based Progressive Reinforcement
Learning (TSP-PRL) framework to sequentially regulate the temporal boundary by
an iterative refinement process. The semantic concepts are explicitly
represented as the branches in the policy, which contributes to efficiently
decomposing complex policies into an interpretable primitive action.
Progressive reinforcement learning provides correct credit assignment via two
task-oriented rewards that encourage mutual promotion within the
tree-structured policy. We extensively evaluate TSP-PRL on the Charades-STA and
ActivityNet datasets, and experimental results show that TSP-PRL achieves
competitive performance over existing state-of-the-art methods.
Related papers
- Multi-Agent Transfer Learning via Temporal Contrastive Learning [8.487274986507922]
This paper introduces a novel transfer learning framework for deep multi-agent reinforcement learning.
The approach automatically combines goal-conditioned policies with temporal contrastive learning to discover meaningful sub-goals.
arXiv Detail & Related papers (2024-06-03T14:42:14Z) - Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding [70.31050639330603]
Video paragraph grounding aims at localizing multiple sentences with semantic relations and temporal order from an untrimmed video.
Existing VPG approaches are heavily reliant on a considerable number of temporal labels that are laborious and time-consuming to acquire.
We introduce and explore Weakly-Supervised Video paragraph Grounding (WSVPG) to eliminate the need of temporal annotations.
arXiv Detail & Related papers (2024-03-18T04:30:31Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Time-Efficient Reinforcement Learning with Stochastic Stateful Policies [20.545058017790428]
We present a novel approach for training stateful policies by decomposing the latter into a gradient internal state kernel and a stateless policy.
We introduce different versions of the stateful policy gradient theorem, enabling us to easily instantiate stateful variants of popular reinforcement learning algorithms.
arXiv Detail & Related papers (2023-11-07T15:48:07Z) - SCD-Net: Spatiotemporal Clues Disentanglement Network for
Self-supervised Skeleton-based Action Recognition [39.99711066167837]
This paper introduces a contrastive learning framework, namely Stemporal Clues Disentanglement Network (SCD-Net)
Specifically, we integrate the sequences with a feature extractor to derive explicit clues from spatial and temporal domains respectively.
We conduct evaluations on the NTU-+D (60&120) PKU-MMDI (&I) datasets, covering various downstream tasks such as action recognition, action retrieval, transfer learning.
arXiv Detail & Related papers (2023-09-11T21:32:13Z) - Option-Aware Adversarial Inverse Reinforcement Learning for Robotic
Control [44.77500987121531]
Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizon tasks from expert demonstrations.
We develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning.
We also propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion.
arXiv Detail & Related papers (2022-10-05T00:28:26Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Continuous Action Reinforcement Learning from a Mixture of Interpretable
Experts [35.80418547105711]
We propose a policy scheme that retains a complex function approxor for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure.
The main technical contribution of the paper is to address the challenges introduced by this non-differentiable state selection procedure.
arXiv Detail & Related papers (2020-06-10T16:02:08Z) - Hierarchical Variational Imitation Learning of Control Programs [131.7671843857375]
We propose a variational inference method for imitation learning of a control policy represented by parametrized hierarchical procedures (PHP)
Our method discovers the hierarchical structure in a dataset of observation-action traces of teacher demonstrations, by learning an approximate posterior distribution over the latent sequence of procedure calls and terminations.
We demonstrate a novel benefit of variational inference in the context of hierarchical imitation learning: in decomposing the policy into simpler procedures, inference can leverage acausal information that is unused by other methods.
arXiv Detail & Related papers (2019-12-29T08:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.