Weakly Supervised Video Summarization by Hierarchical Reinforcement
Learning
- URL: http://arxiv.org/abs/2001.05864v2
- Date: Sat, 29 Feb 2020 15:31:24 GMT
- Title: Weakly Supervised Video Summarization by Hierarchical Reinforcement
Learning
- Authors: Yiyan Chen, Li Tao, Xueting Wang and Toshihiko Yamasaki
- Abstract summary: We propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality.
Experiments on two benchmark datasets show that our proposal has achieved the best performance, even better than supervised approaches.
- Score: 38.261971839012176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional video summarization approaches based on reinforcement learning
have the problem that the reward can only be received after the whole summary
is generated. Such kind of reward is sparse and it makes reinforcement learning
hard to converge. Another problem is that labelling each frame is tedious and
costly, which usually prohibits the construction of large-scale datasets. To
solve these problems, we propose a weakly supervised hierarchical reinforcement
learning framework, which decomposes the whole task into several subtasks to
enhance the summarization quality. This framework consists of a manager network
and a worker network. For each subtask, the manager is trained to set a subgoal
only by a task-level binary label, which requires much fewer labels than
conventional approaches. With the guide of the subgoal, the worker predicts the
importance scores for video frames in the subtask by policy gradient according
to both global reward and innovative defined sub-rewards to overcome the sparse
problem. Experiments on two benchmark datasets show that our proposal has
achieved the best performance, even better than supervised approaches.
Related papers
- Creating Hierarchical Dispositions of Needs in an Agent [0.0]
We present a novel method for learning hierarchical abstractions that prioritize competing objectives.
We derive an equation that orders these scalar values and the global reward by priority, inducing a hierarchy of needs that informs goal formation.
arXiv Detail & Related papers (2024-11-23T06:41:54Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - Contrastive Losses Are Natural Criteria for Unsupervised Video
Summarization [27.312423653997087]
Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing.
We propose three metrics featuring a desirable key frame: local dissimilarity, global consistency, and uniqueness.
We show that by refining the pre-trained features with a lightweight contrastively learned projection module, the frame-level importance scores can be further improved.
arXiv Detail & Related papers (2022-11-18T07:01:28Z) - Generalization with Lossy Affordances: Leveraging Broad Offline Data for
Learning Visuomotor Tasks [65.23947618404046]
We introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data.
When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems.
We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.
arXiv Detail & Related papers (2022-10-12T21:46:38Z) - Use All The Labels: A Hierarchical Multi-Label Contrastive Learning
Framework [75.79736930414715]
We present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes.
We introduce novel hierarchy preserving losses, which jointly apply a hierarchical penalty to the contrastive loss, and enforce the hierarchy constraint.
arXiv Detail & Related papers (2022-04-27T21:41:44Z) - Hierarchical Modeling for Task Recognition and Action Segmentation in
Weakly-Labeled Instructional Videos [6.187780920448871]
This paper focuses on task recognition and action segmentation in weakly-labeled instructional videos.
We propose a two-stream framework, which exploits semantic and temporal hierarchies to recognize top-level tasks in instructional videos.
We present a novel top-down weakly-supervised action segmentation approach, where the predicted task is used to constrain the inference of fine-grained action sequences.
arXiv Detail & Related papers (2021-10-12T02:32:15Z) - Unsupervised Video Summarization with a Convolutional Attentive
Adversarial Network [32.90753137435032]
We propose a convolutional attentive adversarial network (CAAN) to build a deep summarizer in an unsupervised way.
Specifically, the generator employs a fully convolutional sequence network to extract global representation of a video, and an attention-based network to output normalized importance scores.
The results show the superiority of our proposed method against other state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2021-05-24T07:24:39Z) - Temporally-Weighted Hierarchical Clustering for Unsupervised Action
Segmentation [96.67525775629444]
Action segmentation refers to inferring boundaries of semantically consistent visual concepts in videos.
We present a fully automatic and unsupervised approach for segmenting actions in a video that does not require any training.
Our proposal is an effective temporally-weighted hierarchical clustering algorithm that can group semantically consistent frames of the video.
arXiv Detail & Related papers (2021-03-20T23:30:01Z) - Learning Task Decomposition with Ordered Memory Policy Network [73.3813423684999]
We propose Ordered Memory Policy Network (OMPN) to discover subtask hierarchy by learning from demonstration.
OMPN can be applied to partially observable environments and still achieve higher task decomposition performance.
Our visualization confirms that the subtask hierarchy can emerge in our model.
arXiv Detail & Related papers (2021-03-19T18:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.