Global2Local: Efficient Structure Search for Video Action Segmentation
- URL: http://arxiv.org/abs/2101.00910v1
- Date: Mon, 4 Jan 2021 12:06:03 GMT
- Title: Global2Local: Efficient Structure Search for Video Action Segmentation
- Authors: Shang-Hua Gao, Qi Han, Zhong-Yu Li, Pai Peng, Liang Wang, Ming-Ming
Cheng
- Abstract summary: We propose to find better receptive field combinations through a global-to-local search scheme.
Our scheme exploits both global search to find the coarse combinations and local search to get the refined receptive field combination patterns.
Our global-to-local search can be plugged into existing action segmentation methods to achieve state-of-the-art performance.
- Score: 64.99046987598075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal receptive fields of models play an important role in action
segmentation. Large receptive fields facilitate the long-term relations among
video clips while small receptive fields help capture the local details.
Existing methods construct models with hand-designed receptive fields in
layers. Can we effectively search for receptive field combinations to replace
hand-designed patterns? To answer this question, we propose to find better
receptive field combinations through a global-to-local search scheme. Our
search scheme exploits both global search to find the coarse combinations and
local search to get the refined receptive field combination patterns further.
The global search finds possible coarse combinations other than human-designed
patterns. On top of the global search, we propose an expectation guided
iterative local search scheme to refine combinations effectively. Our
global-to-local search can be plugged into existing action segmentation methods
to achieve state-of-the-art performance.
Related papers
- Leveraging Large Language Models for Multimodal Search [0.6249768559720121]
This paper introduces a novel multimodal search model that achieves a new performance milestone on the Fashion200K dataset.
We also propose a novel search interface integrating Large Language Models (LLMs) to facilitate natural language interaction.
arXiv Detail & Related papers (2024-04-24T10:30:42Z) - Large Search Model: Redefining Search Stack in the Era of LLMs [63.503320030117145]
We introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM)
All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts.
This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack.
arXiv Detail & Related papers (2023-10-23T05:52:09Z) - A Visual Active Search Framework for Geospatial Exploration [36.31732056074638]
Many problems can be viewed as forms of geospatial search aided by aerial imagery.
We model this class of problems in a visual active search (VAS) framework, which has three key inputs.
We propose a reinforcement learning approach for VAS that learns a meta-search policy from a collection of fully annotated search tasks.
arXiv Detail & Related papers (2022-11-28T21:53:05Z) - RF-Next: Efficient Receptive Field Search for Convolutional Neural
Networks [86.6139619721343]
We propose to find better receptive field combinations through a global-to-local search scheme.
Our search scheme exploits both global search to find the coarse combinations and local search to get the refined receptive field combinations.
Our RF-Next models, plugging receptive field search to various models, boost the performance on many tasks.
arXiv Detail & Related papers (2022-06-14T06:56:26Z) - CrossBeam: Learning to Search in Bottom-Up Program Synthesis [51.37514793318815]
We propose training a neural model to learn a hands-on search policy for bottom-up synthesis.
Our approach, called CrossBeam, uses the neural model to choose how to combine previously-explored programs into new programs.
We observe that CrossBeam learns to search efficiently, exploring much smaller portions of the program space compared to the state-of-the-art.
arXiv Detail & Related papers (2022-03-20T04:41:05Z) - Exploring Complicated Search Spaces with Interleaving-Free Sampling [127.07551427957362]
In this paper, we build the search algorithm upon a complicated search space with long-distance connections.
We present a simple yet effective algorithm named textbfIF-NAS, where we perform a periodic sampling strategy to construct different sub-networks.
In the proposed search space, IF-NAS outperform both random sampling and previous weight-sharing search algorithms by a significant margin.
arXiv Detail & Related papers (2021-12-05T06:42:48Z) - Global Aggregation then Local Distribution for Scene Parsing [99.1095068574454]
We show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks.
Our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff.
arXiv Detail & Related papers (2021-07-28T03:46:57Z) - NASE: Learning Knowledge Graph Embedding for Link Prediction via Neural
Architecture Search [9.634626241415916]
Link prediction is the task of predicting missing connections between entities in the knowledge graph (KG)
Previous work has tried to use Automated Machine Learning (AutoML) to search for the best model for a given dataset.
We propose a novel Neural Architecture Search (NAS) framework for the link prediction task.
arXiv Detail & Related papers (2020-08-18T03:34:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.