LSM: Learning Subspace Minimization for Low-level Vision
- URL: http://arxiv.org/abs/2004.09197v1
- Date: Mon, 20 Apr 2020 10:49:38 GMT
- Title: LSM: Learning Subspace Minimization for Low-level Vision
- Authors: Chengzhou Tang, Lu Yuan and Ping Tan
- Abstract summary: We replace the regularization term with a learnable subspace constraint, and preserve the data term to exploit domain knowledge.
This learning subspace minimization (LSM) framework unifies the network structures and the parameters for many low-level vision tasks.
We demonstrate our LSM framework on four low-level tasks including interactive image segmentation, video segmentation, stereo matching, and optical flow, and validate the network on various datasets.
- Score: 78.27774638569218
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the energy minimization problem in low-level vision tasks from a
novel perspective. We replace the heuristic regularization term with a
learnable subspace constraint, and preserve the data term to exploit domain
knowledge derived from the first principle of a task. This learning subspace
minimization (LSM) framework unifies the network structures and the parameters
for many low-level vision tasks, which allows us to train a single network for
multiple tasks simultaneously with completely shared parameters, and even
generalizes the trained network to an unseen task as long as its data term can
be formulated. We demonstrate our LSM framework on four low-level tasks
including interactive image segmentation, video segmentation, stereo matching,
and optical flow, and validate the network on various datasets. The experiments
show that the proposed LSM generates state-of-the-art results with smaller
model size, faster training convergence, and real-time inference.
Related papers
- LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging [80.17238673443127]
LiNeS is a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance.
LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing.
arXiv Detail & Related papers (2024-10-22T16:26:05Z) - Fully Fine-tuned CLIP Models are Efficient Few-Shot Learners [8.707819647492467]
We explore capturing the task-specific information via meticulous refinement of entire Vision-Language Models (VLMs)
To mitigate these issues, we propose a framework named CLIP-CITE via designing a discriminative visual-text task.
arXiv Detail & Related papers (2024-07-04T15:22:54Z) - VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs [102.36953558562436]
Vision language models (VLMs) are an exciting emerging class of language models (LMs)
One understudied capability inVLMs is visual spatial planning.
Our study introduces a benchmark that evaluates the spatial planning capability in these models in general.
arXiv Detail & Related papers (2024-07-02T00:24:01Z) - Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models [87.47400128150032]
We propose a novel LMM architecture named Lumen, a Large multimodal model with versatile vision-centric capability enhancement.
Lumen first promotes fine-grained vision-language concept alignment.
Then the task-specific decoding is carried out by flexibly routing the shared representation to lightweight task decoders.
arXiv Detail & Related papers (2024-03-12T04:13:45Z) - Negotiated Representations to Prevent Forgetting in Machine Learning
Applications [0.0]
Catastrophic forgetting is a significant challenge in the field of machine learning.
We propose a novel method for preventing catastrophic forgetting in machine learning applications.
arXiv Detail & Related papers (2023-11-30T22:43:50Z) - Dynamic Neural Network for Multi-Task Learning Searching across Diverse
Network Topologies [14.574399133024594]
We present a new MTL framework that searches for optimized structures for multiple tasks with diverse graph topologies.
We design a restricted DAG-based central network with read-in/read-out layers to build topologically diverse task-adaptive structures.
arXiv Detail & Related papers (2023-03-13T05:01:50Z) - mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal
Skip-connections [104.14624185375897]
mPLUG is a new vision-language foundation model for both cross-modal understanding and generation.
It achieves state-of-the-art results on a wide range of vision-language downstream tasks, such as image captioning, image-text retrieval, visual grounding and visual question answering.
arXiv Detail & Related papers (2022-05-24T11:52:06Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - UniNet: A Unified Scene Understanding Network and Exploring Multi-Task
Relationships through the Lens of Adversarial Attacks [1.1470070927586016]
Single task vision networks extract information only based on some aspects of the scene.
In multi-task learning (MTL), single tasks are jointly learned, thereby providing an opportunity for tasks to share information.
We develop UniNet, a unified scene understanding network that accurately and efficiently infers vital vision tasks.
arXiv Detail & Related papers (2021-08-10T11:00:56Z) - Deep Active Shape Model for Face Alignment and Pose Estimation [0.2148535041822524]
Active Shape Model (ASM) is a statistical model of object shapes that represents a target structure.
This paper presents a lightweight Convolutional Neural Network (CNN) architecture with a loss function regularized by ASM for face alignment and estimating head pose in the wild.
arXiv Detail & Related papers (2021-02-27T03:46:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.