Generalized Operating Procedure for Deep Learning: an Unconstrained
Optimal Design Perspective
- URL: http://arxiv.org/abs/2012.15391v1
- Date: Thu, 31 Dec 2020 01:37:56 GMT
- Title: Generalized Operating Procedure for Deep Learning: an Unconstrained
Optimal Design Perspective
- Authors: Shen Chen, Mingwei Zhang, Jiamin Cui, Wei Yao
- Abstract summary: We present a generalized operating procedure for deep learning (DL) for real use cases.
We build a multi-stream end-to-end speaker verification system, in which the input speech utterance is processed by multiple parallel streams.
Trained with VoxCeleb dataset, our results verify the effectiveness of our proposed operating procedure.
- Score: 4.570823264643028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning (DL) has brought about remarkable breakthrough in processing
images, video and speech due to its efficacy in extracting highly abstract
representation and learning very complex functions. However, there is seldom
operating procedure reported on how to make it for real use cases. In this
paper, we intend to address this problem by presenting a generalized operating
procedure for DL from the perspective of unconstrained optimal design, which is
motivated by a simple intension to remove the barrier of using DL, especially
for those scientists or engineers who are new but eager to use it. Our proposed
procedure contains seven steps, which are project/problem statement, data
collection, architecture design, initialization of parameters, defining loss
function, computing optimal parameters, and inference, respectively. Following
this procedure, we build a multi-stream end-to-end speaker verification system,
in which the input speech utterance is processed by multiple parallel streams
within different frequency range, so that the acoustic modeling can be more
robust resulting from the diversity of features. Trained with VoxCeleb dataset,
our experimental results verify the effectiveness of our proposed operating
procedure, and also show that our multi-stream framework outperforms
single-stream baseline with 20 % relative reduction in minimum decision cost
function (minDCF).
Related papers
- Learning Free Token Reduction for Multi-Modal LLM [3.4026156483879517]
Vision-Language Models (VLMs) have achieved remarkable success across a range of multimodal tasks.
However, their practical deployment is often constrained by high computational costs and prolonged inference times.
We propose a token compression paradigm that operates on both spatial and temporal dimensions.
arXiv Detail & Related papers (2025-01-29T02:52:32Z) - Q-VLM: Post-training Quantization for Large Vision-Language Models [73.19871905102545]
We propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference.
We mine the cross-layer dependency that significantly influences discretization errors of the entire vision-language model, and embed this dependency into optimal quantization strategy.
Experimental results demonstrate that our method compresses the memory by 2.78x and increase generate speed by 1.44x about 13B LLaVA model without performance degradation.
arXiv Detail & Related papers (2024-10-10T17:02:48Z) - PAR: Prompt-Aware Token Reduction Method for Efficient Large Multimodal Models [32.33892531885448]
Multimodal large language models (MLLMs) demonstrate strong performance across visual tasks.
But their efficiency is hindered by significant computational and memory demands from processing long contexts in multimodal inputs.
We introduce PAR (Prompt-Aware Token Reduction), a novel and plug-and-play approach that reduces visual tokens efficiently without compromising model performance.
arXiv Detail & Related papers (2024-10-09T07:13:22Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.
We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.
Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z) - Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning [50.73666458313015]
Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications.
MoE has been emerged as a promising solution with its sparse architecture for effective task decoupling.
Intuition-MoR1E achieves superior efficiency and 2.15% overall accuracy improvement across 14 public datasets.
arXiv Detail & Related papers (2024-04-13T12:14:58Z) - InterroGate: Learning to Share, Specialize, and Prune Representations
for Multi-task Learning [17.66308231838553]
We propose a novel multi-task learning (MTL) architecture designed to mitigate task interference while optimizing inference computational efficiency.
We employ a learnable gating mechanism to automatically balance the shared and task-specific representations while preserving the performance of all tasks.
arXiv Detail & Related papers (2024-02-26T18:59:52Z) - Parameterized Projected Bellman Operator [64.129598593852]
Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL)
We propose a novel alternative approach based on learning an approximate version of the Bellman operator.
We formulate an optimization problem to learn PBO for generic sequential decision-making problems.
arXiv Detail & Related papers (2023-12-20T09:33:16Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - SparCA: Sparse Compressed Agglomeration for Feature Extraction and
Dimensionality Reduction [0.0]
We propose sparse compressed agglomeration (SparCA) as a novel dimensionality reduction procedure.
SparCA is applicable to a wide range of data types, produces highly interpretable features, and shows compelling performance on downstream supervised learning tasks.
arXiv Detail & Related papers (2023-01-26T13:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.