Generalized Operating Procedure for Deep Learning: an Unconstrained
Optimal Design Perspective
- URL: http://arxiv.org/abs/2012.15391v1
- Date: Thu, 31 Dec 2020 01:37:56 GMT
- Title: Generalized Operating Procedure for Deep Learning: an Unconstrained
Optimal Design Perspective
- Authors: Shen Chen, Mingwei Zhang, Jiamin Cui, Wei Yao
- Abstract summary: We present a generalized operating procedure for deep learning (DL) for real use cases.
We build a multi-stream end-to-end speaker verification system, in which the input speech utterance is processed by multiple parallel streams.
Trained with VoxCeleb dataset, our results verify the effectiveness of our proposed operating procedure.
- Score: 4.570823264643028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning (DL) has brought about remarkable breakthrough in processing
images, video and speech due to its efficacy in extracting highly abstract
representation and learning very complex functions. However, there is seldom
operating procedure reported on how to make it for real use cases. In this
paper, we intend to address this problem by presenting a generalized operating
procedure for DL from the perspective of unconstrained optimal design, which is
motivated by a simple intension to remove the barrier of using DL, especially
for those scientists or engineers who are new but eager to use it. Our proposed
procedure contains seven steps, which are project/problem statement, data
collection, architecture design, initialization of parameters, defining loss
function, computing optimal parameters, and inference, respectively. Following
this procedure, we build a multi-stream end-to-end speaker verification system,
in which the input speech utterance is processed by multiple parallel streams
within different frequency range, so that the acoustic modeling can be more
robust resulting from the diversity of features. Trained with VoxCeleb dataset,
our experimental results verify the effectiveness of our proposed operating
procedure, and also show that our multi-stream framework outperforms
single-stream baseline with 20 % relative reduction in minimum decision cost
function (minDCF).
Related papers
- Q-VLM: Post-training Quantization for Large Vision-Language Models [73.19871905102545]
We propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference.
We mine the cross-layer dependency that significantly influences discretization errors of the entire vision-language model, and embed this dependency into optimal quantization strategy.
Experimental results demonstrate that our method compresses the memory by 2.78x and increase generate speed by 1.44x about 13B LLaVA model without performance degradation.
arXiv Detail & Related papers (2024-10-10T17:02:48Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.
We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.
Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z) - Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning [50.73666458313015]
Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications.
MoE has been emerged as a promising solution with its sparse architecture for effective task decoupling.
Intuition-MoR1E achieves superior efficiency and 2.15% overall accuracy improvement across 14 public datasets.
arXiv Detail & Related papers (2024-04-13T12:14:58Z) - InterroGate: Learning to Share, Specialize, and Prune Representations
for Multi-task Learning [17.66308231838553]
We propose a novel multi-task learning (MTL) architecture designed to mitigate task interference while optimizing inference computational efficiency.
We employ a learnable gating mechanism to automatically balance the shared and task-specific representations while preserving the performance of all tasks.
arXiv Detail & Related papers (2024-02-26T18:59:52Z) - Parameterized Projected Bellman Operator [64.129598593852]
Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL)
We propose a novel alternative approach based on learning an approximate version of the Bellman operator.
We formulate an optimization problem to learn PBO for generic sequential decision-making problems.
arXiv Detail & Related papers (2023-12-20T09:33:16Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Efficient Cross-Task Prompt Tuning for Few-Shot Conversational Emotion
Recognition [6.988000604392974]
Emotion Recognition in Conversation (ERC) has been widely studied due to its importance in developing emotion-aware empathetic machines.
We propose a derivative-free optimization method called Cross-Task Prompt Tuning (CTPT) for few-shot conversational emotion recognition.
arXiv Detail & Related papers (2023-10-23T06:46:03Z) - Scalable Bayesian optimization with high-dimensional outputs using
randomized prior networks [3.0468934705223774]
We propose a deep learning framework for BO and sequential decision making based on bootstrapped ensembles of neural architectures with randomized priors.
We show that the proposed framework can approximate functional relationships between design variables and quantities of interest, even in cases where the latter take values in high-dimensional vector spaces or even infinite-dimensional function spaces.
We test the proposed framework against state-of-the-art methods for BO and demonstrate superior performance across several challenging tasks with high-dimensional outputs.
arXiv Detail & Related papers (2023-02-14T18:55:21Z) - SparCA: Sparse Compressed Agglomeration for Feature Extraction and
Dimensionality Reduction [0.0]
We propose sparse compressed agglomeration (SparCA) as a novel dimensionality reduction procedure.
SparCA is applicable to a wide range of data types, produces highly interpretable features, and shows compelling performance on downstream supervised learning tasks.
arXiv Detail & Related papers (2023-01-26T13:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.