Related papers: Generalized Operating Procedure for Deep Learning: an Unconstrained Optimal Design Perspective

Generalized Operating Procedure for Deep Learning: an Unconstrained Optimal Design Perspective

URL: http://arxiv.org/abs/2012.15391v1
Date: Thu, 31 Dec 2020 01:37:56 GMT
Title: Generalized Operating Procedure for Deep Learning: an Unconstrained Optimal Design Perspective
Authors: Shen Chen, Mingwei Zhang, Jiamin Cui, Wei Yao
Abstract summary: We present a generalized operating procedure for deep learning (DL) for real use cases. We build a multi-stream end-to-end speaker verification system, in which the input speech utterance is processed by multiple parallel streams. Trained with VoxCeleb dataset, our results verify the effectiveness of our proposed operating procedure.
Score: 4.570823264643028
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning (DL) has brought about remarkable breakthrough in processing images, video and speech due to its efficacy in extracting highly abstract representation and learning very complex functions. However, there is seldom operating procedure reported on how to make it for real use cases. In this paper, we intend to address this problem by presenting a generalized operating procedure for DL from the perspective of unconstrained optimal design, which is motivated by a simple intension to remove the barrier of using DL, especially for those scientists or engineers who are new but eager to use it. Our proposed procedure contains seven steps, which are project/problem statement, data collection, architecture design, initialization of parameters, defining loss function, computing optimal parameters, and inference, respectively. Following this procedure, we build a multi-stream end-to-end speaker verification system, in which the input speech utterance is processed by multiple parallel streams within different frequency range, so that the acoustic modeling can be more robust resulting from the diversity of features. Trained with VoxCeleb dataset, our experimental results verify the effectiveness of our proposed operating procedure, and also show that our multi-stream framework outperforms single-stream baseline with 20 % relative reduction in minimum decision cost function (minDCF).

Related papers

Learning Free Token Reduction for Multi-Modal Large Language Models [3.4026156483879517]
Vision-Language Models (VLMs) have achieved remarkable success across a range of multimodal tasks. However, their practical deployment is often constrained by high computational costs and prolonged inference times. We propose a token compression paradigm that operates on both spatial and temporal dimensions.
arXiv Detail & Related papers (2025-01-29T02:52:32Z)
Q-VLM: Post-training Quantization for Large Vision-Language Models [73.19871905102545]
We propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference. We mine the cross-layer dependency that significantly influences discretization errors of the entire vision-language model, and embed this dependency into optimal quantization strategy. Experimental results demonstrate that our method compresses the memory by 2.78x and increase generate speed by 1.44x about 13B LLaVA model without performance degradation.
arXiv Detail & Related papers (2024-10-10T17:02:48Z)
PAR: Prompt-Aware Token Reduction Method for Efficient Large Multimodal Models [32.33892531885448]
Multimodal large language models (MLLMs) demonstrate strong performance across visual tasks. But their efficiency is hindered by significant computational and memory demands from processing long contexts in multimodal inputs. We introduce PAR (Prompt-Aware Token Reduction), a novel and plug-and-play approach that reduces visual tokens efficiently without compromising model performance.
arXiv Detail & Related papers (2024-10-09T07:13:22Z)
Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation. In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales. Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z)
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z)
Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning [50.73666458313015]
Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications. MoE has been emerged as a promising solution with its sparse architecture for effective task decoupling. Intuition-MoR1E achieves superior efficiency and 2.15% overall accuracy improvement across 14 public datasets.
arXiv Detail & Related papers (2024-04-13T12:14:58Z)
InterroGate: Learning to Share, Specialize, and Prune Representations for Multi-task Learning [17.66308231838553]
We propose a novel multi-task learning (MTL) architecture designed to mitigate task interference while optimizing inference computational efficiency. We employ a learnable gating mechanism to automatically balance the shared and task-specific representations while preserving the performance of all tasks.
arXiv Detail & Related papers (2024-02-26T18:59:52Z)
Parameterized Projected Bellman Operator [64.129598593852]
Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL) We propose a novel alternative approach based on learning an approximate version of the Bellman operator. We formulate an optimization problem to learn PBO for generic sequential decision-making problems.
arXiv Detail & Related papers (2023-12-20T09:33:16Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
Efficient Cross-Task Prompt Tuning for Few-Shot Conversational Emotion Recognition [6.988000604392974]
Emotion Recognition in Conversation (ERC) has been widely studied due to its importance in developing emotion-aware empathetic machines. We propose a derivative-free optimization method called Cross-Task Prompt Tuning (CTPT) for few-shot conversational emotion recognition.
arXiv Detail & Related papers (2023-10-23T06:46:03Z)
Scalable Bayesian optimization with high-dimensional outputs using randomized prior networks [3.0468934705223774]
We propose a deep learning framework for BO and sequential decision making based on bootstrapped ensembles of neural architectures with randomized priors. We show that the proposed framework can approximate functional relationships between design variables and quantities of interest, even in cases where the latter take values in high-dimensional vector spaces or even infinite-dimensional function spaces. We test the proposed framework against state-of-the-art methods for BO and demonstrate superior performance across several challenging tasks with high-dimensional outputs.
arXiv Detail & Related papers (2023-02-14T18:55:21Z)
SparCA: Sparse Compressed Agglomeration for Feature Extraction and Dimensionality Reduction [0.0]
We propose sparse compressed agglomeration (SparCA) as a novel dimensionality reduction procedure. SparCA is applicable to a wide range of data types, produces highly interpretable features, and shows compelling performance on downstream supervised learning tasks.
arXiv Detail & Related papers (2023-01-26T13:59:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.