CODA: Constructivism Learning for Instance-Dependent Dropout
Architecture Construction
- URL: http://arxiv.org/abs/2106.08444v1
- Date: Tue, 15 Jun 2021 21:32:28 GMT
- Title: CODA: Constructivism Learning for Instance-Dependent Dropout
Architecture Construction
- Authors: Xiaoli Li
- Abstract summary: We propose Constructivism learning for instance-dependent Dropout Architecture (CODA)
Based on the theory we have designed a better drop out technique, Uniform Process Mixture Models.
We have evaluated our proposed method on 5 real-world datasets and compared the performance with other state-of-the-art dropout techniques.
- Score: 3.2238887070637805
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Dropout is attracting intensive research interest in deep learning as an
efficient approach to prevent overfitting. Recently incorporating structural
information when deciding which units to drop out produced promising results
comparing to methods that ignore the structural information. However, a major
issue of the existing work is that it failed to differentiate among instances
when constructing the dropout architecture. This can be a significant
deficiency for many applications. To solve this issue, we propose
Constructivism learning for instance-dependent Dropout Architecture (CODA),
which is inspired from a philosophical theory, constructivism learning.
Specially, based on the theory we have designed a better drop out technique,
Uniform Process Mixture Models, using a Bayesian nonparametric method Uniform
process. We have evaluated our proposed method on 5 real-world datasets and
compared the performance with other state-of-the-art dropout techniques. The
experimental results demonstrated the effectiveness of CODA.
Related papers
- Towards Robust Out-of-Distribution Generalization: Data Augmentation and Neural Architecture Search Approaches [4.577842191730992]
We study ways toward robust OoD generalization for deep learning.
We first propose a novel and effective approach to disentangle the spurious correlation between features that are not essential for recognition.
We then study the problem of strengthening neural architecture search in OoD scenarios.
arXiv Detail & Related papers (2024-10-25T20:50:32Z) - Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? [60.50127555651554]
Large Language Models (LLMs) show impressive results in numerous practical applications, but they lack essential safety features.
This makes them vulnerable to manipulations such as indirect prompt injections and generally unsuitable for safety-critical tasks.
We introduce a formal measure for instruction-data separation and an empirical variant that is calculable from a model's outputs.
arXiv Detail & Related papers (2024-03-11T15:48:56Z) - Depth-agnostic Single Image Dehazing [12.51359372069387]
We propose a simple yet novel synthetic method to decouple the relationship between haze density and scene depth, by which a depth-agnostic dataset (DA-HAZE) is generated.
Experiments indicate that models trained on DA-HAZE achieve significant improvements on real-world benchmarks, with less discrepancy between SOTS and DA-SOTS.
We revisit the U-Net-based architectures for dehazing, in which dedicatedly designed blocks are incorporated.
arXiv Detail & Related papers (2024-01-14T06:33:11Z) - Boosting the Cross-Architecture Generalization of Dataset Distillation through an Empirical Study [52.83643622795387]
Cross-architecture generalization of dataset distillation weakens its practical significance.
We propose a novel method of EvaLuation with distillation Feature (ELF)
By performing extensive experiments, we successfully prove that ELF can well enhance the cross-architecture generalization of current DD methods.
arXiv Detail & Related papers (2023-12-09T15:41:42Z) - Lightweight Diffusion Models with Distillation-Based Block Neural
Architecture Search [55.41583104734349]
We propose to automatically remove structural redundancy in diffusion models with our proposed Diffusion Distillation-based Block-wise Neural Architecture Search (NAS)
Given a larger pretrained teacher, we leverage DiffNAS to search for the smallest architecture which can achieve on-par or even better performance than the teacher.
Different from previous block-wise NAS methods, DiffNAS contains a block-wise local search strategy and a retraining strategy with a joint dynamic loss.
arXiv Detail & Related papers (2023-11-08T12:56:59Z) - One-for-All: Bridge the Gap Between Heterogeneous Architectures in
Knowledge Distillation [69.65734716679925]
Knowledge distillation has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme.
Most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family.
We propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.
arXiv Detail & Related papers (2023-10-30T11:13:02Z) - A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies.
Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance.
Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z) - The Staged Knowledge Distillation in Video Classification: Harmonizing
Student Progress by a Complementary Weakly Supervised Framework [21.494759678807686]
We propose a new weakly supervised learning framework for knowledge distillation in video classification.
Our approach leverages the concept of substage-based learning to distill knowledge based on the combination of student substages and the correlation of corresponding substages.
Our proposed substage-based distillation approach has the potential to inform future research on label-efficient learning for video data.
arXiv Detail & Related papers (2023-07-11T12:10:42Z) - A Survey on Dropout Methods and Experimental Verification in
Recommendation [34.557554809126415]
Overfitting is a common problem in machine learning, which means the model too closely fits the training data while performing poorly in the test data.
Among various methods of coping with overfitting, dropout is one of the representative ways.
From randomly dropping neurons to dropping neural structures, dropout has achieved great success in improving model performances.
arXiv Detail & Related papers (2022-04-05T07:08:21Z) - Efficient Sub-structured Knowledge Distillation [52.5931565465661]
We propose an approach that is much simpler in its formulation and far more efficient for training than existing approaches.
We transfer the knowledge from a teacher model to its student model by locally matching their predictions on all sub-structures, instead of the whole output space.
arXiv Detail & Related papers (2022-03-09T15:56:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.