The Two Dimensions of Worst-case Training and the Integrated Effect for
Out-of-domain Generalization
- URL: http://arxiv.org/abs/2204.04384v1
- Date: Sat, 9 Apr 2022 04:14:55 GMT
- Title: The Two Dimensions of Worst-case Training and the Integrated Effect for
Out-of-domain Generalization
- Authors: Zeyi Huang, Haohan Wang, Dong Huang, Yong Jae Lee, Eric P. Xing
- Abstract summary: We propose a new, simple yet effective, generalization to train machine learning models.
We name our method W2D following the concept of "Worst-case along Two Dimensions"
- Score: 95.34898583368154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training with an emphasis on "hard-to-learn" components of the data has been
proven as an effective method to improve the generalization of machine learning
models, especially in the settings where robustness (e.g., generalization
across distributions) is valued. Existing literature discussing this
"hard-to-learn" concept are mainly expanded either along the dimension of the
samples or the dimension of the features. In this paper, we aim to introduce a
simple view merging these two dimensions, leading to a new, simple yet
effective, heuristic to train machine learning models by emphasizing the
worst-cases on both the sample and the feature dimensions. We name our method
W2D following the concept of "Worst-case along Two Dimensions". We validate the
idea and demonstrate its empirical strength over standard benchmarks.
Related papers
- 2D Matryoshka Training for Information Retrieval [32.44832240958393]
2D Matryoshka Training is an embedding representation training approach designed to train an encoder model simultaneously across various layer-dimension setups.
We implement and evaluate both versions of 2D Matryoshka Training on STS tasks and extend our analysis to retrieval tasks.
arXiv Detail & Related papers (2024-11-26T10:47:35Z) - Starbucks: Improved Training for 2D Matryoshka Embeddings [32.44832240958393]
We propose Starbucks, a new training strategy for Matryoshka-like embedding models.
For the fine-tuning phase, we provide a fixed list of layer-dimension pairs, from small size to large sizes.
We also introduce a new pre-training strategy, which applies masked autoencoder language modelling to sub-layers and sub-dimensions.
arXiv Detail & Related papers (2024-10-17T05:33:50Z) - Understanding the Double Descent Phenomenon in Deep Learning [49.1574468325115]
This tutorial sets the classical statistical learning framework and introduces the double descent phenomenon.
By looking at a number of examples, section 2 introduces inductive biases that appear to have a key role in double descent by selecting.
section 3 explores the double descent with two linear models, and gives other points of view from recent related works.
arXiv Detail & Related papers (2024-03-15T16:51:24Z) - Bidirectional Trained Tree-Structured Decoder for Handwritten
Mathematical Expression Recognition [51.66383337087724]
The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR.
Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models.
We propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure.
arXiv Detail & Related papers (2023-12-31T09:24:21Z) - Full High-Dimensional Intelligible Learning In 2-D Lossless
Visualization Space [7.005458308454871]
This study explores a new methodology for machine learning classification tasks in 2-D visualization space (2-D ML)
It is shown that this is a full machine learning approach that does not require processing n-dimensional data in an abstract n-dimensional space.
It enables discovering n-D patterns in 2-D space without loss of n-D information using graph representations of n-D data in 2-D.
arXiv Detail & Related papers (2023-05-29T00:21:56Z) - Adaptive Cross Batch Normalization for Metric Learning [75.91093210956116]
Metric learning is a fundamental problem in computer vision.
We show that it is equally important to ensure that the accumulated embeddings are up to date.
In particular, it is necessary to circumvent the representational drift between the accumulated embeddings and the feature embeddings at the current training iteration.
arXiv Detail & Related papers (2023-03-30T03:22:52Z) - Toward Learning Robust and Invariant Representations with Alignment
Regularization and Data Augmentation [76.85274970052762]
This paper is motivated by a proliferation of options of alignment regularizations.
We evaluate the performances of several popular design choices along the dimensions of robustness and invariance.
We also formally analyze the behavior of alignment regularization to complement our empirical study under assumptions we consider realistic.
arXiv Detail & Related papers (2022-06-04T04:29:19Z) - Adaptive Hierarchical Similarity Metric Learning with Noisy Labels [138.41576366096137]
We propose an Adaptive Hierarchical Similarity Metric Learning method.
It considers two noise-insensitive information, textiti.e., class-wise divergence and sample-wise consistency.
Our method achieves state-of-the-art performance compared with current deep metric learning approaches.
arXiv Detail & Related papers (2021-10-29T02:12:18Z) - Manifold attack [0.22419496088582863]
In this paper, we enforce the manifold preservation (manifold learning) from the original data into latent presentation.
We show that our approach of regularization provides improvements for the accuracy rate and for the robustness to adversarial examples.
arXiv Detail & Related papers (2020-09-13T09:39:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.