Analyzing and Mitigating Interference in Neural Architecture Search
- URL: http://arxiv.org/abs/2108.12821v1
- Date: Sun, 29 Aug 2021 11:07:46 GMT
- Title: Analyzing and Mitigating Interference in Neural Architecture Search
- Authors: Jin Xu, Xu Tan, Kaitao Song, Renqian Luo, Yichong Leng, Tao Qin,
Tie-Yan Liu, Jian Li
- Abstract summary: We investigate the interference issue by sampling different child models and calculating the gradient similarity of shared operators.
Inspired by these two observations, we propose two approaches to mitigate the interference.
Our searched architecture outperforms RoBERTa$_rm base$ by 1.1 and 0.6 scores and ELECTRA$_rm base$ by 1.6 and 1.1 scores on the dev and test set of GLUE benchmark.
- Score: 96.60805562853153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weight sharing has become the \textit{de facto} approach to reduce the
training cost of neural architecture search (NAS) by reusing the weights of
shared operators from previously trained child models. However, the estimated
accuracy of those child models has a low rank correlation with the ground truth
accuracy due to the interference among different child models caused by weight
sharing. In this paper, we investigate the interference issue by sampling
different child models and calculating the gradient similarity of shared
operators, and observe that: 1) the interference on a shared operator between
two child models is positively correlated to the number of different operators
between them; 2) the interference is smaller when the inputs and outputs of the
shared operator are more similar. Inspired by these two observations, we
propose two approaches to mitigate the interference: 1) rather than randomly
sampling child models for optimization, we propose a gradual modification
scheme by modifying one operator between adjacent optimization steps to
minimize the interference on the shared operators; 2) forcing the inputs and
outputs of the operator across all child models to be similar to reduce the
interference. Experiments on a BERT search space verify that mitigating
interference via each of our proposed methods improves the rank correlation of
super-pet and combining both methods can achieve better results. Our searched
architecture outperforms RoBERTa$_{\rm base}$ by 1.1 and 0.6 scores and
ELECTRA$_{\rm base}$ by 1.6 and 1.1 scores on the dev and test set of GLUE
benchmark. Extensive results on the BERT compression task, SQuAD datasets and
other search spaces also demonstrate the effectiveness and generality of our
proposed methods.
Related papers
- Semisupervised score based matching algorithm to evaluate the effect of public health interventions [3.221788913179251]
In one-to-one matching algorithms, a large number of "pairs" to be matched could mean both the information from a large sample and a large number of tasks.
We propose a novel one-to-one matching algorithm based on a quadratic score function $S_beta(x_i,x_j)= betaT (x_i-x_j)(x_i-x_j)T beta$.
arXiv Detail & Related papers (2024-03-19T02:24:16Z) - Noisy Pair Corrector for Dense Retrieval [59.312376423104055]
We propose a novel approach called Noisy Pair Corrector (NPC)
NPC consists of a detection module and a correction module.
We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
arXiv Detail & Related papers (2023-11-07T08:27:14Z) - Inferring effective couplings with Restricted Boltzmann Machines [3.150368120416908]
Generative models attempt to encode correlations observed in the data at the level of the Boltzmann weight associated with an energy function in the form of a neural network.
We propose a solution by implementing a direct mapping between the Restricted Boltzmann Machine and an effective Ising spin Hamiltonian.
arXiv Detail & Related papers (2023-09-05T14:55:09Z) - Semi-DETR: Semi-Supervised Object Detection with Detection Transformers [105.45018934087076]
We analyze the DETR-based framework on semi-supervised object detection (SSOD)
We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector.
Our method outperforms all state-of-the-art methods by clear margins.
arXiv Detail & Related papers (2023-07-16T16:32:14Z) - Hausdorff Distance Matching with Adaptive Query Denoising for Rotated Detection Transformer [4.137346786534721]
We introduce a Hausdorff distance-based cost for bipartite matching, which more accurately quantifies the discrepancy between predictions and ground truths.
We propose an adaptive query denoising method that employs bipartite matching to selectively eliminate noised queries that detract from model improvement.
arXiv Detail & Related papers (2023-05-12T16:42:54Z) - Modeling Instance Interactions for Joint Information Extraction with
Neural High-Order Conditional Random Field [39.055053720433435]
We introduce a joint IE framework (CRFIE) that formulates joint IE as a high-order Conditional Random Field.
Specifically, we design binary factors and ternary factors to directly model interactions between not only a pair of instances but also triplets.
We incorporate a high-order neural decoder that is unfolded from a mean-field variational inference method.
arXiv Detail & Related papers (2022-12-17T18:45:23Z) - Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning [92.05556163518999]
MARL exacerbates matters by imposing various constraints on communication and observability.
For value-based methods, it poses challenges in accurately representing the optimal value function.
For policy gradient methods, it makes training the critic difficult and exacerbates the problem of the lagging critic.
We show that from a learning theory perspective, both problems can be addressed by accurately representing the associated action-value function.
arXiv Detail & Related papers (2021-05-31T23:08:05Z) - DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets.
We propose an efficient and effective data augmentation method called DecAug for HOI detection.
Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z) - Relabel the Noise: Joint Extraction of Entities and Relations via
Cooperative Multiagents [52.55119217982361]
We propose a joint extraction approach to handle noisy instances with a group of cooperative multiagents.
To handle noisy instances in a fine-grained manner, each agent in the cooperative group evaluates the instance by calculating a continuous confidence score from its own perspective.
A confidence consensus module is designed to gather the wisdom of all agents and re-distribute the noisy training set with confidence-scored labels.
arXiv Detail & Related papers (2020-04-21T12:03:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.