Related papers: On Continual Model Refinement in Out-of-Distribution Data Streams

On Continual Model Refinement in Out-of-Distribution Data Streams

URL: http://arxiv.org/abs/2205.02014v1
Date: Wed, 4 May 2022 11:54:44 GMT
Title: On Continual Model Refinement in Out-of-Distribution Data Streams
Authors: Bill Yuchen Lin, Sida Wang, Xi Victoria Lin, Robin Jia, Lin Xiao, Xiang Ren, Wen-tau Yih
Abstract summary: Real-world natural language processing (NLP) models need to be continually updated to fix the prediction errors in out-of-distribution (OOD) data streams. Existing continual learning (CL) problem setups cannot cover such a realistic and complex scenario. We propose a new CL problem formulation dubbed continual model refinement (CMR)
Score: 64.62569873799096
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-world natural language processing (NLP) models need to be continually updated to fix the prediction errors in out-of-distribution (OOD) data streams while overcoming catastrophic forgetting. However, existing continual learning (CL) problem setups cannot cover such a realistic and complex scenario. In response to this, we propose a new CL problem formulation dubbed continual model refinement (CMR). Compared to prior CL settings, CMR is more practical and introduces unique challenges (boundary-agnostic and non-stationary distribution shift, diverse mixtures of multiple OOD data clusters, error-centric streams, etc.). We extend several existing CL approaches to the CMR setting and evaluate them extensively. For benchmarking and analysis, we propose a general sampling algorithm to obtain dynamic OOD data streams with controllable non-stationarity, as well as a suite of metrics measuring various aspects of online performance. Our experiments and detailed analysis reveal the promise and challenges of the CMR problem, supporting that studying CMR in dynamic OOD streams can benefit the longevity of deployed NLP models in production.

Related papers

Disentangling Uncertainties by Learning Compressed Data Representation [2.959687944707463]
We propose a framework that learns a neural network encoding of the data distribution and enables direct sampling from the output distribution. Our approach incorporates a novel inference procedure based on Langevin dynamics sampling, allowing CDRM to predict arbitrary output distributions.
arXiv Detail & Related papers (2025-03-20T02:37:48Z)
Analyzing and Mitigating Model Collapse in Rectified Flow Models [23.568835948164065]
Recent studies have shown that repeatedly training on self-generated samples can lead to model collapse. We provide both theoretical analysis and practical solutions for addressing MC in diffusion/flow models. We propose a novel Real-data Augmented Reflow and a series of improved variants, which seamlessly integrate real data into Reflow training by leveraging reverse flow.
arXiv Detail & Related papers (2024-12-11T08:05:35Z)
Amortized Inference of Causal Models via Conditional Fixed-Point Iterations [17.427722515310606]
We propose amortized inference of Structural Causal Models (SCMs) by training a single model on multiple datasets sampled from different SCMs.<n>We first use a transformer-based architecture for amortized learning of dataset embeddings, and then extend the Fixed-Point Approach (FiP) to infer SCMs conditionally on their dataset embeddings.<n>As a byproduct, our method can generate observational and interventional data from novel SCMs at inference time, without updating parameters.
arXiv Detail & Related papers (2024-10-08T15:31:33Z)
CMamba: Channel Correlation Enhanced State Space Models for Multivariate Time Series Forecasting [18.50360049235537]
Mamba, a state space model, has emerged with robust sequence and feature mixing capabilities. Capturing cross-channel dependencies is critical in enhancing performance of time series prediction. We introduce a refined Mamba variant tailored for time series forecasting.
arXiv Detail & Related papers (2024-06-08T01:32:44Z)
Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models. This paper investigates the robustness of existing CLTR models in complex and diverse situations. We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z)
Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states. This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO) We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z)
The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting [50.48888534815361]
We show that models trained with the Channel Independent (CI) strategy outperform those trained with the Channel Dependent (CD) strategy. Our results conclude that the CD approach has higher capacity but often lacks robustness to accurately predict distributionally drifted time series. We propose a modified CD method called Predict Residuals with Regularization (PRReg) that can surpass the CI strategy.
arXiv Detail & Related papers (2023-04-11T13:15:33Z)
Federated Latent Class Regression for Hierarchical Data [5.110894308882439]
Federated Learning (FL) allows a number of agents to participate in training a global machine learning model without disclosing locally stored data. We propose a novel probabilistic model, Hierarchical Latent Class Regression (HLCR), and its extension to Federated Learning, FEDHLCR. Our inference algorithm, being derived from Bayesian theory, provides strong convergence guarantees and good robustness to overfitting. Experimental results show that FEDHLCR offers fast convergence even in non-IID datasets.
arXiv Detail & Related papers (2022-06-22T00:33:04Z)
A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval [19.2650103482509]
Cross-Modal Retrieval (CMR) is an important research topic across multimodal computing and information retrieval. We take CLIP as the current representative vision-language pre-trained model to conduct a comprehensive empirical study. We propose a novel model CLIP4CMR that employs pre-trained CLIP as backbone network to perform supervised CMR.
arXiv Detail & Related papers (2022-01-08T06:00:22Z)
Solving Multistage Stochastic Linear Programming via Regularized Linear Decision Rules: An Application to Hydrothermal Dispatch Planning [77.34726150561087]
We propose a novel regularization scheme for linear decision rules (LDR) based on the AdaSO (adaptive least absolute shrinkage and selection operator) Experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve MSLP. For the LHDP problem, our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark.
arXiv Detail & Related papers (2021-10-07T02:36:14Z)
Distributionally Robust Multi-Output Regression Ranking [3.9318191265352196]
We introduce a new listwise listwise learning-to-rank model called Distributionally Robust Multi-output Regression Ranking (DRMRR) DRMRR uses a Distributionally Robust Optimization framework to minimize a multi-output loss function under the most adverse distributions in the neighborhood of the empirical data distribution. Our experiments were conducted on two real-world applications, medical document retrieval, and drug response prediction.
arXiv Detail & Related papers (2021-09-27T05:19:27Z)
Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs) We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z)
MMCGAN: Generative Adversarial Network with Explicit Manifold Prior [78.58159882218378]
We propose to employ explicit manifold learning as prior to alleviate mode collapse and stabilize training of GAN. Our experiments on both the toy data and real datasets show the effectiveness of MMCGAN in alleviating mode collapse, stabilizing training, and improving the quality of generated samples.
arXiv Detail & Related papers (2020-06-18T07:38:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.