On Continual Model Refinement in Out-of-Distribution Data Streams
- URL: http://arxiv.org/abs/2205.02014v1
- Date: Wed, 4 May 2022 11:54:44 GMT
- Title: On Continual Model Refinement in Out-of-Distribution Data Streams
- Authors: Bill Yuchen Lin, Sida Wang, Xi Victoria Lin, Robin Jia, Lin Xiao,
Xiang Ren, Wen-tau Yih
- Abstract summary: Real-world natural language processing (NLP) models need to be continually updated to fix the prediction errors in out-of-distribution (OOD) data streams.
Existing continual learning (CL) problem setups cannot cover such a realistic and complex scenario.
We propose a new CL problem formulation dubbed continual model refinement (CMR)
- Score: 64.62569873799096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-world natural language processing (NLP) models need to be continually
updated to fix the prediction errors in out-of-distribution (OOD) data streams
while overcoming catastrophic forgetting. However, existing continual learning
(CL) problem setups cannot cover such a realistic and complex scenario. In
response to this, we propose a new CL problem formulation dubbed continual
model refinement (CMR). Compared to prior CL settings, CMR is more practical
and introduces unique challenges (boundary-agnostic and non-stationary
distribution shift, diverse mixtures of multiple OOD data clusters,
error-centric streams, etc.). We extend several existing CL approaches to the
CMR setting and evaluate them extensively. For benchmarking and analysis, we
propose a general sampling algorithm to obtain dynamic OOD data streams with
controllable non-stationarity, as well as a suite of metrics measuring various
aspects of online performance. Our experiments and detailed analysis reveal the
promise and challenges of the CMR problem, supporting that studying CMR in
dynamic OOD streams can benefit the longevity of deployed NLP models in
production.
Related papers
- Revisiting Multivariate Time Series Forecasting with Missing Values [65.30332997607141]
Missing values are common in real-world time series.<n>Current approaches have developed an imputation-then-prediction framework that uses imputation modules to fill in missing values, followed by forecasting on the imputed data.<n>This framework overlooks a critical issue: there is no ground truth for the missing values, making the imputation process susceptible to errors that can degrade prediction accuracy.<n>We introduce Consistency-Regularized Information Bottleneck (CRIB), a novel framework built on the Information Bottleneck principle.
arXiv Detail & Related papers (2025-09-27T20:57:48Z) - Monte Carlo Functional Regularisation for Continual Learning [2.2871867623460216]
We present a new functional regularisation CL framework, called MCFRCL, which approximates model prediction distributions by Monte Carlo sampling.<n>The proposed MCFRCL is evaluated against multiple benchmark methods on the MNIST and CIFAR datasets.
arXiv Detail & Related papers (2025-08-18T15:25:37Z) - FMIP: Joint Continuous-Integer Flow For Mixed-Integer Linear Programming [52.52020895303244]
Mixed-Integer Linear Programming (MILP) is a foundational tool for complex decision-making problems.<n>We propose Joint Continuous-Integer Flow for Mixed-Integer Linear Programming (FMIP), which is the first generative framework that models joint distribution of both integer and continuous variables for MILP solutions.<n>FMIP is fully compatible with arbitrary backbone networks and various downstream solvers, making it well-suited for a broad range of real-world MILP applications.
arXiv Detail & Related papers (2025-07-31T10:03:30Z) - Disentangling Uncertainties by Learning Compressed Data Representation [2.959687944707463]
We propose a framework that learns a neural network encoding of the data distribution and enables direct sampling from the output distribution.
Our approach incorporates a novel inference procedure based on Langevin dynamics sampling, allowing CDRM to predict arbitrary output distributions.
arXiv Detail & Related papers (2025-03-20T02:37:48Z) - Analyzing and Mitigating Model Collapse in Rectified Flow Models [23.568835948164065]
Recent studies have shown that repeatedly training on self-generated samples can lead to model collapse.
We provide both theoretical analysis and practical solutions for addressing MC in diffusion/flow models.
We propose a novel Real-data Augmented Reflow and a series of improved variants, which seamlessly integrate real data into Reflow training by leveraging reverse flow.
arXiv Detail & Related papers (2024-12-11T08:05:35Z) - Amortized Inference of Causal Models via Conditional Fixed-Point Iterations [17.427722515310606]
We propose amortized inference of Structural Causal Models (SCMs) by training a single model on multiple datasets sampled from different SCMs.<n>We first use a transformer-based architecture for amortized learning of dataset embeddings, and then extend the Fixed-Point Approach (FiP) to infer SCMs conditionally on their dataset embeddings.<n>As a byproduct, our method can generate observational and interventional data from novel SCMs at inference time, without updating parameters.
arXiv Detail & Related papers (2024-10-08T15:31:33Z) - CMamba: Channel Correlation Enhanced State Space Models for Multivariate Time Series Forecasting [18.50360049235537]
Mamba, a state space model, has emerged with robust sequence and feature mixing capabilities.
Capturing cross-channel dependencies is critical in enhancing performance of time series prediction.
We introduce a refined Mamba variant tailored for time series forecasting.
arXiv Detail & Related papers (2024-06-08T01:32:44Z) - Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models.
This paper investigates the robustness of existing CLTR models in complex and diverse situations.
We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z) - Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states.
This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO)
We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z) - The Capacity and Robustness Trade-off: Revisiting the Channel
Independent Strategy for Multivariate Time Series Forecasting [50.48888534815361]
We show that models trained with the Channel Independent (CI) strategy outperform those trained with the Channel Dependent (CD) strategy.
Our results conclude that the CD approach has higher capacity but often lacks robustness to accurately predict distributionally drifted time series.
We propose a modified CD method called Predict Residuals with Regularization (PRReg) that can surpass the CI strategy.
arXiv Detail & Related papers (2023-04-11T13:15:33Z) - Federated Latent Class Regression for Hierarchical Data [5.110894308882439]
Federated Learning (FL) allows a number of agents to participate in training a global machine learning model without disclosing locally stored data.
We propose a novel probabilistic model, Hierarchical Latent Class Regression (HLCR), and its extension to Federated Learning, FEDHLCR.
Our inference algorithm, being derived from Bayesian theory, provides strong convergence guarantees and good robustness to overfitting. Experimental results show that FEDHLCR offers fast convergence even in non-IID datasets.
arXiv Detail & Related papers (2022-06-22T00:33:04Z) - A Comprehensive Empirical Study of Vision-Language Pre-trained Model for
Supervised Cross-Modal Retrieval [19.2650103482509]
Cross-Modal Retrieval (CMR) is an important research topic across multimodal computing and information retrieval.
We take CLIP as the current representative vision-language pre-trained model to conduct a comprehensive empirical study.
We propose a novel model CLIP4CMR that employs pre-trained CLIP as backbone network to perform supervised CMR.
arXiv Detail & Related papers (2022-01-08T06:00:22Z) - Solving Multistage Stochastic Linear Programming via Regularized Linear
Decision Rules: An Application to Hydrothermal Dispatch Planning [77.34726150561087]
We propose a novel regularization scheme for linear decision rules (LDR) based on the AdaSO (adaptive least absolute shrinkage and selection operator)
Experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve MSLP.
For the LHDP problem, our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark.
arXiv Detail & Related papers (2021-10-07T02:36:14Z) - Distributionally Robust Multi-Output Regression Ranking [3.9318191265352196]
We introduce a new listwise listwise learning-to-rank model called Distributionally Robust Multi-output Regression Ranking (DRMRR)
DRMRR uses a Distributionally Robust Optimization framework to minimize a multi-output loss function under the most adverse distributions in the neighborhood of the empirical data distribution.
Our experiments were conducted on two real-world applications, medical document retrieval, and drug response prediction.
arXiv Detail & Related papers (2021-09-27T05:19:27Z) - Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs)
We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs.
We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z) - MMCGAN: Generative Adversarial Network with Explicit Manifold Prior [78.58159882218378]
We propose to employ explicit manifold learning as prior to alleviate mode collapse and stabilize training of GAN.
Our experiments on both the toy data and real datasets show the effectiveness of MMCGAN in alleviating mode collapse, stabilizing training, and improving the quality of generated samples.
arXiv Detail & Related papers (2020-06-18T07:38:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.