Related papers: Mitigating Catastrophic Forgetting in Long Short-Term Memory Networks

Mitigating Catastrophic Forgetting in Long Short-Term Memory Networks

URL: http://arxiv.org/abs/2305.17244v1
Date: Fri, 26 May 2023 20:17:18 GMT
Title: Mitigating Catastrophic Forgetting in Long Short-Term Memory Networks
Authors: Ketaki Joshi, Raghavendra Pradyumna Pothukuchi, Andre Wibisono, Abhishek Bhattacharjee
Abstract summary: Continual learning on sequential data is critical for many machine learning (ML) deployments. LSTM networks suffer from catastrophic forgetting and are limited in their ability to learn multiple tasks continually. We discover that catastrophic forgetting in LSTM networks can be overcome in two novel and readily-implementable ways.
Score: 7.291687946822538
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continual learning on sequential data is critical for many machine learning (ML) deployments. Unfortunately, LSTM networks, which are commonly used to learn on sequential data, suffer from catastrophic forgetting and are limited in their ability to learn multiple tasks continually. We discover that catastrophic forgetting in LSTM networks can be overcome in two novel and readily-implementable ways -- separating the LSTM memory either for each task or for each target label. Our approach eschews the need for explicit regularization, hypernetworks, and other complex methods. We quantify the benefits of our approach on recently-proposed LSTM networks for computer memory access prefetching, an important sequential learning problem in ML-based computer system optimization. Compared to state-of-the-art weight regularization methods to mitigate catastrophic forgetting, our approach is simple, effective, and enables faster learning. We also show that our proposal enables the use of small, non-regularized LSTM networks for complex natural language processing in the offline learning scenario, which was previously considered difficult.

Related papers

Learning for Cross-Layer Resource Allocation in MEC-Aided Cell-Free Networks [71.30914500714262]
Cross-layer resource allocation over mobile edge computing (MEC)-aided cell-free networks can sufficiently exploit the transmitting and computing resources to promote the data rate. Joint subcarrier allocation and beamforming optimization are investigated for the MEC-aided cell-free network from the perspective of deep learning.
arXiv Detail & Related papers (2024-12-21T10:18:55Z)
RED-CT: A Systems Design Methodology for Using LLM-labeled Data to Train and Deploy Edge Classifiers for Computational Social Science [0.46560775769914914]
Large language models (LLMs) have enhanced our ability to rapidly analyze and classify unstructured natural language data. However, concerns regarding cost, network limitations, and security constraints have posed challenges for their integration into work processes. In this study, we adopt a systems design approach to employing LLMs as imperfect data annotators for downstream supervised learning tasks.
arXiv Detail & Related papers (2024-08-15T15:28:37Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs. We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z)
netFound: Foundation Model for Network Security [11.38388749887112]
This paper introduces a novel transformer-based network foundation model, netFound. We employ self-supervised learning techniques on abundant, unlabeled network telemetry data for pre-training. Our results demonstrate that netFound effectively captures the hidden networking context in production settings.
arXiv Detail & Related papers (2023-10-25T22:04:57Z)
On the Soft-Subnetwork for Few-shot Class Incremental Learning [67.0373924836107]
We propose a few-shot class incremental learning (FSCIL) method referred to as emphSoft-SubNetworks (SoftNet). Our objective is to learn a sequence of sessions incrementally, where each session only includes a few training instances per class while preserving the knowledge of the previously learned ones. We provide comprehensive empirical validations demonstrating that our SoftNet effectively tackles the few-shot incremental learning problem by surpassing the performance of state-of-the-art baselines over benchmark datasets.
arXiv Detail & Related papers (2022-09-15T04:54:02Z)
Learning towards Synchronous Network Memorizability and Generalizability for Continual Segmentation across Multiple Sites [52.84959869494459]
In clinical practice, a segmentation network is often required to continually learn on a sequential data stream from multiple sites. Existing methods are usually restricted in either network memorizability on previous sites or generalizability on unseen sites. This paper aims to tackle the problem of Synchronous Memorizability and Generalizability with a novel proposed SMG-learning framework.
arXiv Detail & Related papers (2022-06-14T13:04:36Z)
LSTMSPLIT: Effective SPLIT Learning based LSTM on Sequential Time-Series Data [3.9011223632827385]
We propose a new approach, LSTMSPLIT, that uses SL architecture with an LSTM network to classify time-series data with multiple clients. The proposed method, LSTMSPLIT, has achieved better or reasonable accuracy compared to the Split-1DCNN method using the electrocardiogram dataset and the human activity recognition dataset.
arXiv Detail & Related papers (2022-03-08T11:44:12Z)
La-MAML: Look-ahead Meta Learning for Continual Learning [14.405620521842621]
We propose Look-ahead MAML (La-MAML), a fast optimisation-based meta-learning algorithm for online-continual learning, aided by a small episodic memory. La-MAML achieves performance superior to other replay-based, prior-based and meta-learning based approaches for continual learning on real-world visual classification benchmarks.
arXiv Detail & Related papers (2020-07-27T23:07:01Z)
Neuromodulated Neural Architectures with Local Error Signals for Memory-Constrained Online Continual Learning [4.2903672492917755]
We develop a biologically-inspired light weight neural network architecture that incorporates local learning and neuromodulation. We demonstrate the efficacy of our approach on both single task and continual learning setting.
arXiv Detail & Related papers (2020-07-16T07:41:23Z)
Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs) We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z)
LSM: Learning Subspace Minimization for Low-level Vision [78.27774638569218]
We replace the regularization term with a learnable subspace constraint, and preserve the data term to exploit domain knowledge. This learning subspace minimization (LSM) framework unifies the network structures and the parameters for many low-level vision tasks. We demonstrate our LSM framework on four low-level tasks including interactive image segmentation, video segmentation, stereo matching, and optical flow, and validate the network on various datasets.
arXiv Detail & Related papers (2020-04-20T10:49:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.