Related papers: Memorisation Cartography: Mapping out the Memorisation-Generalisation Continuum in Neural Machine Translation

Memorisation Cartography: Mapping out the Memorisation-Generalisation Continuum in Neural Machine Translation

URL: http://arxiv.org/abs/2311.05379v1
Date: Thu, 9 Nov 2023 14:03:51 GMT
Title: Memorisation Cartography: Mapping out the Memorisation-Generalisation Continuum in Neural Machine Translation
Authors: Verna Dankers, Ivan Titov and Dieuwke Hupkes
Abstract summary: We use the counterfactual memorisation metric to build a resource that places 5M NMT datapoints on a memorisation-generalisation map. We also illustrate how the datapoints' surface-level characteristics and a models' per-datum training signals are predictive of memorisation in NMT.
Score: 41.816534359921896
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When training a neural network, it will quickly memorise some source-target mappings from your dataset but never learn some others. Yet, memorisation is not easily expressed as a binary feature that is good or bad: individual datapoints lie on a memorisation-generalisation continuum. What determines a datapoint's position on that spectrum, and how does that spectrum influence neural models' performance? We address these two questions for neural machine translation (NMT) models. We use the counterfactual memorisation metric to (1) build a resource that places 5M NMT datapoints on a memorisation-generalisation map, (2) illustrate how the datapoints' surface-level characteristics and a models' per-datum training signals are predictive of memorisation in NMT, (3) and describe the influence that subsets of that map have on NMT systems' performance.

Related papers

Generalisation First, Memorisation Second? Memorisation Localisation for Natural Language Classification Tasks [33.1099258648462]
Memorisation is a natural part of learning from real-world data. We show that memorisation is a gradual process rather than a localised one.
arXiv Detail & Related papers (2024-08-09T09:30:57Z)
Measuring Feature Dependency of Neural Networks by Collapsing Feature Dimensions in the Data Manifold [18.64569268049846]
We introduce a new technique to measure the feature dependency of neural network models. The motivation is to better understand a model by querying whether it is using information from human-understandable features. We test our method on deep neural network models trained on synthetic image data with known ground truth.
arXiv Detail & Related papers (2024-04-18T17:10:18Z)
Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory [66.88278207591294]
We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data. PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities.
arXiv Detail & Related papers (2024-04-18T03:03:46Z)
Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process. We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z)
Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks. By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead. We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z)
Selective Memory Recursive Least Squares: Recast Forgetting into Memory in RBF Neural Network Based Real-Time Learning [2.31120983784623]
In radial basis function neural network (RBFNN) based real-time learning tasks, forgetting mechanisms are widely used. This paper proposes a real-time training method named selective memory recursive least squares (SMRLS) in which the classical forgetting mechanisms are recast into a memory mechanism. With SMRLS, the input space of the RBFNN is evenly divided into a finite number of partitions and a synthesized objective function is developed using synthesized samples from each partition.
arXiv Detail & Related papers (2022-11-15T05:29:58Z)
Tree Mover's Distance: Bridging Graph Metrics and Stability of Graph Neural Networks [54.225220638606814]
We propose a pseudometric for attributed graphs, the Tree Mover's Distance (TMD), and study its relation to generalization. First, we show that TMD captures properties relevant to graph classification; a simple TMD-SVM performs competitively with standard GNNs. Second, we relate TMD to generalization of GNNs under distribution shifts, and show that it correlates well with performance drop under such shifts.
arXiv Detail & Related papers (2022-10-04T21:03:52Z)
Dendritic Self-Organizing Maps for Continual Learning [0.0]
We propose a novel algorithm inspired by biological neurons, termed Dendritic-Self-Organizing Map (DendSOM) DendSOM consists of a single layer of SOMs, which extract patterns from specific regions of the input space. It outperforms classical SOMs and several state-of-the-art continual learning algorithms on benchmark datasets.
arXiv Detail & Related papers (2021-10-18T14:47:19Z)
Training Binary Neural Networks through Learning with Noisy Supervision [76.26677550127656]
This paper formalizes the binarization operations over neural networks from a learning perspective. Experimental results on benchmark datasets indicate that the proposed binarization technique attains consistent improvements over baselines.
arXiv Detail & Related papers (2020-10-10T01:59:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.