Neural String Edit Distance
- URL: http://arxiv.org/abs/2104.08388v1
- Date: Fri, 16 Apr 2021 22:16:47 GMT
- Title: Neural String Edit Distance
- Authors: Jind\v{r}ich Libovick\'y, Alexander Fraser
- Abstract summary: We propose the neural string edit distance model for string-pair classification and sequence generation.
We modify the original expectation-maximization learned edit distance algorithm into a differentiable loss function.
We show that we can trade off between performance and interpretability in a single framework.
- Score: 77.72325513792981
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose the neural string edit distance model for string-pair
classification and sequence generation based on learned string edit distance.
We modify the original expectation-maximization learned edit distance algorithm
into a differentiable loss function, allowing us to integrate it into a neural
network providing a contextual representation of the input. We test the method
on cognate detection, transliteration, and grapheme-to-phoneme conversion. We
show that we can trade off between performance and interpretability in a single
framework. Using contextual representations, which are difficult to interpret,
we can match the performance of state-of-the-art string-pair classification
models. Using static embeddings and a minor modification of the loss function,
we can force interpretability, at the expense of an accuracy drop.
Related papers
- Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis.
Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z) - QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input [17.017127559393398]
We propose a differentiable soft quantizer, which better simulates the gradient of the round function during backpropagation.
This enables the network to learn from subtle input perturbations.
We further refine the training strategy to ensure convergence while simulating quantization errors.
arXiv Detail & Related papers (2024-05-22T17:34:18Z) - Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - Transparency at the Source: Evaluating and Interpreting Language Models
With Access to the True Distribution [4.01799362940916]
We present a setup for training, evaluating and interpreting neural language models, that uses artificial, language-like data.
The data is generated using a massive probabilistic grammar, that is itself derived from a large natural language corpus.
With access to the underlying true source, our results show striking differences and outcomes in learning dynamics between different classes of words.
arXiv Detail & Related papers (2023-10-23T12:03:01Z) - Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - A Comparison of Transformer, Convolutional, and Recurrent Neural
Networks on Phoneme Recognition [16.206467862132012]
We compare and analyze CNN, RNN, Transformer, and Conformer models using phoneme recognition.
Our analyses show that Transformer and Conformer models benefit from the long-range accessibility of self-attention through input frames.
arXiv Detail & Related papers (2022-10-01T20:47:25Z) - Predicting What You Already Know Helps: Provable Self-Supervised
Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data.
We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation.
We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z) - Logic Constrained Pointer Networks for Interpretable Textual Similarity [11.142649867439406]
We introduce a novel pointer network based model with a sentinel gating function to align constituent chunks.
We improve this base model with a loss function to equally penalize misalignments in both sentences, ensuring the alignments are bidirectional.
The model achieves an F1 score of 97.73 and 96.32 on the benchmark SemEval datasets for the chunk alignment task.
arXiv Detail & Related papers (2020-07-15T13:01:44Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.