Language Through a Prism: A Spectral Approach for Multiscale Language
Representations
- URL: http://arxiv.org/abs/2011.04823v1
- Date: Mon, 9 Nov 2020 23:17:43 GMT
- Title: Language Through a Prism: A Spectral Approach for Multiscale Language
Representations
- Authors: Alex Tamkin, Dan Jurafsky, Noah Goodman
- Abstract summary: We show that signal processing provides a natural framework for separating structure across scales.
We apply spectral filters to the activations of a neuron across an input, producing filtered embeddings that perform well on part of speech tagging.
We also present a prism layer for training models, which uses spectral filters to constrain different neurons to model structure at different scales.
- Score: 30.224517199646993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language exhibits structure at different scales, ranging from subwords to
words, sentences, paragraphs, and documents. To what extent do deep models
capture information at these scales, and can we force them to better capture
structure across this hierarchy? We approach this question by focusing on
individual neurons, analyzing the behavior of their activations at different
timescales. We show that signal processing provides a natural framework for
separating structure across scales, enabling us to 1) disentangle
scale-specific information in existing embeddings and 2) train models to learn
more about particular scales. Concretely, we apply spectral filters to the
activations of a neuron across an input, producing filtered embeddings that
perform well on part of speech tagging (word-level), dialog speech acts
classification (utterance-level), or topic classification (document-level),
while performing poorly on the other tasks. We also present a prism layer for
training models, which uses spectral filters to constrain different neurons to
model structure at different scales. Our proposed BERT + Prism model can better
predict masked tokens using long-range context and produces multiscale
representations that perform better at utterance- and document-level tasks. Our
methods are general and readily applicable to other domains besides language,
such as images, audio, and video.
Related papers
- SyllableLM: Learning Coarse Semantic Units for Speech Language Models [21.762112843104028]
We introduce a controllable self-supervised technique to merge speech representations into coarser syllable-like units.
Our method produces controllable-rate semantic units at as low as 5Hz and 60bps and SotA inc segmentation and clustering.
SyllableLM achieves significant improvements in efficiency with a 30x reduction in training compute and a 4x wall-clock inference speedup.
arXiv Detail & Related papers (2024-10-05T04:29:55Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - MGDoc: Pre-training with Multi-granular Hierarchy for Document Image
Understanding [53.03978356918377]
spatial hierarchical relationships between content at different levels of granularity are crucial for document image understanding tasks.
Existing methods learn features from either word-level or region-level but fail to consider both simultaneously.
We propose MGDoc, a new multi-modal multi-granular pre-training framework that encodes page-level, region-level, and word-level information at the same time.
arXiv Detail & Related papers (2022-11-27T22:47:37Z) - Bidirectional Representations for Low Resource Spoken Language
Understanding [39.208462511430554]
We propose a representation model to encode speech in bidirectional rich encodings.
The approach uses a masked language modelling objective to learn the representations.
We show that the performance of the resulting encodings is better than comparable models on multiple datasets.
arXiv Detail & Related papers (2022-11-24T17:05:16Z) - A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification.
The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample.
A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z) - On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors.
We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z) - Direct speech-to-speech translation with discrete units [64.19830539866072]
We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation.
We propose to predict the self-supervised discrete representations learned from an unlabeled speech corpus instead.
When target text transcripts are available, we design a multitask learning framework with joint speech and text training that enables the model to generate dual mode output (speech and text) simultaneously in the same inference pass.
arXiv Detail & Related papers (2021-07-12T17:40:43Z) - Preliminary study on using vector quantization latent spaces for TTS/VC
systems with consistent performance [55.10864476206503]
We investigate the use of quantized vectors to model the latent linguistic embedding.
By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding.
Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations.
arXiv Detail & Related papers (2021-06-25T07:51:35Z) - Seeing Both the Forest and the Trees: Multi-head Attention for Joint
Classification on Different Compositional Levels [15.453888735879525]
In natural languages, words are used in association to construct sentences.
We design a deep neural network architecture that explicitly wires lower and higher linguistic components.
We show that our model, MHAL, learns to simultaneously solve them at different levels of granularity.
arXiv Detail & Related papers (2020-11-01T10:44:46Z) - Catplayinginthesnow: Impact of Prior Segmentation on a Model of Visually
Grounded Speech [24.187382590960254]
Children do not build their lexicon by segmenting spoken input into phonemes and then building up words from them.
This suggests that the ideal way of learning a language is by starting from full semantic units.
We present a simple way to introduce such information into an RNN-based model and investigate which type of boundary is the most efficient.
arXiv Detail & Related papers (2020-06-15T13:20:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.