Asking without Telling: Exploring Latent Ontologies in Contextual
Representations
- URL: http://arxiv.org/abs/2004.14513v2
- Date: Fri, 9 Oct 2020 00:28:21 GMT
- Title: Asking without Telling: Exploring Latent Ontologies in Contextual
Representations
- Authors: Julian Michael, Jan A. Botha, Ian Tenney
- Abstract summary: We show that pretrained contextual encoders learn to encode meaningful notions of linguistic structure without explicit supervision.
Our results provide new evidence of emergent structure in pretrained encoders, including departures from existing annotations.
- Score: 12.69022456384102
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of pretrained contextual encoders, such as ELMo and BERT, has
brought a great deal of interest in what these models learn: do they, without
explicit supervision, learn to encode meaningful notions of linguistic
structure? If so, how is this structure encoded? To investigate this, we
introduce latent subclass learning (LSL): a modification to existing
classifier-based probing methods that induces a latent categorization (or
ontology) of the probe's inputs. Without access to fine-grained gold labels,
LSL extracts emergent structure from input representations in an interpretable
and quantifiable form. In experiments, we find strong evidence of familiar
categories, such as a notion of personhood in ELMo, as well as novel
ontological distinctions, such as a preference for fine-grained semantic roles
on core arguments. Our results provide unique new evidence of emergent
structure in pretrained encoders, including departures from existing
annotations which are inaccessible to earlier methods.
Related papers
- Unsupervised Morphological Tree Tokenizer [36.584680344291556]
We introduce morphological structure guidance to tokenization and propose a deep model to induce character-level structures of words.
Specifically, the deep model jointly encodes internal structures and representations of words with a mechanism named $textitOverriding$ to ensure the indecomposability of morphemes.
Based on the induced structures, our algorithm tokenizes words through vocabulary matching in a top-down manner.
arXiv Detail & Related papers (2024-06-21T15:35:49Z) - Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - OntoType: Ontology-Guided and Pre-Trained Language Model Assisted Fine-Grained Entity Typing [25.516304052884397]
Fine-grained entity typing (FET) assigns entities in text with context-sensitive, fine-grained semantic types.
OntoType follows a type ontological structure, from coarse to fine, ensembles multiple PLM prompting results to generate a set of type candidates.
Our experiments on the Ontonotes, FIGER, and NYT datasets demonstrate that our method outperforms the state-of-the-art zero-shot fine-grained entity typing methods.
arXiv Detail & Related papers (2023-05-21T00:32:37Z) - Lattice-preserving $\mathcal{ALC}$ ontology embeddings [50.05281461410368]
We propose an order-preserving embedding method to generate embeddings on a graph out of We, the semantics of which are expressed in Logics Descriptions (DLs)
We show that our method outperforms state-the-art theory-of-of-the-art embedding methods in several knowledge base completion tasks.
arXiv Detail & Related papers (2023-05-11T22:27:51Z) - Semantic Role Labeling Meets Definition Modeling: Using Natural Language
to Describe Predicate-Argument Structures [104.32063681736349]
We present an approach to describe predicate-argument structures using natural language definitions instead of discrete labels.
Our experiments and analyses on PropBank-style and FrameNet-style, dependency-based and span-based SRL also demonstrate that a flexible model with an interpretable output does not necessarily come at the expense of performance.
arXiv Detail & Related papers (2022-12-02T11:19:16Z) - Prompting Language Models for Linguistic Structure [73.11488464916668]
We present a structured prompting approach for linguistic structured prediction tasks.
We evaluate this approach on part-of-speech tagging, named entity recognition, and sentence chunking.
We find that while PLMs contain significant prior knowledge of task labels due to task leakage into the pretraining corpus, structured prompting can also retrieve linguistic structure with arbitrary labels.
arXiv Detail & Related papers (2022-11-15T01:13:39Z) - Learning Primitive-aware Discriminative Representations for Few-shot
Learning [28.17404445820028]
Few-shot learning aims to learn a classifier that can be easily adapted to recognize novel classes with only a few labeled examples.
We propose a Primitive Mining and Reasoning Network (PMRN) to learn primitive-aware representations.
Our method achieves state-of-the-art results on six standard benchmarks.
arXiv Detail & Related papers (2022-08-20T16:22:22Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Exploring the Role of BERT Token Representations to Explain Sentence
Probing Results [15.652077779677091]
We show that BERT tends to encode meaningful knowledge in specific token representations.
This allows the model to detect syntactic and semantic abnormalities and to distinctively separate grammatical number and tense subspaces.
arXiv Detail & Related papers (2021-04-03T20:40:42Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Refining Implicit Argument Annotation for UCCA [6.873471412788333]
This paper proposes a typology for fine-grained implicit argument annotation on top of Universal Cognitive Conceptual's foundational layer.
The proposed implicit argument categorisation is driven by theories of implicit role interpretation and consists of six types: Deictic, Generic, Genre-based, Type-identifiable, Non-specific, and Iterated-set.
arXiv Detail & Related papers (2020-05-26T17:24:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.