Related papers: Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text?

Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text?

URL: http://arxiv.org/abs/2404.04169v1
Date: Fri, 5 Apr 2024 15:22:02 GMT
Title: Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text?
Authors: Ilya Ilyankou, Aldo Lipani, Stefano Cavazzi, Xiaowei Gao, James Haworth,
Abstract summary: Sentence transformers are language models designed to perform semantic search. This study investigates the capacity of sentence transformers to associate descriptions of human-generated routes across Great Britain with queries often used to describe hiking experiences.
Score: 7.060398061192044
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sentence transformers are language models designed to perform semantic search. This study investigates the capacity of sentence transformers, fine-tuned on general question-answering datasets for asymmetric semantic search, to associate descriptions of human-generated routes across Great Britain with queries often used to describe hiking experiences. We find that sentence transformers have some zero-shot capabilities to understand quasi-geospatial concepts, such as route types and difficulty, suggesting their potential utility for routing recommendation systems.

Related papers

Enhancing Transformers for Generalizable First-Order Logical Entailment [51.04944136538266]
This paper investigates the generalizable first-order logical reasoning ability of transformers with their parameterized knowledge. The first-order reasoning capability of transformers is assessed through their ability to perform first-order logical entailment. We propose a more sophisticated, logic-aware architecture, TEGA, to enhance the capability for generalizable first-order logical entailment in transformers.
arXiv Detail & Related papers (2025-01-01T07:05:32Z)
Extracting Finite State Machines from Transformers [0.3069335774032178]
We investigate the trainability of transformers trained on regular languages from a mechanistic interpretability perspective. We empirically find tighter lower bounds on the trainability of transformers, when a finite number of symbols determine the state. Our mechanistic insight allows us to characterise the regular languages a one-layer transformer can learn with good length generalisation.
arXiv Detail & Related papers (2024-10-08T13:43:50Z)
In-Context Learning with Representations: Contextual Generalization of Trained Transformers [66.78052387054593]
In-context learning (ICL) refers to a capability of pretrained large language models, which can learn a new task given a few examples during inference. This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks.
arXiv Detail & Related papers (2024-08-19T16:47:46Z)
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization [22.033370572209744]
We study whether transformers can learn to implicitly reason over parametric knowledge. We focus on two representative reasoning types, composition and comparison. We find that transformers can learn implicit reasoning, but only through grokking.
arXiv Detail & Related papers (2024-05-23T21:42:19Z)
Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically [74.96551626420188]
Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures. We investigate sources of inductive bias in transformer models and their training that could cause such generalization behavior to emerge.
arXiv Detail & Related papers (2024-04-25T07:10:29Z)
Grokking of Hierarchical Structure in Vanilla Transformers [72.45375959893218]
We show that transformer language models can learn to generalize hierarchically after training for extremely long periods. intermediate-depth models generalize better than both very deep and very shallow transformers.
arXiv Detail & Related papers (2023-05-30T04:34:13Z)
Measuring Cross-Lingual Transferability of Multilingual Transformers on Sentence Classification [49.8111760092473]
We propose IGap, a cross-lingual transferability metric for multilingual Transformers on sentence classification tasks. Experimental results show that IGap outperforms baseline metrics for transferability measuring and transfer direction ranking. Our results reveal three findings about cross-lingual transfer, which helps us to better understand multilingual Transformers.
arXiv Detail & Related papers (2023-05-15T17:05:45Z)
An Introduction to Transformers [23.915718146956355]
transformer is a neural network component that can be used to learn useful sequences or sets of data-points. In this note we aim for a mathematically precise, intuitive, and clean description of the transformer architecture.
arXiv Detail & Related papers (2023-04-20T14:54:19Z)
Characterizing Intrinsic Compositionality in Transformers with Tree Projections [72.45375959893218]
neural models like transformers can route information arbitrarily between different parts of their input. We show that transformers for three different tasks become more treelike over the course of training. These trees are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.
arXiv Detail & Related papers (2022-11-02T17:10:07Z)
Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks [6.525090891505941]
We show how a causal transformer can perform a set of algorithmic tasks, including copying, sorting, and hierarchical compositions. We show that two-layer transformers learn generalizable solutions to multi-level problems and develop signs of systematic task decomposition. These results provide key insights into how transformer models may be capable of decomposing complex decisions into reusable, multi-level policies.
arXiv Detail & Related papers (2022-10-02T00:46:36Z)
Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language. We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer. We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z)
Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors [15.348047288817478]
We propose to use dictionary learning to open up "black boxes" as linear superpositions of transformer factors. Through visualization, we demonstrate the hierarchical semantic structures captured by the transformer factors. We hope this visualization tool can bring further knowledge and a better understanding of how transformer networks work.
arXiv Detail & Related papers (2021-03-29T20:51:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.