Similarity between Units of Natural Language: The Transition from Coarse
to Fine Estimation
- URL: http://arxiv.org/abs/2210.14275v1
- Date: Tue, 25 Oct 2022 18:54:32 GMT
- Title: Similarity between Units of Natural Language: The Transition from Coarse
to Fine Estimation
- Authors: Wenchuan Mu
- Abstract summary: Capturing the similarities between human language units is crucial for explaining how humans associate different objects.
My research goal in this thesis is to develop regression models that account for similarities between language units in a more refined way.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Capturing the similarities between human language units is crucial for
explaining how humans associate different objects, and therefore its
computation has received extensive attention, research, and applications. With
the ever-increasing amount of information around us, calculating similarity
becomes increasingly complex, especially in many cases, such as legal or
medical affairs, measuring similarity requires extra care and precision, as
small acts within a language unit can have significant real-world effects. My
research goal in this thesis is to develop regression models that account for
similarities between language units in a more refined way.
Computation of similarity has come a long way, but approaches to debugging
the measures are often based on continually fitting human judgment values. To
this end, my goal is to develop an algorithm that precisely catches loopholes
in a similarity calculation. Furthermore, most methods have vague definitions
of the similarities they compute and are often difficult to interpret. The
proposed framework addresses both shortcomings. It constantly improves the
model through catching different loopholes. In addition, every refinement of
the model provides a reasonable explanation. The regression model introduced in
this thesis is called progressively refined similarity computation, which
combines attack testing with adversarial training. The similarity regression
model of this thesis achieves state-of-the-art performance in handling edge
cases.
Related papers
- Accelerated Stochastic ExtraGradient: Mixing Hessian and Gradient Similarity to Reduce Communication in Distributed and Federated Learning [50.382793324572845]
Distributed computing involves communication between devices, which requires solving two key problems: efficiency and privacy.
In this paper, we analyze a new method that incorporates the ideas of using data similarity and clients sampling.
To address privacy concerns, we apply the technique of additional noise and analyze its impact on the convergence of the proposed method.
arXiv Detail & Related papers (2024-09-22T00:49:10Z) - Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - Counting Like Human: Anthropoid Crowd Counting on Modeling the
Similarity of Objects [92.80955339180119]
mainstream crowd counting methods regress density map and integrate it to obtain counting results.
Inspired by this, we propose a rational and anthropoid crowd counting framework.
arXiv Detail & Related papers (2022-12-02T07:00:53Z) - Evaluation of taxonomic and neural embedding methods for calculating
semantic similarity [0.0]
We study the mechanisms between taxonomic and distributional similarity measures.
We find that taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity.
The synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning.
arXiv Detail & Related papers (2022-09-30T02:54:21Z) - A Correlation-Ratio Transfer Learning and Variational Stein's Paradox [7.652701739127332]
This paper introduces a new strategy, linear correlation-ratio, to build an accurate relationship between the models.
On the practical side, the new framework is applied to some application scenarios, especially the areas of data streams and medical studies.
arXiv Detail & Related papers (2022-06-10T01:59:16Z) - Predicting Human Similarity Judgments Using Large Language Models [13.33450619901885]
We propose an efficient procedure for predicting similarity judgments based on text descriptions.
The number of descriptions required grows only linearly with the number of stimuli, drastically reducing the amount of data required.
We test this procedure on six datasets of naturalistic images and show that our models outperform previous approaches based on visual information.
arXiv Detail & Related papers (2022-02-09T21:09:25Z) - Few-shot Visual Reasoning with Meta-analogical Contrastive Learning [141.2562447971]
We propose to solve a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning.
We extract structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning.
We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce.
arXiv Detail & Related papers (2020-07-23T14:00:34Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - Pairwise Supervision Can Provably Elicit a Decision Boundary [84.58020117487898]
Similarity learning is a problem to elicit useful representations by predicting the relationship between a pair of patterns.
We show that similarity learning is capable of solving binary classification by directly eliciting a decision boundary.
arXiv Detail & Related papers (2020-06-11T05:35:16Z) - Building and Interpreting Deep Similarity Models [0.0]
We propose to make similarities interpretable by augmenting them with an explanation in terms of input features.
We develop BiLRP, a scalable and theoretically founded method to systematically decompose similarity scores on pairs of input features.
arXiv Detail & Related papers (2020-03-11T17:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.