Wasserstein Distance Regularized Sequence Representation for Text
Matching in Asymmetrical Domains
- URL: http://arxiv.org/abs/2010.07717v2
- Date: Sat, 17 Oct 2020 01:32:08 GMT
- Title: Wasserstein Distance Regularized Sequence Representation for Text
Matching in Asymmetrical Domains
- Authors: Weijie Yu, Chen Xu, Jun Xu, Liang Pang, Xiaopeng Gao, Xiaozhao Wang
and Ji-Rong Wen
- Abstract summary: We propose a novel match method tailored for text matching in asymmetrical domains, called WD-Match.
In WD-Match, a Wasserstein distance-based regularizer is defined to regularize the features vectors projected from different domains.
The training process of WD-Match amounts to a game that minimizes the matching loss regularized by the Wasserstein distance.
- Score: 51.91456788949489
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One approach to matching texts from asymmetrical domains is projecting the
input sequences into a common semantic space as feature vectors upon which the
matching function can be readily defined and learned. In real-world matching
practices, it is often observed that with the training goes on, the feature
vectors projected from different domains tend to be indistinguishable. The
phenomenon, however, is often overlooked in existing matching models. As a
result, the feature vectors are constructed without any regularization, which
inevitably increases the difficulty of learning the downstream matching
functions. In this paper, we propose a novel match method tailored for text
matching in asymmetrical domains, called WD-Match. In WD-Match, a Wasserstein
distance-based regularizer is defined to regularize the features vectors
projected from different domains. As a result, the method enforces the feature
projection function to generate vectors such that those correspond to different
domains cannot be easily discriminated. The training process of WD-Match
amounts to a game that minimizes the matching loss regularized by the
Wasserstein distance. WD-Match can be used to improve different text matching
methods, by using the method as its underlying matching model. Four popular
text matching methods have been exploited in the paper. Experimental results
based on four publicly available benchmarks showed that WD-Match consistently
outperformed the underlying methods and the baselines.
Related papers
- Partial Scene Text Retrieval [56.14891109413448]
The task of partial scene text retrieval involves localizing and searching for text instances that are the same or similar to a given query text from an image gallery.
Existing methods can only handle text-line instances, leaving the problem of searching for partial patches unsolved.
We propose a network that can simultaneously retrieve both text-line instances and their partial patches.
arXiv Detail & Related papers (2024-11-15T15:08:04Z) - On Unsupervised Partial Shape Correspondence [9.175560202201819]
We argue that functional maps introduce errors in the estimated match when partiality is invoked.
We propose a novel approach for partial shape matching.
The proposed approach shows superior performance on the SHREC'16 dataset.
arXiv Detail & Related papers (2023-10-23T08:32:50Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - RE-Matching: A Fine-Grained Semantic Matching Method for Zero-Shot
Relation Extraction [40.90544879476107]
We propose a fine-grained semantic matching method tailored for zero-shot relation extraction.
We decompose the sentence-level similarity score into entity and context matching scores.
Due to the lack of explicit annotations of the redundant components, we design a feature distillation module to adaptively identify the relation-irrelevant features.
arXiv Detail & Related papers (2023-06-08T06:02:34Z) - Adaptive Assignment for Geometry Aware Local Feature Matching [22.818457285745733]
detector-free feature matching approaches are currently attracting great attention thanks to their excellent performance.
We introduce AdaMatcher, which accomplishes the feature correlation and co-visible area estimation through an elaborate feature interaction module.
AdaMatcher then performs adaptive assignment on patch-level matching while estimating the scales between images, and finally refines the co-visible matches through scale alignment and sub-pixel regression module.
arXiv Detail & Related papers (2022-07-18T08:22:18Z) - Cross-domain Speech Recognition with Unsupervised Character-level
Distribution Matching [60.8427677151492]
We propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains.
Experiments on the Libri-Adapt dataset show that our proposed approach achieves 14.39% and 16.50% relative Word Error Rate (WER) reduction on both cross-device and cross-environment ASR.
arXiv Detail & Related papers (2021-04-15T14:36:54Z) - Match-Ignition: Plugging PageRank into Transformer for Long-form Text
Matching [66.71886789848472]
We propose a novel hierarchical noise filtering model, namely Match-Ignition, to tackle the effectiveness and efficiency problem.
The basic idea is to plug the well-known PageRank algorithm into the Transformer, to identify and filter both sentence and word level noisy information.
Noisy sentences are usually easy to detect because the sentence is the basic unit of a long-form text, so we directly use PageRank to filter such information.
arXiv Detail & Related papers (2021-01-16T10:34:03Z) - Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach.
The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features.
Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z) - Discovering linguistic (ir)regularities in word embeddings through
max-margin separating hyperplanes [0.0]
We show new methods for learning how related words are positioned relative to each other in word embedding spaces.
Our model, SVMCos, is robust to a range of experimental choices when training word embeddings.
arXiv Detail & Related papers (2020-03-07T20:21:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.