Pruned Wasserstein Index Generation Model and wigpy Package
- URL: http://arxiv.org/abs/2004.00999v3
- Date: Thu, 9 Jul 2020 14:42:59 GMT
- Title: Pruned Wasserstein Index Generation Model and wigpy Package
- Authors: Fangzhou Xie
- Abstract summary: I propose a Lasso-based shrinkage method to reduce dimensionality for the vocabulary as a pre-processing step prior to fitting the WIG model.
I also provide a textitwigpy module in Python to carry out computation in both flavor.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent proposal of Wasserstein Index Generation model (WIG) has shown a new
direction for automatically generating indices. However, it is challenging in
practice to fit large datasets for two reasons. First, the Sinkhorn distance is
notoriously expensive to compute and suffers from dimensionality severely.
Second, it requires to compute a full $N\times N$ matrix to be fit into memory,
where $N$ is the dimension of vocabulary. When the dimensionality is too large,
it is even impossible to compute at all. I hereby propose a Lasso-based
shrinkage method to reduce dimensionality for the vocabulary as a
pre-processing step prior to fitting the WIG model. After we get the word
embedding from Word2Vec model, we could cluster these high-dimensional vectors
by $k$-means clustering, and pick most frequent tokens within each cluster to
form the "base vocabulary". Non-base tokens are then regressed on the vectors
of base token to get a transformation weight and we could thus represent the
whole vocabulary by only the "base tokens". This variant, called pruned WIG
(pWIG), will enable us to shrink vocabulary dimension at will but could still
achieve high accuracy. I also provide a \textit{wigpy} module in Python to
carry out computation in both flavor. Application to Economic Policy
Uncertainty (EPU) index is showcased as comparison with existing methods of
generating time-series sentiment indices.
Related papers
- Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets.
We present an improved version of the "$q$-means" algorithm for clustering.
We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z) - Torch-Choice: A PyTorch Package for Large-Scale Choice Modelling with
Python [11.566791864440262]
$texttttorch-choice$ is an open-source library for flexible, fast choice modeling with Python and PyTorch.
$textttChoiceDataset$ provides a $textttChoiceDataset$ data structure to manage databases flexibly and memory-efficiently.
arXiv Detail & Related papers (2023-04-04T16:00:48Z) - Convergence of alternating minimisation algorithms for dictionary
learning [4.5687771576879594]
We derive sufficient conditions for the convergence of two popular alternating minimisation algorithms for dictionary learning.
We show that given a well-behaved initialisation that is either within distance at most $1/log(K)$ to the generating dictionary or has a special structure ensuring that each element of the initialisation only points to one generating element, both algorithms will converge with geometric convergence rate to the generating dictionary.
arXiv Detail & Related papers (2023-04-04T12:58:47Z) - N-Gram Nearest Neighbor Machine Translation [101.25243884801183]
We propose a novel $n$-gram nearest neighbor retrieval method that is model agnostic and applicable to both Autoregressive Translation(AT) and Non-Autoregressive Translation(NAT) models.
We demonstrate that the proposed method consistently outperforms the token-level method on both AT and NAT models as well as on general as on domain adaptation translation tasks.
arXiv Detail & Related papers (2023-01-30T13:19:19Z) - Modeling Label Correlations for Second-Order Semantic Dependency Parsing
with Mean-Field Inference [34.75002236767817]
Second-order semantic parsing with end-to-end mean-field inference has been shown good performance.
In this work we aim to improve this method by modeling label correlations between adjacent arcs.
To tackle this computational challenge, we leverage tensor decomposition techniques.
arXiv Detail & Related papers (2022-04-07T17:40:08Z) - Efficient and robust high-dimensional sparse logistic regression via
nonlinear primal-dual hybrid gradient algorithms [0.0]
We propose an iterative algorithm that provably computes a solution to a logistic regression problem regularized by an elastic net penalty.
This result improves on the known complexity bound of $O(min(m2n,mn2)log (1/epsilon))$ for first-order optimization methods.
arXiv Detail & Related papers (2021-11-30T14:16:48Z) - Compressing 1D Time-Channel Separable Convolutions using Sparse Random
Ternary Matrices [65.4388266814055]
We replace 1x1-convolutions in 1D time-channel separable convolutions with constant, sparse random ternary matrices with weights in $-1,0,+1$.
For command recognition on Google Speech Commands v1, we improve the state-of-the-art accuracy from $97.21%$ to $97.41%$ at the same network size.
For speech recognition on Librispeech, we half the number of weights to be trained while only sacrificing about $1%$ of the floating-point baseline's word error rate.
arXiv Detail & Related papers (2021-03-31T15:09:20Z) - Learning Universal Shape Dictionary for Realtime Instance Segmentation [40.27913339054021]
We present a novel explicit shape representation for instance segmentation.
Based on how to model the object shape, current instance segmentation systems can be divided into two categories, implicit and explicit models.
The proposed USD-Seg adopts a linear model, sparse coding with dictionary, for object shapes.
arXiv Detail & Related papers (2020-12-02T09:44:49Z) - Deep Learning Meets Projective Clustering [66.726500395069]
A common approach for compressing NLP networks is to encode the embedding layer as a matrix $AinmathbbRntimes d$.
Inspired by emphprojective clustering from computational geometry, we suggest replacing this subspace by a set of $k$ subspaces.
arXiv Detail & Related papers (2020-10-08T22:47:48Z) - Improving Robustness and Generality of NLP Models Using Disentangled
Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$.
We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning.
We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z) - Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies [60.285091454321055]
We design a simple and efficient embedding algorithm that learns a small set of anchor embeddings and a sparse transformation matrix.
On text classification, language modeling, and movie recommendation benchmarks, we show that ANT is particularly suitable for large vocabulary sizes.
arXiv Detail & Related papers (2020-03-18T13:07:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.