Related papers: Pruned Wasserstein Index Generation Model and wigpy Package

Pruned Wasserstein Index Generation Model and wigpy Package

URL: http://arxiv.org/abs/2004.00999v3
Date: Thu, 9 Jul 2020 14:42:59 GMT
Title: Pruned Wasserstein Index Generation Model and wigpy Package
Authors: Fangzhou Xie
Abstract summary: I propose a Lasso-based shrinkage method to reduce dimensionality for the vocabulary as a pre-processing step prior to fitting the WIG model. I also provide a textitwigpy module in Python to carry out computation in both flavor.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent proposal of Wasserstein Index Generation model (WIG) has shown a new direction for automatically generating indices. However, it is challenging in practice to fit large datasets for two reasons. First, the Sinkhorn distance is notoriously expensive to compute and suffers from dimensionality severely. Second, it requires to compute a full $N\times N$ matrix to be fit into memory, where $N$ is the dimension of vocabulary. When the dimensionality is too large, it is even impossible to compute at all. I hereby propose a Lasso-based shrinkage method to reduce dimensionality for the vocabulary as a pre-processing step prior to fitting the WIG model. After we get the word embedding from Word2Vec model, we could cluster these high-dimensional vectors by $k$-means clustering, and pick most frequent tokens within each cluster to form the "base vocabulary". Non-base tokens are then regressed on the vectors of base token to get a transformation weight and we could thus represent the whole vocabulary by only the "base tokens". This variant, called pruned WIG (pWIG), will enable us to shrink vocabulary dimension at will but could still achieve high accuracy. I also provide a \textit{wigpy} module in Python to carry out computation in both flavor. Application to Economic Policy Uncertainty (EPU) index is showcased as comparison with existing methods of generating time-series sentiment indices.

Related papers

Accurate Coresets for Latent Variable Models and Regularized Regression [1.9567015559455132]
We introduce a unified framework for constructing accurate coresets. We present accurate coreset construction algorithms for general problems. We substantiate our theoretical findings with extensive experimental evaluations on real datasets.
arXiv Detail & Related papers (2024-12-28T16:01:49Z)
Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets. We present an improved version of the "$q$-means" algorithm for clustering. We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z)
Torch-Choice: A PyTorch Package for Large-Scale Choice Modelling with Python [11.566791864440262]
$texttttorch-choice$ is an open-source library for flexible, fast choice modeling with Python and PyTorch. $textttChoiceDataset$ provides a $textttChoiceDataset$ data structure to manage databases flexibly and memory-efficiently.
arXiv Detail & Related papers (2023-04-04T16:00:48Z)
Convergence of alternating minimisation algorithms for dictionary learning [4.5687771576879594]
We derive sufficient conditions for the convergence of two popular alternating minimisation algorithms for dictionary learning. We show that given a well-behaved initialisation that is either within distance at most $1/log(K)$ to the generating dictionary or has a special structure ensuring that each element of the initialisation only points to one generating element, both algorithms will converge with geometric convergence rate to the generating dictionary.
arXiv Detail & Related papers (2023-04-04T12:58:47Z)
N-Gram Nearest Neighbor Machine Translation [101.25243884801183]
We propose a novel $n$-gram nearest neighbor retrieval method that is model agnostic and applicable to both Autoregressive Translation(AT) and Non-Autoregressive Translation(NAT) models. We demonstrate that the proposed method consistently outperforms the token-level method on both AT and NAT models as well as on general as on domain adaptation translation tasks.
arXiv Detail & Related papers (2023-01-30T13:19:19Z)
Modeling Label Correlations for Second-Order Semantic Dependency Parsing with Mean-Field Inference [34.75002236767817]
Second-order semantic parsing with end-to-end mean-field inference has been shown good performance. In this work we aim to improve this method by modeling label correlations between adjacent arcs. To tackle this computational challenge, we leverage tensor decomposition techniques.
arXiv Detail & Related papers (2022-04-07T17:40:08Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Efficient and robust high-dimensional sparse logistic regression via nonlinear primal-dual hybrid gradient algorithms [0.0]
We propose an iterative algorithm that provably computes a solution to a logistic regression problem regularized by an elastic net penalty. This result improves on the known complexity bound of $O(min(m2n,mn2)log (1/epsilon))$ for first-order optimization methods.
arXiv Detail & Related papers (2021-11-30T14:16:48Z)
Compressing 1D Time-Channel Separable Convolutions using Sparse Random Ternary Matrices [65.4388266814055]
We replace 1x1-convolutions in 1D time-channel separable convolutions with constant, sparse random ternary matrices with weights in $-1,0,+1$. For command recognition on Google Speech Commands v1, we improve the state-of-the-art accuracy from $97.21%$ to $97.41%$ at the same network size. For speech recognition on Librispeech, we half the number of weights to be trained while only sacrificing about $1%$ of the floating-point baseline's word error rate.
arXiv Detail & Related papers (2021-03-31T15:09:20Z)
Learning Universal Shape Dictionary for Realtime Instance Segmentation [40.27913339054021]
We present a novel explicit shape representation for instance segmentation. Based on how to model the object shape, current instance segmentation systems can be divided into two categories, implicit and explicit models. The proposed USD-Seg adopts a linear model, sparse coding with dictionary, for object shapes.
arXiv Detail & Related papers (2020-12-02T09:44:49Z)
Deep Learning Meets Projective Clustering [66.726500395069]
A common approach for compressing NLP networks is to encode the embedding layer as a matrix $AinmathbbRntimes d$. Inspired by emphprojective clustering from computational geometry, we suggest replacing this subspace by a set of $k$ subspaces.
arXiv Detail & Related papers (2020-10-08T22:47:48Z)
Improving Robustness and Generality of NLP Models Using Disentangled Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$. We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning. We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z)
Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies [60.285091454321055]
We design a simple and efficient embedding algorithm that learns a small set of anchor embeddings and a sparse transformation matrix. On text classification, language modeling, and movie recommendation benchmarks, we show that ANT is particularly suitable for large vocabulary sizes.
arXiv Detail & Related papers (2020-03-18T13:07:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.