On a Guided Nonnegative Matrix Factorization
- URL: http://arxiv.org/abs/2010.11365v2
- Date: Fri, 5 Feb 2021 16:56:22 GMT
- Title: On a Guided Nonnegative Matrix Factorization
- Authors: Joshua Vendrow, Jamie Haddock, Elizaveta Rebrova, Deanna Needell
- Abstract summary: We propose an approach based upon the nonnegative matrix factorization (NMF) model, deemed textit NMF, that incorporates user-designed seed word supervision.
Our experimental results demonstrate the promise of this model and illustrate that it is competitive with other methods of this ilk with only very little supervision information.
- Score: 9.813862201223973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fully unsupervised topic models have found fantastic success in document
clustering and classification. However, these models often suffer from the
tendency to learn less-than-meaningful or even redundant topics when the data
is biased towards a set of features. For this reason, we propose an approach
based upon the nonnegative matrix factorization (NMF) model, deemed
\textit{Guided NMF}, that incorporates user-designed seed word supervision. Our
experimental results demonstrate the promise of this model and illustrate that
it is competitive with other methods of this ilk with only very little
supervision information.
Related papers
- Testing Hypotheses of Covariate Effects on Topics of Discourse [0.0]
We introduce an approach to topic modelling that remains tractable in the face of large text corpora.<n>This is achieved by de-emphasizing the role of parameter estimation in an underlying probabilistic model.<n>We argue that the simple, non-parametric approach advocated here is faster, more interpretable, and enjoys better inferential justification than said generative models.
arXiv Detail & Related papers (2025-06-05T20:28:49Z) - Adversarial Transferability in Deep Denoising Models: Theoretical Insights and Robustness Enhancement via Out-of-Distribution Typical Set Sampling [6.189440665620872]
Deep learning-based image denoising models demonstrate remarkable performance, but their lack of robustness analysis remains a significant concern.
A major issue is that these models are susceptible to adversarial attacks, where small, carefully crafted perturbations to input data can cause them to fail.
We propose a novel adversarial defense method: the Out-of-Distribution Typical Set Sampling Training strategy.
arXiv Detail & Related papers (2024-12-08T13:47:57Z) - Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort [31.992947353231564]
Concept Bottleneck Models (CBMs) can provide a principled way of disclosing and guiding model behaviors through human-understandable concepts.
We propose a novel framework designed to exploit pre-trained models while being immune to these biases, thereby reducing vulnerability to spurious correlations.
We evaluate the proposed method on multiple datasets, and the results demonstrate its effectiveness in reducing model reliance on spurious correlations while preserving its interpretability.
arXiv Detail & Related papers (2024-07-12T03:07:28Z) - Unified Multi-View Orthonormal Non-Negative Graph Based Clustering
Framework [74.25493157757943]
We formulate a novel clustering model, which exploits the non-negative feature property and incorporates the multi-view information into a unified joint learning framework.
We also explore, for the first time, the multi-model non-negative graph-based approach to clustering data based on deep features.
arXiv Detail & Related papers (2022-11-03T08:18:27Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - Probing of Quantitative Values in Abstractive Summarization Models [0.0]
We evaluate the efficacy of abstract summarization models' modeling of quantitative values found in the input text.
Our results show that in most cases, the encoders of recent SOTA-performing models struggle to provide embeddings that adequately represent quantitative values.
arXiv Detail & Related papers (2022-10-03T00:59:50Z) - MRCLens: an MRC Dataset Bias Detection Toolkit [82.44296974850639]
We introduce MRCLens, a toolkit that detects whether biases exist before users train the full model.
For the convenience of introducing the toolkit, we also provide a categorization of common biases in MRC.
arXiv Detail & Related papers (2022-07-18T21:05:39Z) - Flexible and Hierarchical Prior for Bayesian Nonnegative Matrix
Factorization [4.913248451323163]
We introduce a probabilistic model for learning nonnegative matrix factorization (NMF)
We evaluate the model on several real-world datasets including MovieLens 100K and MovieLens 1M with different sizes and dimensions.
arXiv Detail & Related papers (2022-05-23T03:51:55Z) - Semi-supervised Nonnegative Matrix Factorization for Document
Classification [6.577559557980527]
We propose new semi-supervised nonnegative matrix factorization (SSNMF) models for document classification.
We derive training methods using multiplicative updates for each new model, and demonstrate the application of these models to single-label and multi-label document classification.
arXiv Detail & Related papers (2022-02-28T19:00:49Z) - Unsupervised Learning of Debiased Representations with Pseudo-Attributes [85.5691102676175]
We propose a simple but effective debiasing technique in an unsupervised manner.
We perform clustering on the feature embedding space and identify pseudoattributes by taking advantage of the clustering results.
We then employ a novel cluster-based reweighting scheme for learning debiased representation.
arXiv Detail & Related papers (2021-08-06T05:20:46Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Towards Debiasing NLU Models from Unknown Biases [70.31427277842239]
NLU models often exploit biases to achieve high dataset-specific performance without properly learning the intended task.
We present a self-debiasing framework that prevents models from mainly utilizing biases without knowing them in advance.
arXiv Detail & Related papers (2020-09-25T15:49:39Z) - Explainable Matrix -- Visualization for Global and Local
Interpretability of Random Forest Classification Ensembles [78.6363825307044]
We propose Explainable Matrix (ExMatrix), a novel visualization method for Random Forest (RF) interpretability.
It employs a simple yet powerful matrix-like visual metaphor, where rows are rules, columns are features, and cells are rules predicates.
ExMatrix applicability is confirmed via different examples, showing how it can be used in practice to promote RF models interpretability.
arXiv Detail & Related papers (2020-05-08T21:03:48Z) - Mind the Trade-off: Debiasing NLU Models without Degrading the
In-distribution Performance [70.31427277842239]
We introduce a novel debiasing method called confidence regularization.
It discourages models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples.
We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets.
arXiv Detail & Related papers (2020-05-01T11:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.