Related papers: Matrix Factorization for Inferring Associations and Missing Links

Matrix Factorization for Inferring Associations and Missing Links

URL: http://arxiv.org/abs/2503.04680v1
Date: Thu, 06 Mar 2025 18:22:46 GMT
Title: Matrix Factorization for Inferring Associations and Missing Links
Authors: Ryan Barron, Maksim E. Eren, Duc P. Truong, Cynthia Matuszek, James Wendelberger, Mary F. Dorn, Boian Alexandrov,
Abstract summary: Missing link prediction identifies unseen but potentially existing connections in a network.<n>In proliferation detection, this supports efforts to identify and characterize attempts by state and non-state actors to acquire nuclear weapons.<n>We introduce novel weighted (WNMFk), Boolean (BNMFk), and Recommender (RNMFk) matrix factorization methods, along with ensemble variants incorporating logistic factorization, for link prediction.
Score: 5.700773330654261
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Missing link prediction is a method for network analysis, with applications in recommender systems, biology, social sciences, cybersecurity, information retrieval, and Artificial Intelligence (AI) reasoning in Knowledge Graphs. Missing link prediction identifies unseen but potentially existing connections in a network by analyzing the observed patterns and relationships. In proliferation detection, this supports efforts to identify and characterize attempts by state and non-state actors to acquire nuclear weapons or associated technology - a notoriously challenging but vital mission for global security. Dimensionality reduction techniques like Non-Negative Matrix Factorization (NMF) and Logistic Matrix Factorization (LMF) are effective but require selection of the matrix rank parameter, that is, of the number of hidden features, k, to avoid over/under-fitting. We introduce novel Weighted (WNMFk), Boolean (BNMFk), and Recommender (RNMFk) matrix factorization methods, along with ensemble variants incorporating logistic factorization, for link prediction. Our methods integrate automatic model determination for rank estimation by evaluating stability and accuracy using a modified bootstrap methodology and uncertainty quantification (UQ), assessing prediction reliability under random perturbations. We incorporate Otsu threshold selection and k-means clustering for Boolean matrix factorization, comparing them to coordinate descent-based Boolean thresholding. Our experiments highlight the impact of rank k selection, evaluate model performance under varying test-set sizes, and demonstrate the benefits of UQ for reliable predictions using abstention. We validate our methods on three synthetic datasets (Boolean and uniformly distributed) and benchmark them against LMF and symmetric LMF (symLMF) on five real-world protein-protein interaction networks, showcasing an improved prediction performance.

Related papers

STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction [78.0692157478247]
We propose STAR, a framework that bridges data-driven STatistical expectations with knowledge-driven Agentic Reasoning.<n>We show that STAR consistently outperforms all baselines on both score-based and rank-based metrics.
arXiv Detail & Related papers (2026-02-12T16:30:07Z)
MambaITD: An Efficient Cross-Modal Mamba Network for Insider Threat Detection [9.049925971684837]
This paper proposes a new insider threat detection framework MambaITD based on the Mamba state space model and cross-modal adaptive fusion.<n>Compared with traditional methods, MambaITD shows significant advantages in modeling efficiency and feature fusion capabilities.
arXiv Detail & Related papers (2025-08-06T18:45:00Z)
Similarity-Distance-Magnitude Universal Verification [0.0]
We solve the neural network robustness problem by adding Similarity (i.e., correctly predicted depth-matches into training)-awareness.<n>We use this novel behavior to address the complementary problem of mapping the output to human-interpretable summary statistics.
arXiv Detail & Related papers (2025-02-27T15:05:00Z)
Uncertainty quantification for Markov chains with application to temporal difference learning [63.49764856675643]
We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains.<n>We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
arXiv Detail & Related papers (2025-02-19T15:33:55Z)
A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.<n>We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.<n>By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z)
Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI [47.64301863399763]
We present a dynamic semantic clustering approach inspired by the Chinese Restaurant Process. We quantify uncertainty of Large Language Models (LLMs) on a given query by calculating entropy of the generated semantic clusters. We propose leveraging the (negative) likelihood of these clusters as the (non)conformity score within Conformal Prediction framework.
arXiv Detail & Related papers (2024-11-04T18:49:46Z)
Bayesian Estimation and Tuning-Free Rank Detection for Probability Mass Function Tensors [17.640500920466984]
This paper presents a novel framework for estimating the joint PMF and automatically inferring its rank from observed data. We derive a deterministic solution based on variational inference (VI) to approximate the posterior distributions of various model parameters. Additionally, we develop a scalable version of the VI-based approach by leveraging variational inference (SVI) Experiments involving both synthetic data and real movie recommendation data illustrate the advantages of our VI and SVI-based methods in terms of estimation accuracy, automatic rank detection, and computational efficiency.
arXiv Detail & Related papers (2024-10-08T20:07:49Z)
The Misclassification Likelihood Matrix: Some Classes Are More Likely To Be Misclassified Than Others [1.654278807602897]
This study introduces Misclassification Likelihood Matrix (MLM) as a novel tool for quantifying the reliability of neural network predictions under distribution shifts. The implications of this work extend beyond image classification, with ongoing applications in autonomous systems, such as self-driving cars.
arXiv Detail & Related papers (2024-07-10T16:43:14Z)
LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.<n>By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.<n>Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z)
Weakly supervised covariance matrices alignment through Stiefel matrices estimation for MEG applications [64.20396555814513]
This paper introduces a novel domain adaptation technique for time series data, called Mixing model Stiefel Adaptation (MSA) We exploit abundant unlabeled data in the target domain to ensure effective prediction by establishing pairwise correspondence with equivalent signal variances between domains. MSA outperforms recent methods in brain-age regression with task variations using magnetoencephalography (MEG) signals from the Cam-CAN dataset.
arXiv Detail & Related papers (2024-01-24T19:04:49Z)
Supervised Class-pairwise NMF for Data Representation and Classification [2.7320863258816512]
Non-negative Matrix factorization (NMF) based methods add new terms to the cost function to adapt the model to specific tasks. NMF method adopts unsupervised approaches to estimate the factorizing matrices.
arXiv Detail & Related papers (2022-09-28T04:33:03Z)
Algorithmic Fairness Verification with Graphical Models [24.8005399877574]
We propose an efficient fairness verifier, called FVGM, that encodes correlations among features as a Bayesian network. We show that FVGM leads to an accurate and scalable assessment for more diverse families of fairness-enhancing algorithms.
arXiv Detail & Related papers (2021-09-20T12:05:14Z)
Estimating Structural Target Functions using Machine Learning and Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models. This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics. We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z)
Providing reliability in Recommender Systems through Bernoulli Matrix Factorization [63.732639864601914]
This paper proposes Bernoulli Matrix Factorization (BeMF) to provide both prediction values and reliability values. BeMF acts on model-based collaborative filtering rather than on memory-based filtering. The more reliable a prediction is, the less liable it is to be wrong.
arXiv Detail & Related papers (2020-06-05T14:24:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.