Tensor Methods in High Dimensional Data Analysis: Opportunities and Challenges
- URL: http://arxiv.org/abs/2405.18412v1
- Date: Tue, 28 May 2024 17:54:03 GMT
- Title: Tensor Methods in High Dimensional Data Analysis: Opportunities and Challenges
- Authors: Arnab Auddy, Dong Xia, Ming Yuan,
- Abstract summary: Multiway arrays or tensors are prevalent in modern applications across various fields such as chemometrics, genomics, physics, psychology, and signal processing.
Addressing these challenges requires an interdisciplinary approach that brings together tools and insights from statistics, optimization and numerical linear algebra among other fields.
This review seeks to examine some of the key advancements and identify common threads among them, under eight different statistical settings.
- Score: 16.544309363025324
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large amount of multidimensional data represented by multiway arrays or tensors are prevalent in modern applications across various fields such as chemometrics, genomics, physics, psychology, and signal processing. The structural complexity of such data provides vast new opportunities for modeling and analysis, but efficiently extracting information content from them, both statistically and computationally, presents unique and fundamental challenges. Addressing these challenges requires an interdisciplinary approach that brings together tools and insights from statistics, optimization and numerical linear algebra among other fields. Despite these hurdles, significant progress has been made in the last decade. This review seeks to examine some of the key advancements and identify common threads among them, under eight different statistical settings.
Related papers
- Shannon invariants: A scalable approach to information decomposition [41.60443091960594]
"Shannon invariants" are quantities that capture essential properties of high-order information processing.
Our theoretical results demonstrate how Shannon invariants can be used to resolve long-standing ambiguities.
Our results reveal distinctive information-processing signatures of various deep learning architectures.
arXiv Detail & Related papers (2025-04-22T10:41:38Z) - Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models [104.17057231661371]
Time series analysis is crucial for understanding dynamics of complex systems.
Recent advances in foundation models have led to task-agnostic Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs)
Their success depends on large, diverse, and high-quality datasets, which are challenging to build due to regulatory, diversity, quality, and quantity constraints.
This survey provides a comprehensive review of synthetic data for TSFMs and TSLLMs, analyzing data generation strategies, their role in model pretraining, fine-tuning, and evaluation, and identifying future research directions.
arXiv Detail & Related papers (2025-03-14T13:53:46Z) - Meta-Statistical Learning: Supervised Learning of Statistical Inference [59.463430294611626]
This work demonstrates that the tools and principles driving the success of large language models (LLMs) can be repurposed to tackle distribution-level tasks.
We propose meta-statistical learning, a framework inspired by multi-instance learning that reformulates statistical inference tasks as supervised learning problems.
arXiv Detail & Related papers (2025-02-17T18:04:39Z) - Variable Selection Methods for Multivariate, Functional, and Complex Biomedical Data in the AI Age [0.0]
This work proposes new optimization-based variable selection methods for multivariate, functional, and even more general outcomes in metrics spaces based on best-subset selection.
Our framework applies to several types of regression models, including linear, quantile, or non parametric additive models, and to a broad range of random responses.
Our analysis demonstrates that our proposed methodology outperforms state-of-the-art methods in accuracy and, especially, in speed-achieving several orders of magnitude improvement over competitors.
arXiv Detail & Related papers (2025-01-12T16:33:06Z) - Simultaneous Dimensionality Reduction for Extracting Useful Representations of Large Empirical Multimodal Datasets [0.0]
We focus on the sciences of dimensionality reduction as a means to obtain low-dimensional descriptions from high-dimensional data.
We address the challenges posed by real-world data that defy conventional assumptions, such as complex interactions within systems or high-dimensional dynamical systems.
arXiv Detail & Related papers (2024-10-23T21:27:40Z) - Transforming Multidimensional Time Series into Interpretable Event Sequences for Advanced Data Mining [5.2863523790908955]
This paper introduces a novel proposed representation model designed to address the limitations of traditional methods in multidimensional time series (MTS) analysis.
The proposed framework has significant potential for applications across various fields, including services for monitoring and optimizing IT infrastructure, medical diagnosis through continuous patient monitoring, trend analysis, and internet businesses for tracking user behavior and forecasting.
arXiv Detail & Related papers (2024-09-22T06:27:07Z) - High-dimensional learning of narrow neural networks [1.7094064195431147]
This manuscript reviews the tools and ideas underlying recent progress in machine learning.
We introduce a generic model -- the sequence multi-index model -- which encompasses numerous previously studied models as special instances.
We explicate in full detail the analysis of the learning of sequence multi-index models, using statistical physics techniques such as the replica method and approximate message-passing algorithms.
arXiv Detail & Related papers (2024-09-20T21:20:04Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - HEMM: Holistic Evaluation of Multimodal Foundation Models [91.60364024897653]
Multimodal foundation models can holistically process text alongside images, video, audio, and other sensory modalities.
It is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains.
arXiv Detail & Related papers (2024-07-03T18:00:48Z) - Bayesian Nonparametrics: An Alternative to Deep Learning [0.5801621787540265]
This survey aims to delve into the significance of Bayesian nonparametrics, particularly in addressing complex challenges across various domains such as statistics, computer science, and electrical engineering.
We uncover the versatility and efficacy of Bayesian nonparametric methodologies, paving the way for innovative solutions to intricate challenges across diverse disciplines.
arXiv Detail & Related papers (2024-03-29T17:32:42Z) - Enhancing Deep Learning Models through Tensorization: A Comprehensive
Survey and Framework [0.0]
This paper explores the steps involved in multidimensional data sources, various multiway analysis methods employed, and the benefits of these approaches.
A small example of Blind Source Separation (BSS) is presented comparing 2-dimensional algorithms and a multiway algorithm in Python.
Results indicate that multiway analysis is more expressive.
arXiv Detail & Related papers (2023-09-05T17:56:22Z) - Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset
and Comprehensive Framework [51.44863255495668]
Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence.
We present Multi-Modal Reasoning(COCO-MMR) dataset, a novel dataset that encompasses an extensive collection of open-ended questions.
We propose innovative techniques, including multi-hop cross-modal attention and sentence-level contrastive learning, to enhance the image and text encoders.
arXiv Detail & Related papers (2023-07-24T08:58:25Z) - Quantifying & Modeling Multimodal Interactions: An Information
Decomposition Framework [89.8609061423685]
We propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy relating input modalities with an output task.
To validate PID estimation, we conduct extensive experiments on both synthetic datasets where the PID is known and on large-scale multimodal benchmarks.
We demonstrate their usefulness in (1) quantifying interactions within multimodal datasets, (2) quantifying interactions captured by multimodal models, (3) principled approaches for model selection, and (4) three real-world case studies.
arXiv Detail & Related papers (2023-02-23T18:59:05Z) - Graph signal processing for machine learning: A review and new
perspectives [57.285378618394624]
We review a few important contributions made by GSP concepts and tools, such as graph filters and transforms, to the development of novel machine learning algorithms.
We discuss exploiting data structure and relational priors, improving data and computational efficiency, and enhancing model interpretability.
We provide new perspectives on future development of GSP techniques that may serve as a bridge between applied mathematics and signal processing on one side, and machine learning and network science on the other.
arXiv Detail & Related papers (2020-07-31T13:21:33Z) - Bayesian Sparse Factor Analysis with Kernelized Observations [67.60224656603823]
Multi-view problems can be faced with latent variable models.
High-dimensionality and non-linear issues are traditionally handled by kernel methods.
We propose merging both approaches into single model.
arXiv Detail & Related papers (2020-06-01T14:25:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.