Complex Coordinate-Based Meta-Analysis with Probabilistic Programming
- URL: http://arxiv.org/abs/2012.01303v2
- Date: Fri, 22 Jan 2021 13:36:50 GMT
- Title: Complex Coordinate-Based Meta-Analysis with Probabilistic Programming
- Authors: Valentin Iovene (NEUROSPIN, PARIETAL), Gaston Zanitti (NEUROSPIN,
PARIETAL), Demian Wassermann (NEUROSPIN, PARIETAL)
- Abstract summary: Coordinate-based meta-analysis (CBMA) databases are built by automatically extracting both coordinates of reported peak activations and term associations.
We show how recent lifted query processing algorithms make it possible to scale to the size of large neuroimaging data.
We demonstrate results for two-term conjunctive queries, both on simulated meta-analysis databases and on the widely-used Neurosynth database.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growing number of published functional magnetic resonance imaging
(fMRI) studies, meta-analysis databases and models have become an integral part
of brain mapping research. Coordinate-based meta-analysis (CBMA) databases are
built by automatically extracting both coordinates of reported peak activations
and term associations using natural language processing (NLP) techniques.
Solving term-based queries on these databases make it possible to obtain
statistical maps of the brain related to specific cognitive processes. However,
with tools like Neurosynth, only singleterm queries lead to statistically
reliable results. When solving richer queries, too few studies from the
database contribute to the statistical estimations. We design a probabilistic
domain-specific language (DSL) standing on Datalog and one of its probabilistic
extensions, CP-Logic, for expressing and solving rich logic-based queries. We
encode a CBMA database into a probabilistic program. Using the joint
distribution of its Bayesian network translation, we show that solutions of
queries on this program compute the right probability distributions of voxel
activations. We explain how recent lifted query processing algorithms make it
possible to scale to the size of large neuroimaging data, where state of the
art knowledge compilation (KC) techniques fail to solve queries fast enough for
practical applications. Finally, we introduce a method for relating studies to
terms probabilistically, leading to better solutions for conjunctive queries on
smaller databases. We demonstrate results for two-term conjunctive queries,
both on simulated meta-analysis databases and on the widely-used Neurosynth
database.
Related papers
- DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - In-Database Data Imputation [0.6157028677798809]
Missing data is a widespread problem in many domains, creating challenges in data analysis and decision making.
Traditional techniques for dealing with missing data, such as excluding incomplete records or imputing simple estimates, are computationally efficient but may introduce bias and disrupt variable relationships.
Model-based imputation techniques offer a more robust solution that preserves the variability and relationships in the data, but they demand significantly more computation time.
This work enables efficient, high-quality, and scalable data imputation within a database system using the widely used MICE method.
arXiv Detail & Related papers (2024-01-07T01:57:41Z) - Chat2Brain: A Method for Mapping Open-Ended Semantic Queries to Brain
Activation Maps [59.648646222905235]
We propose a method called Chat2Brain that combines LLMs to basic text-2-image model, known as Text2Brain, to map semantic queries to brain activation maps.
We demonstrate that Chat2Brain can synthesize plausible neural activation patterns for more complex tasks of text queries.
arXiv Detail & Related papers (2023-09-10T13:06:45Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - Rethinking Complex Queries on Knowledge Graphs with Neural Link Predictors [58.340159346749964]
We propose a new neural-symbolic method to support end-to-end learning using complex queries with provable reasoning capability.
We develop a new dataset containing ten new types of queries with features that have never been considered.
Our method outperforms previous methods significantly in the new dataset and also surpasses previous methods in the existing dataset at the same time.
arXiv Detail & Related papers (2023-04-14T11:35:35Z) - A Scalable Space-efficient In-database Interpretability Framework for
Embedding-based Semantic SQL Queries [3.0938904602244346]
We introduce a new co-occurrence based interpretability approach to capture relationships between relational entities.
Our approach provides both query-agnostic (global) and query-specific (local) interpretabilities.
arXiv Detail & Related papers (2023-02-23T17:18:40Z) - Hard and Soft EM in Bayesian Network Learning from Incomplete Data [1.5484595752241122]
We investigate the impact of using imputation instead of belief propagation on the quality of the resulting BNs.
We find that it is possible to recommend one approach over the other in several scenarios based on the characteristics of the data.
arXiv Detail & Related papers (2020-12-09T19:13:32Z) - Symbolic Querying of Vector Spaces: Probabilistic Databases Meets
Relational Embeddings [35.877591735510734]
We formalize a probabilistic database model with respect to which all queries are done.
The lack of a well-defined joint probability distribution causes simple query problems to become provably hard.
We introduce TO, a relational embedding model designed to be a tractable probabilistic database.
arXiv Detail & Related papers (2020-02-24T01:17:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.