Multi-layer Optimizations for End-to-End Data Analytics
- URL: http://arxiv.org/abs/2001.03541v1
- Date: Fri, 10 Jan 2020 16:14:44 GMT
- Title: Multi-layer Optimizations for End-to-End Data Analytics
- Authors: Amir Shaikhha, Maximilian Schleich, Alexandru Ghita, Dan Olteanu
- Abstract summary: We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
- Score: 71.05611866288196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of training machine learning models over
multi-relational data. The mainstream approach is to first construct the
training dataset using a feature extraction query over input database and then
use a statistical software package of choice to train the model. In this paper
we introduce Iterative Functional Aggregate Queries (IFAQ), a framework that
realizes an alternative approach. IFAQ treats the feature extraction query and
the learning task as one program given in the IFAQ's domain-specific language,
which captures a subset of Python commonly used in Jupyter notebooks for rapid
prototyping of machine learning applications. The program is subject to several
layers of IFAQ optimizations, such as algebraic transformations, loop
transformations, schema specialization, data layout optimizations, and finally
compilation into efficient low-level C++ code specialized for the given
workload and data.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit,
and TensorFlow by several orders of magnitude for linear regression and
regression tree models over several relational datasets.
Related papers
- Enhancing Question Answering Precision with Optimized Vector Retrieval and Instructions [1.2425910171551517]
Question-answering (QA) is an important application of Information Retrieval (IR) and language models.
We propose an innovative approach to improve QA task performances by integrating optimized vector retrievals and instruction methodologies.
arXiv Detail & Related papers (2024-11-01T21:14:04Z) - Adapt-$\infty$: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for Lifelong Instruction Tuning.
We construct pseudo-skill clusters by grouping gradient-based sample vectors.
We select the best-performing data selector for each skill cluster from a pool of selector experts.
arXiv Detail & Related papers (2024-10-14T15:48:09Z) - Scalable Inference for Bayesian Multinomial Logistic-Normal Dynamic Linear Models [0.5735035463793009]
This article develops an efficient and accurate approach to posterior state estimation, called $textitFenrir$.
Our experiments suggest that Fenrir can be three orders of magnitude more efficient than Stan.
Our methods are made available to the community as a user-friendly software library written in C++ with an R interface.
arXiv Detail & Related papers (2024-10-07T23:20:14Z) - Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning [1.6570772838074355]
multimodal large language models (MLLMs) exhibit great potential for chart question answering (CQA)
Recent efforts primarily focus on scaling up training datasets through data collection and synthesis.
We propose a visualization-referenced instruction tuning approach to guide the training dataset enhancement and model development.
arXiv Detail & Related papers (2024-07-29T17:04:34Z) - Learning to Retrieve Iteratively for In-Context Learning [56.40100968649039]
iterative retrieval is a novel framework that empowers retrievers to make iterative decisions through policy optimization.
We instantiate an iterative retriever for composing in-context learning exemplars and apply it to various semantic parsing tasks.
By adding only 4M additional parameters for state encoding, we convert an off-the-shelf dense retriever into a stateful iterative retriever.
arXiv Detail & Related papers (2024-06-20T21:07:55Z) - Selecting Walk Schemes for Database Embedding [6.7609045625714925]
We study the embedding of components of a relational database.
We focus on the recent FoRWaRD algorithm that is designed for dynamic databases.
We show that by focusing on a few informative walk schemes, we can obtain embedding significantly faster, while retaining the quality.
arXiv Detail & Related papers (2024-01-20T11:39:32Z) - MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation.
Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results.
For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data.
For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Parameter-Efficient Abstractive Question Answering over Tables or Text [60.86457030988444]
A long-term ambition of information seeking QA systems is to reason over multi-modal contexts and generate natural answers to user queries.
Memory intensive pre-trained language models are adapted to downstream tasks such as QA by fine-tuning the model on QA data in a specific modality like unstructured text or structured tables.
To avoid training such memory-hungry models while utilizing a uniform architecture for each modality, parameter-efficient adapters add and train small task-specific bottle-neck layers between transformer layers.
arXiv Detail & Related papers (2022-04-07T10:56:29Z) - StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics [4.237343083490243]
In machine learning (ML), ensemble methods such as bagging, boosting, and stacking are widely-established approaches.
StackGenVis is a visual analytics system for stacked generalization.
arXiv Detail & Related papers (2020-05-04T15:43:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.