The Relational Data Borg is Learning
- URL: http://arxiv.org/abs/2008.07864v1
- Date: Tue, 18 Aug 2020 11:25:45 GMT
- Title: The Relational Data Borg is Learning
- Authors: Dan Olteanu
- Abstract summary: This paper overviews an approach that addresses machine learning over computation data as a database problem.
This approach has been already investigated for a number of supervised and unsupervised learning tasks.
- Score: 3.228602524766158
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper overviews an approach that addresses machine learning over
relational data as a database problem. This is justified by two observations.
First, the input to the learning task is commonly the result of a feature
extraction query over the relational data. Second, the learning task requires
the computation of group-by aggregates.
This approach has been already investigated for a number of supervised and
unsupervised learning tasks, including: ridge linear regression, factorisation
machines, support vector machines, decision trees, principal component
analysis, and k-means; and also for linear algebra over data matrices.
The main message of this work is that the runtime performance of machine
learning can be dramatically boosted by a toolbox of techniques that exploit
the knowledge of the underlying data. This includes theoretical development on
the algebraic, combinatorial, and statistical structure of relational data
processing and systems development on code specialisation, low-level
computation sharing, and parallelisation. These techniques aim at lowering both
the complexity and the constant factors of the learning time.
This work is the outcome of extensive collaboration of the author with
colleagues from RelationalAI, in particular Mahmoud Abo Khamis, Molham Aref,
Hung Ngo, and XuanLong Nguyen, and from the FDB research project, in particular
Ahmet Kara, Milos Nikolic, Maximilian Schleich, Amir Shaikhha, Jakub Zavodny,
and Haozhe Zhang. The author would also like to thank the members of the FDB
project for the figures and examples used in this paper.
The author is grateful for support from industry: Amazon Web Services,
Google, Infor, LogicBlox, Microsoft Azure, RelationalAI; and from the funding
agencies EPSRC and ERC. This project has received funding from the European
Union's Horizon 2020 research and innovation programme under grant agreement No
682588.
Related papers
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data [89.2410799619405]
We introduce the Quantitative Reasoning with Data benchmark to evaluate Large Language Models' capability in statistical and causal reasoning with real-world data.
The benchmark comprises a dataset of 411 questions accompanied by data sheets from textbooks, online learning materials, and academic papers.
To compare models' quantitative reasoning abilities on data and text, we enrich the benchmark with an auxiliary set of 290 text-only questions, namely QRText.
arXiv Detail & Related papers (2024-02-27T16:15:03Z) - MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible
Pipeline [12.186691561822256]
We postulate that the inherent nature of large language models (LLMs) presents challenges in modeling mathematical reasoning.
This paper introduces a novel math dataset, enhanced with a capability to utilize a Python code interpreter.
We propose a tentative, easily replicable protocol for the fine-tuning of math-specific LLMs.
arXiv Detail & Related papers (2024-01-16T08:08:01Z) - Kun: Answer Polishment for Chinese Self-Alignment with Instruction
Back-Translation [51.43576926422795]
Kun is a novel approach for creating high-quality instruction-tuning datasets for large language models (LLMs) without relying on manual annotations.
We leverage unlabelled data from diverse sources such as Wudao, Wanjuan, and SkyPile to generate a substantial dataset of over a million Chinese instructional data points.
arXiv Detail & Related papers (2024-01-12T09:56:57Z) - Relational Deep Learning: Graph Representation Learning on Relational
Databases [69.7008152388055]
We introduce an end-to-end representation approach to learn on data laid out across multiple tables.
Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all data input.
arXiv Detail & Related papers (2023-12-07T18:51:41Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and
Regulatory Norms [58.93352076927003]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - GPT-FinRE: In-context Learning for Financial Relation Extraction using
Large Language Models [1.9559144041082446]
This paper describes our solution to relation extraction on one such dataset REFinD.
In this paper, we employed OpenAI models under the framework of in-context learning (ICL)
We were able to achieve 3rd rank overall. Our best F1-score is 0.718.
arXiv Detail & Related papers (2023-06-30T10:12:30Z) - The Tensor Data Platform: Towards an AI-centric Database System [6.519203713828565]
We make the case that it is time to do the same for AI -- but with a twist!
We claim that achieving a truly AI-centric database requires moving the engine, at its core, from a relational to a tensor abstraction.
This allows us to: (1) support multi-modal data processing such as images, videos, audio, text as well as relational; (2) leverage the wellspring of innovation in HW and runtimes for tensor computation; and (3) exploit automatic differentiation to enable a novel class of "trainable" queries that can learn to perform a task.
arXiv Detail & Related papers (2022-11-04T21:26:16Z) - Semantic Parsing to Manipulate Relational Database For a Management
System [0.0]
This work is carried out proposes a simple algorithm, a model which can be implemented in different fields each with its own work scope.
The proposed model converts human language text to-understandablesql queries.
This paper compares the time among the 2 datasets and also compares the accuracy of both.
arXiv Detail & Related papers (2021-02-18T15:08:23Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Bayesian active learning for production, a systematic study and a
reusable library [85.32971950095742]
In this paper, we analyse the main drawbacks of current active learning techniques.
We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process.
We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size.
arXiv Detail & Related papers (2020-06-17T14:51:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.