The Relational Data Borg is Learning
- URL: http://arxiv.org/abs/2008.07864v1
- Date: Tue, 18 Aug 2020 11:25:45 GMT
- Title: The Relational Data Borg is Learning
- Authors: Dan Olteanu
- Abstract summary: This paper overviews an approach that addresses machine learning over computation data as a database problem.
This approach has been already investigated for a number of supervised and unsupervised learning tasks.
- Score: 3.228602524766158
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper overviews an approach that addresses machine learning over
relational data as a database problem. This is justified by two observations.
First, the input to the learning task is commonly the result of a feature
extraction query over the relational data. Second, the learning task requires
the computation of group-by aggregates.
This approach has been already investigated for a number of supervised and
unsupervised learning tasks, including: ridge linear regression, factorisation
machines, support vector machines, decision trees, principal component
analysis, and k-means; and also for linear algebra over data matrices.
The main message of this work is that the runtime performance of machine
learning can be dramatically boosted by a toolbox of techniques that exploit
the knowledge of the underlying data. This includes theoretical development on
the algebraic, combinatorial, and statistical structure of relational data
processing and systems development on code specialisation, low-level
computation sharing, and parallelisation. These techniques aim at lowering both
the complexity and the constant factors of the learning time.
This work is the outcome of extensive collaboration of the author with
colleagues from RelationalAI, in particular Mahmoud Abo Khamis, Molham Aref,
Hung Ngo, and XuanLong Nguyen, and from the FDB research project, in particular
Ahmet Kara, Milos Nikolic, Maximilian Schleich, Amir Shaikhha, Jakub Zavodny,
and Haozhe Zhang. The author would also like to thank the members of the FDB
project for the figures and examples used in this paper.
The author is grateful for support from industry: Amazon Web Services,
Google, Infor, LogicBlox, Microsoft Azure, RelationalAI; and from the funding
agencies EPSRC and ERC. This project has received funding from the European
Union's Horizon 2020 research and innovation programme under grant agreement No
682588.
Related papers
- GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration [46.663380413396226]
GraphTeam consists of five LLM-based agents from three modules, and the agents with different specialities can collaborate to address complex problems.
Experiments on six graph analysis benchmarks demonstrate that GraphTeam achieves state-of-the-art performance with an average 25.85% improvement over the best baseline in terms of accuracy.
arXiv Detail & Related papers (2024-10-23T17:02:59Z) - BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data [61.936320820180875]
Large language models (LLMs) have become increasingly pivotal across various domains.
BabelBench is an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution.
Our experimental findings on BabelBench indicate that even cutting-edge models like ChatGPT 4 exhibit substantial room for improvement.
arXiv Detail & Related papers (2024-10-01T15:11:24Z) - RelBench: A Benchmark for Deep Learning on Relational Databases [78.52438155603781]
We present RelBench, a public benchmark for solving tasks over databases with graph neural networks.
We use RelBench to conduct the first comprehensive study of Deep Learning infrastructure.
RDL learns better whilst reducing human work needed by more than an order of magnitude.
arXiv Detail & Related papers (2024-07-29T14:46:13Z) - Relational Deep Learning: Graph Representation Learning on Relational
Databases [69.7008152388055]
We introduce an end-to-end representation approach to learn on data laid out across multiple tables.
Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all data input.
arXiv Detail & Related papers (2023-12-07T18:51:41Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - GPT-FinRE: In-context Learning for Financial Relation Extraction using
Large Language Models [1.9559144041082446]
This paper describes our solution to relation extraction on one such dataset REFinD.
In this paper, we employed OpenAI models under the framework of in-context learning (ICL)
We were able to achieve 3rd rank overall. Our best F1-score is 0.718.
arXiv Detail & Related papers (2023-06-30T10:12:30Z) - The Tensor Data Platform: Towards an AI-centric Database System [6.519203713828565]
We make the case that it is time to do the same for AI -- but with a twist!
We claim that achieving a truly AI-centric database requires moving the engine, at its core, from a relational to a tensor abstraction.
This allows us to: (1) support multi-modal data processing such as images, videos, audio, text as well as relational; (2) leverage the wellspring of innovation in HW and runtimes for tensor computation; and (3) exploit automatic differentiation to enable a novel class of "trainable" queries that can learn to perform a task.
arXiv Detail & Related papers (2022-11-04T21:26:16Z) - Semantic Parsing to Manipulate Relational Database For a Management
System [0.0]
This work is carried out proposes a simple algorithm, a model which can be implemented in different fields each with its own work scope.
The proposed model converts human language text to-understandablesql queries.
This paper compares the time among the 2 datasets and also compares the accuracy of both.
arXiv Detail & Related papers (2021-02-18T15:08:23Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Bayesian active learning for production, a systematic study and a
reusable library [85.32971950095742]
In this paper, we analyse the main drawbacks of current active learning techniques.
We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process.
We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size.
arXiv Detail & Related papers (2020-06-17T14:51:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.