SALT: Sales Autocompletion Linked Business Tables Dataset
- URL: http://arxiv.org/abs/2501.03413v1
- Date: Mon, 06 Jan 2025 22:20:02 GMT
- Title: SALT: Sales Autocompletion Linked Business Tables Dataset
- Authors: Tassilo Klein, Clemens Biehl, Margarida Costa, Andre Sres, Jonas Kolk, Johannes Hoffart,
- Abstract summary: We introduce a curated dataset sourced from an Enterprise Resource Planning (ERP) system, featuring extensive linked tables.<n>Our goal is to potentially enhance the effectiveness and applicability of models for real-world business contexts.
- Score: 7.036380633387952
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models, particularly those that incorporate Transformer architectures, have demonstrated exceptional performance in domains such as natural language processing and image processing. Adapting these models to structured data, like tables, however, introduces significant challenges. These difficulties are even more pronounced when addressing multi-table data linked via foreign key, which is prevalent in the enterprise realm and crucial for empowering business use cases. Despite its substantial impact, research focusing on such linked business tables within enterprise settings remains a significantly important yet underexplored domain. To address this, we introduce a curated dataset sourced from an Enterprise Resource Planning (ERP) system, featuring extensive linked tables. This dataset is specifically designed to support research endeavors in table representation learning. By providing access to authentic enterprise data, our goal is to potentially enhance the effectiveness and applicability of models for real-world business contexts.
Related papers
- RASL: Retrieval Augmented Schema Linking for Massive Database Text-to-SQL [1.3654846342364308]
We introduce a component-based retrieval architecture that decomposes database schemas and metadata into discrete semantic units.<n>Our solution enables practical text-to- interfaces across diverse enterprise settings without specialized fine-tuning.
arXiv Detail & Related papers (2025-07-30T21:09:47Z) - A Pre-training Framework for Relational Data with Information-theoretic Principles [57.93973948947743]
We introduce Task Vector Estimation (TVE), a novel pre-training framework that constructs supervisory signals via set-based aggregation over relational graphs.<n>TVE consistently outperforms traditional pre-training baselines.<n>Our findings advocate for pre-training objectives that encode task heterogeneity and temporal structure as design principles for predictive modeling on relational databases.
arXiv Detail & Related papers (2025-07-14T00:17:21Z) - Relational Deep Learning: Challenges, Foundations and Next-Generation Architectures [50.46688111973999]
Graph machine learning has led to a significant increase in the capabilities of models that learn on arbitrary graph-structured data.<n>We present a new blueprint that enables end-to-end representation of'relational entity graphs' without traditional engineering feature.<n>We discuss key challenges including large-scale multi-table integration and the complexities of modeling temporal dynamics and heterogeneous data.
arXiv Detail & Related papers (2025-06-19T23:51:38Z) - Multimodal Tabular Reasoning with Privileged Structured Information [67.40011423365712]
We introduce TabUlar Reasoning with Bridged infOrmation (sc Turbo)<n>sc Turbo benefits from a structure-aware reasoning trace generator based on DeepSeek-R1.<n>sc Turbo achieves state-of-the-art performance ($+7.2%$ vs. previous SOTA) across multiple datasets.
arXiv Detail & Related papers (2025-06-04T15:46:30Z) - Foundation Models for Tabular Data within Systemic Contexts Need Grounding [9.820997523824676]
We introduce the concept of Semantically Linked Tables (SLT), recognizing that tables are inherently connected to both declarative and procedural operational knowledge.<n>We propose Foundation Models for Semantically Linked Tables (FMSLT), which integrate these components to ground data within its true operational context.
arXiv Detail & Related papers (2025-05-26T11:02:51Z) - Bridging Queries and Tables through Entities in Table Retrieval [70.13748256886288]
Entities are well-studied in the context of text retrieval, but there is a noticeable lack of research on their applications in table retrieval.
We propose an entity-enhanced training framework and design an interaction paradigm based on entity representations.
Our proposed framework is plug-and-play and flexible, making it easy to integrate into existing table retriever training processes.
arXiv Detail & Related papers (2025-04-09T03:16:33Z) - Meta-Statistical Learning: Supervised Learning of Statistical Inference [59.463430294611626]
This work demonstrates that the tools and principles driving the success of large language models (LLMs) can be repurposed to tackle distribution-level tasks.
We propose meta-statistical learning, a framework inspired by multi-instance learning that reformulates statistical inference tasks as supervised learning problems.
arXiv Detail & Related papers (2025-02-17T18:04:39Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Synthesizing Realistic Data for Table Recognition [4.500373384879752]
We propose a novel method for synthesizing annotation data specifically designed for table recognition.
By leveraging the structure and content of tables from Chinese financial announcements, we have developed the first extensive table annotation dataset.
We have established the inaugural benchmark for real-world complex tables in the Chinese financial announcement domain, using it to assess the performance of models trained on our synthetic data.
arXiv Detail & Related papers (2024-04-17T06:36:17Z) - Wiki-TabNER:Advancing Table Interpretation Through Named Entity
Recognition [19.423556742293762]
We analyse a widely used benchmark dataset for evaluation of TI tasks.
To overcome this drawback, we construct and annotate a new more challenging dataset.
We propose a prompting framework for evaluating the newly developed large language models.
arXiv Detail & Related papers (2024-03-07T15:22:07Z) - Large Language Model for Table Processing: A Survey [18.32332372134988]
This survey provides a comprehensive overview of table-related tasks.
It covers traditional tasks like table question answering as well as emerging fields such as spreadsheet manipulation and table data analysis.
arXiv Detail & Related papers (2024-02-04T00:47:53Z) - TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively.
It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z) - Serving Deep Learning Model in Relational Databases [70.53282490832189]
Serving deep learning (DL) models on relational data has become a critical requirement across diverse commercial and scientific domains.
We highlight three pivotal paradigms: The state-of-the-art DL-centric architecture offloads DL computations to dedicated DL frameworks.
The potential UDF-centric architecture encapsulates one or more tensor computations into User Defined Functions (UDFs) within the relational database management system (RDBMS)
arXiv Detail & Related papers (2023-10-07T06:01:35Z) - Towards Foundation Models for Relational Databases [Vision Paper] [15.800326697562841]
We introduce our vision of relational representation learning, that can not only learn from the full relational structure, but also can scale to larger database sizes.
Overall, we argue that this direction can lead to foundation models for relational databases that are today only available for text and images.
arXiv Detail & Related papers (2023-05-24T16:37:35Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Embeddings for Tabular Data: A Survey [8.010589283146222]
Tabular data comprises rows (samples) with the same set of columns (attributes)
Tables are becoming the natural way of storing data among various industries and academia.
New line of research work applies various learning techniques to support various database tasks.
arXiv Detail & Related papers (2023-02-23T04:37:49Z) - A Case for Business Process-Specific Foundation Models [6.25118865553438]
We argue that business process data representations have unique characteristics that warrant the development of a new class of foundation models.
These models should tackle the unique challenges of applying AI to business processes which include data scarcity, multi-modal representations, domain specific terminology, and privacy concerns.
arXiv Detail & Related papers (2022-10-26T14:17:47Z) - CateCom: a practical data-centric approach to categorization of
computational models [77.34726150561087]
We present an effort aimed at organizing the landscape of physics-based and data-driven computational models.
We apply object-oriented design concepts and outline the foundations of an open-source collaborative framework.
arXiv Detail & Related papers (2021-09-28T02:59:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.