Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations
- URL: http://arxiv.org/abs/2404.12608v1
- Date: Fri, 19 Apr 2024 03:28:18 GMT
- Title: Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations
- Authors: Sibei Chen, Yeye He, Weiwei Cui, Ju Fan, Song Ge, Haidong Zhang, Dongmei Zhang, Surajit Chaudhuri,
- Abstract summary: We develop an Auto-Formula system that can accurately predict formulas that users want to author in a target spreadsheet cell.
We use contrastive-learning techniques inspired by "similar-face recognition" from compute vision.
- Score: 36.2969566996675
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spreadsheets are widely recognized as the most popular end-user programming tools, which blend the power of formula-based computation, with an intuitive table-based interface. Today, spreadsheets are used by billions of users to manipulate tables, most of whom are neither database experts nor professional programmers. Despite the success of spreadsheets, authoring complex formulas remains challenging, as non-technical users need to look up and understand non-trivial formula syntax. To address this pain point, we leverage the observation that there is often an abundance of similar-looking spreadsheets in the same organization, which not only have similar data, but also share similar computation logic encoded as formulas. We develop an Auto-Formula system that can accurately predict formulas that users want to author in a target spreadsheet cell, by learning and adapting formulas that already exist in similar spreadsheets, using contrastive-learning techniques inspired by "similar-face recognition" from compute vision. Extensive evaluations on over 2K test formulas extracted from real enterprise spreadsheets show the effectiveness of Auto-Formula over alternatives. Our benchmark data is available at https://github.com/microsoft/Auto-Formula to facilitate future research.
Related papers
- SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation [34.8332394229927]
SpreadsheetBench is designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users.
Unlike existing benchmarks that rely on synthesized queries and simplified spreadsheet files, SpreadsheetBench is built from 912 real questions gathered from online Excel forums.
Our comprehensive evaluation of various LLMs under both single-round and multi-round inference settings reveals a substantial gap between the state-of-the-art (SOTA) models and human performance.
arXiv Detail & Related papers (2024-06-21T09:06:45Z) - TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [52.73289223176475]
TableLLM is a robust large language model (LLM) with 13 billion parameters.
TableLLM is purpose-built for proficiently handling data manipulation tasks.
We have released the model checkpoint, source code, benchmarks, and a web application for user interaction.
arXiv Detail & Related papers (2024-03-28T11:21:12Z) - NL2Formula: Generating Spreadsheet Formulas from Natural Language
Queries [29.33149993368329]
This paper introduces a novel benchmark task called NL2Formula.
The aim is to generate executable formulas that are grounded on a spreadsheet table, given a Natural Language (NL) query as input.
We construct a comprehensive dataset consisting of 70,799 paired NL queries and corresponding spreadsheet formulas, covering 21,670 tables and 37 types of formula functions.
arXiv Detail & Related papers (2024-02-20T05:58:05Z) - STUNT: Few-shot Tabular Learning with Self-generated Tasks from
Unlabeled Tables [64.0903766169603]
We propose a framework for few-shot semi-supervised learning, coined Self-generated Tasks from UNlabeled Tables (STUNT)
Our key idea is to self-generate diverse few-shot tasks by treating randomly chosen columns as a target label.
We then employ a meta-learning scheme to learn generalizable knowledge with the constructed tasks.
arXiv Detail & Related papers (2023-03-02T02:37:54Z) - FLAME: A small language model for spreadsheet formulas [25.667479554632735]
We present FLAME, a transformer-based model trained exclusively on Excel formulas.
We use sketch deduplication, introduce an Excel-specific formula tokenizer, and use domain-specific versions of masked span prediction.
We evaluate FLAME on formula repair, formula completion, and similarity-based formula retrieval.
arXiv Detail & Related papers (2023-01-31T17:29:43Z) - FORTAP: Using Formulae for Numerical-Reasoning-Aware Table Pretraining [23.747119682226675]
FORTAP is the first method for numerical-reasoning-aware table pretraining by leveraging large corpus of spreadsheet formulae.
FORTAP achieves results on two representative downstream tasks, cell type classification and formula prediction.
arXiv Detail & Related papers (2021-09-15T14:31:17Z) - SpreadsheetCoder: Formula Prediction from Semi-structured Context [70.41579328458116]
We propose a BERT-based model architecture to represent the tabular context in both row-based and column-based formats.
We train our model on a large dataset of spreadsheets, and demonstrate that SpreadsheetCoder achieves top-1 prediction accuracy of 42.51%.
Compared to the rule-based system, SpreadsheetCoder 82% assists more users in composing formulas on Google Sheets.
arXiv Detail & Related papers (2021-06-26T11:26:27Z) - TGRNet: A Table Graph Reconstruction Network for Table Structure
Recognition [76.06530816349763]
We propose an end-to-end trainable table graph reconstruction network (TGRNet) for table structure recognition.
Specifically, the proposed method has two main branches, a cell detection branch and a cell logical location branch, to jointly predict the spatial location and the logical location of different cells.
arXiv Detail & Related papers (2021-06-20T01:57:05Z) - TCN: Table Convolutional Network for Web Table Interpretation [52.32515851633981]
We propose a novel table representation learning approach considering both the intra- and inter-table contextual information.
Our method can outperform competitive baselines by +4.8% of F1 for column type prediction and by +4.1% of F1 for column pairwise relation prediction.
arXiv Detail & Related papers (2021-02-17T02:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.