Tabular Incremental Inference
- URL: http://arxiv.org/abs/2601.15751v2
- Date: Tue, 27 Jan 2026 05:31:04 GMT
- Title: Tabular Incremental Inference
- Authors: Xinda Chen, Zhen Xing, Hanyu Zhang, Weimin Tan, Bo Yan,
- Abstract summary: Tabular Incremental Inference (TabII) aims to enable trained models to incorporate new columns during the inference stage.<n> Experimental results across eight public datasets show that TabII effectively utilizes incremental attributes.
- Score: 32.65122826292422
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tabular data is a fundamental form of data structure. The evolution of table analysis tools reflects humanity's continuous progress in data acquisition, management, and processing. The dynamic changes in table columns arise from technological advancements, changing needs, data integration, etc. However, the standard process of training AI models on tables with fixed columns and then performing inference is not suitable for handling dynamically changed tables. Therefore, new methods are needed for efficiently handling such tables in an unsupervised manner. In this paper, we introduce a new task, Tabular Incremental Inference (TabII), which aims to enable trained models to incorporate new columns during the inference stage, enhancing the practicality of AI models in scenarios where tables are dynamically changed. Furthermore, we demonstrate that this new task can be framed as an optimization problem based on the information bottleneck theory, which emphasizes that the key to an ideal tabular incremental inference approach lies in minimizing mutual information between tabular data and representation while maximizing between representation and task labels. Under this guidance, we design a TabII method with Large Language Model placeholders and Pretrained TabAdapter to provide external knowledge and Incremental Sample Condensation blocks to condense the task-relevant information given by incremental column attributes. Experimental results across eight public datasets show that TabII effectively utilizes incremental attributes, achieving state-of-the-art performance.
Related papers
- ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement [58.957050610762565]
ShowTable is a pipeline that synergizes MLLMs with diffusion models via a progressive self-correcting process.<n> MLLM acts as the central orchestrator for reasoning the visual plan and judging visual errors.<n>We introduce TableVisBench, a new benchmark with 800 challenging instances across 5 evaluation dimensions.
arXiv Detail & Related papers (2025-12-15T13:21:50Z) - TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding [52.59372043981724]
TableDART is a training-efficient framework that integrates multimodal views by reusing pretrained single-modality models.<n>In addition, we propose a novel agent to cross-modal knowledge integration by analyzing outputs from text- and image-based models.
arXiv Detail & Related papers (2025-09-18T07:00:13Z) - TableZoomer: A Collaborative Agent Framework for Large-scale Table Question Answering [26.00027389659854]
TableZoomer is a programming-based agent framework for the table question answering (TQA) task.<n>It introduces three key innovations: (1) replacing the original fully verbalized table with structured table schema to bridge the semantic gap and reduce computational complexity; (2) a query-aware table zooming mechanism that dynamically generates sub-table schema through column selection and entity linking; and (3) a Program-of-Thoughts (PoT) strategy that transforms queries into executable code to mitigate numerical hallucination.
arXiv Detail & Related papers (2025-09-01T09:53:01Z) - Improving Table Understanding with LLMs and Entity-Oriented Search [24.3302301035859]
We introduce an entity-oriented search method to improve table understanding with large language models (LLMs)<n>This approach effectively leverages the semantic similarities between questions and table data, as well as the implicit relationships between table cells.<n>It focuses on table entities, ensuring that table cells are semantically tightly bound, thereby enhancing contextual clarity.
arXiv Detail & Related papers (2025-08-23T14:02:45Z) - Multimodal Tabular Reasoning with Privileged Structured Information [67.40011423365712]
We introduce TabUlar Reasoning with Bridged infOrmation (sc Turbo)<n>sc Turbo benefits from a structure-aware reasoning trace generator based on DeepSeek-R1.<n>sc Turbo achieves state-of-the-art performance ($+7.2%$ vs. previous SOTA) across multiple datasets.
arXiv Detail & Related papers (2025-06-04T15:46:30Z) - Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models [62.47618742274461]
We fine-tune base models from the Mistral, OLMo, and Phi families on existing public training datasets.<n>Our replication achieves performance on par with or surpassing existing table LLMs.<n>We decouple the contributions of training data and the base model, providing insight into their individual impacts.
arXiv Detail & Related papers (2025-01-24T18:50:26Z) - TabDPT: Scaling Tabular Foundation Models on Real Data [20.00390825519329]
We propose an approach to combine ICL-based retrieval with self supervised learning to train foundation models.<n>We show that incorporating real data during the pre-training phase can lead to significantly faster training and better generalization to unseen data.<n>Our resulting model, TabDPT, achieves top performance on both regression (CTR23) and classification (CC18) benchmarks.
arXiv Detail & Related papers (2024-10-23T18:00:00Z) - TableRAG: Million-Token Table Understanding with Language Models [53.039560091592215]
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding.<n>TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.<n>Our results demonstrate that TableRAG achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
arXiv Detail & Related papers (2024-10-07T04:15:02Z) - TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively.
It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z) - PixT3: Pixel-based Table-To-Text Generation [66.96636025277536]
We present PixT3, a multimodal table-to-text model that overcomes the challenges of linearization and input size limitations.
Experiments on the ToTTo and Logic2Text benchmarks show that PixT3 is competitive and superior to generators that operate solely on text.
arXiv Detail & Related papers (2023-11-16T11:32:47Z) - UniTabE: A Universal Pretraining Protocol for Tabular Foundation Model
in Data Science [16.384705926693073]
This study seeks to extend the power of pretraining methodologies to facilitate the prediction over tables in data science.
We introduce UniTabE, a method designed to process tables in a uniform manner, devoid of constraints imposed by specific table structures.
In order to implement the pretraining phase, we curated an expansive dataset comprising approximately 13B samples, meticulously gathered from the Kaggle platform.
arXiv Detail & Related papers (2023-07-18T13:28:31Z) - Retrieval-Based Transformer for Table Augmentation [14.460363647772745]
We introduce a novel approach toward automatic data wrangling.
We aim to address table augmentation tasks, including row/column population and data imputation.
Our model consistently and substantially outperforms both supervised statistical methods and the current state-of-the-art transformer-based models.
arXiv Detail & Related papers (2023-06-20T18:51:21Z) - Understanding tables with intermediate pre-training [11.96734018295146]
We adapt TAPAS, a table-based BERT model, to recognize entailment.
We evaluate table pruning techniques as a pre-processing step to drastically improve the training and prediction efficiency.
arXiv Detail & Related papers (2020-10-01T17:43:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.