FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding
- URL: http://arxiv.org/abs/2505.17330v1
- Date: Thu, 22 May 2025 22:53:58 GMT
- Title: FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding
- Authors: Amit Agarwal, Srikant Panda, Kulbhushan Pachauri,
- Abstract summary: Few Shot Domain Adapting Graph (FS-DAG) is a scalable and efficient model architecture for visually rich document understanding (VRDU) in few-shot settings.<n>FS-DAG is highly performant with less than 90M parameters, making it well-suited for complex real-world applications for Information Extraction (IE)<n>We demonstrate FS-DAG's capability through extensive experiments for information extraction task, showing significant improvements in convergence speed and performance compared to state-of-the-art methods.
- Score: 0.9843385481559191
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we propose Few Shot Domain Adapting Graph (FS-DAG), a scalable and efficient model architecture for visually rich document understanding (VRDU) in few-shot settings. FS-DAG leverages domain-specific and language/vision specific backbones within a modular framework to adapt to diverse document types with minimal data. The model is robust to practical challenges such as handling OCR errors, misspellings, and domain shifts, which are critical in real-world deployments. FS-DAG is highly performant with less than 90M parameters, making it well-suited for complex real-world applications for Information Extraction (IE) tasks where computational resources are limited. We demonstrate FS-DAG's capability through extensive experiments for information extraction task, showing significant improvements in convergence speed and performance compared to state-of-the-art methods. Additionally, this work highlights the ongoing progress in developing smaller, more efficient models that do not compromise on performance. Code : https://github.com/oracle-samples/fs-dag
Related papers
- AFRDA: Attentive Feature Refinement for Domain Adaptive Semantic Segmentation [8.541106387148872]
In Unsupervised Domain Adaptive Semantic (UDA-SS) a model is trained on labeled source domain data and adapted to an unlabeled target domain.<n>Existing UDA-SS methods often struggle to balance fine-grained local details with global contextual information.<n>We introduce the Adaptive Feature Refinement (AFR) module, which enhances segmentation accuracy by refining highresolution features.
arXiv Detail & Related papers (2025-07-23T22:02:17Z) - Topology-Aware CLIP Few-Shot Learning [0.0]
We introduce a topology-aware tuning approach integrating Representation Topology Divergence into the Task Residual framework.<n>By explicitly aligning the topological structures of visual and text representations using a combined RTD and Cross-Entropy loss, our method enhances few-shot performance.
arXiv Detail & Related papers (2025-05-03T04:58:29Z) - QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding [53.69841526266547]
Fine-tuning a pre-trained Vision-Language Model with new datasets often falls short in optimizing the vision encoder.<n>We introduce QID, a novel, streamlined, architecture-preserving approach that integrates query embeddings into the vision encoder.
arXiv Detail & Related papers (2025-04-03T18:47:16Z) - RGL: A Graph-Centric, Modular Framework for Efficient Retrieval-Augmented Generation on Graphs [58.10503898336799]
We introduce the RAG-on-Graphs Library (RGL), a modular framework that seamlessly integrates the complete RAG pipeline.<n>RGL addresses key challenges by supporting a variety of graph formats and integrating optimized implementations for essential components.<n>Our evaluations demonstrate that RGL not only accelerates the prototyping process but also enhances the performance and applicability of graph-based RAG systems.
arXiv Detail & Related papers (2025-03-25T03:21:48Z) - LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding [4.759109475818876]
Implicit Neural Representations (INRs) are proving to be a powerful paradigm in unifying task modeling across diverse data domains.<n>We introduce LIFT, a novel, high-performance framework that captures multiscale information through meta-learning.<n>We also introduce ReLIFT, an enhanced variant of LIFT that incorporates residual connections and expressive frequency encodings.
arXiv Detail & Related papers (2025-03-19T17:00:58Z) - Resource-Efficient Affordance Grounding with Complementary Depth and Semantic Prompts [21.435113588059924]
Affordance refers to the functional properties that an agent perceives and utilizes from its environment.<n>Existing multimodal affordance methods face limitations in extracting useful information.<n>This paper proposes the BiT-Align image-depth-text affordance mapping framework.
arXiv Detail & Related papers (2025-03-04T13:20:42Z) - ContextFormer: Redefining Efficiency in Semantic Segmentation [48.81126061219231]
Convolutional methods, although capturing local dependencies well, struggle with long-range relationships.<n>Vision Transformers (ViTs) excel in global context capture but are hindered by high computational demands.<n>We propose ContextFormer, a hybrid framework leveraging the strengths of CNNs and ViTs in the bottleneck to balance efficiency, accuracy, and robustness for real-time semantic segmentation.
arXiv Detail & Related papers (2025-01-31T16:11:04Z) - DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights [8.139817615390147]
This paper introduces the Domain Adaptive Visually-rich Document Understanding (DAViD) framework.
DAViD integrates fine-grained and coarse-grained document representation learning and employs synthetic annotations to reduce the need for costly manual labelling.
arXiv Detail & Related papers (2024-10-02T14:47:55Z) - DiffusionNAG: Predictor-guided Neural Architecture Generation with Diffusion Models [56.584561770857306]
We propose a novel conditional Neural Architecture Generation (NAG) framework based on diffusion models, dubbed DiffusionNAG.
Specifically, we consider the neural architectures as directed graphs and propose a graph diffusion model for generating them.
We validate the effectiveness of DiffusionNAG through extensive experiments in two predictor-based NAS scenarios: Transferable NAS and Bayesian Optimization (BO)-based NAS.
When integrated into a BO-based algorithm, DiffusionNAG outperforms existing BO-based NAS approaches, particularly in the large MobileNetV3 search space on the ImageNet 1K dataset.
arXiv Detail & Related papers (2023-05-26T13:58:18Z) - Exploring Few-Shot Adaptation for Activity Recognition on Diverse Domains [46.26074225989355]
Domain adaptation is essential for activity recognition to ensure accurate and robust performance across diverse environments.
In this work, we focus on FewShot Domain Adaptation for Activity Recognition (FSDA-AR), which leverages a very small amount of labeled target videos.
We propose a new FSDA-AR using five established datasets considering the adaptation on more diverse and challenging domains.
arXiv Detail & Related papers (2023-05-15T08:01:05Z) - GenURL: A General Framework for Unsupervised Representation Learning [58.59752389815001]
Unsupervised representation learning (URL) learns compact embeddings of high-dimensional data without supervision.
We propose a unified similarity-based URL framework, GenURL, which can smoothly adapt to various URL tasks.
Experiments demonstrate that GenURL achieves consistent state-of-the-art performance in self-supervised visual learning, unsupervised knowledge distillation (KD), graph embeddings (GE), and dimension reduction.
arXiv Detail & Related papers (2021-10-27T16:24:39Z) - Disentangled Feature Representation for Few-shot Image Classification [64.40410801469106]
We propose a novel Disentangled Feature Representation framework, dubbed DFR, for few-shot learning applications.
DFR can adaptively decouple the discriminative features that are modeled by the classification branch, from the class-irrelevant component of the variation branch.
In general, most of the popular deep few-shot learning methods can be plugged in as the classification branch, thus DFR can boost their performance on various few-shot tasks.
arXiv Detail & Related papers (2021-09-26T09:53:11Z) - Cross-Domain Facial Expression Recognition: A Unified Evaluation
Benchmark and Adversarial Graph Learning [85.6386289476598]
We develop a novel adversarial graph representation adaptation (AGRA) framework for cross-domain holistic-local feature co-adaptation.
We conduct extensive and fair evaluations on several popular benchmarks and show that the proposed AGRA framework outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2020-08-03T15:00:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.