DATa: Domain Adaptation-Aided Deep Table Detection Using Visual-Lexical
Representations
- URL: http://arxiv.org/abs/2211.06648v1
- Date: Sat, 12 Nov 2022 12:14:16 GMT
- Title: DATa: Domain Adaptation-Aided Deep Table Detection Using Visual-Lexical
Representations
- Authors: Hyebin Kwon, Joungbin An, Dongwoo Lee, Won-Yong Shin
- Abstract summary: We present a novel Domain Adaptation-aided deep Table detection method called DATa.
It guarantees satisfactory performance in a specific target domain where few trusted labels are available.
Experiments show that DATa substantially outperforms competing methods that only utilize visual representations in the target domain.
- Score: 2.542864854772221
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Considerable research attention has been paid to table detection by
developing not only rule-based approaches reliant on hand-crafted heuristics
but also deep learning approaches. Although recent studies successfully perform
table detection with enhanced results, they often experience performance
degradation when they are used for transferred domains whose table layout
features might differ from the source domain in which the underlying model has
been trained. To overcome this problem, we present DATa, a novel Domain
Adaptation-aided deep Table detection method that guarantees satisfactory
performance in a specific target domain where few trusted labels are available.
To this end, we newly design lexical features and an augmented model used for
re-training. More specifically, after pre-training one of state-of-the-art
vision-based models as our backbone network, we re-train our augmented model,
consisting of the vision-based model and the multilayer perceptron (MLP)
architecture. Using new confidence scores acquired based on the trained MLP
architecture as well as an initial prediction of bounding boxes and their
confidence scores, we calculate each confidence score more accurately. To
validate the superiority of DATa, we perform experimental evaluations by
adopting a real-world benchmark dataset in a source domain and another dataset
in our target domain consisting of materials science articles. Experimental
results demonstrate that the proposed DATa method substantially outperforms
competing methods that only utilize visual representations in the target
domain. Such gains are possible owing to the capability of eliminating high
false positives or false negatives according to the setting of a confidence
score threshold.
Related papers
- EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition [6.996304653818122]
We propose a simple yet powerful approach to better exploit the potential of a foundation model for Visual Place Recognition.
We first demonstrate that features extracted from self-attention layers can serve as a powerful re-ranker for VPR.
We then demonstrate that a single-stage method leveraging internal ViT layers for pooling can generate global features that achieve state-of-the-art results.
arXiv Detail & Related papers (2024-05-28T11:24:41Z) - On the Out of Distribution Robustness of Foundation Models in Medical
Image Segmentation [47.95611203419802]
Foundations for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach.
We compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset.
We further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution data.
arXiv Detail & Related papers (2023-11-18T14:52:10Z) - Open-Set Domain Adaptation with Visual-Language Foundation Models [51.49854335102149]
Unsupervised domain adaptation (UDA) has proven to be very effective in transferring knowledge from a source domain to a target domain with unlabeled data.
Open-set domain adaptation (ODA) has emerged as a potential solution to identify these classes during the training phase.
arXiv Detail & Related papers (2023-07-30T11:38:46Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Domain Adaptation with Adversarial Training on Penultimate Activations [82.9977759320565]
Enhancing model prediction confidence on unlabeled target data is an important objective in Unsupervised Domain Adaptation (UDA)
We show that this strategy is more efficient and better correlated with the objective of boosting prediction confidence than adversarial training on input images or intermediate features.
arXiv Detail & Related papers (2022-08-26T19:50:46Z) - Low-confidence Samples Matter for Domain Adaptation [47.552605279925736]
Domain adaptation (DA) aims to transfer knowledge from a label-rich source domain to a related but label-scarce target domain.
We propose a novel contrastive learning method by processing low-confidence samples.
We evaluate the proposed method in both unsupervised and semi-supervised DA settings.
arXiv Detail & Related papers (2022-02-06T15:45:45Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Enhancing the Generalization for Intent Classification and Out-of-Domain
Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU)
Recent works have shown that using extra data and labels can improve the OOD detection performance.
This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.