Hierarchical Document Parsing via Large Margin Feature Matching and Heuristics
- URL: http://arxiv.org/abs/2502.07442v1
- Date: Tue, 11 Feb 2025 10:37:01 GMT
- Title: Hierarchical Document Parsing via Large Margin Feature Matching and Heuristics
- Authors: Duong Anh Kiet,
- Abstract summary: We present our solution to the AAAI-25 VRD-IU challenge.
By combining a deep learning-based matching strategy with greedy algorithms, we achieve a significant boost in accuracy.
Our method attains an accuracy of 0.98904 on the private leaderboard, demonstrating its effectiveness in document structure parsing.
- Score: 0.0
- License:
- Abstract: We present our solution to the AAAI-25 VRD-IU challenge, achieving first place in the competition. Our approach integrates large margin loss for improved feature discrimination and employs heuristic rules to refine hierarchical relationships. By combining a deep learning-based matching strategy with greedy algorithms, we achieve a significant boost in accuracy while maintaining computational efficiency. Our method attains an accuracy of 0.98904 on the private leaderboard, demonstrating its effectiveness in document structure parsing. Source codes are publicly available at https://github.com/ffyyytt/VRUID-AAAI-DAKiet
Related papers
- Attention with Dependency Parsing Augmentation for Fine-Grained Attribution [26.603281615221505]
We develop a fine-grained attribution mechanism that provides supporting evidence from retrieved documents for every answer span.
Existing attribution methods rely on model-internal similarity metrics between responses and documents, such as saliency scores and hidden state similarity.
We propose two techniques applicable to all model-internals-based methods. First, we aggregate token-wise evidence through set union operations, preserving the granularity of representations.
Second, we enhance the attributor by integrating dependency parsing to enrich the semantic completeness of target spans.
arXiv Detail & Related papers (2024-12-16T03:12:13Z) - Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching [53.05954114863596]
We propose a brand-new Deep Boosting Learning (DBL) algorithm for image-text matching.
An anchor branch is first trained to provide insights into the data properties.
A target branch is concurrently tasked with more adaptive margin constraints to further enlarge the relative distance between matched and unmatched samples.
arXiv Detail & Related papers (2024-04-28T08:44:28Z) - Generalized Correspondence Matching via Flexible Hierarchical Refinement
and Patch Descriptor Distillation [13.802788788420175]
Correspondence matching plays a crucial role in numerous robotics applications.
This paper addresses the limitations of deep feature matching (DFM), a state-of-the-art (SoTA) plug-and-play correspondence matching approach.
Our proposed method achieves an overall performance in terms of mean matching accuracy of 0.68, 0.92, and 0.95 with respect to the tolerances of 1, 3, and 5 pixels, respectively.
arXiv Detail & Related papers (2024-03-08T15:32:18Z) - Identity-Seeking Self-Supervised Representation Learning for
Generalizable Person Re-identification [55.1738496692892]
Prior DG ReID methods employ limited labeled data for training due to the high cost of annotation.
We propose an Identity-seeking Self-supervised Representation learning (ISR) method.
ISR constructs positive pairs from inter-frame images by modeling the instance association as a maximum-weight bipartite matching problem.
ISR achieves 87.0% Rank-1 on Market-1501 and 56.4% Rank-1 on MSMT17, outperforming the best supervised domain-generalizable method by 5.0% and 19.5%, respectively.
arXiv Detail & Related papers (2023-08-17T09:46:27Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - Two Losses Are Better Than One: Faster Optimization Using a Cheaper
Proxy [6.170898159041277]
We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy.
Our algorithm guarantees convergence at a rate matching the gradient descent on a $delta$-smooth objective.
Our algorithm has many potential applications in machine learning, and provides a principled means of leveraging synthetic data, physics simulators, mixed public and private data, and more.
arXiv Detail & Related papers (2023-02-07T15:50:49Z) - Okapi: Generalising Better by Making Statistical Matches Match [7.392460712829188]
Okapi is a simple, efficient, and general method for robust semi-supervised learning based on online statistical matching.
Our method uses a nearest-neighbours-based matching procedure to generate cross-domain views for a consistency loss.
We show that it is in fact possible to leverage additional unlabelled data to improve upon empirical risk minimisation.
arXiv Detail & Related papers (2022-11-07T12:41:17Z) - Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation.
We develop a new adversarial learning based method, which is simple and efficient to apply.
We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z) - Double Coverage with Machine-Learned Advice [100.23487145400833]
We study the fundamental online $k$-server problem in a learning-augmented setting.
We show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff.
arXiv Detail & Related papers (2021-03-02T11:04:33Z) - Provably Efficient Algorithms for Multi-Objective Competitive RL [54.22598924633369]
We study multi-objective reinforcement learning (RL) where an agent's reward is represented as a vector.
In settings where an agent competes against opponents, its performance is measured by the distance of its average return vector to a target set.
We develop statistically and computationally efficient algorithms to approach the associated target set.
arXiv Detail & Related papers (2021-02-05T14:26:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.