Hypertext Entity Extraction in Webpage
- URL: http://arxiv.org/abs/2403.01698v1
- Date: Mon, 4 Mar 2024 03:21:40 GMT
- Title: Hypertext Entity Extraction in Webpage
- Authors: Yifei Yang, Tianqiao Liu, Bo Shao, Hai Zhao, Linjun Shou, Ming Gong,
Daxin Jiang
- Abstract summary: We introduce a textbfMoE-based textbfEntity textbfExtraction textbfFramework (textitMoEEF), which integrates multiple features to enhance model performance.
We also analyze the effectiveness of hypertext features in textitHEED and several model components in textitMoEEF.
- Score: 112.56734676713721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Webpage entity extraction is a fundamental natural language processing task
in both research and applications. Nowadays, the majority of webpage entity
extraction models are trained on structured datasets which strive to retain
textual content and its structure information. However, existing datasets all
overlook the rich hypertext features (e.g., font color, font size) which show
their effectiveness in previous works. To this end, we first collect a
\textbf{H}ypertext \textbf{E}ntity \textbf{E}xtraction \textbf{D}ataset
(\textit{HEED}) from the e-commerce domains, scraping both the text and the
corresponding explicit hypertext features with high-quality manual entity
annotations. Furthermore, we present the \textbf{Mo}E-based \textbf{E}ntity
\textbf{E}xtraction \textbf{F}ramework (\textit{MoEEF}), which efficiently
integrates multiple features to enhance model performance by Mixture of Experts
and outperforms strong baselines, including the state-of-the-art small-scale
models and GPT-3.5-turbo. Moreover, the effectiveness of hypertext features in
\textit{HEED} and several model components in \textit{MoEEF} are analyzed.
Related papers
- Hypergraph based Understanding for Document Semantic Entity Recognition [65.84258776834524]
We build a novel hypergraph attention document semantic entity recognition framework, HGA, which uses hypergraph attention to focus on entity boundaries and entity categories at the same time.
Our results on FUNSD, CORD, XFUNDIE show that our method can effectively improve the performance of semantic entity recognition tasks.
arXiv Detail & Related papers (2024-07-09T14:35:49Z) - 5W1H Extraction With Large Language Models [27.409473072672277]
The extraction of essential news elements through the 5W1H framework is critical for event extraction and text summarization.
ChatGPT has encountered challenges in processing longer news texts and analyzing specific attributes in context.
We design several strategies from zero-shot/few-shot prompting to efficient fine-tuning to conduct 5W1H aspects extraction from the original news documents.
arXiv Detail & Related papers (2024-05-25T09:42:58Z) - COSMO: COntrastive Streamlined MultimOdal Model with Interleaved
Pre-Training [119.03392147066093]
Recent autoregressive vision-language models have excelled in few-shot text generation tasks but face challenges in alignment tasks.
We introduce the contrastive loss into text generation models, partitioning the language model into dedicated unimodal text processing and adept multimodal data handling components.
To bridge this gap, this work introduces VideoDatasetName, an inaugural interleaved video-text dataset featuring comprehensive captions.
arXiv Detail & Related papers (2024-01-01T18:58:42Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - TextDiffuser: Diffusion Models as Text Painters [118.30923824681642]
We introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds.
We contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs.
We show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text.
arXiv Detail & Related papers (2023-05-18T10:16:19Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - StrucTexT: Structured Text Understanding with Multi-Modal Transformers [29.540122964399046]
Structured text understanding on Visually Rich Documents (VRDs) is a crucial part of Document Intelligence.
This paper proposes a unified framework named StrucTexT, which is flexible and effective for handling both sub-tasks.
We evaluate our method for structured text understanding at segment-level and token-level and show it outperforms the state-of-the-art counterparts.
arXiv Detail & Related papers (2021-08-06T02:57:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.