A Web Scale Entity Extraction System
- URL: http://arxiv.org/abs/2110.00423v1
- Date: Fri, 27 Aug 2021 16:37:37 GMT
- Title: A Web Scale Entity Extraction System
- Authors: Xuanting Cai, Quanbin Ma, Pan Li, Jianyu Liu, Qi Zeng, Zhengkan Yang,
Pushkar Tripathi
- Abstract summary: We present learnings from our efforts in building an entity extraction system for multiple document types at large scale.
We empirically demonstrate the effectiveness of multi-lingual, multi-task and cross-document type learning.
We also discuss the label collection schemes that help to minimize the amount of noise in the collected data.
- Score: 9.300916856534007
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Understanding the semantic meaning of content on the web through the lens of
entities and concepts has many practical advantages. However, when building
large-scale entity extraction systems, practitioners are facing unique
challenges involving finding the best ways to leverage the scale and variety of
data available on internet platforms. We present learnings from our efforts in
building an entity extraction system for multiple document types at large scale
using multi-modal Transformers. We empirically demonstrate the effectiveness of
multi-lingual, multi-task and cross-document type learning. We also discuss the
label collection schemes that help to minimize the amount of noise in the
collected data.
Related papers
- Towards Text-Image Interleaved Retrieval [49.96332254241075]
We introduce the text-image interleaved retrieval (TIIR) task, where the query and document are interleaved text-image sequences.
We construct a TIIR benchmark based on naturally interleaved wikiHow tutorials, where a specific pipeline is designed to generate interleaved queries.
We propose a novel Matryoshka Multimodal Embedder (MME), which compresses the number of visual tokens at different granularity.
arXiv Detail & Related papers (2025-02-18T12:00:47Z) - Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models [55.25892137362187]
We introduce a new dataset featuring Multimodal Multi-Grained Concept annotations (MMGiC) for MLLMs.
Our analyses reveal that multi-grained concept annotations integrate and complement each other, under our structured template and a general MLLM framework.
We further validate our hypothesis by investigating the fair comparison and effective collaboration between MMGiC and image--caption data on 12 multimodal comprehension and generation benchmarks.
arXiv Detail & Related papers (2024-12-08T13:45:44Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - Many or Few Samples? Comparing Transfer, Contrastive and Meta-Learning
in Encrypted Traffic Classification [68.19713459228369]
We compare transfer learning, meta-learning and contrastive learning against reference Machine Learning (ML) tree-based and monolithic DL models.
We show that (i) using large datasets we can obtain more general representations, (ii) contrastive learning is the best methodology.
While ML tree-based cannot handle large tasks but fits well small tasks, by means of reusing learned representations, DL methods are reaching tree-based models performance also for small tasks.
arXiv Detail & Related papers (2023-05-21T11:20:49Z) - DICNet: Deep Instance-Level Contrastive Network for Double Incomplete
Multi-View Multi-Label Classification [20.892833511657166]
Multi-view multi-label data in the real world is commonly incomplete due to the uncertain factors of data collection and manual annotation.
We propose a deep instance-level contrastive network, namely DICNet, to deal with the double incomplete multi-view multi-label classification problem.
Our DICNet is adept in capturing consistent discriminative representations of multi-view multi-label data and avoiding the negative effects of missing views and missing labels.
arXiv Detail & Related papers (2023-03-15T04:24:01Z) - ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document
Understanding [31.227481709446746]
Existing approaches mainly focus on fine-grained elements such as words and document image, making it hard for them to learn from coarse-grained elements.
In this paper, we attach more importance to coarse-grained elements containing high-density information and consistent semantics.
Our method can improve the performance of multimodal Transformers based on fine-grained elements and achieve better performance with fewer parameters.
arXiv Detail & Related papers (2022-09-18T13:46:56Z) - Self-paced Multi-grained Cross-modal Interaction Modeling for Referring
Expression Comprehension [21.000045864213327]
referring expression comprehension (REC) generally requires a large amount of multi-grained information of visual and linguistic modalities to realize accurate reasoning.
How to aggregate multi-grained information from different modalities and extract abundant knowledge from hard examples is crucial in the REC task.
We propose a Self-paced Multi-grained Cross-modal Interaction Modeling framework, which improves the language-to-vision localization ability.
arXiv Detail & Related papers (2022-04-21T08:32:47Z) - Modeling Endorsement for Multi-Document Abstractive Summarization [10.166639983949887]
A crucial difference between single- and multi-document summarization is how salient content manifests itself in the document(s)
In this paper, we model the cross-document endorsement effect and its utilization in multiple document summarization.
Our method generates a synopsis from each document, which serves as an endorser to identify salient content from other documents.
arXiv Detail & Related papers (2021-10-15T03:55:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.