Towards Open-World Product Attribute Mining: A Lightly-Supervised
Approach
- URL: http://arxiv.org/abs/2305.18350v1
- Date: Fri, 26 May 2023 11:51:31 GMT
- Title: Towards Open-World Product Attribute Mining: A Lightly-Supervised
Approach
- Authors: Liyan Xu, Chenwei Zhang, Xian Li, Jingbo Shang, Jinho D. Choi
- Abstract summary: We present a new task setting for attribute mining on e-commerce products.
We aim to expand the attribute vocabulary of existing seed types, and also to discover any new attribute types automatically.
Our approach surpasses various baselines by 12 F1, expanding attributes of existing types significantly by up to 12 times, and discovering values from 39% new types.
- Score: 60.52087154731358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new task setting for attribute mining on e-commerce products,
serving as a practical solution to extract open-world attributes without
extensive human intervention. Our supervision comes from a high-quality seed
attribute set bootstrapped from existing resources, and we aim to expand the
attribute vocabulary of existing seed types, and also to discover any new
attribute types automatically. A new dataset is created to support our setting,
and our approach Amacer is proposed specifically to tackle the limited
supervision. Especially, given that no direct supervision is available for
those unseen new attributes, our novel formulation exploits self-supervised
heuristic and unsupervised latent attributes, which attains implicit semantic
signals as additional supervision by leveraging product context. Experiments
suggest that our approach surpasses various baselines by 12 F1, expanding
attributes of existing types significantly by up to 12 times, and discovering
values from 39% new types.
Related papers
- EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM [52.016009472409166]
EIVEN is a data- and parameter-efficient generative framework for implicit attribute value extraction.
We introduce a novel Learning-by-Comparison technique to reduce model confusion.
Our experiments reveal that EIVEN significantly outperforms existing methods in extracting implicit attribute values.
arXiv Detail & Related papers (2024-04-13T03:15:56Z) - SAGE: Structured Attribute Value Generation for Billion-Scale Product
Catalogs [1.1184789007828977]
SAGE is a Generative LLM for inferring attribute values for products across world-wide e-Commerce catalogs.
We introduce a novel formulation of the attribute-value prediction problem as a Seq2Seq summarization task.
SAGE is the first method able to tackle all aspects of the attribute-value-prediction task as they arise in practical settings in e-Commerce catalogs.
arXiv Detail & Related papers (2023-09-12T02:24:16Z) - OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak
Supervision [93.26737878221073]
We study the attribute mining problem in an open-world setting to extract novel attributes and their values.
We propose a principled framework that first generates attribute value candidates and then groups them into clusters of attributes.
Our model significantly outperforms strong baselines and can generalize to unseen attributes and product types.
arXiv Detail & Related papers (2022-04-29T04:16:04Z) - Spread Spurious Attribute: Improving Worst-group Accuracy with Spurious
Attribute Estimation [72.92329724600631]
We propose a pseudo-attribute-based algorithm, coined Spread Spurious Attribute, for improving the worst-group accuracy.
Our experiments on various benchmark datasets show that our algorithm consistently outperforms the baseline methods.
We also demonstrate that the proposed SSA can achieve comparable performances to methods using full (100%) spurious attribute supervision.
arXiv Detail & Related papers (2022-04-05T09:08:30Z) - Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel
Attribute Synthesis [65.74825840440504]
We propose Zero Shot Learning for Attributes (ZSLA), which is the first of its kind to the best of our knowledge.
Our proposed method is able to synthesize the detectors of novel attributes in a zero-shot learning manner.
With using only 32 seen attributes on the Caltech-UCSD Birds-200-2011 dataset, our proposed method is able to synthesize other 207 novel attributes.
arXiv Detail & Related papers (2021-11-28T15:45:54Z) - Creating Training Sets via Weak Indirect Supervision [66.77795318313372]
Weak Supervision (WS) frameworks synthesize training labels from multiple potentially noisy supervision sources.
We formulate Weak Indirect Supervision (WIS), a new research problem for automatically synthesizing training labels.
We develop a probabilistic modeling approach, PLRM, which uses user-provided label relations to model and leverage indirect supervision sources.
arXiv Detail & Related papers (2021-10-07T14:09:35Z) - Disentangled Face Attribute Editing via Instance-Aware Latent Space
Search [30.17338705964925]
A rich set of semantic directions exist in the latent space of Generative Adversarial Networks (GANs)
Existing methods may suffer poor attribute variation disentanglement, leading to unwanted change of other attributes when altering the desired one.
We propose a novel framework (IALS) that performs Instance-Aware Latent-Space Search to find semantic directions for disentangled attribute editing.
arXiv Detail & Related papers (2021-05-26T16:19:08Z) - Self-Supervised Features Improve Open-World Learning [13.880789191591088]
We present an unifying open-world framework combining Incremental Learning, Out-of-Distribution detection and Open-World learning.
Under an unsupervised feature representation, we categorize the problem of detecting unknowns as either Out-of-Label-space or Out-of-Distribution detection.
The incremental learning component of our pipeline is a zero-exemplar online model which performs comparatively against state-of-the-art on ImageNet-100 protocol.
arXiv Detail & Related papers (2021-02-15T21:03:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.