Fugu-MT 論文翻訳(概要): Large Scale Generative Multimodal Attribute Extraction for E-commerce Attributes

論文の概要: Large Scale Generative Multimodal Attribute Extraction for E-commerce Attributes

arxiv url: http://arxiv.org/abs/2306.00379v1
Date: Thu, 1 Jun 2023 06:21:45 GMT
ステータス: 翻訳完了
システム内更新日: 2023-06-02 18:02:21.706001
Title: Large Scale Generative Multimodal Attribute Extraction for E-commerce Attributes
Title（参考訳）: Eコマース属性の大規模生成型マルチモーダル属性抽出
Authors: Anant Khandelwal, Happy Mittal, Shreyas Sunil Kulkarni, Deepak Gupta
Abstract要約: eコマースのウェブサイト(Amazonなど)には、製品ページには構造化され、構造化されていない情報(テキストと画像)が多数存在している。販売業者は、商品の属性(色、サイズなど)のラベルやラベルを間違えたりしないことが多い。 3つのキーコンポーネントからなる textbfMXT を用いて,この問題に対するスケーラブルなソリューションを提案する。
参考スコア（独自算出の注目度）: 23.105116746332506
License: http://creativecommons.org/licenses/by/4.0/
Abstract: E-commerce websites (e.g. Amazon) have a plethora of structured and unstructured information (text and images) present on the product pages. Sellers often either don't label or mislabel values of the attributes (e.g. color, size etc.) for their products. Automatically identifying these attribute values from an eCommerce product page that contains both text and images is a challenging task, especially when the attribute value is not explicitly mentioned in the catalog. In this paper, we present a scalable solution for this problem where we pose attribute extraction problem as a question-answering task, which we solve using \textbf{MXT}, consisting of three key components: (i) \textbf{M}AG (Multimodal Adaptation Gate), (ii) \textbf{X}ception network, and (iii) \textbf{T}5 encoder-decoder. Our system consists of a generative model that \emph{generates} attribute-values for a given product by using both textual and visual characteristics (e.g. images) of the product. We show that our system is capable of handling zero-shot attribute prediction (when attribute value is not seen in training data) and value-absent prediction (when attribute value is not mentioned in the text) which are missing in traditional classification-based and NER-based models respectively. We have trained our models using distant supervision, removing dependency on human labeling, thus making them practical for real-world applications. With this framework, we are able to train a single model for 1000s of (product-type, attribute) pairs, thus reducing the overhead of training and maintaining separate models. Extensive experiments on two real world datasets show that our framework improves the absolute recall@90P by 10.16\% and 6.9\% from the existing state of the art models. In a popular e-commerce store, we have deployed our models for 1000s of (product-type, attribute) pairs.
Abstract（参考訳）: eコマースウェブサイト(例えばamazon)は、製品ページにある構造化され、構造化されていない情報(テキストと画像)を多数持っている。販売業者は、商品の属性(色、サイズなど)のラベルやラベルを間違えたりしないことが多い。テキストと画像の両方を含むeコマース製品ページから属性値を自動的に識別することは、特にカタログで属性値が明示的に言及されていない場合、難しい作業である。本稿では, 属性抽出問題を質問応答タスクとして用いて, 3つの重要な要素からなる‘textbf{MXT}’を用いて解決する, この問題に対するスケーラブルな解を提案する。 (i) \textbf{m}ag(マルチモーダル適応ゲート) (ii) \textbf{x}ception network、及び (iii) \textbf{t}5エンコーダ-デコーダ。本システムは,商品のテキスト的特徴と視覚的特徴(例えば,画像)を用いて,ある商品の属性値にemph{generates}を付与する生成モデルから構成される。本システムは,従来の分類モデルとNERモデルでは欠落しているゼロショット属性予測(トレーニングデータでは属性値が見えない場合)と付加価値予測(テキストでは属性値が言及されていない場合)を扱うことができることを示す。我々は、遠隔監視を用いてモデルをトレーニングし、人間のラベリングへの依存を排除し、現実世界のアプリケーションに実用的なものにした。このフレームワークを使用することで、1000の(製品タイプ、属性)ペアに対して単一のモデルをトレーニングすることが可能になります。 2つの実世界のデータセットに対する大規模な実験は、我々のフレームワークが既存のアートモデルの状態から10.16\%と6.9\%の絶対リコール@90Pを改善することを示している。人気のeコマースストアでは、1000の(製品タイプ、属性)ペアのモデルをデプロイしています。

論文の概要: Large Scale Generative Multimodal Attribute Extraction for E-commerce Attributes

関連論文リスト