Fugu-MT 論文翻訳(概要): EPM-RL: Reinforcement Learning for On-Premise Product Mapping in E-Commerce

論文の概要: EPM-RL: Reinforcement Learning for On-Premise Product Mapping in E-Commerce

arxiv url: http://arxiv.org/abs/2604.23993v1
Date: Mon, 27 Apr 2026 03:18:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.716019
Title: EPM-RL: Reinforcement Learning for On-Premise Product Mapping in E-Commerce
Title（参考訳）: EPM-RL:eコマースにおけるオンプレミス製品マッピングのための強化学習
Authors: Minhyeong Yu, Wonduk Seo,
Abstract要約: EPM-RLは,効率的なオンプレミスeコマース製品マッピングモデルを構築するための強化学習ベースのフレームワークである。私たちの中心となる考え方は、高価なエージェント推論をトレーニング可能な社内モデルに蒸留することです。予備的な結果は、EPM-RLがPEFTのみのトレーニングよりも一貫して改善し、商用APIベースのベースラインよりも高品質なトレードオフを提供することを示している。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Product mapping, the task of deciding whether two e-commerce listings refer to the same product, is a core problem for price monitoring and channel visibility. In real marketplaces, however, sellers frequently inject promotional keywords, platform-specific tags, and bundle descriptions into titles, causing the same product to appear under many different names. Recent LLM-based and multi-agent frameworks improve robustness and interpretability on such hard cases, but they often rely on expensive external APIs, repeated retrieval, and complex inference-time orchestration, making large-scale deployment costly and difficult in privacy-sensitive enterprise settings. To address these issues, we present EPM-RL, a reinforcement-learning-based framework for building an accurate and efficient on-premise e-commerce product mapping model. Our central idea is to distill high-cost agentic reasoning into a trainable in-house model. Starting from a curated set of product pairs with LLM-generated rationales and human verification, we first perform parameter-efficient fine-tuning (PEFT) on a small student model using structured reasoning outputs. We then further optimize the model with Reinforcement Learning (RL) using an agent-based reward that jointly evaluates output-format compliance, label correctness, reasoning--preference scores from specially designed judge models. Preliminary results show that EPM-RL consistently improves over PEFT-only training and offers a stronger quality--cost trade-off than commercial API-based baselines, while enabling private deployment and lower operational cost. These findings suggest that reinforcement learning can turn product mapping from a high-latency agentic pipeline into a scalable, inspectable, and production-ready in-house system.
Abstract（参考訳）: 2つのeコマースリストが同じ製品を指すかどうかを判断するタスクである製品マッピングは、価格監視とチャネルの可視性にとって、中核的な問題である。しかし、実際のマーケットプレースでは、売り手はしばしばプロモーションキーワード、プラットフォーム固有のタグ、バンドル記述をタイトルに注入し、同じ製品が多くの異なる名前で登場する。最近のLLMベースのマルチエージェントフレームワークは、このような難しいケースに対する堅牢性と解釈性を向上しているが、しばしば高価な外部API、繰り返し検索、複雑な推論時間オーケストレーションに依存しており、プライバシに敏感なエンタープライズ環境で大規模なデプロイメントをコストと困難にしている。これらの課題に対処するため,EPM-RLを提案する。私たちの中心となる考え方は、高価なエージェント推論をトレーニング可能な社内モデルに蒸留することです。 LLM生成論理と人間の検証による製品ペアのキュレートセットから始め、構造化推論出力を用いて小学生モデル上でパラメータ効率の微調整(PEFT)を行う。さらに、特別に設計された判断モデルから出力形式コンプライアンス、ラベルの正しさ、推論-参照スコアを共同評価するエージェントベースの報酬を用いて、強化学習(RL)を用いてモデルをさらに最適化する。予備的な結果は、EPM-RLがPEFTのみのトレーニングよりも一貫して改善し、商用APIベースのベースラインよりも高品質なトレードオフを提供すると同時に、プライベートデプロイメントと運用コストの低減を実現していることを示している。これらの結果から,強化学習は高遅延エージェントパイプラインからの製品マッピングを,スケーラブルで検査可能な,実運用対応の社内システムに転換する可能性が示唆された。

論文の概要: EPM-RL: Reinforcement Learning for On-Premise Product Mapping in E-Commerce

関連論文リスト