Related papers: Neural Graph Matching for Video Retrieval in Large-Scale Video-driven E-commerce

Neural Graph Matching for Video Retrieval in Large-Scale Video-driven E-commerce

URL: http://arxiv.org/abs/2408.00346v1
Date: Thu, 1 Aug 2024 07:31:23 GMT
Title: Neural Graph Matching for Video Retrieval in Large-Scale Video-driven E-commerce
Authors: Houye Ji, Ye Tang, Zhaoxin Chen, Lixi Deng, Jun Hu, Lei Su,
Abstract summary: Video-driven e-commerce has shown huge potential in stimulating consumer confidence and promoting sales. We propose a novel bi-level Graph Matching Network (GMN), which mainly consists of node- and preference-level graph matching. Comprehensive experiments show the superiority of the proposed GMN with significant improvements over state-of-the-art approaches.
Score: 5.534002182451785
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rapid development of the short video industry, traditional e-commerce has encountered a new paradigm, video-driven e-commerce, which leverages attractive videos for product showcases and provides both video and item services for users. Benefitting from the dynamic and visualized introduction of items,video-driven e-commerce has shown huge potential in stimulating consumer confidence and promoting sales. In this paper, we focus on the video retrieval task, facing the following challenges: (1) Howto handle the heterogeneities among users, items, and videos? (2)How to mine the complementarity between items and videos for better user understanding? In this paper, we first leverage the dual graph to model the co-existing of user-video and user-item interactions in video-driven e-commerce and innovatively reduce user preference understanding to a graph matching problem. To solve it, we further propose a novel bi-level Graph Matching Network(GMN), which mainly consists of node- and preference-level graph matching. Given a user, node-level graph matching aims to match videos and items, while preference-level graph matching aims to match multiple user preferences extracted from both videos and items. Then the proposed GMN can generate and improve user embedding by aggregating matched nodes or preferences from the dual graph in a bi-level manner. Comprehensive experiments show the superiority of the proposed GMN with significant improvements over state-of-the-art approaches (e.g., AUC+1.9% and CTR+7.15%). We have developed it on a well-known video-driven e-commerce platform, serving hundreds of millions of users every day

Related papers

Short Video Segment-level User Dynamic Interests Modeling in Personalized Recommendation [23.082810471266235]
Short video growth has necessitated effective recommender systems to match users with content tailored to their evolving preferences. Current video recommendation models primarily treat each video as a whole, overlooking the dynamic nature of user preferences with specific video segments. We propose an innovative model that integrates a hybrid representation module, a multi-modal user-video encoder, and a segment interest decoder.
arXiv Detail & Related papers (2025-04-05T17:45:32Z)
CoActionGraphRec: Sequential Multi-Interest Recommendations Using Co-Action Graphs [4.031699584957737]
eBay's data sparsity exceeds other e-commerce sites by an order of magnitude. We propose a text based two-tower deep learning model (Item Tower and User Tower) utilizing co-action graph layers. For the Item Tower, we represent each item using its co-action items to capture collaborative signals in a co-action graph that is fully leveraged by the graph neural network component.
arXiv Detail & Related papers (2024-10-15T10:11:18Z)
LightSAGE: Graph Neural Networks for Large Scale Item Retrieval in Shopee's Advertisement Recommendation [2.1165011830664677]
We introduce our simple yet novel and impactful techniques in graph construction, modeling, and handling data skewness. We construct high-quality item graphs by combining strong-signal user behaviors with high-precision collaborative filtering (CF) algorithm. We then develop a new GNN architecture named LightSAGE to produce high-quality items' embeddings for vector search.
arXiv Detail & Related papers (2023-10-30T09:57:06Z)
A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset [47.805378137676605]
We propose a heterogeneous dataset that contains the multi-modal video entity and fruitful common sense relations. Experiments indicate that combining video understanding embedding with factual knowledge benefits the content-based video retrieval performance. It also helps the model generate better knowledge graph embedding which outperforms traditional KGE-based methods on VRT and VRV tasks.
arXiv Detail & Related papers (2022-11-19T09:00:45Z)
GIFT: Graph-guIded Feature Transfer for Cold-Start Video Click-Through Rate Prediction [47.06479882277151]
Short video has witnessed rapid growth in China and shows a promising market for promoting the sales of products in e-commerce platforms like Taobao. To ensure the freshness of the content, the platform needs to release a large number of new videos every day. We propose GIFT, an efficient Graph-guIded Feature Transfer system, to take advantages of the rich information of warmed-up videos that related to the cold-start video.
arXiv Detail & Related papers (2022-02-21T09:31:35Z)
Concept-Aware Denoising Graph Neural Network for Micro-Video Recommendation [30.67251766249372]
We propose a novel concept-aware denoising graph neural network (named CONDE) for micro-video recommendation. The proposed CONDE achieves significantly better recommendation performance than the existing state-of-the-art solutions.
arXiv Detail & Related papers (2021-09-28T07:02:52Z)
Pre-training Graph Transformer with Multimodal Side Information for Recommendation [82.4194024706817]
We propose a pre-training strategy to learn item representations by considering both item side information and their relationships. We develop a novel sampling algorithm named MCNSampling to select contextual neighbors for each item. The proposed Pre-trained Multimodal Graph Transformer (PMGT) learns item representations with two objectives: 1) graph structure reconstruction, and 2) masked node feature reconstruction.
arXiv Detail & Related papers (2020-10-23T10:30:24Z)
Comprehensive Information Integration Modeling Framework for Video Titling [124.11296128308396]
We integrate comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-to-end modeling framework. To tackle this issue, the proposed method consists of two processes, i.e., granular-level interaction modeling and abstraction-level story-line summarization. We collect a large-scale dataset accordingly from real-world data in Taobao, a world-leading e-commerce platform.
arXiv Detail & Related papers (2020-06-24T10:38:15Z)
Graph Convolution Machine for Context-aware Recommender System [59.50474932860843]
We extend the advantages of graph convolutions to context-aware recommender system (CARS) We propose textitGraph Convolution Machine (GCM), an end-to-end framework that consists of three components: an encoder, graph convolution layers, and a decoder. We conduct experiments on three real-world datasets from Yelp and Amazon, validating the effectiveness of GCM and the benefits of performing graph convolutions for CARS.
arXiv Detail & Related papers (2020-01-30T15:32:08Z)
Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks [150.5425122989146]
This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS) AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges. Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case.
arXiv Detail & Related papers (2020-01-19T10:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.