Related papers: Unveiling the Power of Self-Attention for Shipping Cost Prediction: The Rate Card Transformer

Unveiling the Power of Self-Attention for Shipping Cost Prediction: The Rate Card Transformer

URL: http://arxiv.org/abs/2311.11694v1
Date: Mon, 20 Nov 2023 11:48:50 GMT
Title: Unveiling the Power of Self-Attention for Shipping Cost Prediction: The Rate Card Transformer
Authors: P Aditya Sreekar, Sahil Verma, Varun Madhavan, Abhishek Persad
Abstract summary: Current solutions for estimating shipping costs on day 0 rely on tree-based models that require extensive manual engineering efforts. In this study, we propose a novel architecture called the Rate Card Transformer (RCT) that uses self-attention to encode all package shipping information. Our results demonstrate that cost predictions made by the RCT have 28.82% less error compared to tree-based GBDT model.
Score: 2.5398014196797614
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Amazon ships billions of packages to its customers annually within the United States. Shipping cost of these packages are used on the day of shipping (day 0) to estimate profitability of sales. Downstream systems utilize these days 0 profitability estimates to make financial decisions, such as pricing strategies and delisting loss-making products. However, obtaining accurate shipping cost estimates on day 0 is complex for reasons like delay in carrier invoicing or fixed cost components getting recorded at monthly cadence. Inaccurate shipping cost estimates can lead to bad decision, such as pricing items too low or high, or promoting the wrong product to the customers. Current solutions for estimating shipping costs on day 0 rely on tree-based models that require extensive manual engineering efforts. In this study, we propose a novel architecture called the Rate Card Transformer (RCT) that uses self-attention to encode all package shipping information such as package attributes, carrier information and route plan. Unlike other transformer-based tabular models, RCT has the ability to encode a variable list of one-to-many relations of a shipment, allowing it to capture more information about a shipment. For example, RCT can encode properties of all products in a package. Our results demonstrate that cost predictions made by the RCT have 28.82% less error compared to tree-based GBDT model. Moreover, the RCT outperforms the state-of-the-art transformer-based tabular model, FTTransformer, by 6.08%. We also illustrate that the RCT learns a generalized manifold of the rate card that can improve the performance of tree-based models.

Related papers

Finding the Muses: Identifying Coresets through Loss Trajectories [7.293244528299574]
Loss Trajectory Correlation (LTC) is a novel metric for coreset selection that identifies critical training samples driving generalization. $LTC$ consistently achieves accuracy on par with or surpassing state-of-the-art coreset selection methods. It also offers insights into training dynamics, such as identifying aligned and conflicting sample behaviors.
arXiv Detail & Related papers (2025-03-12T18:11:16Z)
An Instrumental Value for Data Production and its Application to Data Pricing [107.98697414652479]
This paper develops an approach for capturing the instrumental value of data production processes. We show how they connect to classic notions of information design and signals in information economics.
arXiv Detail & Related papers (2024-12-24T03:53:57Z)
Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach [51.76826149868971]
Policy evaluation via Monte Carlo simulation is at the core of many MC Reinforcement Learning (RL) algorithms. We propose as a quality index a surrogate of the mean squared error of a return estimator that uses trajectories of different lengths. We present an adaptive algorithm called Robust and Iterative Data collection strategy Optimization (RIDO)
arXiv Detail & Related papers (2024-10-17T11:47:56Z)
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget [53.311109531586844]
We demonstrate very low-cost training of large-scale T2I diffusion transformer models. We train a 1.16 billion parameter sparse transformer with only $1,890 economical cost and achieve a 12.7 FID in zero-shot generation. We aim to release our end-to-end training pipeline to further democratize the training of large-scale diffusion models on micro-budgets.
arXiv Detail & Related papers (2024-07-22T17:23:28Z)
A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers. Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective. We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z)
Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization [15.73877955614998]
This paper presents a novel communication algorithm -- DeComFL, which reduces the communication cost from $mathscrO(d)$ to $mathscrO(1)$ by transmitting only a constant number of scalar values between clients. Empirical evaluations, encompassing both classic deep learning training and large language model fine-tuning, demonstrate significant reductions in communication overhead.
arXiv Detail & Related papers (2024-05-24T18:07:05Z)
VST++: Efficient and Stronger Visual Saliency Transformer [74.26078624363274]
We develop an efficient and stronger VST++ model to explore global long-range dependencies. We evaluate our model across various transformer-based backbones on RGB, RGB-D, and RGB-T SOD benchmark datasets.
arXiv Detail & Related papers (2023-10-18T05:44:49Z)
MatFormer: Nested Transformer for Elastic Inference [91.45687988953435]
MatFormer is a novel Transformer architecture designed to provide elastic inference across diverse deployment constraints. MatFormer achieves this by incorporating a nested Feed Forward Network (FFN) block structure within a standard Transformer model. We show that a 850M decoder-only MatFormer language model (MatLM) allows us to extract multiple smaller models spanning from 582M to 850M parameters.
arXiv Detail & Related papers (2023-10-11T17:57:14Z)
Tangent Model Composition for Ensembling and Continual Fine-tuning [69.92177580782929]
Tangent Model Composition (TMC) is a method to combine component models independently fine-tuned around a pre-trained point. TMC improves accuracy by 4.2% compared to ensembling non-linearly fine-tuned models.
arXiv Detail & Related papers (2023-07-16T17:45:33Z)
Incremental Profit per Conversion: a Response Transformation for Uplift Modeling in E-Commerce Promotions [1.7640556247739623]
This paper focuses on promotions with response-dependent costs, where expenses are incurred only when a purchase is made. Existing uplift model approaches often necessitate training multiple models, like meta-learners, or encounter complications when estimating profit. We introduce Incremental Profit per Conversion (IPC), a novel uplift measure of promotional campaigns' efficiency in unit economics.
arXiv Detail & Related papers (2023-06-23T19:46:02Z)
Inductive Graph Transformer for Delivery Time Estimation [19.024006381947416]
We propose an inductive graph transformer (IGT) that leverages raw feature information and structural graph data to estimate package delivery time. Experiments on real-world logistics datasets show that our proposed model can significantly outperform the state-of-the-art methods on estimation of delivery time.
arXiv Detail & Related papers (2022-11-05T09:51:15Z)
Neural Optimal Transport with General Cost Functionals [66.41953045707172]
We introduce a novel neural network-based algorithm to compute optimal transport plans for general cost functionals. As an application, we construct a cost functional to map data distributions while preserving the class-wise structure.
arXiv Detail & Related papers (2022-05-30T20:00:19Z)
Optimal Cost Design for Model Predictive Control [30.86835688868485]
Many robotics domains use non model control (MPC) for planning, which sets a reduced time horizon, performs optimization, and replans at every step. In this work, we challenge the common assumption that the cost we optimize using MPC should be the same as the ground truth cost for the task (plus a terminal cost) We propose a zeroth-order trajectory-based approach that enables us to design optimal costs for an MPC planning robot in continuous MDPs.
arXiv Detail & Related papers (2021-04-23T00:00:58Z)
Think out of the package: Recommending package types for e-commerce shipments [2.741530713365541]
Multiple product attributes determine the package type used by e-commerce companies to ship products. Sub-optimal package types lead to damaged shipments, incurring huge damage related costs. We propose a multi-stage approach that trades-off between shipment and damage costs for each product.
arXiv Detail & Related papers (2020-06-05T05:27:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.