Unveiling the Power of Self-Attention for Shipping Cost Prediction: The
Rate Card Transformer
- URL: http://arxiv.org/abs/2311.11694v1
- Date: Mon, 20 Nov 2023 11:48:50 GMT
- Title: Unveiling the Power of Self-Attention for Shipping Cost Prediction: The
Rate Card Transformer
- Authors: P Aditya Sreekar, Sahil Verma, Varun Madhavan, Abhishek Persad
- Abstract summary: Current solutions for estimating shipping costs on day 0 rely on tree-based models that require extensive manual engineering efforts.
In this study, we propose a novel architecture called the Rate Card Transformer (RCT) that uses self-attention to encode all package shipping information.
Our results demonstrate that cost predictions made by the RCT have 28.82% less error compared to tree-based GBDT model.
- Score: 2.5398014196797614
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Amazon ships billions of packages to its customers annually within the United
States. Shipping cost of these packages are used on the day of shipping (day 0)
to estimate profitability of sales. Downstream systems utilize these days 0
profitability estimates to make financial decisions, such as pricing strategies
and delisting loss-making products. However, obtaining accurate shipping cost
estimates on day 0 is complex for reasons like delay in carrier invoicing or
fixed cost components getting recorded at monthly cadence. Inaccurate shipping
cost estimates can lead to bad decision, such as pricing items too low or high,
or promoting the wrong product to the customers. Current solutions for
estimating shipping costs on day 0 rely on tree-based models that require
extensive manual engineering efforts. In this study, we propose a novel
architecture called the Rate Card Transformer (RCT) that uses self-attention to
encode all package shipping information such as package attributes, carrier
information and route plan. Unlike other transformer-based tabular models, RCT
has the ability to encode a variable list of one-to-many relations of a
shipment, allowing it to capture more information about a shipment. For
example, RCT can encode properties of all products in a package. Our results
demonstrate that cost predictions made by the RCT have 28.82% less error
compared to tree-based GBDT model. Moreover, the RCT outperforms the
state-of-the-art transformer-based tabular model, FTTransformer, by 6.08%. We
also illustrate that the RCT learns a generalized manifold of the rate card
that can improve the performance of tree-based models.
Related papers
- Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach [51.76826149868971]
Policy evaluation via Monte Carlo simulation is at the core of many MC Reinforcement Learning (RL) algorithms.
We propose as a quality index a surrogate of the mean squared error of a return estimator that uses trajectories of different lengths.
We present an adaptive algorithm called Robust and Iterative Data collection strategy Optimization (RIDO)
arXiv Detail & Related papers (2024-10-17T11:47:56Z) - Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget [53.311109531586844]
We demonstrate very low-cost training of large-scale T2I diffusion transformer models.
We train a 1.16 billion parameter sparse transformer with only $1,890 economical cost and achieve a 12.7 FID in zero-shot generation.
We aim to release our end-to-end training pipeline to further democratize the training of large-scale diffusion models on micro-budgets.
arXiv Detail & Related papers (2024-07-22T17:23:28Z) - A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers.
Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective.
We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z) - Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization [15.73877955614998]
This paper presents a novel communication algorithm -- DeComFL, which reduces the communication cost from $mathscrO(d)$ to $mathscrO(1)$ by transmitting only a constant number of scalar values between clients.
Empirical evaluations, encompassing both classic deep learning training and large language model fine-tuning, demonstrate significant reductions in communication overhead.
arXiv Detail & Related papers (2024-05-24T18:07:05Z) - VST++: Efficient and Stronger Visual Saliency Transformer [74.26078624363274]
We develop an efficient and stronger VST++ model to explore global long-range dependencies.
We evaluate our model across various transformer-based backbones on RGB, RGB-D, and RGB-T SOD benchmark datasets.
arXiv Detail & Related papers (2023-10-18T05:44:49Z) - Tangent Model Composition for Ensembling and Continual Fine-tuning [69.92177580782929]
Tangent Model Composition (TMC) is a method to combine component models independently fine-tuned around a pre-trained point.
TMC improves accuracy by 4.2% compared to ensembling non-linearly fine-tuned models.
arXiv Detail & Related papers (2023-07-16T17:45:33Z) - Incremental Profit per Conversion: a Response Transformation for Uplift
Modeling in E-Commerce Promotions [1.7640556247739623]
This paper focuses on promotions with response-dependent costs, where expenses are incurred only when a purchase is made.
Existing uplift model approaches often necessitate training multiple models, like meta-learners, or encounter complications when estimating profit.
We introduce Incremental Profit per Conversion (IPC), a novel uplift measure of promotional campaigns' efficiency in unit economics.
arXiv Detail & Related papers (2023-06-23T19:46:02Z) - Inductive Graph Transformer for Delivery Time Estimation [19.024006381947416]
We propose an inductive graph transformer (IGT) that leverages raw feature information and structural graph data to estimate package delivery time.
Experiments on real-world logistics datasets show that our proposed model can significantly outperform the state-of-the-art methods on estimation of delivery time.
arXiv Detail & Related papers (2022-11-05T09:51:15Z) - Neural Optimal Transport with General Cost Functionals [66.41953045707172]
We introduce a novel neural network-based algorithm to compute optimal transport plans for general cost functionals.
As an application, we construct a cost functional to map data distributions while preserving the class-wise structure.
arXiv Detail & Related papers (2022-05-30T20:00:19Z) - Optimal Cost Design for Model Predictive Control [30.86835688868485]
Many robotics domains use non model control (MPC) for planning, which sets a reduced time horizon, performs optimization, and replans at every step.
In this work, we challenge the common assumption that the cost we optimize using MPC should be the same as the ground truth cost for the task (plus a terminal cost)
We propose a zeroth-order trajectory-based approach that enables us to design optimal costs for an MPC planning robot in continuous MDPs.
arXiv Detail & Related papers (2021-04-23T00:00:58Z) - Think out of the package: Recommending package types for e-commerce
shipments [2.741530713365541]
Multiple product attributes determine the package type used by e-commerce companies to ship products.
Sub-optimal package types lead to damaged shipments, incurring huge damage related costs.
We propose a multi-stage approach that trades-off between shipment and damage costs for each product.
arXiv Detail & Related papers (2020-06-05T05:27:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.