KDSM: An uplift modeling framework based on knowledge distillation and
sample matching
- URL: http://arxiv.org/abs/2303.02980v1
- Date: Mon, 6 Mar 2023 09:15:28 GMT
- Title: KDSM: An uplift modeling framework based on knowledge distillation and
sample matching
- Authors: Chang Sun, Qianying Li, Guanxiang Wang, Sihao Xu, Yitong Liu
- Abstract summary: Uplift modeling aims to estimate the treatment effect on individuals.
Tree-based methods are adept at fitting increment and generalization, while neural-network-based models excel at predicting absolute value and precision.
In this paper, we proposed an uplift modeling framework based on Knowledge Distillation and Sample Matching (KDSM)
- Score: 2.036924568983982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Uplift modeling aims to estimate the treatment effect on individuals, widely
applied in the e-commerce platform to target persuadable customers and maximize
the return of marketing activities. Among the existing uplift modeling methods,
tree-based methods are adept at fitting increment and generalization, while
neural-network-based models excel at predicting absolute value and precision,
and these advantages have not been fully explored and combined. Also, the lack
of counterfactual sample pairs is the root challenge in uplift modeling. In
this paper, we proposed an uplift modeling framework based on Knowledge
Distillation and Sample Matching (KDSM). The teacher model is the uplift
decision tree (UpliftDT), whose structure is exploited to construct
counterfactual sample pairs, and the pairwise incremental prediction is treated
as another objective for the student model. Under the idea of multitask
learning, the student model can achieve better performance on generalization
and even surpass the teacher. Extensive offline experiments validate the
universality of different combinations of teachers and student models and the
superiority of KDSM measured against the baselines. In online A/B testing, the
cost of each incremental room night is reduced by 6.5\%.
Related papers
- Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - Enhancing One-Shot Federated Learning Through Data and Ensemble
Co-Boosting [76.64235084279292]
One-shot Federated Learning (OFL) has become a promising learning paradigm, enabling the training of a global server model via a single communication round.
We introduce a novel framework, Co-Boosting, in which synthesized data and the ensemble model mutually enhance each other progressively.
arXiv Detail & Related papers (2024-02-23T03:15:10Z) - DMT: Comprehensive Distillation with Multiple Self-supervised Teachers [27.037140667247208]
We introduce Comprehensive Distillation with Multiple Self-supervised Teachers (DMT) for pretrained model compression.
Our experimental results on prominent benchmark datasets exhibit that the proposed method significantly surpasses state-of-the-art competitors.
arXiv Detail & Related papers (2023-12-19T08:31:30Z) - Uplift Modeling based on Graph Neural Network Combined with Causal
Knowledge [9.005051998738134]
We propose a framework based on graph neural networks that combine causal knowledge with an estimate of uplift value.
Our findings demonstrate that this method works effectively for predicting uplift values, with small errors in typical simulated data.
arXiv Detail & Related papers (2023-11-14T07:21:00Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z) - Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language
Models [37.88287077119201]
We propose a novel model reuse paradigm, Knowledge Amalgamation(KA) for PLMs.
Without human annotations available, KA aims to merge the knowledge from different teacher-PLMs, each of which specializes in a different classification problem, into a versatile student model.
Experimental results demonstrate that MUKA achieves substantial improvements over baselines on benchmark datasets.
arXiv Detail & Related papers (2021-12-14T12:26:24Z) - Sparse MoEs meet Efficient Ensembles [49.313497379189315]
We study the interplay of two popular classes of such models: ensembles of neural networks and sparse mixture of experts (sparse MoEs)
We present Efficient Ensemble of Experts (E$3$), a scalable and simple ensemble of sparse MoEs that takes the best of both classes of models, while using up to 45% fewer FLOPs than a deep ensemble.
arXiv Detail & Related papers (2021-10-07T11:58:35Z) - Ensemble Knowledge Distillation for CTR Prediction [46.92149090885551]
We propose a new model training strategy based on knowledge distillation (KD)
KD is a teacher-student learning framework to transfer knowledge learned from a teacher model to a student model.
We propose some novel techniques to facilitate ensembled CTR prediction, including teacher gating and early stopping by distillation loss.
arXiv Detail & Related papers (2020-11-08T23:37:58Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.