Related papers: Structural Entropy Guided Probabilistic Coding

Structural Entropy Guided Probabilistic Coding

URL: http://arxiv.org/abs/2412.08841v2
Date: Fri, 13 Dec 2024 12:23:58 GMT
Title: Structural Entropy Guided Probabilistic Coding
Authors: Xiang Huang, Hao Peng, Li Sun, Hui Lin, Chunyang Liu, Jiang Cao, Philip S. Yu,
Abstract summary: We propose a novel structural entropy-guided probabilistic coding model, named SEPC.<n>We incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss.<n> Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC.
Score: 52.01765333755793
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Probabilistic embeddings have several advantages over deterministic embeddings as they map each data point to a distribution, which better describes the uncertainty and complexity of data. Many works focus on adjusting the distribution constraint under the Information Bottleneck (IB) principle to enhance representation learning. However, these proposed regularization terms only consider the constraint of each latent variable, omitting the structural information between latent variables. In this paper, we propose a novel structural entropy-guided probabilistic coding model, named SEPC. Specifically, we incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss. Besides, as traditional structural information theory is not well-suited for regression tasks, we propose a probabilistic encoding tree, transferring regression tasks to classification tasks while diminishing the influence of the transformation. Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC compared to other state-of-the-art models in terms of effectiveness, generalization capability, and robustness to label noise. The codes and datasets are available at https://github.com/SELGroup/SEPC.

Related papers

Partial Transportability for Domain Generalization [56.37032680901525]
Building on the theory of partial identification and transportability, this paper introduces new results for bounding the value of a functional of the target distribution. Our contribution is to provide the first general estimation technique for transportability problems. We propose a gradient-based optimization scheme for making scalable inferences in practice.
arXiv Detail & Related papers (2025-03-30T22:06:37Z)
Meta-Statistical Learning: Supervised Learning of Statistical Inference [59.463430294611626]
This work demonstrates that the tools and principles driving the success of large language models (LLMs) can be repurposed to tackle distribution-level tasks. We propose meta-statistical learning, a framework inspired by multi-instance learning that reformulates statistical inference tasks as supervised learning problems.
arXiv Detail & Related papers (2025-02-17T18:04:39Z)
Dynamic Logistic Ensembles with Recursive Probability and Automatic Subset Splitting for Enhanced Binary Classification [2.7396014165932923]
This paper presents a novel approach to binary classification using dynamic logistic ensemble models. We develop an algorithm that automatically partitions the dataset into multiple subsets, constructing an ensemble of logistic models to enhance classification accuracy. This work balances computational efficiency with theoretical rigor, providing a robust and interpretable solution for complex classification tasks.
arXiv Detail & Related papers (2024-11-27T00:22:55Z)
DeCaf: A Causal Decoupling Framework for OOD Generalization on Node Classification [14.96980804513399]
Graph Neural Networks (GNNs) are susceptible to distribution shifts, creating vulnerability and security issues in critical domains. Existing methods that target learning an invariant (feature, structure)-label mapping often depend on oversimplified assumptions about the data generation process. We introduce a more realistic graph data generation model using Structural Causal Models (SCMs) We propose a casual decoupling framework, DeCaf, that independently learns unbiased feature-label and structure-label mappings.
arXiv Detail & Related papers (2024-10-27T00:22:18Z)
Structured Probabilistic Coding [28.46046583495838]
This paper presents a new supervised representation learning framework, namely structured probabilistic coding (SPC) SPC is an encoder-only probabilistic coding technology with a structured regularization from the target space. It can enhance the generalization ability of pre-trained language models for better language understanding.
arXiv Detail & Related papers (2023-12-21T15:28:02Z)
Variable Importance in High-Dimensional Settings Requires Grouping [19.095605415846187]
Conditional Permutation Importance (CPI) bypasses PI's limitations in such cases. Grouping variables statistically via clustering or some prior knowledge gains some power back. We show that the approach extended with stacking controls the type-I error even with highly-correlated groups.
arXiv Detail & Related papers (2023-12-18T00:21:47Z)
Probabilistic Forecasting with Coherent Aggregation [42.215158938066054]
We augment an MQForecaster neural network architecture with a novel deep Gaussian factor forecasting model that achieves coherence by construction. In a comparison to state-of-the-art coherent forecasting methods, DeepCoFactor achieves significant improvements in scaled CRPS forecast accuracy, with average gains of 15%.
arXiv Detail & Related papers (2023-07-19T07:31:37Z)
iSCAN: Identifying Causal Mechanism Shifts among Nonlinear Additive Noise Models [48.33685559041322]
This paper focuses on identifying the causal mechanism shifts in two or more related datasets over the same set of variables. Code implementing the proposed method is open-source and publicly available at https://github.com/kevinsbello/iSCAN.
arXiv Detail & Related papers (2023-06-30T01:48:11Z)
SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN) Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z)
Regularizing Variational Autoencoder with Diversity and Uncertainty Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference. We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z)
Structural Learning of Probabilistic Sentential Decision Diagrams under Partial Closed-World Assumption [127.439030701253]
Probabilistic sentential decision diagrams are a class of structured-decomposable circuits. We propose a new scheme based on a partial closed-world assumption: data implicitly provide the logical base of the circuit. Preliminary experiments show that the proposed approach might properly fit training data, and generalize well to test data, provided that these remain consistent with the underlying logical base.
arXiv Detail & Related papers (2021-07-26T12:01:56Z)
Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information [39.87273353895564]
We propose learning discrete structured representations from unlabeled data by maximizing the mutual information between a structured latent variable and a target variable. Our key technical contribution is an adversarial objective that can be used to tractably estimate mutual information assuming only the feasibility of cross entropy calculation. We apply our model on document hashing and show that it outperforms current best baselines based on discrete and vector quantized variational autoencoders.
arXiv Detail & Related papers (2020-04-08T13:31:53Z)
Supervised Learning for Non-Sequential Data: A Canonical Polyadic Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks. To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor. For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.