Related papers: Maximum Spanning Trees Are Invariant to Temperature Scaling in Graph-based Dependency Parsing

Maximum Spanning Trees Are Invariant to Temperature Scaling in Graph-based Dependency Parsing

URL: http://arxiv.org/abs/2106.08159v1
Date: Tue, 15 Jun 2021 13:57:24 GMT
Title: Maximum Spanning Trees Are Invariant to Temperature Scaling in Graph-based Dependency Parsing
Authors: Stefan Gr\"unewald
Abstract summary: Modern graph-based syntactic dependencys operate by predicting, for each token within a sentence, a probability distribution over its possible syntactic heads. We prove that temperature scaling, a popular technique for post-hoc calibration of neural networks, cannot change the output of the parsing procedure. We conclude that other techniques are needed to tackle miscalibration in graph-based dependencys in a way that improves accuracy.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern graph-based syntactic dependency parsers operate by predicting, for each token within a sentence, a probability distribution over its possible syntactic heads (i.e., all other tokens) and then extracting a maximum spanning tree from the resulting log-probabilities. Nowadays, virtually all such parsers utilize deep neural networks and may thus be susceptible to miscalibration (in particular, overconfident predictions). In this paper, we prove that temperature scaling, a popular technique for post-hoc calibration of neural networks, cannot change the output of the aforementioned procedure. We conclude that other techniques are needed to tackle miscalibration in graph-based dependency parsers in a way that improves parsing accuracy.

Related papers

Semiparametric conformal prediction [79.6147286161434]
We construct a conformal prediction set accounting for the joint correlation structure of the vector-valued non-conformity scores. We flexibly estimate the joint cumulative distribution function (CDF) of the scores. Our method yields desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z)
A Pseudo-Semantic Loss for Autoregressive Models with Logical Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning. We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution. We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z)
Semantic Strengthening of Neuro-Symbolic Learning [85.6195120593625]
Neuro-symbolic approaches typically resort to fuzzy approximations of a probabilistic objective. We show how to compute this efficiently for tractable circuits. We test our approach on three tasks: predicting a minimum-cost path in Warcraft, predicting a minimum-cost perfect matching, and solving Sudoku puzzles.
arXiv Detail & Related papers (2023-02-28T00:04:22Z)
Neural-Symbolic Inference for Robust Autoregressive Graph Parsing via Compositional Uncertainty Quantification [28.084115398817016]
We study compositionality-aware approach to neural-symbolic inference informed by model confidence. We empirically investigate the approach in the English Resource (ERG) parsing problem on a diverse suite of standard in-domain and seven OOD Grammar. Our approach leads to 35.26% and 35.60% error reduction in aggregated S Grammar score over neural and symbolic approaches respectively, and 14% absolute accuracy gain in key tail linguistic categories over the neural model.
arXiv Detail & Related papers (2023-01-26T23:11:03Z)
Online Centralized Non-parametric Change-point Detection via Graph-based Likelihood-ratio Estimation [77.81487285123147]
Consider each node of a graph to be generating a data stream that is synchronized and observed at near real-time. At a change-point $tau$, a change occurs at a subset of nodes $C$, which affects the probability distribution of their associated node streams. We propose a novel kernel-based method to both detect $tau$ and localize $C$, based on the direct estimation of the likelihood-ratio between the post-change and the pre-change distributions of the node streams.
arXiv Detail & Related papers (2023-01-08T10:15:24Z)
Distributional Gradient Boosting Machines [77.34726150561087]
Our framework is based on XGBoost and LightGBM. We show that our framework achieves state-of-the-art forecast accuracy.
arXiv Detail & Related papers (2022-04-02T06:32:19Z)
Bayesian Graph Contrastive Learning [55.36652660268726]
We propose a novel perspective of graph contrastive learning methods showing random augmentations leads to encoders. Our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector. We show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-12-15T01:45:32Z)
Scaling Up Graph Neural Networks Via Graph Coarsening [18.176326897605225]
Scalability of graph neural networks (GNNs) is one of the major challenges in machine learning. In this paper, we propose to use graph coarsening for scalable training of GNNs. We show that, simply applying off-the-shelf coarsening methods, we can reduce the number of nodes by up to a factor of ten without causing a noticeable downgrade in classification accuracy.
arXiv Detail & Related papers (2021-06-09T15:46:17Z)
Multivariate Boosted Trees and Applications to Forecasting and Control [0.0]
Gradient boosted trees are non-parametric regressors that exploit sequential model fitting and gradient descent to minimize a specific loss function. In this paper, we present a computationally efficient algorithm for fitting multivariate boosted trees.
arXiv Detail & Related papers (2020-03-08T19:26:59Z)
Residual Correlation in Graph Neural Network Regression [39.54530450932135]
We show that conditional independence assumption severely limits predictive power. We address this problem with an interpretable and efficient framework. Our framework achieves substantially higher accuracy than competing baselines.
arXiv Detail & Related papers (2020-02-19T16:32:54Z)
Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs. We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions. Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.