Related papers: Sequence graphs realizations and ambiguity in language models

Sequence graphs realizations and ambiguity in language models

URL: http://arxiv.org/abs/2402.08830v1
Date: Tue, 13 Feb 2024 22:22:51 GMT
Title: Sequence graphs realizations and ambiguity in language models
Authors: Sammy Khalife, Yann Ponty, Laurent Bulteau
Abstract summary: We study the realizability and ambiguity of sequence graphs from a computational point of view. For a window of size at least 3, we prove hardness of all variants, even when w is considered as a constant. We conclude with an integer program to solve the realizability problem, and with dynamic programming to solve the enumeration problem.
Score: 1.4732811715354455
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Several popular language models represent local contexts in an input text as bags of words. Such representations are naturally encoded by a sequence graph whose vertices are the distinct words occurring in x, with edges representing the (ordered) co-occurrence of two words within a sliding window of size w. However, this compressed representation is not generally bijective, and may introduce some degree of ambiguity. Some sequence graphs may admit several realizations as a sequence, while others may not admit any realization. In this paper, we study the realizability and ambiguity of sequence graphs from a combinatorial and computational point of view. We consider the existence and enumeration of realizations of a sequence graph under multiple settings: window size w, presence/absence of graph orientation, and presence/absence of weights (multiplicities). When w = 2, we provide polynomial time algorithms for realizability and enumeration in all cases except the undirected/weighted setting, where we show the #P-hardness of enumeration. For a window of size at least 3, we prove hardness of all variants, even when w is considered as a constant, with the notable exception of the undirected/unweighted case for which we propose an XP algorithms for both (realizability and enumeration) problems, tight due to a corresponding W[1]-hardness result. We conclude with an integer program formulation to solve the realizability problem, and with dynamic programming to solve the enumeration problem. This work leaves open the membership to NP for both problems, a non-trivial question due to the existence of minimum realizations having exponential size on the instance encoding.

Related papers

SAT-Based Algorithms for Regular Graph Pattern Matching [40.86962847131912]
We propose a generalization of graph isomorphism that allows one to check complex structural properties. This specification is given in the form of a Regular Graph Pattern (ReGaP), a special type of graph inspired by regular expressions. We propose a SAT-based algorithm for checking if a target graph matches a given ReGaP.
arXiv Detail & Related papers (2023-12-15T18:12:44Z)
The Complexity of Envy-Free Graph Cutting [44.58084909019557]
We consider the problem of fairly dividing a set of heterogeneous divisible resources among agents with different preferences. We focus on the setting where the resources correspond to the edges of a connected graph, and every agent must be assigned a connected piece of this graph. The problem is NP-complete, and we analyze its complexity with respect to two natural complexity measures: the number of agents and the number of edges in the graph.
arXiv Detail & Related papers (2023-12-12T07:54:30Z)
Discrete Graph Auto-Encoder [52.50288418639075]
We introduce a new framework named Discrete Graph Auto-Encoder (DGAE) We first use a permutation-equivariant auto-encoder to convert graphs into sets of discrete latent node representations. In the second step, we sort the sets of discrete latent representations and learn their distribution with a specifically designed auto-regressive model.
arXiv Detail & Related papers (2023-06-13T12:40:39Z)
Compositional Generalization without Trees using Multiset Tagging and Latent Permutations [121.37328648951993]
We phrase semantic parsing as a two-step process: we first tag each input token with a multiset of output tokens. Then we arrange the tokens into an output sequence using a new way of parameterizing and predicting permutations. Our model outperforms pretrained seq2seq models and prior work on realistic semantic parsing tasks.
arXiv Detail & Related papers (2023-05-26T14:09:35Z)
Linear-Time Modeling of Linguistic Structure: An Order-Theoretic Perspective [97.57162770792182]
Tasks that model the relation between pairs of tokens in a string are a vital part of understanding natural language. We show that these exhaustive comparisons can be avoided, and, moreover, the complexity can be reduced to linear by casting the relation between tokens as a partial order over the string. Our method predicts real numbers for each token in a string in parallel and sorts the tokens accordingly, resulting in total orders of the tokens in the string.
arXiv Detail & Related papers (2023-05-24T11:47:35Z)
Can Language Models Solve Graph Problems in Natural Language? [51.28850846990929]
Large language models (LLMs) are increasingly adopted for a variety of tasks with implicit graphical structures. We propose NLGraph, a benchmark of graph-based problem solving simulating in natural language.
arXiv Detail & Related papers (2023-05-17T08:29:21Z)
Casting graph isomorphism as a point set registration problem using a simplex embedding and sampling [0.0]
A graph can be represented as a point set in enough dimensions using a simplex embedding and sampling. The isomorphism of them corresponds to the existence of a perfect registration between the point set forms of the graphs. The related idea of equivalence classes suggests that graph canonization may be an important tool in tackling graph isomorphism problem.
arXiv Detail & Related papers (2021-11-15T12:16:21Z)
Evolving test instances of the Hamiltonian completion problem [0.7734726150561086]
We propose a new methodology to generate a diverse set of instances by using an evolutionary algorithm. We analyze the resulting graphs and get key insights into which attributes are most related to algorithm performance.
arXiv Detail & Related papers (2020-10-05T20:04:58Z)
Online Dense Subgraph Discovery via Blurred-Graph Feedback [87.9850024070244]
We introduce a novel learning problem for dense subgraph discovery. We first propose a edge-time algorithm that obtains a nearly-optimal solution with high probability. We then design a more scalable algorithm with a theoretical guarantee.
arXiv Detail & Related papers (2020-06-24T11:37:33Z)
Can Graph Neural Networks Count Substructures? [53.256112515435355]
We study the power of graph neural networks (GNNs) via their ability to count attributed graph substructures. We distinguish between two types of substructure counting: inducedsubgraph-count and subgraphcount-count, and both positive and negative answers for popular GNN architectures.
arXiv Detail & Related papers (2020-02-10T18:53:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.