Sequence graphs realizations and ambiguity in language models
- URL: http://arxiv.org/abs/2402.08830v1
- Date: Tue, 13 Feb 2024 22:22:51 GMT
- Title: Sequence graphs realizations and ambiguity in language models
- Authors: Sammy Khalife, Yann Ponty, Laurent Bulteau
- Abstract summary: We study the realizability and ambiguity of sequence graphs from a computational point of view.
For a window of size at least 3, we prove hardness of all variants, even when w is considered as a constant.
We conclude with an integer program to solve the realizability problem, and with dynamic programming to solve the enumeration problem.
- Score: 1.4732811715354455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several popular language models represent local contexts in an input text as
bags of words. Such representations are naturally encoded by a sequence graph
whose vertices are the distinct words occurring in x, with edges representing
the (ordered) co-occurrence of two words within a sliding window of size w.
However, this compressed representation is not generally bijective, and may
introduce some degree of ambiguity. Some sequence graphs may admit several
realizations as a sequence, while others may not admit any realization. In this
paper, we study the realizability and ambiguity of sequence graphs from a
combinatorial and computational point of view. We consider the existence and
enumeration of realizations of a sequence graph under multiple settings: window
size w, presence/absence of graph orientation, and presence/absence of weights
(multiplicities). When w = 2, we provide polynomial time algorithms for
realizability and enumeration in all cases except the undirected/weighted
setting, where we show the #P-hardness of enumeration. For a window of size at
least 3, we prove hardness of all variants, even when w is considered as a
constant, with the notable exception of the undirected/unweighted case for
which we propose an XP algorithms for both (realizability and enumeration)
problems, tight due to a corresponding W[1]-hardness result. We conclude with
an integer program formulation to solve the realizability problem, and with
dynamic programming to solve the enumeration problem. This work leaves open the
membership to NP for both problems, a non-trivial question due to the existence
of minimum realizations having exponential size on the instance encoding.
Related papers
- SAT-Based Algorithms for Regular Graph Pattern Matching [40.86962847131912]
We propose a generalization of graph isomorphism that allows one to check complex structural properties.
This specification is given in the form of a Regular Graph Pattern (ReGaP), a special type of graph inspired by regular expressions.
We propose a SAT-based algorithm for checking if a target graph matches a given ReGaP.
arXiv Detail & Related papers (2023-12-15T18:12:44Z) - The Complexity of Envy-Free Graph Cutting [44.58084909019557]
We consider the problem of fairly dividing a set of heterogeneous divisible resources among agents with different preferences.
We focus on the setting where the resources correspond to the edges of a connected graph, and every agent must be assigned a connected piece of this graph.
The problem is NP-complete, and we analyze its complexity with respect to two natural complexity measures: the number of agents and the number of edges in the graph.
arXiv Detail & Related papers (2023-12-12T07:54:30Z) - Discrete Graph Auto-Encoder [52.50288418639075]
We introduce a new framework named Discrete Graph Auto-Encoder (DGAE)
We first use a permutation-equivariant auto-encoder to convert graphs into sets of discrete latent node representations.
In the second step, we sort the sets of discrete latent representations and learn their distribution with a specifically designed auto-regressive model.
arXiv Detail & Related papers (2023-06-13T12:40:39Z) - Compositional Generalization without Trees using Multiset Tagging and
Latent Permutations [121.37328648951993]
We phrase semantic parsing as a two-step process: we first tag each input token with a multiset of output tokens.
Then we arrange the tokens into an output sequence using a new way of parameterizing and predicting permutations.
Our model outperforms pretrained seq2seq models and prior work on realistic semantic parsing tasks.
arXiv Detail & Related papers (2023-05-26T14:09:35Z) - Linear-Time Modeling of Linguistic Structure: An Order-Theoretic
Perspective [97.57162770792182]
Tasks that model the relation between pairs of tokens in a string are a vital part of understanding natural language.
We show that these exhaustive comparisons can be avoided, and, moreover, the complexity can be reduced to linear by casting the relation between tokens as a partial order over the string.
Our method predicts real numbers for each token in a string in parallel and sorts the tokens accordingly, resulting in total orders of the tokens in the string.
arXiv Detail & Related papers (2023-05-24T11:47:35Z) - Can Language Models Solve Graph Problems in Natural Language? [51.28850846990929]
Large language models (LLMs) are increasingly adopted for a variety of tasks with implicit graphical structures.
We propose NLGraph, a benchmark of graph-based problem solving simulating in natural language.
arXiv Detail & Related papers (2023-05-17T08:29:21Z) - Casting graph isomorphism as a point set registration problem using a
simplex embedding and sampling [0.0]
A graph can be represented as a point set in enough dimensions using a simplex embedding and sampling.
The isomorphism of them corresponds to the existence of a perfect registration between the point set forms of the graphs.
The related idea of equivalence classes suggests that graph canonization may be an important tool in tackling graph isomorphism problem.
arXiv Detail & Related papers (2021-11-15T12:16:21Z) - Evolving test instances of the Hamiltonian completion problem [0.7734726150561086]
We propose a new methodology to generate a diverse set of instances by using an evolutionary algorithm.
We analyze the resulting graphs and get key insights into which attributes are most related to algorithm performance.
arXiv Detail & Related papers (2020-10-05T20:04:58Z) - Online Dense Subgraph Discovery via Blurred-Graph Feedback [87.9850024070244]
We introduce a novel learning problem for dense subgraph discovery.
We first propose a edge-time algorithm that obtains a nearly-optimal solution with high probability.
We then design a more scalable algorithm with a theoretical guarantee.
arXiv Detail & Related papers (2020-06-24T11:37:33Z) - Can Graph Neural Networks Count Substructures? [53.256112515435355]
We study the power of graph neural networks (GNNs) via their ability to count attributed graph substructures.
We distinguish between two types of substructure counting: inducedsubgraph-count and subgraphcount-count, and both positive and negative answers for popular GNN architectures.
arXiv Detail & Related papers (2020-02-10T18:53:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.