StructInbet: Integrating Explicit Structural Guidance into Inbetween Frame Generation
- URL: http://arxiv.org/abs/2507.13377v1
- Date: Tue, 15 Jul 2025 01:01:29 GMT
- Title: StructInbet: Integrating Explicit Structural Guidance into Inbetween Frame Generation
- Authors: Zhenglin Pan, Haoran Xie,
- Abstract summary: We propose StructInbet, an inbetweening system designed to generate controllable transitions over explicit structural guidance.<n>We adopt a temporal attention mechanism that incorporates visual identity from both the preceding and succeeding transitions, ensuring consistency in character appearance.
- Score: 3.528466385159056
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose StructInbet, an inbetweening system designed to generate controllable transitions over explicit structural guidance. StructInbet introduces two key contributions. First, we propose explicit structural guidance to the inbetweening problem to reduce the ambiguity inherent in pixel trajectories. Second, we adopt a temporal attention mechanism that incorporates visual identity from both the preceding and succeeding keyframes, ensuring consistency in character appearance.
Related papers
- ffstruc2vec: Flat, Flexible and Scalable Learning of Node Representations from Structural Identities [0.0]
This paper introduces ffstruc2vec, a scalable deep-learning framework for learning node embedding vectors that preserve structural identities.<n>Its flat, efficient architecture allows high flexibility in capturing diverse types of structural patterns, enabling broad adaptability to various downstream application tasks.<n>The proposed framework significantly outperforms existing approaches across diverse unsupervised and supervised tasks in practical applications.
arXiv Detail & Related papers (2025-04-01T18:47:16Z) - "Principal Components" Enable A New Language of Images [79.45806370905775]
We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space.<n>Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
arXiv Detail & Related papers (2025-03-11T17:59:41Z) - Composing or Not Composing? Towards Distributional Construction Grammars [47.636049672406145]
Building the meaning of a linguistic utterance is incremental, step-by-step, based on a compositional process.<n>It is therefore necessary to propose a framework bringing together both approaches.<n>We present an approach based on Construction Grammars and completing this framework in order to account for these different mechanisms.
arXiv Detail & Related papers (2024-12-10T11:17:02Z) - Learning to Model Graph Structural Information on MLPs via Graph Structure Self-Contrasting [50.181824673039436]
We propose a Graph Structure Self-Contrasting (GSSC) framework that learns graph structural information without message passing.
The proposed framework is based purely on Multi-Layer Perceptrons (MLPs), where the structural information is only implicitly incorporated as prior knowledge.
It first applies structural sparsification to remove potentially uninformative or noisy edges in the neighborhood, and then performs structural self-contrasting in the sparsified neighborhood to learn robust node representations.
arXiv Detail & Related papers (2024-09-09T12:56:02Z) - Compositional Structures in Neural Embedding and Interaction Decompositions [101.40245125955306]
We describe a basic correspondence between linear algebraic structures within vector embeddings in artificial neural networks.
We introduce a characterization of compositional structures in terms of "interaction decompositions"
We establish necessary and sufficient conditions for the presence of such structures within the representations of a model.
arXiv Detail & Related papers (2024-07-12T02:39:50Z) - Learning Correlation Structures for Vision Transformers [93.22434535223587]
We introduce a new attention mechanism, dubbed structural self-attention (StructSA)
We generate attention maps by recognizing space-time structures of key-query correlations via convolution.
This effectively leverages rich structural patterns in images and videos such as scene layouts, object motion, and inter-object relations.
arXiv Detail & Related papers (2024-04-05T07:13:28Z) - ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided
Code-Vision Representation [82.88378582161717]
State-of-the-art vision-language models (VLMs) still have limited performance in structural knowledge extraction.
We present ViStruct, a training framework to learn VLMs for effective visual structural knowledge extraction.
arXiv Detail & Related papers (2023-11-22T09:23:34Z) - StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure [5.2869308707704255]
StrAE is a Structured Autoencoder framework that through strict adherence to explicit structure, enables effective learning of multi-level representations.
We show that our results are directly attributable to the informativeness of the structure provided as input, and show that this is not the case for existing tree models.
We then extend StrAE to allow the model to define its own compositions using a simple localised-merge algorithm.
arXiv Detail & Related papers (2023-05-09T16:20:48Z) - Learning to Incorporate Structure Knowledge for Image Inpainting [20.93448933499842]
This paper develops a multi-task learning framework that attempts to incorporate the image structure knowledge to assist image inpainting.
The primary idea is to train a shared generator to simultaneously complete the corrupted image and corresponding structures.
In the meantime, we also introduce a structure embedding scheme to explicitly embed the learned structure features into the inpainting process.
arXiv Detail & Related papers (2020-02-11T02:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.