Related papers: AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering

AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering

URL: http://arxiv.org/abs/2407.19410v1
Date: Sun, 28 Jul 2024 06:23:06 GMT
Title: AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering
Authors: Mahiro Ukai, Shuhei Kurita, Atsushi Hashimoto, Yoshitaka Ushiku, Nakamasa Inoue,
Abstract summary: We propose AdaCoder, an adaptive prompt compression framework for visual question answering models. AdaCoder operates in two phases: a compression phase and an inference phase. We demonstrate that it reduces token length by 71.1%, while maintaining or even improving the performance of visual question answering.
Score: 23.169961738978614
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual question answering aims to provide responses to natural language questions given visual input. Recently, visual programmatic models (VPMs), which generate executable programs to answer questions through large language models (LLMs), have attracted research interest. However, they often require long input prompts to provide the LLM with sufficient API usage details to generate relevant code. To address this limitation, we propose AdaCoder, an adaptive prompt compression framework for VPMs. AdaCoder operates in two phases: a compression phase and an inference phase. In the compression phase, given a preprompt that describes all API definitions in the Python language with example snippets of code, a set of compressed preprompts is generated, each depending on a specific question type. In the inference phase, given an input question, AdaCoder predicts the question type and chooses the appropriate corresponding compressed preprompt to generate code to answer the question. Notably, AdaCoder employs a single frozen LLM and pre-defined prompts, negating the necessity of additional training and maintaining adaptability across different powerful black-box LLMs such as GPT and Claude. In experiments, we apply AdaCoder to ViperGPT and demonstrate that it reduces token length by 71.1%, while maintaining or even improving the performance of visual question answering.

Related papers

QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA [16.494799458292]
Images often contain more redundant information than text, and not all visual details are pertinent to specific questions. We propose QG-VTC, a novel question-guided visual token compression method for MLLM-based VQA tasks. QG-VTC employs a pretrained text encoder and a learnable feed-forward layer to embed user questions into the vision encoder's feature space.
arXiv Detail & Related papers (2025-04-01T11:07:19Z)
CODEPROMPTZIP: Code-specific Prompt Compression for Retrieval-Augmented Generation in Coding Tasks with LMs [6.936336826531964]
Retrieval-Augmented Generation (RAG) enhances coding tasks by incorporating retrieved code examples into prompts. Existing prompt compression techniques focus on natural language, lacking tailored solutions for code. We propose CodePromptZip, a framework that compresses code examples before integrating into RAG.
arXiv Detail & Related papers (2025-02-19T23:15:23Z)
Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering [12.399738382728653]
Visual question answering (VQA) is the task of providing accurate answers to natural language questions based on visual input. This paper introduces PyramidCoder, a novel prompting framework for PVQA models. Compared to the state-of-the-art PVQA model, our approach improves accuracy by at least 0.5% on the GQA dataset, 1.4% on the VQAv2 dataset, and 2.9% on the NLVR2 dataset.
arXiv Detail & Related papers (2024-07-30T05:36:43Z)
Learning to Compress Prompt in Natural Language Formats [54.06967020905763]
Large language models (LLMs) are great at processing multiple natural language processing tasks. LLMs are constrained by inferior performance with long context, slow inference speed, and the high cost of computing the results. This work aims to compress lengthy prompts in the form of natural language with LLM transferability.
arXiv Detail & Related papers (2024-02-28T20:41:21Z)
Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression [39.233017243612025]
Large language models (LLMs) require lengthy prompts as the input context to produce output aligned with user intentions. We propose a novel method for compressing prompts which also can assist the prompt interpretation and engineering. Gist-COCO employs an encoder-decoder based language model and then incorporates an additional encoder as a plugin module to compress prompts with inputs using gist tokens.
arXiv Detail & Related papers (2024-02-25T11:07:08Z)
SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects. We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z)
Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code. We find that code prompting exhibits a high-performance boost for multiple LLMs. Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z)
AskIt: Unified Programming Interface for Programming with Large Language Models [0.0]
Large Language Models (LLMs) exhibit a unique phenomenon known as emergent abilities, demonstrating adeptness across numerous tasks. This paper introduces AskIt, a domain-specific language specifically designed for LLMs. Across 50 tasks, AskIt generated concise prompts, achieving a 16.14 % reduction in prompt length compared to benchmarks.
arXiv Detail & Related papers (2023-08-29T21:44:27Z)
Large Language Models Should Ask Clarifying Questions to Increase Confidence in Generated Code [0.7252027234425334]
Large language models (LLMs) have significantly improved the ability to perform tasks in the field of code generation. There is still a gap between LLMs being capable coders and being top-tier software engineers. I propose a communication-centered process that uses an LLM-generated communicator to identify issues with high ambiguity or low confidence in problem descriptions and generated code.
arXiv Detail & Related papers (2023-08-25T17:33:05Z)
LongCoder: A Long-Range Pre-trained Language Model for Code Completion [56.813974784131624]
LongCoder employs a sliding window mechanism for self-attention and introduces two types of globally accessible tokens. Bridge tokens are inserted throughout the input sequence to aggregate local information and facilitate global interaction. memory tokens are included to highlight important statements that may be invoked later and need to be memorized.
arXiv Detail & Related papers (2023-06-26T17:59:24Z)
Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models [74.95486528482327]
We explore code prompting, a neural symbolic prompting method with both zero-shot and few-shot versions which triggers code as intermediate steps. We conduct experiments on 7 widely-used benchmarks involving symbolic reasoning and arithmetic reasoning.
arXiv Detail & Related papers (2023-05-29T15:14:09Z)
AceCoder: Utilizing Existing Code to Enhance Code Generation [45.034292331340524]
Existing prompting techniques are designed for natural language generation and have low accuracy in code generation. AceCoder contains two novel mechanisms (i.e., guided code generation and example retrieval) to solve these challenges. Results show that AceCoder can significantly improve the performance of LLMs on code generation.
arXiv Detail & Related papers (2023-03-31T02:57:15Z)
Binding Language Models in Symbolic Languages [146.3027328556881]
Binder is a training-free neural-symbolic framework that maps the task input to a program. In the parsing stage, Codex is able to identify the part of the task input that cannot be answerable by the original programming language. In the execution stage, Codex can perform versatile functionalities given proper prompts in the API calls.
arXiv Detail & Related papers (2022-10-06T12:55:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.