Related papers: CodeV: Issue Resolving with Visual Data

CodeV: Issue Resolving with Visual Data

URL: http://arxiv.org/abs/2412.17315v1
Date: Mon, 23 Dec 2024 06:17:11 GMT
Title: CodeV: Issue Resolving with Visual Data
Authors: Linhao Zhang, Daoguang Zan, Quanshun Yang, Zhirong Huang, Dong Chen, Bo Shen, Tianyu Liu, Yongshun Gong, Pengjie Huang, Xudong Lu, Guangtai Liang, Lizhen Cui, Qianxiang Wang,
Abstract summary: We propose CodeV, the first approach to leveraging visual data to enhance the issue-resolving capabilities of Large Language Models (LLMs)<n>CodeV resolves each issue by following a two-phase process: data processing and patch generation.<n>We demonstrate the effectiveness of CodeV, as well as provide valuable insights into leveraging visual data to resolve GitHub issues.
Score: 32.05873957588477
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have advanced rapidly in recent years, with their applications in software engineering expanding to more complex repository-level tasks. GitHub issue resolving is a key challenge among these tasks. While recent approaches have made progress on this task, they focus on textual data within issues, neglecting visual data. However, this visual data is crucial for resolving issues as it conveys additional knowledge that text alone cannot. We propose CodeV, the first approach to leveraging visual data to enhance the issue-resolving capabilities of LLMs. CodeV resolves each issue by following a two-phase process: data processing and patch generation. To evaluate CodeV, we construct a benchmark for visual issue resolving, namely Visual SWE-bench. Through extensive experiments, we demonstrate the effectiveness of CodeV, as well as provide valuable insights into leveraging visual data to resolve GitHub issues.

Related papers

Response Wide Shut? Surprising Observations in Basic Vision Language Model Capabilities [54.94982467313341]
Vision-language Models (VLMs) have emerged as general-purpose tools for addressing a variety of complex computer vision problems.<n>We set out to understand the limitations of SoTA VLMs on fundamental visual tasks by constructing a series of tests that probe which components of design, specifically, may be lacking.
arXiv Detail & Related papers (2025-07-10T15:26:41Z)
Hidden in plain sight: VLMs overlook their visual representations [48.83628674170634]
We compare vision language models (VLMs) to their visual encoders to understand their ability to integrate across these modalities.<n>We find that VLMs perform substantially worse than their visual encoders, dropping to near-chance performance.
arXiv Detail & Related papers (2025-06-09T17:59:54Z)
QG-VTC: Question-Guided Visual Token Compression in MLLMs for Efficient VQA [16.494799458292]
Images often contain more redundant information than text, and not all visual details are pertinent to specific questions. We propose QG-VTC, a novel question-guided visual token compression method for MLLM-based VQA tasks. QG-VTC employs a pretrained text encoder and a learnable feed-forward layer to embed user questions into the vision encoder's feature space.
arXiv Detail & Related papers (2025-04-01T11:07:19Z)
Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption [65.06388526722186]
Infrared-visible image fusion is a critical task in computer vision. There is a lack of recent comprehensive surveys that address this rapidly expanding domain. We introduce a multi-dimensional framework to elucidate common learning-based IVIF methods.
arXiv Detail & Related papers (2025-01-18T13:17:34Z)
One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering [31.025439143093585]
Vision-Language Models (VLMs) have shown significant promise in Visual Question Answering (VQA) tasks by leveraging web-scale multimodal datasets. These models often struggle with continual learning due to catastrophic forgetting when adapting to new tasks. We propose the first data-free method that leverages the language generation capability of a VLM, instead of relying on external models.
arXiv Detail & Related papers (2024-11-04T16:04:59Z)
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques [9.248637518957445]
We review the fundamental theories related to visual language models (VLMs) and the datasets constructed for them in remote sensing. We categorize the improvement methods into three main parts according to the core components ofVLMs and provide a detailed introduction and comparison of these methods.
arXiv Detail & Related papers (2024-10-15T13:28:55Z)
Discriminative Spatial-Semantic VOS Solution: 1st Place Solution for 6th LSVOS [68.47681139026666]
Video object segmentation (VOS) is a crucial task in computer vision. Current VOS methods struggle with complex scenes and prolonged object motions. This report introduces a discriminative spatial-temporal VOS model.
arXiv Detail & Related papers (2024-08-29T10:47:17Z)
Visual Analysis of GitHub Issues to Gain Insights [2.9051263101214566]
This paper presents a prototype web application that generates visualizations to offer insights into issue timelines. It focuses on the lifecycle of issues and depicts vital information to enhance users' understanding of development patterns.
arXiv Detail & Related papers (2024-07-30T15:17:57Z)
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs [103.61860743476933]
We introduce V Debugger, a critic-refiner framework trained to localize and debug visual programs by tracking execution step by step. V Debugger identifies and corrects program errors leveraging detailed execution feedback, improving interpretability and accuracy. Evaluations on six datasets demonstrate V Debugger's effectiveness, showing performance improvements of up to 3.2% in downstream task accuracy.
arXiv Detail & Related papers (2024-06-19T11:09:16Z)
CodeR: Issue Resolving with Multi-Agent and Task Graphs [21.499576889342343]
GitHub issue resolving has attracted significant attention from academia and industry. We propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs. On SWE-bench lite, CodeR is able to solve 28.33% of issues, when submitting only once for each issue.
arXiv Detail & Related papers (2024-06-03T13:13:35Z)
Learning without Forgetting for Vision-Language Models [86.53237963364754]
Class-Incremental Learning (CIL) or continual learning is a desired capability in the real world. Recent advances in Vision-Language Models (VLM) have shown promising capabilities in learning generalizable representations. We propose PROjectiOn Fusion (PROOF) that enables VLMs to learn without forgetting.
arXiv Detail & Related papers (2023-05-30T17:59:32Z)
Visual Named Entity Linking: A New Dataset and A Baseline [61.38231023490981]
We consider a purely Visual-based Named Entity Linking (VNEL) task, where the input only consists of an image. We propose three different sub-tasks, i.e., visual to visual entity linking (V2VEL), visual to textual entity linking (V2TEL), and visual to visual-textual entity linking (V2VTEL) We present a high-quality human-annotated visual person linking dataset, named WIKIPerson.
arXiv Detail & Related papers (2022-11-09T13:27:50Z)
Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets [5.45761450227064]
We propose a new Few-Shot Visual Question Generation (FS-VQG) task and provide a comprehensive benchmark to it. We evaluate various existing VQG approaches as well as popular few-shot solutions based on meta-learning and self-supervised strategies for the FS-VQG task. Several important findings emerge from our experiments, that shed light on the limits of current models in few-shot vision and language generation tasks.
arXiv Detail & Related papers (2022-10-13T15:01:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.