A Transfer Learning Approach for Dialogue Act Classification of GitHub
Issue Comments
- URL: http://arxiv.org/abs/2011.04867v1
- Date: Tue, 10 Nov 2020 02:56:18 GMT
- Title: A Transfer Learning Approach for Dialogue Act Classification of GitHub
Issue Comments
- Authors: Ayesha Enayet and Gita Sukthankar
- Abstract summary: This paper presents a transfer learning approach for performing dialogue act classification on issue comments on GitHub.
Since no large labeled corpus of GitHub issue comments exists, employing transfer learning enables us to leverage standard dialogue act datasets.
Being able to map the issue comments to dialogue acts is a useful stepping stone towards understanding cognitive team processes.
- Score: 1.370633147306388
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social coding platforms, such as GitHub, serve as laboratories for studying
collaborative problem solving in open source software development; a key
feature is their ability to support issue reporting which is used by teams to
discuss tasks and ideas. Analyzing the dialogue between team members, as
expressed in issue comments, can yield important insights about the performance
of virtual teams. This paper presents a transfer learning approach for
performing dialogue act classification on issue comments. Since no large
labeled corpus of GitHub issue comments exists, employing transfer learning
enables us to leverage standard dialogue act datasets in combination with our
own GitHub comment dataset. We compare the performance of several word and
sentence level encoding models including Global Vectors for Word
Representations (GloVe), Universal Sentence Encoder (USE), and Bidirectional
Encoder Representations from Transformers (BERT). Being able to map the issue
comments to dialogue acts is a useful stepping stone towards understanding
cognitive team processes.
Related papers
- Visual Analysis of GitHub Issues to Gain Insights [2.9051263101214566]
This paper presents a prototype web application that generates visualizations to offer insights into issue timelines.
It focuses on the lifecycle of issues and depicts vital information to enhance users' understanding of development patterns.
arXiv Detail & Related papers (2024-07-30T15:17:57Z) - SWE-bench: Can Language Models Resolve Real-World GitHub Issues? [80.52201658231895]
SWE-bench is an evaluation framework consisting of $2,294$ software engineering problems drawn from real GitHub issues and corresponding pull requests across $12$ popular Python repositories.
We show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues.
arXiv Detail & Related papers (2023-10-10T16:47:29Z) - InterCode: Standardizing and Benchmarking Interactive Coding with
Execution Feedback [50.725076393314964]
We introduce InterCode, a lightweight, flexible, and easy-to-use framework of interactive coding as a standard reinforcement learning environment.
Our framework is language and platform agnostic, uses self-contained Docker environments to provide safe and reproducible execution.
We demonstrate InterCode's viability as a testbed by evaluating multiple state-of-the-art LLMs configured with different prompting strategies.
arXiv Detail & Related papers (2023-06-26T17:59:50Z) - SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation [55.82577086422923]
We provide a feasible definition of dialogue segmentation points with the help of document-grounded dialogues.
We release a large-scale supervised dataset called SuperDialseg, containing 9,478 dialogues.
We also provide a benchmark including 18 models across five categories for the dialogue segmentation task.
arXiv Detail & Related papers (2023-05-15T06:08:01Z) - SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for
Task-Oriented Dialog Understanding [68.94808536012371]
We propose a tree-structured pre-trained conversation model, which learns dialog representations from limited labeled dialogs and large-scale unlabeled dialog corpora.
Our method can achieve new state-of-the-art results on the DialoGLUE benchmark consisting of seven datasets and four popular dialog understanding tasks.
arXiv Detail & Related papers (2022-09-14T13:42:50Z) - Looking for related discussions on GitHub Discussions [18.688096673390586]
GitHub Discussions is a native forum to facilitate collaborative discussions between users and members of communities hosted on the platform.
As GitHub Discussions resembles PCQA forums, it faces challenges similar to those faced by such environments.
While duplicate posts have the same content - and may be exact copies - near-duplicates share similar topics and information.
We propose an approach based on a Sentence-BERT pre-trained model: the RD-Detector.
arXiv Detail & Related papers (2022-06-23T20:41:33Z) - CoAuthor: Designing a Human-AI Collaborative Writing Dataset for
Exploring Language Model Capabilities [92.79451009324268]
We present CoAuthor, a dataset designed for revealing GPT-3's capabilities in assisting creative and argumentative writing.
We demonstrate that CoAuthor can address questions about GPT-3's language, ideation, and collaboration capabilities.
We discuss how this work may facilitate a more principled discussion around LMs' promises and pitfalls in relation to interaction design.
arXiv Detail & Related papers (2022-01-18T07:51:57Z) - Predicting Issue Types on GitHub [8.791809365994682]
Ticket Tagger is a GitHub app analyzing the issue title and description through machine learning techniques.
We empirically evaluated the tool's prediction performance on about 30,000 GitHub issues.
arXiv Detail & Related papers (2021-07-21T08:14:48Z) - Reasoning in Dialog: Improving Response Generation by Context Reading
Comprehension [49.92173751203827]
In multi-turn dialog, utterances do not always take the full form of sentences.
We propose to improve the response generation performance by examining the model's ability to answer a reading comprehension question.
arXiv Detail & Related papers (2020-12-14T10:58:01Z) - Multi-turn Response Selection using Dialogue Dependency Relations [39.99448321736736]
Multi-turn response selection is a task designed for developing dialogue agents.
We propose a dialogue extraction algorithm to transform a dialogue history into threads based on their dependency relations.
Our model outperforms the state-of-the-art baselines on both D7 and DSTC8*, with competitive results on Ubuntu.
arXiv Detail & Related papers (2020-10-04T08:00:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.