Related papers: Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach

Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach

URL: http://arxiv.org/abs/2003.10715v2
Date: Fri, 27 Aug 2021 06:38:05 GMT
Title: Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach
Authors: David Schindler, Benjamin Zapilko, Frank Kr\"uger
Abstract summary: We present SoftwareKG, a knowledge graph that contains information about software mentions from more than 51,000 scientific articles from the social sciences. A neural network was used to train an LSTM based neural network to identify software mentions in scientific articles. We show how SoftwareKG can be used to assess the role of software in the social sciences.
Score: 0.483420384410068
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge about the software used in scientific investigations is necessary for different reasons, including provenance of the results, measuring software impact to attribute developers, and bibliometric software citation analysis in general. Additionally, providing information about whether and how the software and the source code are available allows an assessment about the state and role of open source software in science in general. While such analyses can be done manually, large scale analyses require the application of automated methods of information extraction and linking. In this paper, we present SoftwareKG - a knowledge graph that contains information about software mentions from more than 51,000 scientific articles from the social sciences. A silver standard corpus, created by a distant and weak supervision approach, and a gold standard corpus, created by manual annotation, were used to train an LSTM based neural network to identify software mentions in scientific articles. The model achieves a recognition rate of .82 F-score in exact matches. As a result, we identified more than 133,000 software mentions. For entity disambiguation, we used the public domain knowledge base DBpedia. Furthermore, we linked the entities of the knowledge graph to other knowledge bases such as the Microsoft Academic Knowledge Graph, the Software Ontology, and Wikidata. Finally, we illustrate, how SoftwareKG can be used to assess the role of software in the social sciences.

Related papers

The Software Observatory: aggregating and analysing software metadata for trend computation and FAIR assessment [0.0]
The Software Observatory at OpenEBench is a novel web portal that consolidates software metadata from various sources.<n>Our platform enables users to analyse trends, identify patterns and advancements within the Life Sciences research software ecosystem.
arXiv Detail & Related papers (2025-10-07T09:15:02Z)
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery [61.02102713094486]
Good interpretation is important in scientific reasoning, as it allows for better decision-making. This paper introduces an automatic way of obtaining such interpretable-by-design models, by learning programs that interleave neural networks. We propose DiSciPLE an evolutionary algorithm that leverages common sense and prior knowledge of large language models (LLMs) to create Python programs explaining visual data.
arXiv Detail & Related papers (2025-02-14T10:26:14Z)
A Systematic Literature Review on the Use of Machine Learning in Software Engineering [0.0]
The study was carried out following the objective and the research questions to explore the current state of the art in applying machine learning techniques in software engineering processes. The review identifies the key areas within software engineering where ML has been applied, including software quality assurance, software maintenance, software comprehension, and software documentation.
arXiv Detail & Related papers (2024-06-19T23:04:27Z)
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks. SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z)
SciCat: A Curated Dataset of Scientific Software Repositories [4.77982299447395]
We introduce the SciCat dataset -- a comprehensive collection of Free-Libre Open Source Software (FLOSS) projects. Our approach involves selecting projects from a pool of 131 million deforked repositories from the World of Code data source. Our classification focuses on software designed for scientific purposes, research-related projects, and research support software.
arXiv Detail & Related papers (2023-12-11T13:46:33Z)
Using Machine Learning To Identify Software Weaknesses From Software Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications. Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv Detail & Related papers (2023-08-10T13:19:10Z)
Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases. Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding. This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z)
Evaluation of software impact designed for biomedical research: Are we measuring what's meaningful? [17.645303073710732]
Analysis of usage and impact metrics can help developers determine user and community engagement. There are challenges associated with these analyses including distorted or misleading metrics. Some tools may be especially beneficial to a small audience, yet may not have compelling typical usage metrics.
arXiv Detail & Related papers (2023-06-05T21:15:05Z)
The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature. We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction. The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z)
SoMeSci- A 5 Star Open Data Gold Standard Knowledge Graph of Software Mentions in Scientific Articles [1.335443972283229]
SoMeSci is a knowledge graph of software mentions in scientific articles. It contains high quality annotations (IRR: $kappa=.82$) of 3756 software mentions in 1367 PubMed Central articles.
arXiv Detail & Related papers (2021-08-20T08:53:03Z)
CitationIE: Leveraging the Citation Graph for Scientific Information Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers. We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z)
CORAL: COde RepresentAtion Learning with Weakly-Supervised Transformers for Analyzing Data Analysis [33.190021245507445]
Large scale analysis of source code, and in particular scientific source code, holds the promise of better understanding the data science process. We propose a novel weakly supervised transformer-based architecture for computing joint representations of code from both abstract syntax trees and surrounding natural language comments. We show that our model, leveraging only easily-available weak supervision, achieves a 38% increase in accuracy over expert-supplieds and outperforms a suite of baselines.
arXiv Detail & Related papers (2020-08-28T19:57:49Z)
Benchmarking Graph Neural Networks [75.42159546060509]
Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. For any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. GitHub repository has reached 1,800 stars and 339 forks, which demonstrates the utility of the proposed open-source framework.
arXiv Detail & Related papers (2020-03-02T15:58:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.