IGGA: A Dataset of Industrial Guidelines and Policy Statements for Generative AIs
- URL: http://arxiv.org/abs/2501.00959v1
- Date: Wed, 01 Jan 2025 21:31:47 GMT
- Title: IGGA: A Dataset of Industrial Guidelines and Policy Statements for Generative AIs
- Authors: Junfeng Jiao, Saleh Afroogh, Kevin Chen, David Atkinson, Amit Dhurandhar,
- Abstract summary: This paper introduces IGGA, a dataset of 160 industry guidelines and policy statements for the use of Generative AIs (GAIs) and Large Language Models (LLMs) in industry and workplace settings.<n>The dataset contains 104,565 words and serves as a valuable resource for natural language processing tasks commonly applied in requirements engineering.
- Score: 8.420666056013685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces IGGA, a dataset of 160 industry guidelines and policy statements for the use of Generative AIs (GAIs) and Large Language Models (LLMs) in industry and workplace settings, collected from official company websites, and trustworthy news sources. The dataset contains 104,565 words and serves as a valuable resource for natural language processing tasks commonly applied in requirements engineering, such as model synthesis, abstraction identification, and document structure assessment. Additionally, IGGA can be further annotated to function as a benchmark for various tasks, including ambiguity detection, requirements categorization, and the identification of equivalent requirements. Our methodologically rigorous approach ensured a thorough examination, with a selection of reputable and influential companies that represent a diverse range of global institutions across six continents. The dataset captures perspectives from fourteen industry sectors, including technology, finance, and both public and private institutions, offering a broad spectrum of insights into the integration of GAIs and LLMs in industry.
Related papers
- Enhancing Large Language Models (LLMs) for Telecommunications using Knowledge Graphs and Retrieval-Augmented Generation [52.8352968531863]
Large language models (LLMs) have made significant progress in general-purpose natural language processing tasks.
This paper presents a novel framework that combines knowledge graph (KG) and retrieval-augmented generation (RAG) techniques to enhance LLM performance in the telecom domain.
arXiv Detail & Related papers (2025-03-31T15:58:08Z) - OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models [58.45517851437422]
Visually-situated text parsing (VsTP) has recently seen notable advancements, driven by the growing demand for automated document understanding.
Existing solutions often rely on task-specific architectures and objectives for individual tasks.
In this paper, we introduce Omni V2, a universal model that unifies VsTP typical tasks, including text spotting, key information extraction, table recognition, and layout analysis.
arXiv Detail & Related papers (2025-02-22T09:32:01Z) - MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark [20.642661835794975]
We introduce MME-Industry, a novel benchmark designed specifically for evaluating MLLMs in industrial settings.
The benchmark encompasses 21 distinct domain, comprising 1050 question-answer pairs with 50 questions per domain.
We provide both Chinese and English versions of the benchmark, enabling comparative analysis of MLLMs' capabilities across these languages.
arXiv Detail & Related papers (2025-01-28T03:56:17Z) - AGGA: A Dataset of Academic Guidelines for Generative AI and Large Language Models [8.420666056013685]
This study introduces AGGA, a dataset comprising 80 academic guidelines for the use of Generative AIs (GAIs) and Large Language Models (LLMs) in academic settings.
The dataset contains 188,674 words and serves as a valuable resource for natural language processing tasks commonly applied in requirements engineering.
arXiv Detail & Related papers (2025-01-03T19:16:36Z) - Generative AI and LLMs in Industry: A text-mining Analysis and Critical Evaluation of Guidelines and Policy Statements Across Fourteen Industrial Sectors [8.420666056013685]
The rise of Generative AI (GAI) and Large Language Models (LLMs) has transformed industrial landscapes.<n>This study conducts a text-based analysis of 160 guidelines and policy statements across fourteen industrial sectors.
arXiv Detail & Related papers (2025-01-01T21:23:22Z) - Bridging the Data Provenance Gap Across Text, Speech and Video [67.72097952282262]
We conduct the largest and first-of-its-kind longitudinal audit across modalities of popular text, speech, and video datasets.
Our manual analysis covers nearly 4000 public datasets between 1990-2024, spanning 608 languages, 798 sources, 659 organizations, and 67 countries.
We find that multimodal machine learning applications have overwhelmingly turned to web-crawled, synthetic, and social media platforms, such as YouTube, for their training sets.
arXiv Detail & Related papers (2024-12-19T01:30:19Z) - Enterprise Benchmarks for Large Language Model Evaluation [10.233863135015797]
This work presents a systematic exploration of benchmarking strategies tailored to large language models (LLMs) evaluation.
The proposed evaluation framework encompasses 25 publicly available datasets from diverse enterprise domains like financial services, legal, cyber security, and climate and sustainability.
The diverse performance of 13 models across different enterprise tasks highlights the importance of selecting the right model based on the specific requirements of each task.
arXiv Detail & Related papers (2024-10-11T18:19:05Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - Advanced Unstructured Data Processing for ESG Reports: A Methodology for
Structured Transformation and Enhanced Analysis [20.038120319271773]
This study introduces an innovative methodology to transform ESG reports into structured, analyzable formats.
Our approach offers high-precision text cleaning, adept identification and extraction of text from images, and standardization of tables within these reports.
This research marks a substantial contribution to the fields of industrial ecology and corporate sustainability assessment.
arXiv Detail & Related papers (2024-01-04T06:26:59Z) - Universal Segmentation at Arbitrary Granularity with Language Instruction [56.39902660380342]
We present UniLSeg, a universal segmentation model that can perform segmentation at any semantic level with the guidance of language instructions.
For training UniLSeg, we reorganize a group of tasks from original diverse distributions into a unified data format, where images with texts describing segmentation targets as input and corresponding masks are output.
arXiv Detail & Related papers (2023-12-04T04:47:48Z) - Glitter or Gold? Deriving Structured Insights from Sustainability
Reports via Large Language Models [16.231171704561714]
This study uses Information Extraction (IE) methods to extract structured insights related to ESG aspects from companies' sustainability reports.
We then leverage graph-based representations to conduct statistical analyses concerning the extracted insights.
arXiv Detail & Related papers (2023-10-09T11:34:41Z) - Development of the ChatGPT, Generative Artificial Intelligence and
Natural Large Language Models for Accountable Reporting and Use (CANGARU)
Guidelines [0.33249867230903685]
CANGARU aims to foster a cross-disciplinary global consensus on the ethical use, disclosure, and proper reporting of GAI/GPT/LLM technologies in academia.
The present protocol consists of an ongoing systematic review of GAI/GPT/LLM applications to understand the linked ideas, findings, and reporting standards in scholarly research, and to formulate guidelines for its use and disclosure.
arXiv Detail & Related papers (2023-07-18T05:12:52Z) - Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey [100.24095818099522]
Large language models (LLMs) have significantly advanced the field of natural language processing (NLP)
They provide a highly useful, task-agnostic foundation for a wide range of applications.
However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles.
arXiv Detail & Related papers (2023-05-30T03:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.