IGGA: A Dataset of Industrial Guidelines and Policy Statements for Generative AIs
- URL: http://arxiv.org/abs/2501.00959v2
- Date: Fri, 03 Jan 2025 19:17:56 GMT
- Title: IGGA: A Dataset of Industrial Guidelines and Policy Statements for Generative AIs
- Authors: Junfeng Jiao, Saleh Afroogh, Kevin Chen, David Atkinson, Amit Dhurandhar,
- Abstract summary: This paper introduces IGGA, a dataset of 160 industry guidelines and policy statements for the use of Generative AIs (GAIs) and Large Language Models (LLMs) in industry and workplace settings.
The dataset contains 104,565 words and serves as a valuable resource for natural language processing tasks commonly applied in requirements engineering.
- Score: 8.420666056013685
- License:
- Abstract: This paper introduces IGGA, a dataset of 160 industry guidelines and policy statements for the use of Generative AIs (GAIs) and Large Language Models (LLMs) in industry and workplace settings, collected from official company websites, and trustworthy news sources. The dataset contains 104,565 words and serves as a valuable resource for natural language processing tasks commonly applied in requirements engineering, such as model synthesis, abstraction identification, and document structure assessment. Additionally, IGGA can be further annotated to function as a benchmark for various tasks, including ambiguity detection, requirements categorization, and the identification of equivalent requirements. Our methodologically rigorous approach ensured a thorough examination, with a selection of reputable and influential companies that represent a diverse range of global institutions across six continents. The dataset captures perspectives from fourteen industry sectors, including technology, finance, and both public and private institutions, offering a broad spectrum of insights into the integration of GAIs and LLMs in industry.
Related papers
- MME-Industry: A Cross-Industry Multimodal Evaluation Benchmark [20.642661835794975]
We introduce MME-Industry, a novel benchmark designed specifically for evaluating MLLMs in industrial settings.
The benchmark encompasses 21 distinct domain, comprising 1050 question-answer pairs with 50 questions per domain.
We provide both Chinese and English versions of the benchmark, enabling comparative analysis of MLLMs' capabilities across these languages.
arXiv Detail & Related papers (2025-01-28T03:56:17Z) - AGGA: A Dataset of Academic Guidelines for Generative AI and Large Language Models [8.420666056013685]
This study introduces AGGA, a dataset comprising 80 academic guidelines for the use of Generative AIs (GAIs) and Large Language Models (LLMs) in academic settings.
The dataset contains 188,674 words and serves as a valuable resource for natural language processing tasks commonly applied in requirements engineering.
arXiv Detail & Related papers (2025-01-03T19:16:36Z) - Generative AI and LLMs in Industry: A text-mining Analysis and Critical Evaluation of Guidelines and Policy Statements Across Fourteen Industrial Sectors [8.420666056013685]
The rise of Generative AI (GAI) and Large Language Models (LLMs) has transformed industrial landscapes.
This study conducts a text-based analysis of 160 guidelines and policy statements across fourteen industrial sectors.
arXiv Detail & Related papers (2025-01-01T21:23:22Z) - Bridging the Data Provenance Gap Across Text, Speech and Video [67.72097952282262]
We conduct the largest and first-of-its-kind longitudinal audit across modalities of popular text, speech, and video datasets.
Our manual analysis covers nearly 4000 public datasets between 1990-2024, spanning 608 languages, 798 sources, 659 organizations, and 67 countries.
We find that multimodal machine learning applications have overwhelmingly turned to web-crawled, synthetic, and social media platforms, such as YouTube, for their training sets.
arXiv Detail & Related papers (2024-12-19T01:30:19Z) - Enterprise Benchmarks for Large Language Model Evaluation [10.233863135015797]
This work presents a systematic exploration of benchmarking strategies tailored to large language models (LLMs) evaluation.
The proposed evaluation framework encompasses 25 publicly available datasets from diverse enterprise domains like financial services, legal, cyber security, and climate and sustainability.
The diverse performance of 13 models across different enterprise tasks highlights the importance of selecting the right model based on the specific requirements of each task.
arXiv Detail & Related papers (2024-10-11T18:19:05Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models [88.16197692794707]
UniGen is a comprehensive framework designed to produce diverse, accurate, and highly controllable datasets.
To augment data diversity, UniGen incorporates an attribute-guided generation module and a group checking feature.
Extensive experiments demonstrate the superior quality of data generated by UniGen.
arXiv Detail & Related papers (2024-06-27T07:56:44Z) - LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application [54.984348122105516]
Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework synergizes open-world knowledge with collaborative knowledge.
We propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge.
arXiv Detail & Related papers (2024-05-07T04:00:30Z) - Advanced Unstructured Data Processing for ESG Reports: A Methodology for
Structured Transformation and Enhanced Analysis [20.038120319271773]
This study introduces an innovative methodology to transform ESG reports into structured, analyzable formats.
Our approach offers high-precision text cleaning, adept identification and extraction of text from images, and standardization of tables within these reports.
This research marks a substantial contribution to the fields of industrial ecology and corporate sustainability assessment.
arXiv Detail & Related papers (2024-01-04T06:26:59Z) - Universal Segmentation at Arbitrary Granularity with Language Instruction [56.39902660380342]
We present UniLSeg, a universal segmentation model that can perform segmentation at any semantic level with the guidance of language instructions.
For training UniLSeg, we reorganize a group of tasks from original diverse distributions into a unified data format, where images with texts describing segmentation targets as input and corresponding masks are output.
arXiv Detail & Related papers (2023-12-04T04:47:48Z) - Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey [100.24095818099522]
Large language models (LLMs) have significantly advanced the field of natural language processing (NLP)
They provide a highly useful, task-agnostic foundation for a wide range of applications.
However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles.
arXiv Detail & Related papers (2023-05-30T03:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.