Related papers: \textit{StateCensusLaws.org}: A Web Application for Consuming and Annotating Legal Discourse Learning

\textit{StateCensusLaws.org}: A Web Application for Consuming and Annotating Legal Discourse Learning

URL: http://arxiv.org/abs/2104.10263v1
Date: Tue, 20 Apr 2021 22:00:54 GMT
Title: \textit{StateCensusLaws.org}: A Web Application for Consuming and Annotating Legal Discourse Learning
Authors: Alexander Spangher and Jonathan May
Abstract summary: We create a web application to highlight the output of NLP models trained to parse and label discourse segments in law text. We focus on state-level law that uses U.S. Census population numbers to allocate resources and organize government.
Score: 89.77347919191774
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we create a web application to highlight the output of NLP models trained to parse and label discourse segments in law text. Our system is built primarily with journalists and legal interpreters in mind, and we focus on state-level law that uses U.S. Census population numbers to allocate resources and organize government. Our system exposes a corpus we collect of 6,000 state-level laws that pertain to the U.S. census, using 25 scrapers we built to crawl state law websites, which we release. We also build a novel, flexible annotation framework that can handle span-tagging and relation tagging on an arbitrary input text document and be embedded simply into any webpage. This framework allows journalists and researchers to add to our annotation database by correcting and tagging new data.

Related papers

Scholarly Wikidata: Population and Exploration of Conference Data in Wikidata using LLMs [4.721309965816974]
We propose to make scholarly data more accessible sustainably by leveraging Wikidata's infrastructure. Our study focuses on data from 105 Semantic Web-related conferences and extends/adds more than 6000 entities in Wikidata.
arXiv Detail & Related papers (2024-11-13T15:34:52Z)
T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text [59.57676466961787]
We propose a novel dynamic vector quantization (DVA-VAE) model that can adjust the encoding length based on the information density in sign language. Experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method. We propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.
arXiv Detail & Related papers (2024-06-11T10:06:53Z)
A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding. There is no publicly available NLI corpus for the Romanian language. We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z)
Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases. Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding. This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z)
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation [54.29679610921429]
Existing sign language datasets contain only about 10K-20K pairs of sign videos, gloss annotations and texts. Data is thus a bottleneck for training effective sign language translation models. This simple baseline surpasses the previous state-of-the-art results on two sign language translation benchmarks.
arXiv Detail & Related papers (2022-03-08T18:59:56Z)
Named Entity Recognition and Linking Augmented with Large-Scale Structured Data [3.211619859724085]
We describe our submissions to the 2nd and 3rd SlavNER Shared Tasks held at BSNLP 2019 and BSNLP 2021. The tasks focused on the analysis of Named Entities in multilingual Web documents in Slavic languages with rich inflection. Our solution takes advantage of large collections of both unstructured and structured documents.
arXiv Detail & Related papers (2021-04-27T20:10:18Z)
Documenting the English Colossal Clean Crawled Corpus [28.008953329187648]
This work provides the first documentation for the Colossal Clean Crawled Corpus (C4; Raffel et al., 2020), a dataset created by applying a set of filters to a single snapshot of Common Crawl. We begin with a high-level summary of the data, including distributions of where the text came from and when it was written. We then give more detailed analysis on salient parts of this data, including the most frequent sources of text.
arXiv Detail & Related papers (2021-04-18T07:42:52Z)
Text-guided Legal Knowledge Graph Reasoning [11.089663225933412]
We propose a novel legal application of legal provision prediction (LPP), which aims to predict the related legal provisions of affairs. We collect amounts of real-world legal provision data from the Guangdong government service website and construct a legal dataset called LegalLPP.
arXiv Detail & Related papers (2021-04-06T04:42:56Z)
Global Attention for Name Tagging [56.62059996864408]
We present a new framework to improve name tagging by utilizing local, document-level, and corpus-level contextual information. We propose a model that learns to incorporate document-level and corpus-level contextual information alongside local contextual information via global attentions. Experiments on benchmark datasets show the effectiveness of our approach.
arXiv Detail & Related papers (2020-10-19T07:27:15Z)
AMALGUM -- A Free, Balanced, Multilayer English Web Corpus [14.073494095236027]
We present a genre-balanced English web corpus totaling 4M tokens. By tapping open online data sources the corpus is meant to offer a more sizable alternative to smaller manually created annotated data sets.
arXiv Detail & Related papers (2020-06-18T17:05:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.