\textit{StateCensusLaws.org}: A Web Application for Consuming and
Annotating Legal Discourse Learning
- URL: http://arxiv.org/abs/2104.10263v1
- Date: Tue, 20 Apr 2021 22:00:54 GMT
- Title: \textit{StateCensusLaws.org}: A Web Application for Consuming and
Annotating Legal Discourse Learning
- Authors: Alexander Spangher and Jonathan May
- Abstract summary: We create a web application to highlight the output of NLP models trained to parse and label discourse segments in law text.
We focus on state-level law that uses U.S. Census population numbers to allocate resources and organize government.
- Score: 89.77347919191774
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we create a web application to highlight the output of NLP
models trained to parse and label discourse segments in law text. Our system is
built primarily with journalists and legal interpreters in mind, and we focus
on state-level law that uses U.S. Census population numbers to allocate
resources and organize government.
Our system exposes a corpus we collect of 6,000 state-level laws that pertain
to the U.S. census, using 25 scrapers we built to crawl state law websites,
which we release. We also build a novel, flexible annotation framework that can
handle span-tagging and relation tagging on an arbitrary input text document
and be embedded simply into any webpage. This framework allows journalists and
researchers to add to our annotation database by correcting and tagging new
data.
Related papers
- Scholarly Wikidata: Population and Exploration of Conference Data in Wikidata using LLMs [4.721309965816974]
We propose to make scholarly data more accessible sustainably by leveraging Wikidata's infrastructure.
Our study focuses on data from 105 Semantic Web-related conferences and extends/adds more than 6000 entities in Wikidata.
arXiv Detail & Related papers (2024-11-13T15:34:52Z) - T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text [59.57676466961787]
We propose a novel dynamic vector quantization (DVA-VAE) model that can adjust the encoding length based on the information density in sign language.
Experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method.
We propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.
arXiv Detail & Related papers (2024-06-11T10:06:53Z) - A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - A Simple Multi-Modality Transfer Learning Baseline for Sign Language
Translation [54.29679610921429]
Existing sign language datasets contain only about 10K-20K pairs of sign videos, gloss annotations and texts.
Data is thus a bottleneck for training effective sign language translation models.
This simple baseline surpasses the previous state-of-the-art results on two sign language translation benchmarks.
arXiv Detail & Related papers (2022-03-08T18:59:56Z) - Named Entity Recognition and Linking Augmented with Large-Scale
Structured Data [3.211619859724085]
We describe our submissions to the 2nd and 3rd SlavNER Shared Tasks held at BSNLP 2019 and BSNLP 2021.
The tasks focused on the analysis of Named Entities in multilingual Web documents in Slavic languages with rich inflection.
Our solution takes advantage of large collections of both unstructured and structured documents.
arXiv Detail & Related papers (2021-04-27T20:10:18Z) - Documenting the English Colossal Clean Crawled Corpus [28.008953329187648]
This work provides the first documentation for the Colossal Clean Crawled Corpus (C4; Raffel et al., 2020), a dataset created by applying a set of filters to a single snapshot of Common Crawl.
We begin with a high-level summary of the data, including distributions of where the text came from and when it was written.
We then give more detailed analysis on salient parts of this data, including the most frequent sources of text.
arXiv Detail & Related papers (2021-04-18T07:42:52Z) - Text-guided Legal Knowledge Graph Reasoning [11.089663225933412]
We propose a novel legal application of legal provision prediction (LPP), which aims to predict the related legal provisions of affairs.
We collect amounts of real-world legal provision data from the Guangdong government service website and construct a legal dataset called LegalLPP.
arXiv Detail & Related papers (2021-04-06T04:42:56Z) - Global Attention for Name Tagging [56.62059996864408]
We present a new framework to improve name tagging by utilizing local, document-level, and corpus-level contextual information.
We propose a model that learns to incorporate document-level and corpus-level contextual information alongside local contextual information via global attentions.
Experiments on benchmark datasets show the effectiveness of our approach.
arXiv Detail & Related papers (2020-10-19T07:27:15Z) - AMALGUM -- A Free, Balanced, Multilayer English Web Corpus [14.073494095236027]
We present a genre-balanced English web corpus totaling 4M tokens.
By tapping open online data sources the corpus is meant to offer a more sizable alternative to smaller manually created annotated data sets.
arXiv Detail & Related papers (2020-06-18T17:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.