Related papers: The Liabilities of Robots.txt

The Liabilities of Robots.txt

URL: http://arxiv.org/abs/2503.06035v1
Date: Sat, 08 Mar 2025 03:16:17 GMT
Title: The Liabilities of Robots.txt
Authors: Chien-yi Chang, Xin He,
Abstract summary: The robots.txt file, introduced as part of the Robots Exclusion Protocol in 1994, provides webmasters with a mechanism to communicate access permissions to automated bots.<n>While broadly adopted as a community standard, the legal liabilities associated with violating robots.txt remain ambiguous.<n>This paper clarifies the liabilities associated with robots.txt within the contexts of contract, copyright, and tort law.
Score: 19.970962071144722
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The robots.txt file, introduced as part of the Robots Exclusion Protocol in 1994, provides webmasters with a mechanism to communicate access permissions to automated bots. While broadly adopted as a community standard, the legal liabilities associated with violating robots.txt remain ambiguous. The rapid rise of large language models, which depend on extensive datasets for training, has amplified these challenges, prompting webmasters to increasingly use robots.txt to restrict the activities of bots engaged in large-scale data collection. This paper clarifies the liabilities associated with robots.txt within the contexts of contract, copyright, and tort law. Drawing on key cases, legal principles, and scholarly discourse, it proposes a legal framework for web scraping disputes. It also addresses the growing fragmentation of the internet, as restrictive practices by webmasters threaten the principles of openness and collaboration. Through balancing innovation with accountability, this paper offers insights to ensure that robots.txt remains an equitable protocol for the internet and thus contributes to digital governance in the age of AI.

Related papers

A roadmap for AI in robotics [55.87087746398059]
We are witnessing growing excitement in robotics at the prospect of leveraging the potential of AI to tackle some of the outstanding barriers to the full deployment of robots in our daily lives.<n>This article offers an assessment of what AI for robotics has achieved since the 1990s and proposes a short- and medium-term research roadmap listing challenges and promises.
arXiv Detail & Related papers (2025-07-26T15:18:28Z)
Scrapers selectively respect robots.txt directives: evidence from a large-scale empirical study [4.68008217188575]
We conduct the first large-scale study of web scraper compliance with robots.txt directives using anonymized web logs from our institution.<n>We find that bots are less likely to comply with stricter robots.txt directives, and that certain categories of bots, including AI search crawlers, rarely check robots.txt at all.<n>These findings suggest that relying on robots.txt to prevent unwanted scraping is risky and highlight the need for alternative approaches.
arXiv Detail & Related papers (2025-05-27T20:22:45Z)
ai.txt: A Domain-Specific Language for Guiding AI Interactions with the Internet [44.29685364907017]
We introduce ai.txt, a domain-specific language designed to regulate interactions between AI models, agents, and web content.<n>Our approach aims to aid the governance of AI-Internet interactions, promoting responsible AI use in digital ecosystems.
arXiv Detail & Related papers (2025-05-02T00:33:00Z)
Generating Robot Constitutions & Benchmarks for Semantic Safety [22.889717765617394]
We release the ASIMOV Benchmark for evaluating semantic safety of robot brains. We develop a framework to automatically generate robot constitutions from real-world data. We propose a novel auto-amending process that is able to introduce nuances in written rules of behavior.
arXiv Detail & Related papers (2025-03-11T17:50:47Z)
$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
Consent in Crisis: The Rapid Decline of the AI Data Commons [74.68176012363253]
General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data. We conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora.
arXiv Detail & Related papers (2024-07-20T16:50:18Z)
Robotic Control via Embodied Chain-of-Thought Reasoning [86.6680905262442]
Key limitation of learned robot control policies is their inability to generalize outside their training data. Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models can substantially improve their robustness and generalization ability. We introduce Embodied Chain-of-Thought Reasoning (ECoT) for VLAs, in which we train VLAs to perform multiple steps of reasoning about plans, sub-tasks, motions, and visually grounded features before predicting the robot action.
arXiv Detail & Related papers (2024-07-11T17:31:01Z)
LLM Granularity for On-the-Fly Robot Control [3.5015824313818578]
In circumstances where visuals become unreliable or unavailable, can we rely solely on language to control robots? This work takes the initial steps to answer this question by: 1) evaluating the responses of assistive robots to language prompts of varying granularities; and 2) exploring the necessity and feasibility of controlling the robot on-the-fly.
arXiv Detail & Related papers (2024-06-20T18:17:48Z)
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation. We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z)
Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations [66.47064743686953]
Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation. Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation. In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies.
arXiv Detail & Related papers (2023-07-12T07:04:53Z)
REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy Transfer [57.045140028275036]
We consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology. Existing approaches that train a new policy by matching the action or state transition distribution, including imitation learning methods, fail due to optimal action and/or state distribution being mismatched in different robots. We propose a novel method named $REvolveR$ of using continuous evolutionary models for robotic policy transfer implemented in a physics simulator.
arXiv Detail & Related papers (2022-02-10T18:50:25Z)
Federated Continual Learning for Socially Aware Robotics [4.224305864052757]
Social robots do not adapt their behavior to new users, and they do not provide sufficient privacy protections. We propose a decentralized learning alternative that improves the privacy and personalization of social robots. We show that decentralized learning is a viable alternative to centralized learning in a proof-of-concept Socially-Aware Navigation domain.
arXiv Detail & Related papers (2022-01-14T15:54:51Z)
A New Paradigm of Threats in Robotics Behaviors [4.873362301533825]
We identify a new paradigm of security threats in the next generation of robots. These threats fall beyond the known hardware or network-based ones. We provide a taxonomy of attacks that exploit these vulnerabilities with realistic examples.
arXiv Detail & Related papers (2021-03-24T15:33:49Z)
Fault-Aware Robust Control via Adversarial Reinforcement Learning [35.16413579212691]
We propose an adversarial reinforcement learning framework, which significantly increases robot fragility over joint damage cases. We validate our algorithm on a three-fingered robot hand and a quadruped robot. Our algorithm can be trained only in simulation and directly deployed on a real robot without any fine-tuning.
arXiv Detail & Related papers (2020-11-17T16:01:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.