Skill Guide

International IP law for AI-generated works and training data

The body of legal frameworks, treaties, and case law governing the ownership, protection, and licensing of intellectual property (copyright, patent, trade secret) in the outputs of artificial intelligence systems and the datasets used to train them, across multiple national jurisdictions.

It mitigates catastrophic legal and financial risk for technology and content companies by defining the boundaries of fair use, infringement, and ownership in a legally unsettled domain. Mastery enables the monetization of AI assets and the defense of core business models against litigation.

1 Careers

1 Categories

9.2 Avg Demand

20% Avg AI Risk

How to Learn International IP law for AI-generated works and training data

Focus 1: Core Copyright Fundamentals - Understand originality, fair use/dealing doctrines (US 107, EU InfoSoc). Focus 2: The Training Data Problem - Learn derivative works, transformative use, and database rights (EU Sui Generis). Focus 3: Jurisdictional Splits - Contrast the US Copyright Office's human authorship requirement with approaches in the UK (computer-generated works) and China.

Move from theory to practice by analyzing real-world litigation (e.g., Getty Images v. Stability AI, NYT v. OpenAI). Common mistake: Assuming 'publicly available' means 'freely usable for AI training'. Intermediate method: Develop a risk matrix for dataset provenance, mapping source types (e.g., licensed stock, scraped web data, synthetic data) to legal exposure in target markets.

Mastery involves architecting global compliance frameworks for AI development pipelines and advising on IP portfolio strategy for AI-generated outputs. This includes designing contractual safeguards (e.g., model output IP assignment in SaaS terms), navigating emerging 'opt-out' regimes for text and data mining (TDM), and leading internal policy development for responsible data sourcing.

Practice Projects

Beginner

Case Study/Exercise

Mapping a Dataset's Legal Exposure

Scenario

Your startup has scraped 1 million images from various websites to train a new image generator. Assess the initial IP risk profile.

How to Execute

1. Categorize the scraped sources (e.g., social media, news sites, artist portfolios, stock photo previews). 2. For each category, identify the applicable license terms (e.g., Terms of Service, robots.txt) and key legal concepts (TDM exceptions, fair use). 3. Draft a 1-page preliminary risk assessment highlighting the top 3 sources of legal vulnerability and a recommendation for next steps (e.g., cease scraping, seek licenses, implement filtering).

Intermediate

Case Study/Exercise

Drafting an AI Data License Agreement

Scenario

You are licensing a proprietary medical image dataset to a pharmaceutical company for AI model training. The company wants broad rights.

How to Execute

1. Define the licensed rights precisely: scope of TDM, purpose limitation (research vs. commercial), and jurisdiction. 2. Address key warranty and indemnity clauses regarding data provenance and third-party IP claims. 3. Include provisions for model output IP ownership and attribution. 4. Simulate negotiation with the counterparty's legal team, defending your risk-averse clauses.

Advanced

Case Study/Exercise

Global Compliance Architecture for a Generative AI Product Launch

Scenario

As Chief IP Counsel, you must prepare for the global launch of a text-to-video AI service trained on licensed and open web data.

How to Execute

1. Conduct a multi-jurisdictional legal analysis of training data rights (US, EU, China, Japan). 2. Develop a tiered data sourcing policy: 'Approved' (licensed, public domain), 'Restricted' (requiring C&R or opt-out verification), 'Prohibited' (personal data, known copyrighted works). 3. Design the product's Terms of Service to allocate IP risk for user-generated prompts and AI outputs. 4. Create an internal 'IP Red Team' protocol to audit model outputs for potential regurgitation of training data.

Tools & Frameworks

Legal & Compliance Platforms

Thomson Reuters Practical LawLexisNexis IPlyticsWorld Intellectual Property Organization (WIPO) Lex Database

Use these for primary source research (case law, statutes), patent landscape analysis related to AI methods, and tracking international treaty developments. Essential for due diligence and building defensible legal arguments.

Mental Models & Methodologies

The Four-Factor Fair Use TestThe TDM Exception Analysis Framework (EU)The 'Human Authorship' Threshold Analysis

Apply these analytical frameworks to specific facts. The Four-Factor Test is for US copyright defense. The TDM framework assesses compliance with EU Directive 2019/790. The 'Human Authorship' framework evaluates the registrability of AI outputs in key jurisdictions.

Interview Questions

Answer Strategy

Structure the answer around the three core IP risks: (1) Copyright in individual images, (2) Database rights (in the EU), (3) Personality/publicity rights in identifiable persons. A strong answer will reference the 'opt-out' mechanism under Article 4 of the EU DSM Directive, the pending Getty v. Stability AI case, and propose a technical mitigation like filtering for known copyrighted works and implementing a robust takedown process. Sample: 'I'd assess risk across three vectors: copyright, database rights, and personal rights. While LAION claims compliance with EU TDM exceptions, the Getty lawsuit highlights that this is contested. My plan would be to implement a multi-layered approach: first, use automated filters to exclude likely copyrighted content; second, establish a clear and responsive takedown procedure; third, consider supplementing with fully licensed datasets for commercial robustness.'

Answer Strategy

This tests business partnering and influence skills. The core competency is translating legal risk into business terms and offering solutions, not just saying 'no'. Sample: 'The team wanted to train on user-uploaded content without explicit license grants. I framed it not as a legal barrier, but as a risk to their roadmap and brand reputation, citing specific litigation examples. I then proposed a solution: modifying the user agreement to include a clear, opt-in license for AI training, coupled with a transparency dashboard. This achieved the core business goal while de-risking the IP. The key was positioning legal counsel as a strategic partner, not a gatekeeper.'