Training AI on Canadian Content: The Legal Vacuum That’s Costing Us Everything
Prof. Barry Appleton, Clause & Effect Substack Blog | May 14, 2025
📌 TL;DR: Canada’s copyright laws are not equipped for the age of generative AI. While the U.S., EU, and others move toward enforceable licensing regimes, Canadian cultural and scientific content is being scraped without consent or compensation. This piece argues for urgent legislative and trade-based action to protect our digital sovereignty.
Training AI on Canadian Content: The Legal Vacuum That’s Costing Us Everything
Prof. Barry Appleton, Clause & Effect Substack Blog | May 14, 2025
THE GROUND HAS SHIFTED
🧱 1. The Ground Has Shifted
In May 2025, the U.S. Copyright Office responded to mounting legal pressure, spurred by the New York Times and OpenAI litigation, by affirming that the commercial use of copyrighted content for AI training likely exceeds fair use.[1]
"Using massive quantities of copyrighted works to train commercial AI models likely falls outside the boundaries of fair use." — U.S. Copyright Office
This follows Warhol v. Goldsmith (2023), [2] where the U.S. Supreme Court ruled that transformation is insufficient when commercial markets are harmed. That decision now casts a long shadow over AI training.
📜 2. Canada’s Copyright Law Is Stuck in the Analog Era
Canada’s fair dealing regime, set by CCH v. Law Society of Upper Canada,[3] permits only specific uses—like research, education, criticism, and news reporting. AI training by for-profit multinationals is not included.[4]
Yet Canadian content is being harvested to train foreign models—without reciprocity, attribution, or payment.
This is not innovation. It is extraction.
⚖️ 3. Fair Use, Fair Dealing—and Unfair Advantage
Canada's fair dealing framework, unlike the more commercially flexible fair use standard in the U.S., is limited to specific public interest purposes like research, education, and journalism. This means that while the U.S. report tightens the commercial use of copyrighted works for AI training, it still leaves broader pathways for data scraping and content use than Canadian law currently permits. This could allow U.S. firms to outmaneuver their Canadian counterparts, extracting value from our creators without reciprocal protections.
Canada’s fair dealing is narrower than the U.S. fair use doctrine but lacks enforcement against digital infringement. AI firms can scrape Canadian works because no law stops them.
Canadian content creators, from musicians to filmmakers, face a significant risk here.
Case Study: CBC & NFB
CBC’s digitized archive is a taxpayer-funded treasure trove now used to train foreign AI, displacing our domestic media.CBC, which has a massive, digitized news and broadcast archive, is a prime example of what is at stake. Without legal protection, this content—funded by Canadian taxpayers—is being used to train foreign AI systems, which in turn produce derivative outputs that displace the original value chain. Canadians are subsidizing the erosion of our own media ecosystem.
The National Film Board (NFB) faces existential risks without modern AI copyright safeguards. The NFB’s 13,000-title archive, including Oscar-winning documentaries and culturally defining works, extending to Indigenous and documentary works, faces a similar risk, with no licensing protection.[5] Yet under current law, no legal clarity or compensation framework governs the ingestion of these works by generative AI models. This exposes Canada’s public media heritage to uncompensated extraction, undermining cultural sovereignty and long-standing licensing ecosystems.
🧪 Research Exploited
Even Canada’s public research (e.g., Canadian Institutes of Health Research (CIHR), U15 universities) is vulnerable. Freely accessible does not mean that it should be license-free.[6]
Even Canada’s publicly funded scientific research, such as the digitized outputs of the Canadian Institutes of Health Research (CIHR) or research repositories from U15 universities, is increasingly vulnerable to uncompensated AI training. Foreign AI developers scrape them to improve medical LLMs, drug discovery agents, and decision-support tools—without attribution, consent, or economic return to the Canadian academic ecosystem.
Without enforceable copyright rules, Canada’s knowledge economy becomes a data donor with no dividend, feeding global innovation pipelines while receiving neither royalties nor strategic visibility.
Our narrower fair dealing doctrine does not necessarily stretch to cover AI training unless Parliament expressly permits it. As things stand, AI firms using Canadian works without permission are skating on thin ice—legally and ethically. The economic and epistemic implications of regulatory inaction are clear. AI scraping is not merely a copyright issue but a structural threat to Canadian research sovereignty and innovation competitiveness.
But here’s the catch: Canadian courts can’t stop the scraping without modernized rules. And foreign firms won’t stop until we make them.
🚫 4. Why Expanding “Research” to Cover AI Would Be a Legal Disaster
Some suggest that AI scraping counts as “research” under CCH. But that would:
Authorize industrial-scale ingestion without consent.
Ignore the commercial nature of AI outputs.
Undermine creators’ rights and licensing frameworks.
If AI training on CBC, Indigenous knowledge, or Canadian research qualifies as fair dealing, then copyright is effectively dead in Canada.
5. Lessons from the UK: Smart, Specific Reform
In 2014, the UK introduced a narrow text-and-data mining exception for non-commercial research—nothing more.[7] When the government tried expanding it in 2023, creators pushed back, and the plan was dropped.
Canada? Still stuck in ambiguity. No statutes. No guidance. No backbone.
🌏 6. What Other Countries Are Getting Right
South Korea requires transparency and provenance in AI datasets.[8]
Australia mandates explicit licensing for copyrighted content used in AI training.[9]
These are not barriers to innovation. They are strategic frameworks for sustainable digital economies.
Canada is falling behind not because we regulate, but because we refuse to.
📉 7. This Is a Trade Issue, Not Just Copyright
Canada enters 2025 USMCA revision negotiations with no AI content strategy.
The risk:
No domestic licensing rules
No leverage in trade talks
Meanwhile, the U.S. is locking down content licensing. If we do not act, Canadian firms will compete on asymmetrical terms.
⚠️ 8. Digital Colonialism in Real Time
Canada’s content has global cultural value. But in a world where training data is the fuel of AI, it is also economic ammunition. The U.S. Copyright Office just locked its legal ammo in a licensing vault. Meanwhile, Canada leaves its content lying unguarded on the open web.
Canadian creators are subsidizing American AI through legal silence, not strategy.
Our archives, our stories, our science: scraped and monetized elsewhere, while our creators receive nothing.
We must act before courts fill the policy void—or before it is too late to negotiate fair rules.
🤝 Balancing Innovation with Integrity
Yes, some worry that licensing requirements will chill innovation. But we must distinguish:
Public-interest research vs.
Commercial exploitation
Countries like South Korea and Australia prove you can support AI without gutting creator rights.
📣 What Needs to Happen Next
1. Amend the Copyright Act
Require licensing for commercial AI training.
Clarify that fair dealing excludes AI ingestion unless explicitly permitted.
· Mandate transparency from AI firms on dataset composition.
2. Launch a Collective Licensing Agency for AI
Modeled on SOCAN and Access Copyright.
It will be self-funded via royalties, with opt-in rights for creators. Companies already pay for curated datasets; this is a shift in market norms, not a burden. AI developers must pay to play—via collective licensing schemes or compulsory licenses that respect Canadian rights and reflect Canadian values.
· Within the Agency, introduce a Text and Data Mining Exception with Limits. Following the UK model, non-commercial data mining should be allowed for legitimate academic or public interest research, but licensing should be required for all commercial uses.
3. Create a Transparency Register
Mandatory disclosure of what datasets were used in AI training.
4. Leverage Trade Negotiations
Demand parity under USMCA. Use digital IP as a bargaining chip.
5. Fund a Canadian AI Training Corpus
Build a national corpus of licensed, rights-respecting Canadian works.
🌍 International Alignment—Or Leadership?
The U.S. Copyright Office has acted.
The EU AI Act mandates transparency.
Australia, South Korea, and Japan are closing the “scraping loophole.”
📢 Canada should not lag. Canada should lead.
A Canadian AI Future Built on Rights and Results
This is not just a legal debate. It concerns who controls the future of Canadian culture, science, and innovation.
Canada must:
Legislate, not litigate.
Protect, not plead.
Lead, not lag.
AI without licensing is not fair dealing. It is digital dispossession.
🗣️ Final Word: We Either Set the Rules—Or Get Scraped
Canada’s lawmakers face a choice. They can keep hoping fair dealing will stretch to cover AI. Or they can write new laws that reflect the realities of a machine-trained future.
The U.S. has played its hand. The EU is codifying control. If Canada wants to shape AI—rather than be shaped by it—we must legislate now.
📬 Enjoyed this analysis? Want more insight like this?
Prof. Barry Appleton, New York Law School, Center for International Law, Fellow, Balsillie School of International Affairs, Managing Partner, Appleton & Associates International Lawyers LP, Toronto. May 2025.
[1] U.S. Copyright Office, Copyright and Artificial Intelligence: Policy Recommendations and Implications for Creative Industries, May 2025. Available at: https://www.copyright.gov/ai/ This was the third and final report issued by the Copyright Office on AI issues.
[2] Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, 143 S. Ct. 1258 (2023). The Court concluded that the 'purpose and character' of the use cannot be determinative of fair use, particularly when commercial markets are affected.
[3] CCH Canadian Ltd. v. Law Society of Upper Canada, [2004] 1 S.C.R. 339. The Supreme Court of Canada established that fair dealing is a 'user's right' in copyright law, to be interpreted broadly, but within the specific enumerated purposes. The Supreme Court held that the Court held that “research” should be interpreted broadly and “includes many activities that do not lead to the creation of new work or the generation of new ideas. (See paras. 48–53).
[4] In 2024, the Toronto Star Newspapers Limited, Metroland Media Group Ltd., Postmedia Network Inc., PNI Maritimes LP, The Globe and Mail Inc., The Canadian Press Enterprises Inc., and the Canadian Broadcasting Corporation sued OpenAI, Inc. and its affiliated companies. OpenAI has been facing similar claims south of the border. Most notably, in December of 2023, the New York Times commenced a lawsuit against OpenAI and Microsoft. The New York Times Company v. Microsoft Corporation, 1:23-cv-11195 (S.D.N.Y. Dec 27, 2023).
[5] National Film Board of Canada, Our Collection (accessed 14 May 2025), online: https://www.nfb.ca/collection/. The NFB holds over 13,000 titles, including documentaries, animations, and Indigenous stories, many of which are publicly funded and digitized, making them especially vulnerable to uncompensated AI training.
[6] While much attention has been paid to artistic and cultural content, Canada’s scientific output is also vulnerable. Publicly funded research articles and datasets—particularly those made available through open-access mandates—are frequently used by foreign AI firms to train biomedical and technical models. These uses often violate licensing terms or operate in legal grey zones, raising serious concerns about sovereignty over Canadian intellectual infrastructure. See also CIHR, “Open Access Policy” (2022), online: https://cihr-irsc.gc.ca.
[7] Section 29A of the UK Copyright, Designs and Patents Act 1988, added by the Copyright and Rights in Performances (Research, Education, Libraries and Archives) Regulations 2014, and the abandoned 2022 proposal to expand this exception.
[8] South Korea's Framework Act on Intelligent Information Society, as amended in 2023. Article 27-2 mandates transparency of training data, Article 27-3 establishes a dataset registry, and Article 3(3) asserts national ownership over culturally significant datasets.
[9] Australia's Copyright Amendment (Artificial Intelligence Protections) Act, 2024, introduced consent requirements for using copyrighted works in AI training, mandated dataset disclosures in public procurement, and added statutory penalties under s.116AA.