AI, COPYRIGHT AND US LEGAL SYSTEM
-Rishiraj Chandan
LLM’s AND GENERATIVE AI
Hundreds
of millions of individuals use ChatGPT and other generative systems,
demonstrating the widespread passion around generative AI. Businesses are
attempting to determine how to use generative AI, and some claim that GPT-4 or
ChatGPT show signs of exhibiting artificial general intelligence. But not
everyone else is as enthusiastic or upbeat about generative AI. One case
challenges the legality of Codexis[1],
which is allegedly illegal, and Copilot, a tool that recommends code in
response to programmer requests. The complaint is filed against GitHub[2],
Microsoft[3],
and OpenAI.
Stability
AI is facing two lawsuits in the US that contest the legitimacy of stable
diffusion[4]. The plaintiff in the Copilot
lawsuit is seeking injunctions to shut down these programs in addition to $9
billion. The two main points of contention are that consuming copyrighted
content via the internet or other sources infringes on the copyright of the
original works, and the results—whether they be source code, text, photos, or
music—also violate derivative works. The U.S. Copyright Office is seeking
public opinion on this matter. Some respond, "Oh, it's fair use and no
problem," while others respond, "Oh, massive piracy and we got to
shut this stuff down." Unfortunately, more people share this response to
generative AI than you may think.
COPYRIGHT IN A NUTSHELL
“The
Future of Life” Open Letter has called for a six-month halt on AI research,
reflecting a larger moral panic around the technology. It discusses the grave
dangers facing mankind and civilization and urges preparation and the
establishment of rules. Law and politics are heavily stressed in this field.
One statement on generative AI referred to it as a "Marxist
nightmare" as it benefits capitalist owners who get no compensation for
the millions of labour hours produced.
Meanwhile,
conferences concerning the best course of action for generative AI and AI
systems in general, as well as the nature of appropriate regulations, are
taking place around Europe. The three main problems that are addressed are:
a. does
it violate copyright to use works as training data for generative AI systems?
b. when
do AI-generated outputs violate derivative works?, and
c. who
is the rightful owner of the copyright in computer program outputs that include
copyright material?
The
dispute, which date back to the mid-1960s, centres on the question of copyright
as it relates to artificial intelligence (AI). Works of authorship are
protected by copyright law for the duration of the author's life plus an
additional 70 years, or 95 years in the case of corporate writers, starting from
the time they are first fixed in a tangible medium. The only authority to
manage copies, distributes, creation of derivative works, public performances,
and exhibitions is granted to authors.
Authors'
ideas, facts, and techniques are not protected by copyright; only the creative
expression in their works of authorship is. The ingesting problem is related to
the limitations on copyright exclusive rights imposed by fair use and other
doctrines. Fair uses of works protected by copyright are protected against
infringement claims in the United States and are not regarded as copyright
infringements. When determining whether a use is fair, courts take into account
four factors: the nature and purpose of the challenged use; criticism;
commentary; news; teaching; research; scholarship; and the difference between
non-commercial and commercial usage.
While
factual and utilitarian works have a limited scope of fair use and a narrower
extent of protection against copyright, artistic and imaginative works receive
a greater scope of protection[5]. Other factors taken into
account include the volume and significance of the taking and how the problem
affected the work's worth or the market for it.
As
shown by the rulings in Field against Google[6]
and Authors Guild v. Google[7],
there are several examples that imply using the internet to crawl works as
training data may be considered fair use. Since Google wasn't abusing the
language in the work, its digitization of millions of in-copyright volumes from
research library holdings was deemed fair use in these situations. On the other
hand, others argue that Google's attempts to facilitate the discovery of the
copyright owner's works are not protected, and so they oppose the intake of
training data.
The
conversation concludes by emphasizing how crucial it is to comprehend fair use
and copyright regulations in connection to AI, especially when it comes to AI
intake.
INGESTING WORKS AS
TRAINING DATA
In
the context of generative AI, the controversy about fair use of AI is covered
in the book. It's been suggested that generative AI produces better results
when it consumes valuable content, and that the carefully chosen works of
authorship need to be compensated for. According to a study conducted by the
Authors Guild, 90% of its writers think that AI developers need to compensate
them for including their works into the training data for extensive language
models[8].
If
there is a market for training data licensing, there may be a damage to that
specific market as well. Fair use analysis takes this into account. There are
opposing factors, however, such as the constitutional goal of copyright, which
is to further scientific knowledge. One may argue that generative AI systems
further this goal, and fair usage gives some possibility for innovation.
Since
a legislation defines what it means to declare that an author has the sole
right to make derivative works, the derivative work right debate is crucial. In
order for a second work to violate the rights of a first work's derivative
work, the author of the second work must have used a significantly comparable
level of creative expression from the first work. The majority of generative AI
outputs won't resemble the input data from the training data series in a
meaningful way. It is doubtful that the outputs would violate that right if
such is the case.
One
issue with generative AI is that if a picture is shown often in ingesting
works, the huge language model can get memorized and cause an infringement
claim[9]. People are considering
methods to eliminate duplicates from the training data, which is less likely to
occur, in order to prevent infringement. Others are attempting to use output
filters to stop the creation of derivatives that violate intellectual property
rights.
SUITS FOR COPYRIGHT
INFRINGEMENT
There
are now two instances in the U.S. against Stability AI, the most compelling of
which is Getty. Getty claims that Stability[10]
downloaded 12 million images from Getty Images, including with captions,
alleging violations of copyright. In addition, they allege trademark
infringement. Their lawsuit includes screenshots of various Stable Diffusion
outputs that Getty contends violate derivative rights. While Getty is open to
licensing the use of photos from its database for instructional purposes, it
takes issue with Stability's egregious violation.
Copyright
attorneys have been arguing about copyright ownership of computer-generated
program outputs. Owner of a computer built using generative AI software Stephen
Thaler brought a picture produced by his system to the Copyright Office and
asked for a registration certificate. The Copyright Office, however, turned
down his plea, claiming that there was no human authorship on the photograph.
Then, he filed a lawsuit against the copyright registrar, requesting an order
compelling the registrar to provide him a registration certificate. This
lawsuit is still continuing in Washington, D.C. federal court.
In
addition, Kris Kashtanova, who creates photographs using Midjourney[11],
filed a copyright lawsuit with the Copyright Office. Although Kashtanova
received a registration certificate from the Office, she subsequently
discovered that her AI-generated photographs were protected by copyright.
Instead of allowing Kashtanova to obtain copyright in the text and the
arrangement, selection, and selection of the photographs, the Office cancelled
the registration. However, U.S. copyright law does not cover any of the
photographs, therefore the registration was altered.
AI-generated
works, according to a policy statement released by the Copyright Office, lack
human authorship, belong to the public domain, and may be freely copied. You
cannot prevent someone else from utilizing material that contains AI-generated
text, photos, or other content just because you have applied to register
copyright for it. Once it is in the public domain, it cannot be taken down. To
sum up, the United States is struggling with the question of who owns the
copyright on computer-generated software and the possibility of infringement
lawsuits. It will be essential to address these concerns and make sure that
copyright rules are respected as the case develops.
The
absence of human authorship in AI-generated works does not prevent the
Copyright Office from registering them, even when the outputs violate
plagiarism. The Office is not refusing registration to AI-generated works
because the outputs violate derivatives, which is excellent news for stability.
Large language models and training data, on the other hand, might present the
same issue since they are the result of automated procedures rather than
creative works. There are still unresolved issues surrounding generative AI,
such as the need of adopting regulations specifically tailored to this field. The
AI Act, which will place significant obligations on AI developers and
businesses that use such systems, is finalized by the European Union[12] and will vary in severity
according on the risk involved in the deployment. The AI Act was created with
certain kinds of AI systems in mind, including general-purpose AI and
healthcare systems.
US LEGAL SYSTEM AND AI
Another
wrench in the works is generative AI, and at the moment, the United States has
few specific restrictions. Although there is an AI Bill of Rights and a
framework for risk management of AI systems released by NIST, they are only a
collection of broadly applicable guidelines. The only legislation that has the
power to destroy generative AI systems is copyright law. According to courts,
ingestion, infringing, and the whole thing may essentially be destroyed,
therefore copyright law may pose an existential danger to advancement in this
specific area.
There
are worries over the possibility of generative AI being used in movies as the
Copyright Office is not protecting anything created with AI. The reason the
Office refuses to register AI-generated works isn't because the results violate
derivatives; rather, it's because the works don't have human authorship[13]. In summary, years will pass
before courts are able to definitively resolve issues pertaining to generative
artificial intelligence. Global discussions regarding AI governance are taking
place in major cities throughout the globe, and it is important to pay
attention to this developing topic in order to develop arguments that it will
strengthen copyright rather than undermine it.
The
employment of AI in a variety of businesses, including the Copyright Office, is
the topic of debate. The fact that human owners choose which of these
technologies to use to a considerable extent raises questions regarding the
ramifications for businesses like Disney. Disney is reluctant to allow third
parties to utilize its computer-generated material outside of the film because
they don't want to come seen as dangerous. Although the Copyright Office has
been addressing this matter for years, it is not an expert in this field.
Through some of its incompetent choices, the present copyright office has been
dictating American industrial strategy. This is troubling since it doesn't take
into account what this means for companies like Disney.
How
to discern between textual and graphical inputs is one of the difficulties in
detangling text and data prompts in AI-generated work. The topic of
"Copilot" in relation to AI systems and the results of a cooperation
in which AI acts as your co-pilot throughout the development process are also
discussed. Given that Europe is also heading in this route, this may sustain
copyright in these areas.
CONCLUSION
Let’s
consider whether web-crawling may be covered by copyright after learning about
the web crawling case. Is it possible to argue that some of the images used in
the Stability AI case are private even though public data is being indexed? The
discussion emphasizes how important it is to pay more attention to how
AI-generated material affects copyright offices and the possible legal
ramifications of such acts. The conversation concludes by emphasizing how
crucial it is to comprehend the subtleties of AI-generated work and the
possible legal ramifications of such activities. It also emphasizes how
important it is to pay more attention to the copyright office's concerns over
AI-generated material as well as any possible legal ramifications of such
activities.
The
idea of copyright is discussed along with how using search engines is affected
by it. It is proposed that new guidelines be implemented in order to shield
users' material from unwanted access. Breaking over a paywall, which would stop
fair usage, is one possible problem. However, in the instance of Authors
Guild v. Google, copies of 27,000 volumes from a private research
library collection of millions of books were made. Google is superior than Bing
because of its enormous volume of data, having scanned millions of books to
form the Google Books corpus.
The
topic of whether the human generation criterion that courts now use is codified
at the legal level should also be discussed. Even while it seems logical, things
aren't always like that. Despite being more than 300 pages lengthy, the
copyright act itself lacks precise guidelines for some situations. The section
outlining exclusive rights is probably just twenty-five sentences long, and it
is hardly a trustworthy indicator of future events.
The
subject of whether utilizing tools for internal work exposes one to copyright
infringement if anything is done there, such writing or copying for an
advertisement, is also discussed. The person who created the outputs is as
accountable for copyright infringement as the person who designed the program
that generated them if the outputs violate derivative works. This is because
damages may be assessed or paid for regardless of whether the output is
utilized as an advertisement or as art.
I conclude the conversation by emphasizing how crucial it is to understand the
copyright system and how it affects consumers. It's critical to understand that
copyright is a strict liability framework and that using someone else's work
for profit does not always put users at risk of infringement.
[1]
Case No. 3:16-cv-00826-WHO (N.D. Cal. Dec. 4, 2017)
[2]
Class action against GitHub Copilot [LWN.net]. (n.d.). https://lwn.net/Articles/914150/
[3]
OpenAI and Microsoft face fresh lawsuit from US news organisation. (2024, June
28). Artificial Intelligence | World IP Review. https://www.worldipreview.com/artificial-intelligence/openai-and-microsoft-face-fresh-lawsuit-from-us-news-organisation
[4]
Hillemann, D. (2023, January 23). AI-Related Lawsuits: How The Stable Diffusion
Case Could Set a Legal Precedent. Fieldfisher. https://www.fieldfisher.com/en/insights/ai-related-lawsuits-how-the-stable-diffusion-case
[5]
Module 3: The Scope of Copyright Law - Copyright for Librarians. (n.d.). https://cyber.harvard.edu/copyrightforlibrarians/Module_3:_The_Scope_of_Copyright_Law
[6]
412 F. Supp. 2d 1106 (D. Nev. 2006)
[7]
Authors Guild v. Google, Inc., No. 13-4829 (2d Cir. 2015). (2015, October 16).
Justia Law. https://law.justia.com/cases/federal/appellate-courts/ca2/13-4829/13-4829-2015-10-16.html
[8]
Survey Reveals 90 Percent of Writers Believe Authors Should Be Compensated for
the Use of Their Books in Training Generative AI - The Authors Guild. (2023,
May 15). The Authors Guild. https://authorsguild.org/news/ai-survey-90-percent-of-writers-believe-authors-should-be-compensated-for-ai-training-use/
[10]
Delaware, U. D. C. F. T. D. O. (2024, April 17). Getty Images (US), Inc. v.
Stability AI, Inc. Justia Dockets & Filings. https://dockets.justia.com/docket/delaware/dedce/1:2023cv00135/81407
[11]
Lawler, R. (2023, February 23). The US Copyright Office says you can’t
copyright Midjourney AI-generated images. The Verge. https://www.theverge.com/2023/2/22/23611278/midjourney-ai-copyright-office-kristina-kashtanova
[12]
High-level summary of the AI Act | EU Artificial Intelligence Act. (n.d.). https://artificialintelligenceact.eu/high-level-summary/
[13]
Wang, Runhua (2024) "The Copyright Requirement of Human Authorship for
Works Containing Artificial Intelligence-Generated Content," IP Theory:
Vol. 13: Iss. 2, Article 2.