The Legal Challenges of Generative AI—Part 1
Skynet and HAL Walk Into a Courtroom
July/August 2023Download This Article (.pdf)
This is the first in a multi-article series discussing the legal implications of using computer programs that mimic human creativity. This article describes how the current generative AI technology works, examines potential legal challenges under the Copyright Act, and introduces questions to consider as this technology develops.
Before lawyers or businesses attempt to reliably make use of generative AI, they should know how it works, at least at a high level, and have some foundation in the law that addresses how generative AI operates. This article provides a high-level overview of the current technology, walks through how the current law might apply to the way generative AI is built, and poses questions about whether and how the law may adapt to accommodate this rapidly changing technology. Future articles in this series will delve into the potential limitations, risks, and legal issues specific to end users, including attorneys. These issues are being presented in multiple segments partly due to the size of the topic, but also because technology is changing so rapidly that it is likely new developments will require further comment.
Introduction to Generative Artificial Intelligence
Intelligent robots in popular culture have been portrayed as rigid and emotionless. Although computers showed superhuman skills in mathematical tasks, authors blithely proclaimed that humans still had the edge in creativity and that emotion and creative invention were beyond the reach of mere computer code. Yet, humans have been using the circuit board as a canvas for decades. We have created digital art, used word processors to create novels and legal briefs, used computer graphics for our movies, and used software to touch up our photos. We have even created entirely new forms of art, like video games, that depend entirely on computers for their existence. Our species has been using complicated arrangements of silicon for creative purposes practically since the moment we first shot lightning through them. In retrospect, then, it should not be all that surprising that computers are more capable of creative expression than supposed.
Programmers have not determined the precise algorithm for creative thought. To date, no one has been able to program the explicit instructions for a computer program that generates Expressionist paintings on command or writes a legal brief at the level of a human. Strangely, we may have created it anyway. Largely out of the public eye, researchers have sidestepped the design problem by inventing ways to train software to solve problems without needing to figure out the solution in advance. Prior thoughts about the limits of artificial intelligence may have really been about a human being’s inability to articulate instructions and not actual limits on what computers can really do.
The public only saw glimmers of the progress being made in the last few decades when a program was able to play chess or win at Jeopardy.1 As of late 2022, however, artificial intelligence that can generate creative works that appear to have been created by a human (generative AI) has clearly seized the public’s attention. Stable Diffusion, an open-source generative AI program that can turn text inputs into art or photographs, was released in August.2 ChatGPT, a type of AI that can respond to text prompts in an uncannily human manner, was released to the public in November.3 Both have been the subject of intense interest.
One jaw after another has dropped as people realize how far the technology has come.4 Despite the limitations of the current software and warnings from major tech CEOs, politicians, and researchers alike about how disruptive and potentially dangerous artificial intelligence is becoming,5 the business world is now in a race to develop a technology that promises to change how white-collar work is done.6 Surveys report that some businesses are already replacing workers with AI despite warnings that ChatGPT in particular is unreliable and shouldn’t be trusted for “anything important.”7 Perhaps unlike other recently hyped technologies like virtual reality, NFTs, or blockchain, generative AI appears to be game-changing.
Any task that requires producing written or other creative work is potentially affected by generative AI. Google and Microsoft are now racing to implement the technology in their office suites.8 New companies are selling or using AI in tools such as virtual personal assistants, writing guides, candidate screening, or customer service.9 But, as with any disruptive new technology or automation, some will be harmed by the resulting changes in supply and demand.10 Some artists, programmers, writers, and yes, lawyers, may be worried by the availability of technology that can do at least a superficially good job of replicating their work product at a lower price.
The law now must grapple with the questions raised by widespread use of software that can produce human-seeming creative works on demand. Copyright law, the major area of law that protects creative works in the United States, does not currently have clear answers for how generative AI may be trained or used, whether works created using generative AI have copyright protection, or many other questions. Copyright is a flexible area of law and can evolve with new technology. Still, it remains to be seen whether and how the law can adapt to the challenges of generative AI and what legal framework will best integrate this new technology into society.
Lawyers have a special interest in generative AI because it seems capable of performing or assisting with many of the mechanical aspects of law practice, such as document review, legal research, legal writing, and blogging.
LexisNexis, one of the leading legal research services, has announced generative AI functionality allowing lawyers to ask for simple legal briefs, letters, and other written material with citations.11 According to Chief Product Officer Jeff Pfeifer, Lexis anticipates a commercial release of its AI-powered search, writing, and research capabilities within LexisNexis in late summer 2023.12 In the meantime, he noted that the “[t]hing that scares me most is the amount of experimentation I’m seeing without understanding the technical infrastructure of the model setup.”13 Even if the tools are not powerful enough to replace associate attorneys just yet, some law firms are already experimenting with incorporating generative AI into their practices.14
Overview of the Technical Details of Generative AI
Understanding the capabilities and limitations of generative AI starts with knowing how the technology works, at least at a very high level. Generative AI programs are generally trained by presenting a neural network15 with a large body of preexisting data and engaging in some form of repetitive machine learning to encourage the network to develop relationships between text input and particular output that fits the training data. There are different techniques within the umbrella of machine learning, but in general this entails seeing how well the existing model does and then adjusting the neural network in a direction that would have produced a better result. This process is then iterated a mind-bogglingly large number of times until the network gradually produces results that closely match the training data.16 The resulting trained network is called a “model.”
Current generative AI models require massive dumps of training data, largely collected from public sources on the Internet, in order to use this training process. The kind of data used depends on the purpose of the model being trained. Generative AI models that read and write human language are called large language models (LLMs) and are generally trained on large text databases curated to ensure the program gets a good sample of the kind of writing it is intended to simulate. ChatGPT, an LLM that seeks to emulate a virtual assistant, was trained on 570 gigabytes of text obtained from “books, webtexts, Wikipedia, articles, and other pieces of writing on the internet.”17 GitHub CoPilot, a computer code model, was trained on “natural language text and source code from publicly available sources, including code in public repositories on GitHub.”18 The training data used for other LLMs are being kept secret by their creators, but probably also consist of enormous text dumps of similar writings.19 Generative AI programs dealing with images, video, or music are being trained on large datasets that are paired with text descriptions. Image-generating programs are trained using large sets of images together with text descriptions, such as LAION-5B or CIFAR-10, scraping images from the World Wide Web.20 Video-generating programs are trained by using sets of narrated videos, again matching video data with text.21 Music-generating programs can be trained on large datasets of music files.22
After initial training, models often are next put through a period of fine-tuning by using adversarial training or other techniques to warp the model in the direction of the desired final behavior. An LLM, for example, is first trained to accurately predict the next word23 by testing it based on its training data. Then, it may be taken through a period of human-assisted reinforced learning involving a human subject ranking their satisfaction with the output of the LLM24 or further training on a specific kind of writing to encourage the model to generate text of that kind.25 The LLM ends up with a remarkable ability to predict the next word based on what it saw in its training data and what the human-based feedback preferred.26 Similar kinds of fine-tuning can be done for other kinds of models.
Because the final form of the model was not designed in advance, but just emerged from guided, gradual modification based on the training data, no one initially understands the internal structure of the final model. No one knows what kind of algorithms or programming tricks emerged inside the model, in a pure mathematical form, that ended up being successful at the task for which the model was trained.27 This means that the output of the model cannot be perfectly predicted. Indeed, there may be fundamental reasons to think that a perfectly predictable and reliable model is technically impossible.28 Even after fine-tuning, the behavior will be unpredictable. For example, LLMs sometimes produce sensible-sounding but factually inaccurate information called a “hallucination.”29
The US Copyright Act
Even though the data used to train generative AI programs is publicly available online, that does not mean all the data is in the public domain. Some of it appears to have been protected intellectual property. Thus, the fundamental question about the legality of current generative AI programs is whether programmers should be allowed to train models on protected works of the kind they seek to replace. Three lawsuits, filed in California and Delaware, are seeking to resolve this question: Anderson v. Stability AI, LTD,30 Doe v. Github, Inc.,31 and Getty Images v. Stability AI.32 While each case has slightly different issues and arguments, each of them primarily focuses on the US Copyright Act (Act).
Their focus on the Copyright Act is appropriate. The Act33 protects various kinds of creative works, including the data used to train generative AI. Literary works, musical works, dramatic works, pictorial and graphic works, and sound recordings are all examples of the kind of work subject to protection.34 The Act also protects extensions of these ideas into new media, such as computer programs.35 Even a program that arguably has only a mechanical or utilitarian role, such as an operating system, may be entitled to copyright protection.36
The Copyright Act is also important because it likely will preempt any state law claims that creative lawyers might rely on. Copyright is an express power of the United States in the Constitution.37 Under the Supremacy Clause, the Act preempts “legal or equitable rights that are equivalent to any of the exclusive rights within the general scope of copyright . . . and come within the subject matter of copyright . . . .”38 Claims that involve the “subject matter” of copyright may be preempted even if no infringement is found.39 So, state law theories like breach of contract, unjust enrichment, unfair competition, and similar claims may be preempted.40
Holders of copyrights have exclusive rights to their work.41 These include the right to reproduce or copy; create derivative works; distribute copies by sale, lease, or lending; and publicly perform or display the work.42 As relevant to computers, unlawfully downloading copyrighted material may violate the reproduction right.43 Even storing copyrighted material temporarily in random-access memory (RAM) without downloading it may do so, at least where it is stored in RAM long enough to be “perceived, reproduced, or otherwise communicated for a period of more than transitory duration.”44 Thus, generative AI’s collection and use of copyrighted works for training is the most common starting point for those who argue the whole design is fundamentally infringing.
Generative AI Probably Does Not Copy Protected Elements of the Original Work
Copyright law is limited by statute and by the First Amendment to the Constitution, distinguishing between ideas and expression, and makes only the latter eligible for protection.45 Whether or not generative AI infringes on copyright first requires identifying precisely what the software is “copying.” To uphold a claim of infringement, the defendant must have copied some original element of the plaintiff’s work.46 The plaintiffs in Anderson argue that by training a computer to produce new works in the same style as existing art, generative AI steals the collection of techniques and choices that constitute the artists’ personal, unique style.47 Perhaps so, but an artist’s personal, unique style is probably not protected. The artist’s resulting works may be,48 but the style likely is not because “[i]n no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.”49 As another example, a singer’s voice is not copyrightable.50 In the context of generative AI, if the aspect of the work being embedded in the model is limited to the styles or ideas behind the creation of the work, there probably is no infringement.51
So, whether the work in question is computer code, text, visual art, or something else, the initial difficulty in declaring infringement seems to be that the generative AI model stores only relationships between text prompt input and creative output. It maps human descriptions of works to the elements of works that are meaningful and named by humans. This mapping may be more similar to the methods, concepts, and principles behind the works than actual elements of the works themselves.
Even if the mapping could be said to be copying some tangible element of the original work, if that element is “inextricably bound” or “inseparable” to a non-copyrightable idea, then the copying may be permitted.52 In a recent case concerning software, Oracle America, Inc. argued that the code that labels and organizes certain computer tasks was protected by its copyright.53 The Supreme Court acknowledged that computer code could be protected by copyright in general and that Oracle did have copyright protection for aspects of its code.54 However, the particular declaring code at issue was mostly organizational in nature, like labeling a filing cabinet, and existed for the purpose of encouraging third parties to use the code to accomplish other non-infringing purposes.55 Thus, the code was “inherently bound together with uncopyrightable ideas” and was “further than are most computer programs . . . from the core of copyright.”56
The same may be true of other artistic works. Even though an artist’s style may be embodied in the form and structure of their finished works, the rules that define that form and structure may be something copyright cannot protect.
None of this suggests that protecting an individual’s unique style or a programmer’s creativity from automation is a bad idea, just a new one. Until now, copying another artist’s style was the kind of thing that only another human could do, and then only after intensive study and practice.57 Today, a cluster of processors can dissect the elements of every artist’s style simultaneously, in a matter of months, and then allow users to easily generate new works in those styles. Even if the steps involved are roughly the same, there is a difference in scope and speed between a human learning the craft of another and a machine learning all crafts of all humans all at once.
Examining AI Under Fair Use Factors
Even assuming generative AI is copying protected work, that would not end the analysis. Some otherwise infringing activities are protected as “fair use” under the Act.58 This is an affirmative defense to allegations of copyright infringement.59 The fair use exclusion specifically mentions exemptions for “criticism, comment, news reporting, teaching . . . , scholarship, or research . . . .”60 These are sometimes treated as the archetypal fair use cases and are favored by the courts.61 They are not absolute, however.62 Nor are they intended to be an exhaustive list of what can qualify as fair use. Instead, fair use is determined on a case-by-case basis by examining four factors: (1) the purpose and character of the use, including whether such use is of a commercial nature or is for a nonprofit educational purpose; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work.63
The factors may sound vague. That is by design. The Act tells judges to consider the “purpose and character” and “nature” of works without defining those terms.64 Rather than providing black-and-white rules, Congress intended to give the courts great latitude in developing the doctrine of fair use through the common law process.65 Fair use is an “equitable rule of reason,” and outcomes may shift as changes in technological and cultural norms modify society’s view of what is acceptable. Congress believed that the judiciary was uniquely situated to determine the rules society will implement to govern new technologies that use copyrighted works in new ways. The four fair use factors and their applicability to generative AI are discussed below.
Purpose and Character of the Use
Courts applying the first factor consider whether the use is commercial and whether it is “transformative.” Theoretically, commercial use weighs against a finding of fair use, because the user stands to profit financially from unfair exploitation of the copyrighted material.66 In many modern cases, commercial use is given little weight in the overall analysis.67 Since “most secondary uses . . . including nearly all of the uses listed in the statutory preamble, are commercial[,]” a finding of commercial motivation does not have presumptive force.68 Courts tended to focus instead on whether the use was transformative. The more transformative a work is, the less significant all of the other statutory factors become.69
In May 2023, the Supreme Court issued its opinion in Warhol v. Goldsmith, which may represent a shift in the importance of the commercial use factor.70 Justice Sotomayor joined the conservative justices in writing an opinion that narrowly targeted how a particular commercial use affects the analysis of the first fair use factor in a case between the ghosts of pop culture icons.71 In Warhol, the copyright holder of a photograph of the late artist Prince sued the late artist Andy Warhol’s foundation for selling a silkscreen Warhol version of the original photograph used in a story about Prince.72 The Warhol Foundation argued that the silkscreen was transformative because, for example, Warhol “conveys the dehumanizing nature of celebrity.”73 The majority disagreed because the precise commercial use at issue was using an image of Prince for a magazine article, and this was the exact use of the original photograph.74 The opinion suggests that the first element of fair use may not be answerable for an infringing work in general but must be understood in the context in which that second work is used. Where Warhol’s depictions of copyrighted works are displayed in an art gallery for a purpose wholly unrelated to the original, like his iconic images of soup cans, they are transformative.75 Thus, we may be at the beginning of a shift in the jurisprudence in this area of law in which the type of commercial transaction or other use becomes more important in fair use analysis.
As noted in Warhol, courts considering whether a work is transformative examine whether it merely supersedes or supplants the original or whether it adds something new, with a further purpose or different character, and alters the first with a new expression, meaning, or message.76 Courts have emphasized that whether a work is transformative depends on whether it is reasonably perceived as having a new meaning and new message, rather than merely counting up the number of changes made.77 If the meaning and message are not changed, but the work is simply translated into a different medium, then it is merely a derivative work and is still infringing. For example, converting audio files into MP3 format is not transformative.78 There is tension between the exemption for transformative work and the creator’s exclusive right to derivative works,79 and the line between the two is not always clear.80 A film adaptation of a book would probably be a derivative work,81 but creating a digital corpus of scanned copies of books to enable a search engine to examine them is transformative.82
Given that the entire point of generative AI is to allow users to generate new creative works, whether the software transforms the training data is a key question. On the one hand, because the neural network was derived from the training data, one could say that its output is “algorithmically derivative” of that data.83 The plaintiff in Anderson asserts this in a different way, claiming that the models should be seen as essentially compression algorithms that still contain the original works in some sense.84 It is unclear whether this is correct from a technical perspective.85 Compression is the process of encoding information in fewer bits than the original representation.86 The goal of a compression algorithm is to reliably reconstruct a particular original.87 Unless the starting data is totally random, there is likely a way to represent it in fewer bits than the raw data itself.88
But, as discussed above, generative AI software does not have the inherent ability to reliably reproduce its training data. It may come close right after machine learning has completed work on the original training data. After that, though, fine-tuning will begin to distort the patterns in the neural network to improve the output.89 In the hands of the end user, a generative AI model does not seek to faithfully reproduce one particular original, but rather to create new works that may share certain elements with the prior works. Even when asking generative AI to generate simple computer code, it replicates code from its training data less than 1% of the time.90 Some users have been able to accomplish a form of “compression” using generative AI for images with some additional technical work by the end user.91 This is a novel form of compression and results in distortions not of the kind that would be expected from normal lossy (lost information) compression techniques.92 Instead of a grainy or blurry picture, the “loss” takes the form of new or missing elements in the composition, changes in shape, or other abstract differences in the work.93 This makes sense because the model is not trying to encode the exact pixels making up an image, but instead encode the relationships between certain elements in an image and human language.94 The model’s goal is not to encode the training data, but to encode the relationships between text prompts and recognizable elements of the training data.
It is unclear whether those relationships are themselves part of the original work that would undermine a finding of transformative use. A “wholesale taking” of the prior work in its final form would not be transformative.95 But, reusing visually identical copies of parts of a preexisting work might be. In Cariou v. Prince, the Second Circuit wrestled with how much preexisting photographs needed to be modified before they had been transformed in a fair use manner and focused on the “use” of the exact prior work “in whole or substantial part.”96 If exact copies of a portion of the prior work can be repurposed and still support a finding of fair use, then mapping the relationships between prompts and styles, patterns, or objects would also seem to be. The generative AI does not seek to cut pictures out of a magazine and rearrange them, but rather to use the original works as “grist for the mill”97 to tease out those underlying relationships.
By making these relationships accessible to users, generative AI arguably provides otherwise unavailable information about the original works. A similar argument was upheld as transformative in Authors Guild v. Google.98 In that case, Google was sued by authors for using their copyrighted works to create a software program that allowed users to search the content of books online.99 Google’s digital database also allowed “new forms of research, known as text mining and data mining,” whereby statistical information on the text of books became available for the first time.100 The Second Circuit had “no doubt that the purpose of this copying is the sort of transformative purpose” permitted under fair use.101 It explained that the kind of transformations that may still infringe tend to be transformations of form only, such as translation into a different language, dramatization, performing, photocopying, abridgment, rebroadcasting, or similar actions.102 Aggregating data that had not been previously aggregated for the purpose of enabling a new tool, such as one that searches or analyzes the data, was “a highly transformative purpose . . . .”103 Creating a searchable database was, in the Second Circuit’s view, “a quintessentially transformative [fair] use.”104
It seems difficult to argue that generative AI is not transformative under this analysis. This software aggregates data about the relationships between text and artistic elements or styles, or other text. If using a search engine to query or summarize existing works is transformative, it seems that using a text query to construct a new work would be, too.
Under Warhol, moreover, the fact that generative AI models have a different purpose than the original art suggests that the first factor will weigh in favor of fair use. Whatever the use of a particular copyrighted piece of art may be, it is not to function as a printing press that generates different pieces of art. Rather, Warhol suggests that this factor would not weigh in favor of the original copyright at the time a model is created. Instead, the infringement analysis might be applied only when someone uses generative AI to create a work that is independently infringing and then uses that new work in an infringing way.
Nature of the Copyrighted Work
In considering the second factor, courts ask (1) whether the nature of the copyrighted work is expressive and creative or more factual, and (2) whether the work is published or unpublished.105 The second question likely weighs in favor of fair use in the context of generative AI since all of the information used for training was publicly available online. The former question may apply differently depending on the particular model in question.
Copyright law offers stronger protection to expressive, original, and creative content than it does to work containing purely factual information.106 The original works described by the plaintiffs in Anderson, Doe, and Getty Images are probably all creative in nature, which would weigh against fair use. But the information used to train LLMs may be a mixture of creative and non-creative works. Thus, this element may be evaluated differently depending on the particular case and particular plaintiff.
Amount and Substantiality of the Portion Used
Courts consider both the quantity and the quality of the amount taken from the original work when assessing fair use. Generally, the more that is taken from the original work, the lower the chance that the new work is a fair use. The Seventh Circuit has found that where so much of the original image was removed that, “as with the Cheshire Cat,” only a smile was taken from a prior work along with the outline of a man’s face, this factor weighed in favor of fair use.107 Where a work is copied wholesale, the copying may not be fair use, at least where such wholesale work was not necessary for purposes of the new work.108 On the other hand, if the small element taken from the original work constitutes the “heart of the work,” that may still weigh against fair use.109
The question of how much of the original work is copied by generative AI is not straightforward because the plaintiffs in the current trio of lawsuits are suing the software in general and not the sale or display of a particular infringing new work.110 The generative AI is not a printing press locked into producing a particular infringing work, and even its opponents seem to agree it almost never exactly replicates a copyrighted work in its training data.111 Until the end user uses generative AI to create an allegedly infringing work, is there any new work or copy on which to base the infringement analysis?
Perhaps. The machine learning process probably makes copies of the data, at least temporarily, for the purpose of training. But, even if an entire work is used for this purpose, that may not add weight to the fair use analysis if the copying was reasonable and necessary to an otherwise proper fair use.112 One example of this is creating thumbnails for a search engine.113 Another example is the process of reverse engineering, by which copying is done “solely in order to discover . . . the aspects of . . . programs that are not protectable by copyright.”114 This form of “intermediate copying” can be fair use if the copying was “necessary” to gain access to the functional elements of the software being questioned.115 The process of “reverse engineering,” is therefore a fair use and not prohibited by the Copyright Act.116 Reverse engineering may end up being a good analogy for how generative AI functions. Like copying computer code to determine the ideas in the underlying system, models during training copy creative work to determine the underlying ideas in the work. The difference may only be what hardware is being reverse engineered. Instead of reverse engineering computer code, generative AI seeks to mimic the operations of a human brain that describe art, text, code, or other creative works in human language. The models are impressive precisely to the extent that they can mimic the output of biological intelligence.117 If it is acceptable to make intermediate copies to uncover the hidden algorithms in a computer chip, there is no obvious reason why the existing law would treat an attempt to uncover the hidden workings of the human mind any differently.
Unless, of course, the law evolves to strike a different societal balance. Just as a lack of copyright in style may be based on how difficult it used to be to replicate an artist’s style or singer’s voice, decisions favoring reverse engineering computer hardware may not have anticipated how that would apply to human biological hardware. Since the 1950s, computers have increasingly encroached on abilities of which only we were capable, with one intellectual task after another being automated. The further we go, the more some of the underlying assumptions about the prior law might be subject to new scrutiny.
Effect on the Market for the Original Work
The final factor has been described by the courts as “undoubtedly the single most important element of fair use.”118 In examining the effect on the market, courts consider (1) the market for the original work; (2) any impact on traditional, reasonable, or likely to be developed potential markets; and (3) the market for derivative works, if any.119 If the new use serves as a substitute for or usurps the market for the original, the use is less likely to be fair; conversely, uses that serve a different purpose or audience are more likely to be fair.120
Widespread adoption of generative AI has a practical impact on the market for the kind of work product it creates.121 For example, the widespread use of ChatGPT’s ability to offer intelligent-sounding answers to questions appears to have caused a drop in demand for online tutoring.122 The same should be true of any other industry where the customer’s needs are met or perceived to be met by generative AI.123 Content farm companies have sprung up using LLMs to produce written content en masse.124 The most immediately vulnerable markets may be those where the customer requires content but is tolerant of less specific or more generic results. This may include SEO content creators, bloggers, advertising, or those who use decorative art such as clip art, book covers, games, and similar applications.
For now, it appears that generative AI is not able to produce output that is better than humans who are experts or masters in their fields. That does not mean that they will escape economic impact, though. Although current generative AI still requires human supervision, industry commentators suggest it could make some creative work “four orders of magnitude cheaper,”125 and recent studies have shown a 14% increase in customer support productivity126 and a 37% increase in productivity for professional writing tasks.127 An increase in supply or productivity may inevitably lead to a drop in prices.128 Whether through increased volume of mediocre work produced by generative AI or through increased volume from more productive creators, it seems inevitable that the volume of available creative work will increase. Even if it turns out that there is discovered or induced demand for the work, that would still entail a general drop in price, and thus an economic impact.
An increased volume of creative work has indirect effects on markets, too, since a flood of generated works can impact everyone. For example, since LLMs have been unleashed, at least one publisher has stopped taking submissions because they are drowning in AI-generated stories.129 Others may still be accepting submissions but are struggling with a large influx of generated work.130 Spotify, a music streaming service, recently purged tens of thousands of AI-generated songs from its service, with about 10 times that amount remaining.131 As artificially generated content threatens to flood its search algorithms, Google has announced that it will try to punish using AI to manipulate search rankings as a violation of its spam policies.132 From the legal perspective, while providing pro se litigants with an LLM that can produce intelligible prose and help people understand the law is laudable, if this results in a significant increase in cases filed it may strain an already overworked judiciary.
All that said, if the allegedly infringing use is transformative, as it seems to be, then merely showing a negative economic impact may not be enough to show that the infringing work is aiming for the same market as the original.133 Copyright holders are not guaranteed a market for their work.134 Changing technology and automation often displace workers in industries affected.135 If artists, programmers, and lawyers begin to feel the same pinch that previously impacted assembly line workers, this pain alone may not be enough to muster a copyright challenge to their robot competitors. When a new use “amounts to mere duplication of the entirety of an original, it clearly . . . serves as a market replacement for it . . . .”136 But, where the second use “is transformative, market substitution is at least less certain . . . . ”137 A copyright holder “cannot prevent others from . . . developing or licensing a market for . . . transformative uses of its own creative work.”138
This factor, like the second, may also vary based on context. The artists in Anderson will have to wrestle with whether the market for the unique style of art each one offers is comparable to the market for artwork in any style that generative AI can produce. The programmers in GitHub will similarly have to show that the market for their programming skills is equivalent to the automation CoPilot provides. It is debatable whether either the artists or the programmers function in the same way that generative AI does, quickly producing a simulated creative result of any kind based on a short text prompt.
Getty Images presents a closer question. Getty Images maintains an online database of stock photos and has made those photos searchable by text descriptions.139 Customers pay Getty Images for access and licenses to use those stock photos for their own purposes.140 In terms of input and output, Getty Images’ business model is remarkably similar to generative AI except that, instead of creating the image upon request, Getty Images predicts in advance what kind of images might be wanted and catalogs them for later retrieval.141 Thus, the Getty Images lawsuit may present the clearest case of market impact and may forecast how courts are likely to address this factor.
Other Related Intellectual Property Laws
Copyright is probably the major pillar of intellectual property that is implicated by generative AI since it is, by its nature, seeking to copy human creativity. But some other federal laws are raised by the trio of pending lawsuits and might be additional areas of interest in generative AI.
Trademark law can become implicated in generative AI when the training data or output produces a protected trademark. The Lanham Act, trademark’s governing statute, prohibits the use of registered trademarks to counterfeit, cause confusion of goods or services, or deceive.142 It was intended to make “actionable the deceptive and misleading use of marks,” and “to protect persons engaged in . . . commerce against unfair competition . . . .”143 Trademark has a fundamentally different purpose from copyright and “is concerned with protection of the symbols, elements or devices used to identify a product in the marketplace and to prevent confusion as to its source.”144 It protects marks, not any other aspect of the work. Trademark infringement does not require a showing of willful action,145 only that the mark merits protection and that the allegedly infringing result is likely to result in consumer confusion.146 The Tenth Circuit, however, does note that the intent of the alleged infringer is one “nonexhaustive factor” in determining whether there is a likelihood of consumer confusion.147
Getty Images, in its lawsuit, argues that sometimes generative AI will reproduce something similar to the watermark it embeds in all of its images,148 and that this is a violation of the Lanham Act.149 Because the model does not differentiate the pixels making up a watermark from any other portion of the image, it may inadvertently associate certain text prompts with a pattern that resembles a signature or watermark. Since the rest of the photographs fall under copyright and not trademark,150 this argument focuses on reproduction of the actual mark. But, to violate the Lanham Act, the infringing use of the mark must also be likely to cause confusion, mistake, or deception.151 It does not appear that Stable Diffusion uses Getty Images’ watermarks or trademark in its own marketing materials, so it is unclear whether the fact that it sometimes emerges from user input is enough to show that customers are confused. It remains to be seen whether any customer who is ordering stock photos would actually be confused by the warped versions of the watermark that sometimes emerge from the model. Even if generative AI and Getty Images both court the same customers, that alone probably does not show confusion. It is possible that those customers know the two products are unaffiliated and are merely shopping between competitors.152
Additionally, as with state law, a Lanham Act claim that overlaps with a copyright claim will be preempted by the Copyright Act.153
Another related federal law implicated by generative AI is the Digital Millennium Copyright Act (DMCA).154 The DMCA prohibits removing, altering, or providing false copyright information.155 In particular, a third party is not allowed to distribute works after removing copyright information such as the author, terms and conditions for use of the work, identifying numbers, or symbols.156 Technology that acts to circumvent a technological measure designed to protect a copyright may violate the DMCA.157 So, with respect to images where the watermark and other information about the stock photos are removed, Getty Images suggests that generative AI causes a violation of the DMCA instead of the Lanham Act.158 This argument may find difficulty for many of the same reasons as those under the Copyright Act, namely, that generative AI does not reproduce the underlying identical copyrighted work stripped of the copyright management or other information. Rather, it extracts the relationships between text descriptions and objects or patterns in the training data.
How Can Stakeholders Navigate Legal Changes?
Perhaps the only thing that can be said for certain about law and generative AI is that it presents a major change in the assumptions underlying existing law. Existing cases, especially Authors Guild v. Google and Oracle, seem to suggest that the technology is likely to be found non-infringing as the law is currently constructed. But the Copyright Act and its interpreting case law were not developed in a world where the human creative process could be reverse-engineered and where passable creative works could be created by automation. The law can and arguably should adapt. When technological change renders the literal terms of the federal Copyright Act ambiguous, the Act must be construed in light of its basic purpose.159
It is less clear exactly what the new legal framework should look like. Putting the genie back in the bottle is probably not a realistic option. Even if the Ninth or Second Circuit finds problems with the way in which the current models are trained, that is unlikely to stop the technology. The pace of change is such that any decision could be irrelevant when issued. Different models already exist, operated by private individuals160 or companies not currently being sued. The plaintiffs in the trio of cases pending before the Ninth and Second Circuits were only able to file their lawsuits because the creators of some of the initial generative AI models discussed their training data sets in public. Other creators have not been so open about where they got their training data, and upon seeing those who did get sued, they have no incentive to be. Even if details concerning how other generative AI models are uncovered or general rules are determined by the courts, the same technology can be repurposed around whatever limitations the court may impose. Reputable companies could apply the same machine learning techniques to datasets that are curated to remove copyrighted material, or perhaps pay third parties for the right to use their works in training.
Companies might also seek to use the output of the current generative AI systems to train the next generation. There is a lot of it. The number of works produced artificially in the last year may be larger than the original training datasets. While good data on the total sum of material being produced by generative AI is hard to find, Midjourney, a popular image-generating AI, has a current user base of 14.5 million161 and may be producing about 275,000 images per day.162 Dall-E, another image generator, may be producing about 2 million per day.163 Even without considering other commercial image generators or those generating images privately using open-source versions of Stable Diffusion at home, these figures suggest that within the last year or so the volume of AI-generated art may have eclipsed large stock image companies’ portfolios164 and the original LAION image database used for machine learning.165 On the LLM side, ChatGPT has 1.16 billion users and manages 10 million queries per day, each of which is an example of a human conversing with the generative AI.166 Whatever the actual numbers are, the companies operating generative AI already have massive new datasets. Those datasets have already been populated with text prompts describing images, text, or something else. There is plenty of data in the world already perfectly organized to train the next generation of models. If the original models are found to have infringed, would each and every output image necessarily also be infringing, and if so, how could this be policed?
The task of deciding how this new technology fits into society may well fall to the US Congress, not the courts.167 And, perhaps it should. There is a constituency that is affected by generative AI that is not necessarily represented in court battles between the companies profiting from the technology and the creators whose markets they are disrupting: the end users of generative AI. The purpose of the Copyright Act is to enrich the general public through access to creative works.168 The Act strives for a balance between two competing goals: encouraging and rewarding authors’ creations while also enabling others to build on that work.169 As explained by the CEO of Midjourney, part of their corporate purpose is to “unlock the creativity of ordinary people by giving them tools to make beautiful pictures just by describing them.”170 This view may be self-serving, but that does not necessarily make it incorrect. The existence of generative AI allows those without training, practice, or skill in creative fields to create at least workmanlike artwork, music, or writing. Additionally, there is some evidence that generative AI most benefits low-skilled workers, meaning it might most benefit “those who were left behind in the previous technological era.”171 Is society best served by protecting the value in the work of human creators, or by allowing more humans to generate creative work more quickly using generative AI programs in place of long years of study and practice in their craft?
Where technology causes such rapid societal change, courts are sometimes reluctant to play policymaker.172 When urged by litigants to establish a national policy concerning open access to the Internet, the Ninth Circuit explained:
[T]hat is not our task, and in our quicksilver technological environment it doubtless would be an idle exercise. The history of the Internet is a chronicle of innovation by improvisation, from its genesis as a national defense research network, to a medium of academic exchange, to a hacker cyber-subculture, to the commercial engine for the so-called “New Economy.” Like Heraclitus at the river, we address the Internet aware that courts are ill-suited to fix its flow; instead, we draw our bearings from the legal landscape, and chart a course by the law’s words.173
So too here. Perhaps society’s choices about how humans handle creative control over our work product should be made not by our courts but collectively through our legislatures.
Articles concerning the state of the law are always vulnerable to being rendered irrelevant by new court decisions or legislation, and this one is no exception. It seems inevitable that there will be court decisions or, more likely, legislation that will change how the law interacts with this new technology. It is the author’s hope that this and subsequent articles will help legal practitioners understand the current state of the law and what changes might be needed. Placing faith in the existing copyright laws is likely to be misguided. Generative AI should be recognized as disruptive in a way that few technologies have been. We, as practitioners, should each consider what role we can play in helping society decide how to handle the disruption.
1. Greenemeier, “20 Years after Deep Blue: How AI Has Advanced Since Conquering Chess,” Sci. Am. (June 2, 2017), https://www.scientificamerican.com/article/20-years-after-deep-blue-how-ai-has-advanced-since-conquering-chess; Markoff, “Computer Wins on ‘Jeopardy!’: Trivial, It’s Not,” N.Y. Times (Feb. 16, 2011), https://www.nytimes.com/2011/02/17/science/17jeopardy-watson.html.
2. Stable Diffusion Public Release, https://stability.ai/blog/stable-diffusion-public-release.
3. Stringer and Wiggers, “ChatGPT: Everything you need to know about the AI-powered Chatbot,” TechCrunch (Apr. 25, 2023), https://techcrunch.com/2023/03/30/chatgpt-everything-you-need-to-know-about-the-ai-powered-chatbot.
4. There are also philosophical questions that are beyond the scope of this article concerning whether LLM technology is likely to result in something close to true general artificial intelligence. Some view any statistical methods of predicting text as mere “stochastic parrots” that lack any true understanding of the concepts embodied in language. See Hofstadter, “Artificial Neural Networks Today Are Not Conscious, According to Douglas Hofstadter,” Economist (June 9, 2022), https://www.economist.com/by-invitation/2022/06/09/artificial-neural-networks-today-are-not-conscious-according-to-douglas-hofstadter. Perhaps, but such arguments could make unstated assumptions about where our own human understanding of concepts actually comes from. Extremely complex behavior can emerge from extremely simple interactions. Wofram, “What is ChatGPT Doing…and Why Does It Work?,” Stephen Wolfram Writings (Feb. 14, 2023), https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work. How far current generative AI will go and its implications for the nature of human intelligence remains to be seen.
5. “Pause Giant AI Experiments: An Open Letter,” Future of Life Institute (Mar. 22, 2023), https://futureoflife.org/open-letter/pause-giant-ai-experiments (signatories include Yoshu Bengio, noble prize winner; Stuart Russel, one of the founders of modern artificial intelligence research; Elon Musk, the CEO of SpaceX, Telsa, and Twitter; and Andrew Yang, former presidential candidate, among many esteemed others).
6. See, e.g., Davenport and Mittal, “How Generative AI Is Changing Creative Work,” Harv. Bus. Rev. (Nov. 14, 2022), https://hbr.org/2022/11/how-generative-ai-is-changing-creative-work.
7. Williams, “Some companies are already replacing workers with ChatGPT, despite warnings it shouldn’t be relied on for ‘anything important,’” Fortune (Feb. 25, 2023), https://fortune.com/2023/02/25/companies-replacing-workers-chatgpt-ai.
8. Kahn, “GPT-4 debuts and Google beats Microsoft in race to add generative A.I. to consumer office tools,” Fortune (Mar. 14, 2023); Spataro, “Introducing Microsoft 365 Copilot—your copilot for work,” Off. Microsoft Blog (Mar. 16, 2023), https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work.
9. Rosalsky, “This Company Adopted AI. Here’s What Happened To Its Workers,” NPR (May 2, 2023), https://www.npr.org/sections/money/2023/05/02/1172791281/this-company-adopted-ai-heres-what-happened-to-its-human-workers?utm_source=tldrnewsletter; Akash, “Top 10 Start-Ups Powered by GPT-3 That You Should Know in 2023,” Analytics Insight (Feb. 3, 2023), https://www.analyticsinsight.net/top-10-start-ups-powered-by-gpt-3-that-you-should-know-in-2023.
10. Heaven, “Generative AI is changing everything. But what’s left when the hype is gone?,” MIT Tech. Rev. (Dec. 16, 2022), https://www.technologyreview.com/2022/12/16/1065005/generative-ai-revolution-art.
11. Lexis+ AI Product Announcement (May 8, 2023), https://www.lexisnexis.com/en-us/products/lexis-plus-ai.page.
12. The author interviewed Jeff Pfeifer on May 12, 2023, concerning new Lexis products and schedules.
13. Id. For example, firms that are experimenting with using public LLMs might be running serious risks of breaching confidential client data as their prompts are ingested into public models. This issue will be addressed in more detail in subsequent articles focusing on the user side risks of generative AI.
14. “Generative AI Captures Imaginations of Lawyers, Law Students, Consumers Alike,” LexisNexis (Mar. 20, 2023), https://www.lexisnexis.com/community/pressroom/b/news/posts/generative-ai-captures-imagination-of-lawyers-law-students-consumers-alike. In a survey including 1,176 US lawyers, LexisNexis found that 51% had already used it in their work or were planning on doing so. Id.
15. A neural network, basically, is a data structure that organizes information into a series of nodes on a graph. The graph connects the nodes in different ways and with different weights. Input is provided into the first level of the network, the nodes then react to each other, and output comes out depending on those interactions. Machine learning refers to techniques by which the network is tested to see how well it performs and then modified to improve its performance in the next test.
16. For the mathematically savvy reader, a simple way to understand what machine learning is doing is to think of it as extremely complicated curve-fitting. The goal is to define the equation of a curve that best fits the training data. Techniques like gradient descent, moving along the derivative of the error function from the last guess, help this process along. For the non-savvy reader, just think of this process as similar to biological evolution, albeit streamlined, directed, and carried out entirely within the universe of the training data.
17. Hughes, “ChatGPT: Everything you need to know about open AI’s GPT-4 tool,” BBC Science Focus (May 5, 2023), https://www.sciencefocus.com/future-technology/gpt-3.
18. “Your AI Pair programmer,” Github CoPilot, https://github.com/features/copilot.
19. Barr, “GPT-4 Is a Giant Black Box and Its Training Data Remains a Mystery,” Gizmodo (Mar. 16, 2023), https://gizmodo.com/chatbot-gpt4-open-ai-ai-bing-microsoft-1850229989.
20. See Carlini et al., “Extracting Training Data from Diffusion Models,” arXiv 3 (Jan. 30, 2023), https://arxiv.org/abs/2301.13188; Baio, “Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion’s Image Generator,” Waxy (Aug. 30, 2022), https://waxy.org/2022/08/exploring-12-million-of-the-images-used-to-train-stable-diffusions-image-generator; Searchable Laion Database, https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/images.
21. Bain et al., “Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval,” arXiv (May 13, 2022), https://arxiv.org/abs/2104.00650; HowTo100M, https://www.di.ens.fr/willow/research/howto100m.
22. Tham, “Generating Music Using Deep Learning,” Towards Data Science (Aug. 25, 2021), https://towardsdatascience.com/generating-music-using-deep-learning-cb5843a9d55e.
23. LLMs do not break language into words but into “tokens,” which may be a word, words, or part of a word.
24. See Christiano et al., “Deep Reinforcement Learning from Human Preferences,” arXiv (Feb. 17, 2023), https://arxiv.org/abs/1706.03741 (describing improving models using human feedback).
25. Dilmegani, “LLM Fine Tuning Guide for Enterprises in 2023,” AI Multiple (May 23, 2023), https://research.aimultiple.com/llm-fine-tuning.
26. Note that this is not quite the same thing as producing truthful or accurate information, an issue which will be discussed more deeply in future articles focusing on the risks to end users or businesses who employ LLMs.
27. Russell, “Beyond ChatGPT: Stuart Russell on the Risks and Rewards of A.I.,” Commonwealth Club of Cal., YouTube (Apr. 4, 2023), https://www.youtube.com/watch?v=ow3XrwTmFA8; Xu, “AI Makes Decisions We Don’t Understand. That’s a Problem,” Built In (Jul. 19, 2021), https://builtin.com/artificial-intelligence/ai-right-explanation. With respect to LLMs, it is unknown at this time how much or how little of the underlying thought process behind language has been captured in the form of algorithms hidden in the model. Compare Kosinski, “Theory of Mind May Have Spontaneously Emerged in Large Language Models,” arXiv (Mar. 14, 2023), https://arxiv.org/abs/2302.02083, with Hofstadter, supra note 4, and Bender et al., “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” Ass’n for Computing Mach., conference on fairness, accountability, and transparency, N.Y.C., N.Y. (Mar. 2021). Must the model have constructed some structures similar to the intelligence that produces language in order to properly predict that language, or is it enough to be calculating simple statistics?
28. See Cotra, “Why AI Alignment Could Be Hard With Modern Deep Learning,” Effective Altruism Forum (Sept. 21, 2021), https://forum.effectivealtruism.org/posts/hCsxvMAGpkEuLCE4E/why-ai-alignment-could-be-hard-with-modern-deep-learning; Amodei et al., “Concrete Problems in AI Safety,” arXiv (Jul. 25, 2016), https://arxiv.org/pdf/1606.06565.pdf. Some abstract ways that a model can perform well in training but fail in practice are related to outer alignment problems—where the function used to train the model is encouraging behavior that is not quite the same as what we really want the model to do, and inner alignment problems—where the model spontaneously develops internal algorithms that appear to be doing what the designer wants but are actually doing something else entirely. Hubinger, “Risks from Learned Optimization in Advanced Machine Learning Systems,” arXiv (Dec 2021), https://arxiv.org/pdf/1906.01820.pdf.
29. Ji et al., “Survey of Hallucination in Natural Language Generation,” arXiv (Nov. 7, 2022), https://arxiv.org/pdf/2202.03629.pdf.
30. Anderson v. Stability AI, LTD, No. 3:23-cv-00201 (N.D.Cal. filed Jan. 13, 2023) (artists allege that the use of their copyrighted work in training without their consent constitutes infringement).
31. Doe v. Github, Inc., No. 3:22-cv-06823 (N.D.Cal. filed Nov. 3, 2022) (programmers allege the code they submitted to Github under creative commons or similar open licenses being used to train a code-generating LLM is protected).
32. Getty Images v. Stability AI, No. 1:23-cv-00135 (D.Del. filed Feb. 3, 2023) (a stock photo company alleges that training on its copyrighted photos and the searchable text descriptions that it added is infringement).
33. 17 USC §§ 101 et seq.
34. 17 USC § 102(a).
35. Id. at House Rep. No. 94-1476, https://uscode.house.gov/view.xhtml?path=/prelim@title17/chapter1&edition=prelim. See also Tandy Corp. v. Pers. Micro Comput., Inc., 524 F.Supp. 171, 173 (N.D.Cal. 1981).
36. Apple Comput., Inc. v. Franklin Comput. Corp., 714 F.2d 1240, 1249–55 (3d Cir. 1983), cert. dismissed, 464 U.S. 1033 (1984).
37. US Const. art. I, § 8, cl. 8.
38. 17 USC § 301(a).
39. See Montz v. Pilgrim Films & TV, Inc., 649 F.3d 975, 979 (9th Cir. 2011) (en banc).
40. Genius Media Grp. v. Google LLC & Lyricfind, No. 19-CV-7279, 2020 U.S. Dist. LEXIS 173196 at *13–14. (E.D.N.Y. Aug. 10, 2020). See also Briarpatch Ltd., L.P. v. Phoenix Pictures, Inc., 373 F.3d 296, 305–06 (2d Cir. 2004); Kennedy v. LaCasse, No. 17-CV-2970, 2017 WL 3098107, at *15 (S.D.N.Y. July 20, 2017).
41. 17 USC § 106.
42. 17 USC § 106(1)–(6).
43. See, e.g., Columbia Pictures Indus., Inc. v. Gary Fung, 710 F.3d 1020, 1034 (9th Cir. 2013). Cf. IMAPizza, LLC v. At Pizza, Ltd., 965 F.3d 871, 877 (D.C.Cir. 2020).
44. Rimini St. v. Oracle Int’l Corp., 473 F.Supp.3d 1158, 1203 (D.Nev. 2020) (citing 17 USC § 101)). See also MDY Indus., LLC v. Blizzard Entm’t Inc., 629 F.3d 928, 938 (9th Cir. 2010).
45. Eldred v. Ashcroft, 537 U.S. 186, 219 (2003); Harper & Row, Publishers, Inc. v. Nation Enters., 471 U.S. 539, 562 (1985).
46. Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, No. 21-869, 2023 U.S. LEXIS 2061 (May 18, 2023), 598 U.S. __, aff’g 11 F.4th 26 (2d Cir. 2021), slip op (Gorsuch, J. concurring); Feist Publ’ns, Inc. v. Rural Tel. Serv. Co., 499 U.S. 340, 361 (1991).
47. See, e.g., Complaint ¶¶ 210, 221, Anderson, No. 3:23-cv-00201 (N.D.Cal. filed Jan. 13, 2023), doc. no. 1.
48. 17 USC § 102(a).
49. 17 USC § 102(b). See also Baker v. Selden, 101 U.S. 99 (1880) (recently cited by Google LLC v. Oracle Am. Inc., 141 S.Ct. 1183, 1213 (U.S. 2021)); Green v. Luby, 177 F. 287 (Cir.Ct.S.D.N.Y. 1909); Bloom & Hamlin v. Nixon, 125 F. 977 (E.D.Pa. 1903).
50. Midler v. Ford Motor Co., 849 F.2d 460 (9th Cir. 1988), applied in Lewis v. Activision Blizzard, Inc., No. C12-1096, 2012 U.S.Dist. LEXIS 151739 (N.D.Cal. Oct. 22, 2012).
51. Bear in mind, though, that if a creator uses the unique style of an artist and attempts to fraudulently pass off the work as made by the original artist, claims other than copyright may apply, such as fraud or deceptive trade practice. Many acts that might otherwise be legal become actionable when lying is added to the mix. This idea will be developed further in the follow-up article about individual user risks and liabilities.
52. Oracle Am., 141 S.Ct. at 1198.
53. Id. at 1198–1201.
55. Id. at 1202.
57. Franklin Mint Corp. v. Nat’l Wildlife Art. Exch, Inc., 575 F.2d 62, 65 (3d Cir. 1978) (the more unique an artist’s style, the more difficult it would be to infringe on it by copying).
58. 17 USC § 107.
59. Latimer v. Roaring Toyz, Inc., 601 F.3d 1224 (11th Cir. 2010).
60. 17 USC § 107.
61. See, e.g., Authors Guild v. Google, Inc., 804 F.3d 202, 215 (2d Cir. 2015).
62. Murphy v. Millennium Radio Grp. LLC, 650 F.3d 295, 307 (3d Cir. 2011) (news reporting is not a blanket exception).
63. 17 USC § 107.
64. Authors Guild v. Google, 804 F.3d at 213.
65. Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 577 (1994) (quoting H.R. Rep. No. 94-1476, at 66 (1976) and S. Rep. No. 94-473, at 62 (1975), US Code Cong. & Admin. News 5659, 5679 (1976)).
66. Harper & Row, Publishers, 471 U.S. at 562.
67. See Infinity Broad. Corp. v. Kirkwood, 150 F.3d 104, 109 (2d Cir. 1998); Religious Tech. Ctr. v. Lerma, 897 F.Supp. 260, 264 (E.D.Va. 1995).
68. Infinity Broad. Corp., 150 F.3d at 109 (citing Campbell, 510 U.S. at 584). See also Castle Rock Ent., Inc. v. Carol Publ’g Grp., Inc., 150 F.3d 132 (2d Cir. 1998) (court will “not make too much of” commercial use because “no man but a blockhead ever wrote, except for money”).
69. Campbell, 510 U.S. at 579.
70. Warhol, No. 21-869, 2023 U.S. LEXIS 2061.
71. Id. at *2.
72. Id. at *5.
73. Id. at *28 (quoting Brief for Petitioner at 44).
74. Id. at *33, 38.
75. Id. at *26. See also id. at*6 (Gorsuch, J. concurring).
76. Campbell, 510 U.S. at 579; Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 11 F.4th 26, 37 (2d Cir. 2021), aff’d, 598 U.S. ___ (2023); Seltzer v. Green Day, Inc., 725 F.3d 1170 (9th Cir. 2013).
77. See Warhol, 11 F.4th at 40; Leibovitz v. Paramount Pictures Corp., 137 F.3d 109, 113–14 (2d Cir. 1998); Cariou v. Prince, 714 F.3d 694, 708 (2d Cir. 2013). See also Brownmark Films, LLC v. Comedy Partners, 682 F.3d 687 (7th Cir. 2012).
78. A&M Records v. Napster, Inc., 239 F.3d 1004, 1015 (9th Cir. 2001).
79. Compare Cariou, 714 F.3d at 708 with Kienitz v. Sconnie Nation, LLC, 766 F.3d 756, 758 (7th Cir. 2014).
80. Apple Inc. v. Corellium, Inc., No. 21-12835, 2023 U.S.App. LEXIS 11225 (11th Cir. May 8, 2023), not released for publication.
81. Warhol, 11 F.4th at 39–40.
82. Authors Guild v. Google, 804 F.3d 202.
83. Kasdan and Pattengale, “A Look At Future AI Questions for the US Copyright Office,” Law 360 (Nov. 10, 2022), https://www.law360.com/appellate/articles/1547912. The authors here also discuss a point that will be addressed in future articles, namely, whether and how works created using generative AI will themselves be protected by copyright.
84. Complaint ¶ 75, Anderson, No. 3:23-cv-00201 (N.D.Cal. filed Jan. 13, 2023). The Anderson plaintiffs are not alone in this suggestion, but other commentators who advance the theory make it clear they are doing so only as an analogy, not as technical truism. Chiang, “ChatGPT Is a Blurry JPEG of the Web,” New Yorker (Feb. 9, 2023), https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web.
85. See Levendowski, “How Copyright Law Can Fix Artificial Intelligence’s Implicit Bias Problem,” 93 Wash. L. Rev. 579, 624 n.221 (2018) (training AI may be different from an overt, primary use of the work).
86. Mahdi et al., “Implementing a Novel Approach an[d] Convert[ing] Auto Compression to Text Coding via Hybrid Technique,” 9 Int’l J. of Comput. Sci. 53, 53–59 (Nov. 2012), http://ijcsi.org/papers/IJCSI-9-6-3-53-59.pdf.
87. See generally Nelson and Gailly, The Data Compression Book (2d ed. Wiley 1995).
88. See generally Chaitin, MetaMath! The Quest for Omega (Pantheon 2005).
89. See, e.g., Edwards, “‘Too Easy’—Midjourney Tests Dramatic New Version of its AI Image Generator,” Ars Technica (Nov. 9, 2022), https://arstechnica.com/information-technology/2022/11/midjourney-turns-heads-with-quality-leap-in-new-ai-image-generator-version.
90. Complaint ¶ 90, Doe v. Github, No. 3:22-cv-06823 (N.D.Cal. filed Nov. 10, 2022), doc. no. 1 (citing Github Twitter post describing their internal metrics, https://twitter.com/ChrisGr93091552/status/1539731632931803137). See also Complaint ¶ 93, Anderson, No. 3:23-cv-00201 (noting that no output image is likely to match any specific image in the training data).
91. See Bühlmann, “Stable Diffusion Based Image Compression,” Towards AI (Sept. 19, 2022), https://pub.towardsai.net/stable-diffusion-based-image-compresssion-6f1f0a399202.
94. See Sohl-Dickstein et al., “Deep Unsupervised Learning Using Nonequilibrium Thermodynamics,” arXiv (Mar. 12, 2015), https://arxiv.org/abs/1503.03585 (describing the process of associating prompts with outputs).
95. See Authors Guild v. Google, 804 F.3d at 215.
96. Cariou, 714 F.3d at 710. It is also interesting to note that the Anderson plaintiff calls image-based generative AI merely a “collage” tool, but the court in Cariou reasoned that some collages were fair use and some were not depending on how much of the original was copied. Id.
97. Sag, “Copyright and Copy-Reliant Technology,” 103 NW. Univ.L.Rev. 1607, 1608 (2009). This sentiment is echoed by Midjourney, one of the defendants in the Anderson case, who contends that “any one image comprises an infinitesimal fragment of a model’s training, just as each visual (every face, sunset, painting) an artist has ever perceived and every text a writer has ever read comprises a tiny fraction of the content and imagery that inform[s] their imagination. See Defendant Midjourney, Inc.’s Notice of Motion and Motion to Dismiss [. . .] 1, Anderson, No. 3:23-cv-00201 (N.D.Cal. filed Apr. 18, 2023), doc. no. 52.
98. Authors Guild v. Google, 804 F.3d at 217.
99. Id. at 209.
101. Id. at 217.
102. Id. at 216.
103. Id. at 216–17.
104. Id. (citing Authors Guild v. HathiTrust, 755 F.3d 87 (2d Cir. 2014)).
105. Warhol, 11 F.4th at 45 (scope of fair use is narrower with respect to unpublished works). See Harper & Row Publishers, 471 U.S. at 564; Religious Tech. Ctr. v. Lerma, 897 F.Supp. 260, 264 (E.D.Va. 1995).
106. See, e.g., Peter Letterese & Assocs. v. World Inst. of Scientology Enters., 533 F.3d 1287, 1312 (11th Cir. 2008). See also Worldwide Church of God v. Phila. Church of God, Inc., 227 F.3d 1110, 1118 (9th Cir. 2000).
107. Kienitz, 766 F.3d at 759. In this case, the defendant modified a low-resolution photo of the mayor of Madison, Wisconsin and printed it on a t-shirt to protest the mayor’s attempt to shut down a yearly block party. Judge Easterbrook explained that “defendants started with a low-resolution version posted on the City’s website, so much of the original’s detail never had a chance to reach the copy; the original’s background is gone; its colors and shading are gone; the expression in [the mayor’s] eyes can no longer be read; after the posterization (and reproduction by silk-screening), the effect of the lighting in the original is almost extinguished. What is left, besides a hint of [his] smile, is the outline of his face, which can’t be copyrighted.” Id.
108. TCA TV Corp. v. McCollum, 839 F.3d 168, 182 (2d Cir. 2016).
109. Campbell, 510 U.S. at 587. See also Harper & Row Publishers, 471 U.S. at 564–66, 568.
110. This point has been raised by the defendants in Anderson in dispositive motions, so court guidance may be available sooner than on other issues in that case. See Defendant Midjourney, Inc.’s Notice of Motion and Motion to Dismiss [. . .], Anderson, No. 3:23-cv-00201.
111. See Doe v. Github, No. 3:22-cv-06823, Anderson, No. 3:23-cv-00201, supra note 90.
112. Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146, 1164–65 (9th Cir. 2007); Kelley v. Arriba Soft Corp., 336 F.3d 811, 820–21 (9th Cir. 2003). The recent Warhol decision seems to agree with these cases, noting that if the final commercial or other use is transformative and the copying is done “only insofar as needed” to achieve the new purpose, it may not be infringing. Warhol, No. 21-869, 2023 U.S. LEXIS 2061, at *42 n. 18.
113. Perfect 10, Inc., 508 F.3d 1146 at 1164–65.
114. Sega Enters. v. Accolade, Inc., 977 F.2d 1510, 1522 (9th Cir. 1992).
115. Sony Computer Ent., Inc. v. Cionnectix Corp., 203 F.3d 596, 602–03 (9th Cir. 2000).
116. See id. See also Atari Games v. Nintendo of Am., Inc., 975 F.2d 832, 843 (Fed.Cir. 1992). Note however, that contracts or license agreements can still independently prohibit reverse engineering even though it is otherwise fair use. Bowers v. Bayside Techs., Inc., 320 F.3d 1317, 1325 (Fed.Cir. 2003).
117. This analogy seems even more accurate in light of some studies in which generative AI was trained to produce images not based on text prompts, but on MRI information from brains of human beings thinking about images. See Parshall, “AI Can Re-create What You See From a Brain Scan,” Sci. Am. (Mar. 17, 2023), https://www.scientificamerican.com/article/ai-can-re-create-what-you-see-from-a-brain-scan.
118. Harper & Row Publishers, 471 U.S. at 566 (citation omitted). See also, e.g., Fox News Network, LLC v. TVEyes, Inc., 883 F.3d 169, 179 (2d Cir. 2018).
119. See, e.g., Campbell, 510 U.S. at 590–92; Ringgold v. Black Entm’t TV, Inc., 126 F.3d 70, 81 (2d Cir. 1997); Castle Rock Ent., Inc. v. Carol Publ’g Grp., Inc., 150 F.3d 132, 145 (2d Cir. 1998).
120. See Campbell, 510 U.S. at 591.
121. Sunray, “Train in Vain: A Theoretical Assessment of Intermediate Copying and Fair Use in Machine AI Music Generator Training,” 13 Am.U.Intell.Prop. Brief 1 (Dec. 2021).
122. Al-Sibai, “Online Tutoring Company Stock Crashes as ChatGPT Steamrolls Its Business,” Yahoo News (May 3, 2023), https://news.yahoo.com/online-tutoring-company-stock-crashes-150308879.html.
123. De Cremer et al., “How Generative AI Could Disrupt Creative Work,” Harv. Bus. Rev. (Apr. 13, 2023), https://hbr.org/2023/04/how-generative-ai-could-disrupt-creative-work.
124. Sadeghi and Arvanitis, “Rise of the Newsbots: AI-Generated News Websites Proliferating Online,” NewsGuard (May 1, 2023), https://www.newsguardtech.com/special-reports/newsbots-ai-generated-news-websites-proliferating.
125. Pennington, “Why Is Generative AI Considered an Economic Gamechanger?” NAB Amplify (Nov. 29, 2022), https://amplify.nabshow.com/articles/ic-how-is-generative-ai-a-gamechanger-for-creatives.
126. Brynjolfsson and Raymond, “Generative AI At Work,” Nat’l Bureau of Econ. Rsch.(Apr. 2023), https://www.nber.org/papers/w31161.
127. Noy and Zhang, “Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence” (working paper, MIT, Mar. 2, 2023), https://economics.mit.edu/sites/default/files/inline-files/Noy_Zhang_1.pdf.
128. See De Cremer, supra note 123.
129. Acovino and Abdullah, “Sci-Fi Magazine Stops Submissions After Flood of AI Generated Stories,” NPR (Feb. 23, 2023), https://www.npr.org/2023/02/23/1159118948/sci-fi-magazine-stops-submissions-after-flood-of-ai-generated-stories.
130. Ivanenko, “AI-Generated Fiction is Flooding Literary Magazines,” Mezha (Feb. 28, 2023), https://mezha.media/en/2023/02/28/ai-generated-fiction-is-flooding-literary-magazines; Oremus, “He Wrote a Book on a Rare Subject. Then a ChatGPT Replica Appeared on Amazon,” Wash. Post (May 5, 2023), https://www.washingtonpost.com/technology/2023/05/05/ai-spam-websites-books-chatgpt.
131. Johnson, “Spotify Removes ‘Tens of Thousands’ of AI-Generated Songs: Here’s Why,” Forbes (May 9, 2023), https://www.forbes.com/sites/ariannajohnson/2023/05/09/spotify-removes-tens-of-thousands-of-ai-generated-songs-heres-why/?sh=59a0afb94f4a. Johnson reports that Spotify removed 7% of songs created with AI and this alone represented tens of thousands of works.
132. “Google’s Search Guidance About AI-Generated Content,” Google Search Central Blog (Feb. 8, 2023), https://developers.google.com/search/blog/2023/02/google-search-and-ai-content.
133. Campbell, 510 U.S. at 591.
134. See Abbott Labs. v. Brennan, 952 F.2d 1346, 1355 (Fed.Cir. 1991) (there is no “absolute presumption of market power for copyright . . . product”).
135. Acemoglu and Restrepo, “Automation and New Tasks: How Technology Displaces and Reinstates Labor,” IZA Inst. of Lab. Econ. 3 (Apr. 2019), https://docs.iza.org/dp12293.pdf.
136. Campbell, 510 U.S. at 591.
138. Bill Graham Archives v. Dorling Kindersley, Ltd., 448 F.3d 605, 614–15 (2d Cir. 2006).
139. Complaint ¶ 4, Getty Images, No. 1:23-cv-00135 (D.Del. filed Feb. 3, 2023).
140. Id. ¶¶ 18–22.
141. Id. ¶ 49.
142. 15 USC § 1114(1). Future articles will discuss in more detail the potential for fraud in connection with generative AI.
143. 15 USC § 1127; Dastar Corp. v. Twentieth Century Fox Film Corp., 539 U.S. 23, 28 (2003). Some commentators suggest that courts are increasingly willing to allow trademark and copyright to protect the same thing, however. Calboli, “Overlapping Copyright and Trademark Protection: A Call for Concern and Action,” 2014 Ill. L. Rev. Slip Ops. 25, 26 (2014).
144. EMI Catalogue P’ship v. Hill, Colliday, Connors, Cosmopulos, Inc., 228 F.3d 56, 63 (2nd Cir. 2000), amended opinion reported at 2000 U.S.App.LEXIS 30761, at *15.
145. Romag Fasteners, Inc. v. Fossil Grp., Inc., 140 S.Ct. 1492 (2020).
146. Motus, LLC v. Cardata Consultants, Inc., 23 F.4th 115 (1st Cir. 2022).
147. 1-800 Contacts, Inc. v. Lens.Com, Inc., 722 F.3d 1229, 1243 (10th Cir. 2013).
148. Complaint ¶ 31, Getty Images, No. 1:23-cv-00135. Trademark is, of course, another form of protected intellectual property. 17 USC §§ 1201 et seq.
149. Complaint ¶ 52, Getty Images, No. 1:23-cv-00135.
150. See Datstar Corp., 539 U.S. at 35 (rejecting overly broad interpretation of “origin” under Lanham Act to include substance of a copyrighted work).
151. 15 USC § 1114(1). See also Two Pesos, Inc. v. Taco Cabana, Inc., 505 U.S. 763, 769 (1992).
152. 1-800 Contacts, Inc., 722 F.2d at 1244.
153. Dastar, 539 U.S. 23.
154. 17 USC §§ 1201 et seq.
155. 17 USC § 1202.
156. 17 USC § 1202(b); Stevens v. CoreLogic, Inc. 899 F.3d 666, 673 (9th Cir. 2018).
157. 17 USC § 1201(a)(1)(A); Dish Network, L.L.C. v. SatFTA, No. 5:08-cv-01561, 2011 U.S.Dist. LEXIS 25038 (N.D.Cal. Mar. 9, 2011).
158. Complaint ¶ 58, Getty Images, No. 1:23-cv-00135.
159. Twentieth Century Music Corp. v. Aiken, 422 U.S. 151, 156 (1975), superseded by statute as stated in Crabshaw Music v. K-Bob’s of El Paso, Inc., 744 F.Supp. 763, 766 (W.D.Tex. 1990).
160. Individuals can download custom LLM models trained using ChatGPT or the Stable Diffusion model. See, e.g., Sha, “How to Run a ChatGTP-Like LLM On Your PC Offline,” Beebom (Mar. 29, 2023), https://beebom.com/how-run-chatgpt-like-language-model-pc-offline; Pocock, “How to Install Stable Diffusion On Windows,” PC Guide (May 10, 2023), https://www.pcguide.com/apps/how-to/how-to-install-stable-diffusion-on-windows.
161. Wilson, “Midjourney Statistics: Users, Polls & Growth,” Approachable AI (Apr. 28, 2023), https://approachableai.com/midjourney-statistics.
162. Heidorn, “Mind-Boggling Midjourney Statistics In 2023,” Tokenized (May 10, 2023), https://tokenizedhq.com/midjourney-statistics.
163. Dorrier, “OpenAI Says DALL-E Is Generating Over 2 Million Images a Day—and That’s Just Table Stakes,” Singularity Hub (Oct. 3, 2022), https://singularityhub.com/2022/10/03/openai-says-dall-e-is-generating-over-2-million-images-a-day-and-thats-just-table-stakes.
164. Getty Images has a catalog of about 80 million images, and Shutterstock, a rival of Getty Images, has about 415 million. Id.
165. Schuhmann, “Laion-400-Million Open Dataset,” LAION (Aug. 20, 2021), https://laion.ai/blog/laion-400-open-dataset.
166. Ruby, “57+ ChatGPT Statistics 2023,” DemandSage (May 18, 2023), https://www.demandsage.com/chatgpt-statistics.
167. Congress has done this before, adopting the Digital Millennium Copyright Act to impose new guidelines and protections for creators that became important due to the forward march of technology. 17 USC § 1201. Colorado Senator Michael Bennet recently announced a plan to introduce legislation seeking to establish a task force to investigate whether the federal government’s AI tools and policies respect civil rights, civil liberties, privacy, and due process.” Bennet, “Bennet Introduces Legislation to Stand Up An AI Task Force to Ensure Responsible Use of The Technology By the Federal Government,” https://www.bennet.senate.gov/public/index.cfm/2023/4/bennet-introduces-legislation-to-stand-up-an-ai-task-force-to-ensure-responsible-use-of-the-technology-by-the-federal-government; ASSESS AI Act, S. 1356, 118th Cong. (2023), https://www.bennet.senate.gov/public/_cache/files/5/3/5331567f-e0a0-4fe5-8ddc-58878b2780c2/2A233A46E0482BF9807A25FB0B4B8B65.assess-ai-text.pdf. Something similar could well be done to handle the intellectual property challenges posed by generative AI.
168. Kirtsaeng v. John Wiley & Sons, Inc., 579 U.S. 197, 204 (2016) (citing US Const., art. I, § 8, cl. 8 (“To promote the progress of science and the useful arts . . . .”)).
169. Kirtsaeng (citing Fogerty v. Fantasy, Inc., 510 U.S. 517, 526 (1994)).
170. Salkowitz, “Midjourney Founder David Holz On The Impact of AI On Art, Imagination and the Creative Economy,” Forbes (Sept. 16, 2022), https://www.forbes.com/sites/robsalkowitz/2022/09/16/midjourney-founder-david-holz-on-the-impact-of-ai-on-art-imagination-and-the-creative-economy/?sh=4e0d67352d2b.
171. Rosalsky, supra note 9.
172. AT&T Corp. v. City of Portland, 216 F.3d 871, 876 (9th Cir. 2000), overruled in part by Nat’l Cable & Telecomms. Ass’n v. Brand X Internet Servs., 545 U.S. 967 (2005).
173. Id. Heraclitus, a Greek philosopher, lived in Ephesus of Asia Minor (present-day Turkey) in the 6th century BCE. One of Heraclitus’ most famous sayings is that “no one ever steps in the same river twice.” See Chaliakopoulos, “Heraclitus of Ephesus: The Philosopher of Change (Bio & Quotes),” The Collector (Mar. 5, 2023), https://www.thecollector.com/greek-philosopher-heraclitus-ephesus-quotes.