As someone who isn't invested in either side, from what I can tell at least some portion of the pro-side are of the opinion that it doesn't count as theft- that the learning algorithm is just doing what humans do when they study art to learn, or that it's not copying the data in a way that is plagiarism or theft to begin with, or something. I haven't the faintest idea whether that's actually accurate or not- I do not have a very good understanding (or really any) of how the tech works- but at the very least some of arguments for AI seem to be coming from a place of 'this isn't actually wrong', not 'we don't care if it's wrong'.
I suppose I can address this.
There's three distinct questions. Does it count as theft (legally)? Does it count as theft (ethically)? And just what does the AI actually do, anyway?
Let's start with the third. As I previously said, any claim that AI works like any specific thing other than itself is false. As you might guess, it does
not learn the way humans do... but it also isn't clip-art, or any other thing you can put in one sentence.
Simplifying, the procedure goes somewhat like this:
- First, you need a huge library of pictures. It doesn't necessarily need to be labelled, but you do need at a minimum millions. Tens of millions, if you want a result close to the state of the art.
- You train the AI. The way this works is:
- Go through the pictures one by one.
- For each picture, add a small amount of noise to it.
- Ask the AI to remove the noise. It's set up to do something random, so it will fail.
- Treat the AI as a linear algebra equation. Using complicated calculus, compute the derivative. Use that to very slightly adjust its weights, such that if you feed it the same noised picture again you'll get something very slightly closer to the original.
- This is around ten bits of adjustment for each input picture...
- Repeat with varying levels of noise, up to and including "The entire picture is gone."
- Repeat until you've gone through every picture.
- Find a dataset of labelled pictures, which combine the picture with an english language description.
- Repeat the whole procedure, this time using the labels as guidance to help the AI learn to generate images based on descriptions.
Where are the pictures, once this is done? ...well, nowhere. AI can be described as compression, but what this is doing isn't really a compression algorithm except by accident. Most of the pretraining ends up teaching the AI how gravity, light, textures and so on work. The labelled fine-tuning is done on what's often a smaller part of the network, and sometimes with a lower learning rate, and isn't supposed to add any capabilities besides the language association. (It does anyway, because we're not great at this. But it doesn't need to, for the process to work.)
- Can you get the training data back out from the AI, by giving it the right prompt?
Sometimes. For a few pictures that were repeated often enough in the training data, the tiny amount of per-image adjustments adds up across the repetitions until the AI learns to generate
that specific picture. The smaller your training set is (relative to the AI), the more likely that is to happen.
- Do you need any actual art to do this?
Well, not really. Most of the training process is aimed at teaching the AI basic concepts about physics and reality, not styles or anything. A television stream would do fine, so long as you're okay with something that's only capable of photorealistic outputs. Though by the same token, 'style' is the only thing it really learns -- if you include tons of art in the unlabelled training data (though, um, I haven't checked if stable diffusion actually used that stage), then the only thing it'll learn from the art is
how to draw, in general.
- Does this match how humans learn?
Soooooort of. To the degree it does, it matches how newborn babies learn. Tons of unlabelled or poorly labelled training data -- because sure, we talk to babies, but they don't understand us -- which is used for the first stage pretraining. The second stage, captioning (that is, conditional generation) doesn't need nearly as much input, but still needs quite a bit; and if that matches any stage of human maturation at all, it'd be toddlers.
Once you get to Dreambooth, LoRAs, textual inversion and such, you're at a stage where huge quantities of pictures are no longer needed. Ten, twenty pictures are enough to make a decent character LoRA. Though because the AI itself is missing an LLM module (and a long-term memory or reference image input, no, img2img doesn't count), it isn't really capable of consistently creating the same characters, or listening when you tell it to fix the hands.
Actual human artists are typically teenagers or adults, which are
so far past this stage of learning. It's hard to say how far away we are from AGI, but... this isn't it.
- Does this match clipart? Collages? Anything I have experience with?
Uh, no. No, the only thing it really matches is "A giant pile of linear algebra". Aaaaand possibly newborns, but really, it'd be newborn... mice, maybe? Human babies have enormously larger brains, even if you buy the idea that these are comparable. (There's a bunch of ways in which they're not.)
- Is it
legal?
Yes, probably. Court cases are in progress, but style has never been copyrightable, and the created AI models do not in fact contain copies. Arguably the ability to output exact copies of training data does count as copyright infringement- but since you have to try very hard in order to do so, chances are the court will decide that's on the person doing it, not the person making the AI.
The ability for LoRAs et al. (or NovelAI-diffusion...) to create highly recognisable pictures of copyrighted anime characters is more concerning, but falls in the same bucket as fan-art in general. This is definitely not the right site to advocate for making that illegal, to the degree it isn't already.
- Is it
ethical?
For
what purpose? I'm generally against making a
technology illegal. There are (and will be) huge issues with corporations abusing it, which was predictable centuries in advancer, but- nobody in this thread is talking to a corporation. This isn't a simple question, and discussions like that are never helped by letting emotions run high. But also-
- Is it stoppable?
¯\_(ツ)_/¯
If I knew how to do that, I would have. I'm more than willing to give up my ability to illustrate stories, if it means we stop and think about this. I'm less willing to do so if doing so is the only effect of having done so.