Proposing a ban on AI-generated content

Baughn · Nov 29, 2023

Vyslanté said:
Speaking of which, a paper just came out, showing that you can extract the training data (including real life emails and contact data scrapped from the web!) from ChatGPT by simply talking to it.

Yep. Though they estimate it's possible to extract about a gigabyte of data. Which isn't great, but it's about one in hundred thousand of amount it was trained on.

GPT is too large relative to its training set, strictly speaking. For optimum training you'd want a smaller AI and larger training set- but GPT is what let us figure that out in the first place, so I'll give it a pass.

(For stable diffusion, it stores about two bytes of information per picture in its training set. Extracting input pictures is nearly impossible; you only get the ones that were repeated thousands of times, if at all.)

Somic said:
On the one hand, I agree, shouting matches aren't really productive. On the other hand, it's disheartening how this always starts with hate from the anti-AI side. I really wanna go to sleep one day and wake up when the hate train has exhausted its fuel, artists can go back to art corners, and AI enthusiasts can go back to AI appreciation corners without being harrassed.

You'll be waiting a long time. I've been doxxed, compared to a nazi and told to go kill myself. I started off the conversation wanting to figure out some solution that would help artists, but that was months ago; these days I mostly want them to go away.

Rgal · Nov 29, 2023

Vyslanté said:
Speaking of which, a paper just came out, showing that you can extract the training data (including real life emails and contact data scrapped from the web!) from ChatGPT by simply talking to it.

Oh my. This paper is a treasure trove of comedy.

How do we know it's training data?

How do we know this is actually recovering training data and not just making up text that looks plausible? Well one thing you can do is just search for it online using Google or something. But that would be slow. (And actually, in prior work, we did exactly this.) It's also error prone and very rote.

Instead, what we do is download a bunch of internet data (roughly 10 terabytes worth) and then build an efficient index on top of it using a suffix array (code here). And then we can intersect all the data we generate from ChatGPT with the data that already existed on the internet prior to ChatGPT's creation. Any long sequence of text that matches our datasets is almost surely memorized.

"One way to do that would be using a google. We tried it before, but it was slow and inaccurate. So instead we made our own google."

Vyslanté · Nov 29, 2023

Baughn said:
(For stable diffusion, it stores about two bytes of information per picture in its training set. Extracting input pictures is nearly impossible; you only get the ones that were repeated thousands of times, if at all.)

The same article does mention that they started by reproducing images from the dataset of Stable Diffusion...

Rgal said:
"One way to do that would be using a google. We tried it before, but it was slow and inaccurate. So instead we made our own google."

To be fair, they do work for google in the first place

Baughn · Nov 29, 2023

Vyslanté said:
The same article does mention that they started by reproducing images from the dataset of Stable Diffusion...

Yes, I remember that one. The pictures in question had been accidentally repeated several thousand times, because they were slight modifications of the same photographs as well as public domain art. In general it isn't possible.

Fey'lya · Nov 29, 2023

Nyvis said:
I'm pretty sure I remember SV trying to have a rule about tagging source for art, do you remember where that went? Because this seems adjacent, at least in the case of AI images specifically.

I remember that too - I think it fell by the wayside because people post images by reflex a lot and kept walking into that rake? Also something something the original source isn't always findable, especially with memes and older things.

Nyvis · Nov 29, 2023

Fey'lya said:
I remember that too - I think it fell by the wayside because people post images by reflex a lot and kept walking into that rake? Also something something the original source isn't always findable, especially with memes and older things.

Yeah, I'm just saying SV isn't immune to making unenforceable rules when it comes to attributing art

I'm totally open to the conclusion being "that policy failed and that's why the AI attribution policy would also fail".

Rat King · Nov 29, 2023

Somic said:
On the one hand, I agree, shouting matches aren't really productive. On the other hand, it's disheartening how this always starts with hate from the anti-AI side. I really wanna go to sleep one day and wake up when the hate train has exhausted its fuel, artists can go back to art corners, and AI enthusiasts can go back to AI appreciation corners without being harrassed.

Unfortunately, the AI stuff insists on shoving it's way into artistic corners and stealing their labor and work, so I doubt it will go away anytime soon.

DoobleDeeDooble · Nov 29, 2023

I dunno, I know it's a necessary consequence given human nature, but I still feel like "The grievance continues so that is why harassment will continue" is an off response. The debate sure won't stop because the actions aggrieving one side continue, but theoretically the debate could happen without that level of hatred and vile actions? I don't think blaming that stuff on the actual subject is really correct or productive; it's just the way humans get about things, and the way the internet culture encourages some really fucked up actions when one feels upset and has a sense of moral righteousness.

I at least read the quoted post as more "I wish people would stop harassing people over this, or invading spaces for art or for AI and starting up debates" rather than wanting the debate to die entirely, even in debate spaces, and for everyone to accept the status quo (and thus tacitly accept that AI art is okay to make). Which might be why that response feels off to me? Maybe also just that it felt like a wish for a bad thing to stop and not really looking for debate to continue.

That said I have definitely seen conversations and debates get turned into spitefests with the hate originally coming from pro-AI sides. I feel like the tendency of what kicks it off is a lot more about the spaces one is in than an overall feature of the argument. And admittedly I sorta glazed over looking at it to get to the post where I got quoted, but I think the fun little digression that happened here got worsened by people just sorta summoning the idea of spite being at play where I hadn't really felt it was, which was... something.

Vyslanté · Nov 29, 2023

DoobleDeeDooble said:
I dunno, I know it's a necessary consequence given human nature, but I still feel like "The grievance continues so that is why harassment will continue" is an off response.

"please don't bring that here" in response to people bringing it here is not harassement.

DoobleDeeDooble · Nov 29, 2023

Vyslanté said:
"please don't bring that here" in response to people bringing it here is not harassement.

...I'm sorry, I was assuming the quote "without being harassed" meant, uh, without being harassed. And so was speaking in general, and to mentions of harassment having happened (although looking back Baughn's mention of that came after Somic's comment), rather than to what you think it is I was speaking to there. I am not sure what I have accidentally come off as saying, but if it seemed I was accusing Rat King or anyone else here of harassment or encouraging or endorsing harassment, that was not at all my intention and I'm sorry it came off so wrong!

Unless I really glossed over something, I don't think any harassment has happened here in this thread.

Edit: Like, for additional clarity, to vastly simplify things, I read Somic's comment as "I wish the harassment bound up in this argument would stop" and then Rat King's comment as "It won't stop because of [thing that drives the argument]" where, since that thing will drive the argument, and the argument unfortunately drives the harassment, is accurate but struck me as speaking past, somewhat. It is possible I have misread some part of this, and I had tried to be clear about my interpretations and where they could be off, but I suppose not clear enough, oops.

(Edit again because for some reason I just wrote the complete wrong name instead of Somic.)

Rat King · Nov 29, 2023

DoobleDeeDooble said:
I dunno, I know it's a necessary consequence given human nature, but I still feel like "The grievance continues so that is why harassment will continue" is an off response. The debate sure won't stop because the actions aggrieving one side continue, but theoretically the debate could happen without that level of hatred and vile actions? I don't think blaming that stuff on the actual subject is really correct or productive; it's just the way humans get about things, and the way the internet culture encourages some really fucked up actions when one feels upset and has a sense of moral righteousness.

I at least read the quoted post as more "I wish people would stop harassing people over this, or invading spaces for art or for AI and starting up debates" rather than wanting the debate to die entirely, even in debate spaces, and for everyone to accept the status quo (and thus tacitly accept that AI art is okay to make). Which might be why that response feels off to me? Maybe also just that it felt like a wish for a bad thing to stop and not really looking for debate to continue.

That said I have definitely seen conversations and debates get turned into spitefests with the hate originally coming from pro-AI sides. I feel like the tendency of what kicks it off is a lot more about the spaces one is in than an overall feature of the argument. And admittedly I sorta glazed over looking at it to get to the post where I got quoted, but I think the fun little digression that happened here got worsened by people just sorta summoning the idea of spite being at play where I hadn't really felt it was, which was... something.

Speaking from my own experience, the spite tends to come from a spot where after it's been expressed that hey, this really sucks, the response is generally to shrug or to tell us that we just have to accept it and get good.

Like the pro-AI side, in my experience, actively does not care about concerns around stolen labor and work or the threat it presents to a lot of people and at best are just happy they have a cool new toy to play with.

Baughn · Nov 29, 2023

Rat King said:
Speaking from my own experience, the spite tends to come from a spot where after it's been expressed that hey, this really sucks, the response is generally to shrug or to tell us that we just have to accept it and get good.

Like the pro-AI side, in my experience, actively does not care about concerns around stolen labor and work or the threat it presents to a lot of people and at best are just happy they have a cool new toy to play with.

Is there a 'pro-AI side'? Is that a monolithic bloc you can engage with?

I spent multiple years playing with AI art, talking to artists about it, trying to think of problems and possible solutions, all well before stable diffusion existed. You can, if you like, go look at my history on this very site for examples of that. "This might put artists out of work" is a talking point I first used mostly in jest, then increasingly nervously as the quality started approaching, then occasionally surpassing, what said artists could do.

I couldn't possibly stop the development. I mean, c'mon; nobody can. Economy (and large companies, which are by construction sociopathic) will do what they will. Moloch always does his thing.

But sure, the cool new toy was cool. It let me illustrate things I'd never have illustrated before. Occasionally it drove me to commision art, which became easier over time, since I could tell them "Like this, this and this. Except with decent hands. And coherent backgrounds. And whatever." That worked until shortly after stable diffusion was released, when said artists started shouting at me and calling me names if I dared to suggest they look at my examples.

Most of the information about how the systems work, how the legalities work, et cetera, is misinformation. From both sides. Latent diffusion models work nothing like human brains -- but also work nothing like clip-art. They work nothing like <insert description here>, in fact, unless you give up and accept you need to learn about how the actual LDM architecture works; there's never been anything like it before. I've provided several explanations of that as well, though people tend not to listen.

From my perspective I've spent years engaging in good-faith discussion and explanations, trying to make decent use of the technology only in ways I feel people should agree with-

And ended up being abused in just about every possible way, up to and including being literally told to kill myself.

I'm burned out. I know perfectly well that not every artist is like that, because I know several personally, including my nephew, but I'm well past the point of engaging with them as a group. And sure, there's an element of schadenfreude when all the bad consequences I predicted five years ago inevitably come to pass, because in many cases it's the same people who dismissed me back then who are up in arms now.

But these attitudes are why I stopped working as an operator on SV. I just refuse to give any assistance to anyone who'd call themselves an artist, anymore. If that's all I know about them, then I can't stop myself from disliking them.

Maybe ask yourself why that would be?

Arthur Frayn · Nov 29, 2023

For the love of god, can we cool it with the "X Side is full of hate and makes me feel like a piece of shit just for existing" drive by posting, please?

Like I'm very sorry if you actually have been factually harassed, that's unacceptable, but swinging into a thread and implicitly claiming that everyone who disagrees with you is driven by nothing but hate some major poisoning the well bullshit. The pro and anti-AI art "sides" aren't monolithic blocks that are foot soldiers in a larger war for the nature of art itself, they're literally a handful of posters (most of whom have already posted in this thread!) who are just using this shit as another proxy in the never ending war of "Someone I Don't Like Is Saying Something."

Like seriously, can we cut this performative martyrdom shit out? It does nothing for the discussion and is like an instant toss of gasoline onto the fire.

DoobleDeeDooble · Nov 29, 2023

I feel I have just inadvertently built a springboard for the exact kind of discussion I didn't want, while also coming off super wrong to at least one person; so I am going to bow out of here before my foot hits my esophagus, sorry about the mess.

NSMS · Nov 29, 2023

Rat King said:
Speaking from my own experience, the spite tends to come from a spot where after it's been expressed that hey, this really sucks, the response is generally to shrug or to tell us that we just have to accept it and get good.

Like the pro-AI side, in my experience, actively does not care about concerns around stolen labor and work or the threat it presents to a lot of people and at best are just happy they have a cool new toy to play with.

As someone who isn't invested in either side, from what I can tell at least some portion of the pro-side are of the opinion that it doesn't count as theft- that the learning algorithm is just doing what humans do when they study art to learn, or that it's not copying the data in a way that is plagiarism or theft to begin with, or something. I haven't the faintest idea whether that's actually accurate or not- I do not have a very good understanding (or really any) of how the tech works- but at the very least some of arguments for AI seem to be coming from a place of 'this isn't actually wrong', not 'we don't care if it's wrong'.

As for losing jobs... well, there it becomes a whole thorny issue of technological progress and benefits to the masses and how it can benefit artists as well as hurt them, versus it causing actual measurable harm to real people and whether it damages the integrity of art as a whole and so on. I am not even remotely qualified to dig into the ethics of that topic, and I don't think the data even exists to do a good job of it yet anyway, but I'd assume at least some of the pro-side are of the mindset 'on balance, this does more good than harm' rather than 'I don't care who this hurts'.

I guess what I'm saying is... I suspect that people's opinions and feelings are lot more complex, varied, and nuanced than just 'I don't care who gets hurt, I want it'. Which is what you seem to be saying you think is the attitude of the majority of pro-AI people, which honestly makes me uncomfortable.

Baughn · Nov 29, 2023

NSMS said:
As someone who isn't invested in either side, from what I can tell at least some portion of the pro-side are of the opinion that it doesn't count as theft- that the learning algorithm is just doing what humans do when they study art to learn, or that it's not copying the data in a way that is plagiarism or theft to begin with, or something. I haven't the faintest idea whether that's actually accurate or not- I do not have a very good understanding (or really any) of how the tech works- but at the very least some of arguments for AI seem to be coming from a place of 'this isn't actually wrong', not 'we don't care if it's wrong'.

I suppose I can address this.

There's three distinct questions. Does it count as theft (legally)? Does it count as theft (ethically)? And just what does the AI actually do, anyway?

Let's start with the third. As I previously said, any claim that AI works like any specific thing other than itself is false. As you might guess, it does not learn the way humans do... but it also isn't clip-art, or any other thing you can put in one sentence.

Simplifying, the procedure goes somewhat like this:

First, you need a huge library of pictures. It doesn't necessarily need to be labelled, but you do need at a minimum millions. Tens of millions, if you want a result close to the state of the art.
You train the AI. The way this works is:
1. Go through the pictures one by one.
2. For each picture, add a small amount of noise to it.
3. Ask the AI to remove the noise. It's set up to do something random, so it will fail.
4. Treat the AI as a linear algebra equation. Using complicated calculus, compute the derivative. Use that to very slightly adjust its weights, such that if you feed it the same noised picture again you'll get something very slightly closer to the original.
5. This is around ten bits of adjustment for each input picture...
6. Repeat with varying levels of noise, up to and including "The entire picture is gone."
7. Repeat until you've gone through every picture.
Find a dataset of labelled pictures, which combine the picture with an english language description.
Repeat the whole procedure, this time using the labels as guidance to help the AI learn to generate images based on descriptions.

Where are the pictures, once this is done? ...well, nowhere. AI can be described as compression, but what this is doing isn't really a compression algorithm except by accident. Most of the pretraining ends up teaching the AI how gravity, light, textures and so on work. The labelled fine-tuning is done on what's often a smaller part of the network, and sometimes with a lower learning rate, and isn't supposed to add any capabilities besides the language association. (It does anyway, because we're not great at this. But it doesn't need to, for the process to work.)

- Can you get the training data back out from the AI, by giving it the right prompt?

Sometimes. For a few pictures that were repeated often enough in the training data, the tiny amount of per-image adjustments adds up across the repetitions until the AI learns to generate that specific picture. The smaller your training set is (relative to the AI), the more likely that is to happen.

- Do you need any actual art to do this?

Well, not really. Most of the training process is aimed at teaching the AI basic concepts about physics and reality, not styles or anything. A television stream would do fine, so long as you're okay with something that's only capable of photorealistic outputs. Though by the same token, 'style' is the only thing it really learns -- if you include tons of art in the unlabelled training data (though, um, I haven't checked if stable diffusion actually used that stage), then the only thing it'll learn from the art is how to draw, in general.

- Does this match how humans learn?

Soooooort of. To the degree it does, it matches how newborn babies learn. Tons of unlabelled or poorly labelled training data -- because sure, we talk to babies, but they don't understand us -- which is used for the first stage pretraining. The second stage, captioning (that is, conditional generation) doesn't need nearly as much input, but still needs quite a bit; and if that matches any stage of human maturation at all, it'd be toddlers.

Once you get to Dreambooth, LoRAs, textual inversion and such, you're at a stage where huge quantities of pictures are no longer needed. Ten, twenty pictures are enough to make a decent character LoRA. Though because the AI itself is missing an LLM module (and a long-term memory or reference image input, no, img2img doesn't count), it isn't really capable of consistently creating the same characters, or listening when you tell it to fix the hands.

Actual human artists are typically teenagers or adults, which are so far past this stage of learning. It's hard to say how far away we are from AGI, but... this isn't it.

- Does this match clipart? Collages? Anything I have experience with?

Uh, no. No, the only thing it really matches is "A giant pile of linear algebra". Aaaaand possibly newborns, but really, it'd be newborn... mice, maybe? Human babies have enormously larger brains, even if you buy the idea that these are comparable. (There's a bunch of ways in which they're not.)

- Is it legal?

Yes, probably. Court cases are in progress, but style has never been copyrightable, and the created AI models do not in fact contain copies. Arguably the ability to output exact copies of training data does count as copyright infringement- but since you have to try very hard in order to do so, chances are the court will decide that's on the person doing it, not the person making the AI.

The ability for LoRAs et al. (or NovelAI-diffusion...) to create highly recognisable pictures of copyrighted anime characters is more concerning, but falls in the same bucket as fan-art in general. This is definitely not the right site to advocate for making that illegal, to the degree it isn't already.

- Is it ethical?

For what purpose? I'm generally against making a technology illegal. There are (and will be) huge issues with corporations abusing it, which was predictable centuries in advancer, but- nobody in this thread is talking to a corporation. This isn't a simple question, and discussions like that are never helped by letting emotions run high. But also-

- Is it stoppable?

¯\_(ツ)_/¯

If I knew how to do that, I would have. I'm more than willing to give up my ability to illustrate stories, if it means we stop and think about this. I'm less willing to do so if doing so is the only effect of having done so.

Talia B · Nov 29, 2023

It's a real shame that the new brand of art generation algorithms (they aren't AI and I refuse to carry that marketing flourish forward for them) are a blatant end-run around paying artists for their labor, because I was a huge fan of the output of things like Deep Dream that created unearthly and utterly, delightfully inhuman images. Those have a real place as brushes in an illustrator's toolbox.

But there are algorithmically generated art programs out there now that are advertised as a way to insert the name of an artist and receive bespoke artwork in their style. Programs capable of that are, very plainly to me, plagiarism.

Frankly, I'm coming at the problem as a burned out art director with a few failed (and a few successful) projects: if I'm going to commission art, I'm not doing it from known plagiarists, and I know from bitterly earned experience that you get what you pay for and "free" is never worth it.

I do feel that using art generation algorithms for personal noncommercial projects, or as an intermediate step to get a sample to the talent you actually want to commission, is a lot less hinky? I still wouldn't do it myself but I wouldn't fault someone who did.

hghwolf · Nov 29, 2023

Good lord, why is this happening again ?

SV is in no way the final battleground for AI vs Artist. There aren't any stakes here, we don't need to fight about it so long as we can be reasonably sure the AI cadre doesn't try to present their output as 'artisanal' or however you want to call it, and again, I don't see why they wouldn't because there are, again, no stakes to posting AI created images or text, edited or not, on SV.

Just tag it or ideally section it off into its own little subforum so it doesn't clutter up the creative forums and we can all live together perfectly peacefully if not amicably.

I don't think it's great for people to be harassed for using AI, but by the same token I don't feel like AI stuff should be allowed to colonize creative spaces, and that's really all SV as a forum should facilitate.

Exponent · Nov 29, 2023

Potato Anarchy said:
I have prepared a reading guide for past threads. Hopefully not this thread!

Full disclosure: I did use a meme website to make this but did not use the button mysteriously labeled "AI."

The rent-seeking fights were lit, though. I die for those.

Baughn · Nov 29, 2023

Talia B said:
they aren't AI and I refuse to carry that marketing flourish forward for them

Am I going to die on this cross?

I guess I am. "AI" is a technical term. It describes the field of study, not the thing that's built, and in particular it means you're at the forefront of that field -- anything we end up fully understanding, also ends up being called "not AI". Stuff like complex control loops, markov chains, autopilots, LISP, Canny edge detection, support vector machines, perceptrons...

All called AI when they were new, but now that we understand them, they're not.

"Artificial Intelligence" does not imply "human-level intelligence". It just means intelligence. Ants have intelligence. Anthills have quite a bit of it, actually. Sheep have intelligence, and I'd guess we've outclassed the sheep, by now. We certainly have for certain subgroups of intelligence.

I've used AI to mean 'subhuman intelligence' for the last twenty years, as have all my colleagues, and I'm not going to stop just because the media is now misusing the term in a more blatant manner than usual.

Lisafication · Nov 29, 2023

thread closed
As a general AI thread has no business being in the Staff Communications forum, I think this thread has run its course.

For all readers and writers

After much ado, our new tagging system is here and being rolled out to the userbase. Read more here!

Proposing a ban on AI-generated content

Healing-type writer

How do we know it's training data?

The self is a prison

Healing-type writer

Viera

Please read me again sometime.

The self is a prison

Please read me again sometime.

Viera

Healing-type writer

Fake God by Occupation, Magician by Inclination

Please read me again sometime.

Probably evil, unfortunately.

Healing-type writer

Designer for the Other Worlds Tourism Bureau

Bird of contempt

A character.

Healing-type writer

very not good