Ted Chiang: ChatGPT Is a Blurry JPEG of the Web

That honestly sounds like something that would be done easier and cheaper with existing technologies. Like by creating a large number of lines, flags, and good RNG algorithm you could do something similar that the average person will never be able to tell any different.
That's never worked, and it seems obvious why it can't. You are not capable of writing more lines than players are capable of mapping, and even more so of writing enough lines (and logic too) to cover emergent states from a flexible model.
 
That's never worked, and it seems obvious why it can't. You are not capable of writing more lines than players are capable of mapping, and even more so of writing enough lines (and logic too) to cover emergent states from a flexible model.
But a statistical model that spits out the most likely reply is going to what, pick some less likely reply?
 
I feel like people are comparing two massively different things with regards to LLM NPCs vs human crafted NPCs.

When thinking about LLM NPCs, I don't expect nor want something like Disco Elysium. What I want is something like Dwarf Fortress or Crusader Kings; emergent gameplay. I want NPCs that have meaningful interactions with each other and me, and by meaningful, I don't mean literary, I mean having actual impact. I want to be able to go into a town, have a minor interaction with an NPC that results in knock-off effects that drastically reshape the entire community...and then reload a quick save, do the exact same thing and have a completely different, yet plausible and engaging set of events occur.

I don't expect nor am I interested in LLMs that attempt to emulate great human writers.
Yeah, I was mostly assuming the LLM would be for generating interactions with the player, as a replacement or addition to prewritten dialogue. Basically a different front-end presented instead of a dialogue menu. Though even if you disregard dialogue quality, there are still a lot of other hurdles I didn't mention there. For example, what if you convince an NPC to join your quest, but the game doesn't have a follower system? If you threaten them, can you follow up on it? What if you try to ERP with your ingame wife?

What you're suggesting is more like NPC's as agents. That's mostly orthogonal to LLM's, because LLM's are not agents. There's been a ton of attempts to build LLM's into agents with varying levels of success, but that's mostly meant to (A) give them a smoother UI or (B) help them solve problems too novel or complex to be pre-coded. For NPC's operating within the confines of a game system, that's mostly not an issue. They already have their state, the finite list of actions they can take, etc. And traditional NPC AI scales far better. Something like Dwarf Fortress or Rimworld can process its entire game state in the time it takes for an LLM to kick back a single token.
 
That's never worked, and it seems obvious why it can't. You are not capable of writing more lines than players are capable of mapping, and even more so of writing enough lines (and logic too) to cover emergent states from a flexible model.
That's the thing though, the average player isn't mapping dialog trees. The kind of player that does and see this as a key feature rather than who likes to be able to manipulate those trees or use them as an indicator that a speed run is on track is going to be a pretty small audience. And having the hook up is either going to massively increase the file size and memory requirements if running the LLM on the machine or a constant internet connection if not.
 
That's the thing though, the average player isn't mapping dialog trees. The kind of player that does and see this as a key feature rather than who likes to be able to manipulate those trees or use them as an indicator that a speed run is on track is going to be a pretty small audience. And having the hook up is either going to massively increase the file size and memory requirements if running the LLM on the machine or a constant internet connection if not.
Not in a one-pass situation maybe, but we're talking iterated here. It doesn't take long to notice the exact same line repeating.
 
I decided to test new GPT-4o with a random photo from my house. Results:


MIstake/hallucination in regards to the cat, but still impressive.
 
It seems to me that even a perfect LLM (which currently does not exist) still cannot be wholly trusted without human oversight,
because stuff like
"what year or country was this written in (the meaning and connotation of words drift over time, especially when dialects start coming into play)" or
"who is this translation targeted to (for localisation effort so the target understands it)"
simply cannot be universally trusted due to a LLM only being able to give one output for a given input.
Or is there a way to get around this by adding a metadata window or something to translation LLMs?

(I'm not saying that LLMs are useless for translation though - they can be very useful with oversight or even without it for low stakes cases, just that 100% trust seems impossible to me even in a spherical cow scenario due to these factors.
Or did I make a basic research failure regarding probabilistic output while thinking about this topic?)
 
Last edited:
Ask it to include multiple options in that output.
Hm... closest equivalent I can think of would be footnotes which aren't a thing as far as I am aware in LLM translations.

Maybe I didn't frame things carefully enough earlier:
Comparing to a professional translator (human oversight) who I assume will try to ask the questions I listed above, even a perfect LLM would have problems if blindly trusted under certain scenarios,
and that is something that needs to be considered when companies try to use them for more critical fields like health and medicine?
 
Hm... closest equivalent I can think of would be footnotes which aren't a thing as far as I am aware in LLM translations.

Maybe I didn't frame things carefully enough earlier:
Comparing to a professional translator (human oversight) who I assume will try to ask the questions I listed above, even a perfect LLM would have problems if blindly trusted under certain scenarios,
and that is something that needs to be considered when companies try to use them for more critical fields like health and medicine?
Oh, true. Blindly trusting anyone isn't a great idea, but a (good) human translator will know to ask questions. This gets back to humans being agents, and LLMs not.

It's pretty easy to work around, though. You can ask it to ask clarifying questions, which is something I have in my default prompt for ChatGPT--moderates its yes-man tendencies somewhat. Anyone providing LLM-translation-as-a-service ought to at least take the ten minutes to do that.
 
default prompt for ChatGPT
Cool! I read an article (I don't have the link rn) that said some students apparently got better results out of it by asking it to point out mistakes in their essays than editing directly.

I wonder what people use for default prompts or prompt series for ChatGPT (for the free one or the paid one)? What do you mean by clarifying questions?

Personally, I try to "both sides" things I use it for or to keep regenerating, and in extreme scenarios to delete the chat and memory and try again. Absolutely no clue how effective that is though, and trying to use it to generate sources for further reference is usually less than worthless.
 
Cool! I read an article (I don't have the link rn) that said some students apparently got better results out of it by asking it to point out mistakes in their essays than editing directly.

I wonder what people use for default prompts or prompt series for ChatGPT (for the free one or the paid one)? What do you mean by clarifying questions?

Personally, I try to "both sides" things I use it for or to keep regenerating, and in extreme scenarios to delete the chat and memory and try again. Absolutely no clue how effective that is though, and trying to use it to generate sources for further reference is usually less than worthless.
Since GPT-4o is barely out, it's in flux. But currently I've got this in 'customization':

Ask clarifying questions if you're unsure about any element of my statements. I won't be offended if you include a little snark or pushback. If there's no obvious question, then you should treat it as a debate and push back and/or provide nuance. Provide multiple answers if there's doubt. Try to keep the personality of a German engineer.
 
I've been having fun with listing meal ingredients and then asking AIs what kind of society they came from. They've been very on point, including being able to identify 1950s dishes.
 
My impression is that getting citations pretty much depends on using a system that's coupled to search rather than working off its gut as it were?
Which is also how it works for humans. You wouldn't trust someone's immediate, off-the-cuff claim to a reference unless perhaps it's their own work, and even then you'd want to check.

You should do the same with GPT, but for some reason a lot of people seem to think it's unnecessary. The only solution I can think of is to make them fact-check themselves, but that's asking for more than you'd be asking from this hypothetical human.
 
Also: Thanks, Hollywood. GPT-4 looks superficially similar to fictional AIs like Cortana, but isn't. It's funny how many of its limitations sum up as "treat it like a human, not an AI".

Others, of course, don't. The ideal would be to treat it like itself, and not like either of those things.
 
Which is also how it works for humans. You wouldn't trust someone's immediate, off-the-cuff claim to a reference unless perhaps it's their own work, and even then you'd want to check.

You should do the same with GPT, but for some reason a lot of people seem to think it's unnecessary. The only solution I can think of is to make them fact-check themselves, but that's asking for more than you'd be asking from this hypothetical human.
A domain subject-matter expert might be able to reference a few highly relevant items that aren't their own work (but are heavily cited in their own work) off the top of their head, though I certainly would expect them to not have the full bibliographic citation at their fingertips.

Of course, general GPT isn't a domain subject matter expert like that.
 
Well, yeah, that's what I meant by marketing. Even the use of the term "AI" is meant to evoke those science-fictional representations.
No, trust me, it just isn't. The 'AI' wording comes from the technical side -- I'd know, I'm there -- and it has never referred to science-fiction AIs. Perhaps marketing should have known better than to reuse the word, even when all the engineers were saying 'yup, this is AI', but that's a different sort of mistake.

GPT-4 is definitely AI. It's extremely good AI. It isn't a hollywood AI -- that is, it isn't AGI -- but AGI != AI.
 
...yes, that's what I'm saying. The insistence on the marketing side to promote these as Actual Artificial Intelligence Guys, knowing full well what the general public will think of when hearing these words, and these same claims being relayed unthinkingly by the generalist press, is what led to people treating ChatGPT as they do.
 
"AI coworkers in 1-2 years"
There's lots of interest from investors in this tech right now, and thus money to be made. I'll be sceptical until it begins rolling out in the workplace.

Certainly stuff like GPT4o is a big step up from 3.5, and current image or sound generation bots are a big improvement over the technology back in 2019. I just dunno if it'll pan out into the robots we see in SciFi media.
 
I suspect that LLMs are going to hit practical limits whenever and wherever the material they create bumps into the walls of, for lack of a better term, accountability.

Because it looks like the courts aren't accepting the notion of "the LLMs we use for integral parts of our business model are not actually part of the company and we are not accountable for what they do, so sorry, there's no one for you to sue over this mistake." If they were, everyone would be all over LLMs even more so than they already are, because there's almost no downside to using them and considerable upside. But if you can plausibly lose real money over something dumb your LLM says, it's going to be hard to really get the maximum benefit from what the LLM does. You still need people poring over every line of text for most applications, and that's often only slightly easier than having to write it from scratch.

Even if the LLM is approximately as reliable as a human being in creating content, when a human being creates content there is someone to shout at if it fails. And the decision-making process the content-creator uses isn't opaque, so you can realistically figure out WHY the product is messed up.
 
Back
Top