Ted Chiang: ChatGPT Is a Blurry JPEG of the Web

Aineko · Wednesday at 11:39 PM

Silent Optics said:
I can imagine an AI that has to use a separate Wolfram Alpha-style module to access certain types of calculation capabilities, which means, in effect, it's not qualitatively different than a human having access to Wolfram Alpha. Why might that be the case? Well, for one, digital computers use much simpler representations of numbers. A number represented as a string of bits is easy to do arithmetic on, whereas a number represented as a 'concept' for the purpose of more sophisticated reasoning would be much, much heftier and more cumbersome to work with. There's an inherent tradeoff.

Cherrypick: the start of this is, as far as I know, currently accurate and certainly might be in the future too. But the reason presented doesn't make sense, frankly. The representation of a number is just that - a representation. A signifier. It doesn't, and doesn't have any reason to, contain all your understandings about the number. It's the index, not the entry. Humans and computers agree on this. A hypothetical general but math-friendly AI hopefully has a lot of associations to Pi, potentially taking up a sizable tract of its informational complexity. But it can represent Pi with probably 1-2 of its smallest representational units, just like I can here.

burybone · 2024-09-27T01:45:15-0400

Aineko said:
Cherrypick: the start of this is, as far as I know, currently accurate and certainly might be in the future too. But the reason presented doesn't make sense, frankly. The representation of a number is just that - a representation. A signifier. It doesn't, and doesn't have any reason to, contain all your understandings about the number. It's the index, not the entry. Humans and computers agree on this. A hypothetical general but math-friendly AI hopefully has a lot of associations to Pi, potentially taking up a sizable tract of its informational complexity. But it can represent Pi with probably 1-2 of its smallest representational units, just like I can here.

You seem to be confusing token with embedding here? Internally, the token "pi" is represented by a giant tensor, which, yes, represents quite a lot of the associations the AI has around pi. As a neural network, you inherently have to manipulate a finite set of pretty chunky representational units. For humans that's around 7. If you want to do fancy stuff with those 7 chunks of thought, you need to spend a while doing pathwise modification of them so that you have the right subset of associations. And also all the rest of your brain is set up so they know how those associations are supposed to go together.

And then once your brain has the right library of concept-chunks and your brain knows how those chunks affect each other then it can do really complex operations really fast. It's basically the worlds biggest CISC computer.

Aineko · 2024-09-27T02:50:08-0400

burybone said:
You seem to be confusing token with embedding here? Internally, the token "pi" is represented by a giant tensor, which, yes, represents quite a lot of the associations the AI has around pi. As a neural network, you inherently have to manipulate a finite set of pretty chunky representational units. For humans that's around 7. If you want to do fancy stuff with those 7 chunks of thought, you need to spend a while doing pathwise modification of them so that you have the right subset of associations. And also all the rest of your brain is set up so they know how those associations are supposed to go together.

And then once your brain has the right library of concept-chunks and your brain knows how those chunks affect each other then it can do really complex operations really fast. It's basically the worlds biggest CISC computer.

The tensor is the same size for all tokens, is it not? It represents the associations around pi by how it engages with the behaviors determined by the weights and architecture. Not by itself being especially large.

The 'embeddings' I'm familiar with are conversions of other, generally non-text data forms into tensors that might be suitable to feed into a large or not-so-large model so that it inherits a hopefully-useful perspective on what they mean. It might have other uses?

I'm not sure what confusion you're proposing, since you went off on something about 'chunky representational units' that you never tied back to the first two sentences.