Would a 12 GB RTX 3060 be good for their needs? Might be a way to save some money?
My personal experience is with Stable Diffusion rather than LLMs - but yeah, I consider the RTX 3060 12GB the cheapest sensible entry. Anything below 12GB runs out of vram too fast once you start increasing resolution or adding extensions (controlnet).

Basically my tier list for AI workloads right now would be about
- 3060 12GB
- 4060Ti 16GB
- 3090 24GB
- 4090 24GB

A 3090 is nominally slower than a 4080, but the minute Stable Diffusion runs out of vram and has to start caching into system RAM it bogs things down so much that a 3090 that hasn't yet run out of space will definitely get done first.

I don't bother recommending AMD cards for this because basically anything AI is programmed for CUDA, which is so much easier to run on nvidia cards.

An A100 80GB would be great but even pre-owned it's still way outside of my price range (like $20K;
I know your pain...
Unfortunately I doubt that nvidia will release a high-vram consumer GPU anytime soon. They have a financial inscentive to drive any AI related sales towards their high margin enterprise cards, after all.

The possibility of a new RTX titan with 48GB VRAM (or something like that) seems nil.

I got myself an M2 Max macbook pro to run that sort of thing. It's expensive, but at 96GB of RAM, it's got... 80-85GB of effective VRAM
I've thought about that, but as you say: still too slow for Stable Diffusion
I'll just stick to my 4090 and hope that someone does something clever with vram management.
 
Last edited:
My personal experience is with Stable Diffusion rather than LLMs - but yeah, I consider the RTX 3060 12GB the cheapest sensible entry. Anything below 12GB runs out of vram too fast once you start increasing resolution or adding extensions (controlnet).

RTX 2080 ti has 11 GB of ram but just may be too old for what you want to do and for used cards are crazy expensive.
You could likely get a RTX 3060 12 Gb cheaper. Looks like ti from a few quick checks.
Too bad AMD is not a good option.
 
I got myself an M2 Max macbook pro to run that sort of thing. It's expensive, but at 96GB of RAM, it's got... 80-85GB of effective VRAM, even before you start considering quantization. And sure, it's a lot slower than an A100—or even my 4090—but it's a lot cheaper than said A100, as well.
I have thought about this, but I worry about the architecture causing just as much complexity. Plus a 192GB Mac Pro is still like $9K and much slower.

Vexing really.
 
I have thought about this, but I worry about the architecture causing just as much complexity. Plus a 192GB Mac Pro is still like $9K and much slower.

Vexing really.
You might also consider renting time through vast.ai. Sure, it won't be your own hardware, but you can get quite a bit of time for 2k.

Depends on how concrete your plans are, really, and whether or not they get scuttled.
 
I have a friend who bought an RTX 3070 at scalper prices a few years ago (during the pandemic). They don't have a lot of money but suggested that they might want a new card. Ultimately up to them but trying to tell them that the RTX 3070 is still not a bad card in 2023. Card that give significantly better performance are either very expensive, consume power like crazy, or both. Mostly both. They game with it, not work station tasks such as AI.

Any good resources I could point to?
 
I linked them to a few discussion groups.

Something I don't understand, why does this website rate the RTX 3070 slightly higher than the RX 6800?
Granted the RTX 3070 is better at raytracing and AI jobs, but I think that is still a minority of users.
The RX 6800 does better in DX 11 and slightly better in DX 12, which is what most games use these days, and has twice the V-Ram.
I cannot see DX 10 as having enough weight to move the parameters.
 
You might also consider renting time through vast.ai. Sure, it won't be your own hardware, but you can get quite a bit of time for 2k.

Depends on how concrete your plans are, really, and whether or not they get scuttled.
I've thought about it.

Basically, the primary use case (I'm experimenting with several), is to create a writing assistant for myself. I want an agent that I can bounce writing ideas off - i.e., something trained on my own writing that I can use to generate ideas, scenes, and dialogue that I can aggressively edit down. I'm mostly playing with a combination of LoRA + context windows right now, but short context windows and relatively shallow models means that the outputs are pretty bad. I'd like to fix those to make it work better - but of course those are the same things that would make it pricey to run elsewhere.

Unfortunately ~7B and ~13B parameter models are kind of too stupid, and 4-8K context windows are somewhat too short. I think realistically, I'm looking at a ~30B parameter model with a 16K or 32K context window as the practical maximum for something I could run locally (in terms of compute), but even then I feel like finding a good model would be tricky.
 
Last edited:
I've thought about it.

Basically, the primary use case (I'm experimenting with several), is to create a writing assistant for myself. I want an agent that I can bounce writing ideas off - i.e., something trained on my own writing that I can use to generate ideas, scenes, and dialogue that I can aggressively edit down. I'm mostly playing with a combination of LoRA + context windows right now, but short context windows and relatively shallow models means that the outputs are pretty bad. I'd like to fix those to make it work better - but of course those are the same things that would make it pricey to run elsewhere.

Unfortunately ~7B and ~13B parameter models are kind of too stupid, and 4-8K context windows are somewhat too short. I think realistically, I'm looking at a ~30B parameter model with a 16K or 32K context window as the practical maximum for something I could run locally (in terms of compute), but even then I feel like finding a good model would be tricky.
I think you should look at 64B models, to be honest. I've played around a fair bit in this space—not to mention I'm one of the Bard SREs, so there's that—and my feeling is that that's about the minimum level that displays any amount of intelligence.

Even then, you're going to have trouble. To say nothing of the training costs... I wish you the best of luck with this; it's an area of AI space that I gave up on doing much with personally.
 
I think you should look at 64B models, to be honest. I've played around a fair bit in this space—not to mention I'm one of the Bard SREs, so there's that—and my feeling is that that's about the minimum level that displays any amount of intelligence.

Even then, you're going to have trouble. To say nothing of the training costs... I wish you the best of luck with this; it's an area of AI space that I gave up on doing much with personally.
I would also prefer something in the ~60-70B space, but realistically that means going for a 3x4090/3090 configuration, which is particularly expensive because I don't have space or power for that many cards, so it means redoing the entire system.

There was that recent research paper that claimed 3.5-Turbo is actually 20B and Mistral seems to do a very good job at 7B parameters. If so, I think that implies there's potentially real value in the ~20-30B space. Not, perhaps, top of the line, but likely very good.

Training is another area I have my doubts. Training any kind of model from scratch seems implausible with resources to hand. Context is very useful and relatively cheap in terms of data formatting, but your window is very limited. I'm trying to train some LORAs but I'm actually having a lot of trouble getting anything useful out of it, and I'm not quite sure why.

I think this is all to say that I am probably convinced more 3090s is the way to go, at least today. Maybe the 5090 or Titan 4000 comes out with 48GB of VRAM, but I doubt it. My only caveat is that I have heard that Ada's tensor cores have FP8 and the Amperes do not, and FP8 makes a difference for some operations - I might have to investigate that more thoroughly.
 
I would also prefer something in the ~60-70B space, but realistically that means going for a 3x4090/3090 configuration, which is particularly expensive because I don't have space or power for that many cards, so it means redoing the entire system.
You sound like you're already familiar, but just in case—and for the benefit of the peanut gallery-

A 64B model is 64 billion parameters, not 64 billion bytes. Parameters are usually 16 bits each, though 32-bit floats are still sometimes used for training, but that makes the requirement ~128 GiB of VRAM. 6 3090s, not 3. Of course, that's only if you run them at full precision. Most layers of an LLM can be quantized down to 8-bit with trivial loss of quality (along with a near-doubling of performance), and 4-5b quantization is common enough, especially for anything that run on the CPU. At which point you can fit it in 3 3090s, yes.

However.

Well, okay. The first 'however' is: You need a Threadripper or EPYC to actually drive that configuration at anywhere near the design performance. Standard desktops don't have the PCIe lanes; even if they have the slot, all but one will be 1-channel or 4-channel, regardless of physical appearance.

The more serious 'however' is this: Quantization works much better for inference than training. While the quality drop from 8-bit floats is generally minor, it's a much worse problem when you're trying to fine-tune it. LoRAs do help somewhat, in that they don't require fine-tuning the entire model at once; but that of course comes with its own downsides. Given that I've never done this, I can't tell you how much VRAM you'd need to make a good one.
 
You sound like you're already familiar, but just in case—and for the benefit of the peanut gallery-

A 64B model is 64 billion parameters, not 64 billion bytes. Parameters are usually 16 bits each, though 32-bit floats are still sometimes used for training, but that makes the requirement ~128 GiB of VRAM. 6 3090s, not 3. Of course, that's only if you run them at full precision. Most layers of an LLM can be quantized down to 8-bit with trivial loss of quality (along with a near-doubling of performance), and 4-5b quantization is common enough, especially for anything that run on the CPU. At which point you can fit it in 3 3090s, yes.

However.

Well, okay. The first 'however' is: You need a Threadripper or EPYC to actually drive that configuration at anywhere near the design performance. Standard desktops don't have the PCIe lanes; even if they have the slot, all but one will be 1-channel or 4-channel, regardless of physical appearance.

The more serious 'however' is this: Quantization works much better for inference than training. While the quality drop from 8-bit floats is generally minor, it's a much worse problem when you're trying to fine-tune it. LoRAs do help somewhat, in that they don't require fine-tuning the entire model at once; but that of course comes with its own downsides. Given that I've never done this, I can't tell you how much VRAM you'd need to make a good one.
It's not realistic to run anything large unquantized on consumer hardware and frankly quantized big models are better than unquantized small ones, so yeah it'd probably be Q4 or Q5 - rather than a 30B unquantized model.

But yeah it would have to be a 1500W Threadripper-based box, not just slotting another GPU into my existing system.

LORAs can be done much more quickly and efficiently than full models… I just can't get it to work well. Very frustrating.
 
I have what some might consider a dumb question. . . ..Why are used RTX 3060 12 GB cards almost the same as new?
 
Hm... 220~240 used, 290+ new according to recently sold ebay listings

Why are used RTX 3060 12 GB cards almost the same as new?
Because neither nvidia nor AMD has a true successor... the 7600XT comes closest, but it's around 20% more expensive.

Also - yeah, they used to cost more than twice as much two years ago. The price gouging was harsh and people don't want to think that it's lost more than 2/3 of the price they paid.

Also also - there's enough people willing to pay that kind of money. Supply and demand. These aren't the diamonds monopoly - used card prices would go down if none were selling.
 
What I am seeing is used cards on eBay costing around $230 without bidding
At the same time I see one on Amazon for $270.
There is always a certain amount of price bouncing and might not see the exact same prices.
 
Used graphics card markets are a huge ripoff and usually completely absurd in pricing. Anything that isn't sky high in price is usually a worn cryptomining card that can die at any moment.
Well, if you're lucky you can get a damaged card where the damage isn't something you care about. All the ports being broken, for example. But that's still rather a gamble; it's just normally a cheaper one.
 
Can confirm. I got a Sapphire Pulse card (I think... 5700xt? I don't remember the specific one) at a 50% discount, but now I'm having GPU problems (like green-screen system crashes).

I needed the upgrade, but it really wasn't worth it given that I'm thinking of replacing it under a year after purchase - and I've been putting up with these for about a month after I got it.

Unrelatedly: Does anyone have any pointers for evaluating laptops?

I'm looking for a used laptop to replace this Lenovo IdeaPad 310-15ABR 80ST. The performance is really rough, even on some light games. I'm mostly plugging in CPU and GPU benchmarks but I feel like I'm missing something.

(My budget is extremely tight, so I'm mostly checking second-hand models from ebay)
 
Last edited:
Eh, you can get unlucky with new GPUs as well as used.

While I've mostly bought my GPUs new, I did buy a used GTX 680 in 2012 that lasted me about 4 years into 2016 (before admittedly it broke).
Though I did buy it through a PC forum from a fellow enthusiast. I don't remember why he was selling a card that was only on the market for half a year at that point.
 
Is the Crucial T500 series of M.2 a good drive? Their price seems reasonable and Tom's Hardware seems to think they are decent.

Drive C on gaming laptop has only 170 GB left on it. The drive currently mounted is a WDC PC SN530 M.2 drive and thinking of replacing it with a 2 TB Crucial T500. I don't like to delete games I have.
 
Last edited:
Seems like a fast and efficient SSD, which should be suitable for laptop use.

There's plenty of very fast SSDs that get far too hot to put in a laptop, but the T500 seems fine in that regard.
 
It's worth checking the power draw. Heat is one thing, and obviously related, but even a one-watt SSD would be a significant drain on the battery if it runs at that performance level a lot.

Unfortunately, while I know how to get these numbers from an installed SSD, I'm not aware of any database you can look them up in. The manufacturer's datasheet I suppose?

High performance tends to work against you here; the 990 in my desktop runs at up to something like thirty watts. Which isn't a problem in a desktop, but yeah.
 
Hey PC builders. I'm looking to upgrade my graphics card and saw that I've got a PCIe 3 motherboard socket and the new cards are all PCIe 4. Any idea how much this affects things at the sharp end, or is this fairly minor?
 
Back
Top