AI at home, round 2

dogememe · May 27, 2026

Sometime last year I first started exploring local AI models and even made a thread about it somewhere on here with my initial attempts. Started with two 5060ti 16gb cards and Ollama and never really got anywhere useful with it. The biggest problem is that while Ollama is easy it's quite restrictive and inefficient... it's not really a good path forward if you're serious about local AI. Anyway, the AI landscape has changed significantly plus my workflow has also changed. In the meantime I sold one of the 5060tis for more than I paid for it and the other one is in my main workstation/gaming PC.

In the meantime, what I used the most for my web development and other projects is GitHub Copilot BUT at the end of this month they're going from including a massive amount of usage with a fixed price subscription (for $40/month I could do almost anything and everything using some really good, expensive models) to per token pricing. Based on my past usage I would be spending $500-1000/mo for what I just used this month using the new GHCP pricing. So I downgraded to the $10/mo plan so I have access to the most advanced frontier models as needed but so far I've been playing with my local AI models and they can do 80% of the work. We shall see how my usage goes.

Anyway, I made (mostly vibe-coded) a custom piece of software that uses Docker and Llama-cpp that gives me a nice web UI that I can use to manage devices and models. I also spent a few bucks on GPUs. Haha. More on that in a bit.

I can chat with it using the web interface but more useful than that it gives me an OpenAI-compatible API that I can integrate with stuff. Right now I mostly just use it in OpenCode or occasionally in the Continue VSCode plugin but I'm also working on integrating it with more things. But the main thing about my custom bit of software is that it supports multiple GPU vendors, mixed GPUs, and pooling (within the same vendor). Yes I could have done it with vLLM which is more powerful and performant, but that's more work to configure and this way it does what I want, the way I want. Plus I'm lazy and a nice web UI I can click stuff in is less work than configuring and managing vLLM.

My biggest challenge right now is properly implementing real time lookups and web searches. I mainly used Grok for this ($30/mo plan) but I've since gone down to the free plan and just spread my free usage across Grok, Gemini, ChatGPT, and Claude. $30 is $30! Once I have implemented web search and real time data access into my app I will use a lot less of the cloud services. Because I prefer to keep my data under my control. I have also been doing just fine without Claude Code, their usage is just too restrictive. Although from my understanding now that they rent a bunch of compute from xAI they loosened this. But I think if I do get another AI subscription it will be Cursor. We'll see.

Software aside, I have two "AI servers" now:

1. 1x AMD AI Pro R9700 32GB GPU, Intel Core i7-14700F CPU, 64GB DDR5, 1TB NVMe SSD. I'm also going to add an Arc A380 6GB card to this just as a cheap low power way to run small models concurrently with the larger models without powering on the other system. Currently I just use the CPU for this but it's more power-efficient to use a small GPU instead of the CPU and our power costs are pretty high here. Ultimately if local AI really does alleviate all my GHCP usage I will probably get a second R9700 but I need to get a better platform/motherboard first because the existing motherboard only runs the second PCI-E slot at x4 which will bottleneck the GPU. This is my primary AI server.

Originally I had two Arc B60s instead of the single AMD R9700 but they were just too unstable. I tried them in various computers but it was a mess. So I returned them and exchanged them for the AMD card. I'm much happier with it. Although 48GB of VRAM would have been great!

2. 3x NVIDIA RTX3050 8GB GPU, Intel Core i7-9800X, 64GB DDR4, 512GB NVMe SSD. This one was a hodgepodge of cheap leftover parts combined with a few other things I got a good deal, otherwise it's really not efficient and not the ideal route. It's a secondary server I use for testing and various smaller models but sometimes the NVIDIA CUDA stack just works better than the Vulkan stack I'm using on the other server for AMD. Initially I was using ROCm for AMD but that thing is so trash and so broken in so many ways AMD should be ashamed of themselves...

Yes a 64GB or 128GB Mac Mini or Studio would be more efficient but I love being able to tinker with stuff and my custom thing runs on Ubuntu so that wouldn't really do what I want.

For the models, there are so many to list that I'm playing with. Qwen3.6 really is insanely good for a local, not huge model!

Oh, and it's warm in here! I think my bedroom looks more like a datacenter (albeit a very sloppy one with a hodgepodge pile of desktop PCs) than a bedroom.

As the project gets more stable, secure, and reliable, I might post a link to my open source project, but for now, just wanted to share and discuss the hardware and local AI in general. Anyone else doing local AI at home? And if so, on what hardware, with what software, what models, and what workflow?

redhat · May 27, 2026

Very nice. Those are a couple of nice setups for that.

I have just started getting into running LM Studio and local models for coding a project I’ve been working on — essentially a comprehensive fleet maintenance software all hosted on a Docker container. PostgreSQL, .NET backend and Node.js/React front end.

I used some different AI models to assist but does it get stupid expensive. Plus it get darn annoying wasting thru credits with unintended results.

I am still a novice and don’t know a ton.

dogememe · May 27, 2026

redhat said:
Very nice. Those are a couple of nice setups for that.

I have just started getting into running LM Studio and local models for coding a project I’ve been working on — essentially a comprehensive fleet maintenance software all hosted on a Docker container. PostgreSQL, .NET backend and Node.js/React front end.

I used some different AI models to assist but does it get stupid expensive. Plus it get darn annoying wasting thru credits with unintended results.

I am still a novice and don’t know a ton.

LM Studio is neat and def an easy way to explore it. What is your workflow?

redhat · May 27, 2026

Qwen Coder is good? I’m trying to figure out what would fit me best. I want to run all on my 48GB M4 Max MBP.

I have a Ryzen 7 5700X system with 32GB of RAM and a 8GB (I think) 3060ti, but I doubt that’s better than the Mac.

Zee09 · May 27, 2026

Really great...I will never use my computers for AI like Pablo does...
Of course I say that now...

redhat · May 27, 2026

dogememe said:
LM Studio is neat and def an easy way to explore it. What is your workflow?

I would say I don’t have a workflow. I was using Perplexity Computer (meh) and tried Codex giving them full repo copies and sending them off to work the files and I’d refresh my Docker Desktop instance, refresh NPM and the .NET backend to verify.

In Perplexity it’d work the files then relaunch the entire app within its self contained sandbox.

I briefly tried VScode plugins but could not figure out how to link them to the models running locally.

Ideally I’d love to be able to hand a model a repo copy and continually work on the product.

dogememe · May 27, 2026

redhat said:
Qwen Coder is good? I’m trying to figure out what would fit me best. I want to run all on my 48GB M4 Max MBP.

I have a Ryzen 7 5700X system with 32GB of RAM and a 8GB (I think) 3060ti, but I doubt that’s better than the Mac.

How are you interacting with it? I'd suggest OpenCode Desktop is an easy way to upgrade from just going back and forth with it in LM Studio.

Have you tried Qwen3.6? It's pretty awesome really.

Edit: just saw your prior response. Try VSCode Insiders which apparently lets you use custom/local providers using the GHCP harness (although I haven't tried it yet) or try OpenCode Desktop

Edit: yeah the Mac is probably your better bet. 8GB isn't enough GPU memory on your other system for anything important. Although you can always run a small model on that one for quick stuff so you don't waste your better system's time on it.

redhat · May 27, 2026

dogememe said:
How are you interacting with it? I'd suggest OpenCode Desktop is an easy way to upgrade from just going back and forth with it in LM Studio.

Have you tried Qwen3.6? It's pretty awesome really.

Edit: just saw your prior response. Try VSCode Insiders which apparently lets you use custom/local providers using the GHCP harness (although I haven't tried it yet) or try OpenCode Desktop

Edit: yeah the Mac is probably your better bet. 8GB isn't enough GPU memory on your other system for anything important. Although you can always run a small model on that one for quick stuff so you don't waste your better system's time on it.

Thank you, I will check those out. I might as well use this Mac and let that Apple Silicon do more than idle for once. FWIW, here's a few screenshots of the self-hosted app that I'm working on. The main goal is to be able to track any vehicle, car/truck, trailer, generator, ATV... whatever for its maintenance schedules, repairs, and also keep track of the "on-hand" inventory one might have from filters, wipers to lubricants. Along with reminders to "re-order". I think this fills a unique niche. Especially when some of us see deals and either need the "you have enough on hand" reminder or "hmm.... I do only have 7 quarts left". Also have VIN decoding against the free NHTSA API to show if your ride has recalls as a potential "recall alerter" and to fill in different details of the vehicle.

When it's ready for the lime light, I'd be happy to provide Github info for folks on here to use. More to come. Needs a ton of work, some menu consistency, but the base idea is there.

Screenshot 2026-05-27 at 6.15.19 PM.webp

Screenshot 2026-05-27 at 6.16.30 PM.webp

Plumb Bob · May 27, 2026

Nice setup @dogememe

Yes, it gets expensive to run LMs. So far I have resisted the temptation...see how long I last.

dogememe · May 27, 2026

Plumb Bob said:
Nice setup @dogememe

Yes, it gets expensive to run LMs. So far I have resisted the temptation...see how long I last.

Yeah, but it's slowly getting far more expensive to use LLMs online, if you want to use good models fairly quickly and frequently :/

DogLover · May 28, 2026

What the drawback to using the free versions of AI? Other than lack of privacy.

dogememe · May 28, 2026

DogLover said:
What the drawback to using the free versions of AI? Other than lack of privacy.

Free self hosted AI is often not as fast for complex tasks and depending on what you’re doing it’s not as “smart” as the frontier cloud models (think the latest versions of Claude, Grok, Gemini, ChatGPT). But the gap is closing as we progress.

My biggest annoyance is real time web search. You either need to combine multiple additional projects to make this work or pay for an API. web fetching specific URLs is easy but looking up stuff is not as simply.

dogememe · May 28, 2026

DogLover said:
What the drawback to using the free versions of AI? Other than lack of privacy.

Well this morning I got annoyed and implemented web search through Serper and Brave search API and it only took about 15 minutes. So maybe my complaint was pointless I just hadn’t tried hard enough haha.

dogememe · Jun 3, 2026

Ended up replacing my two server approach with one combined server after getting a second R9700. My R9700s are a little neutered due to the older motherboard only having PCI-E 3, but it works fine and I'll get a newer board later.

Plumb Bob · Jun 6, 2026

dogememe said:
Ended up replacing my two server approach with one combined server after getting a second R9700. My R9700s are a little neutered due to the older motherboard only having PCI-E 3, but it works fine and I'll get a newer board later.

View attachment 340921

What models are you running on those?

dogememe · Jun 6, 2026

Plumb Bob said:
What models are you running on those?

I mostly use Qwen3.6 and Gemma4. But I tinker with others. And have a third R9700 coming next week

.

My model management system has price estimation based on token usage and in less than a week since the latest iteration I’ve used about $400 worth of inference. So it is working out well! And that’s based on standard OpenAI GPT5.4 API cost. Using GPT5.5 as a comparison point would be much pricier.

Plumb Bob · Jun 6, 2026

I might go the Radeon route. Nvidia prices are crazy...

Thanks

dogememe · Jun 6, 2026

Plumb Bob said:
I might go the Radeon route. Nvidia prices are crazy...

Thanks

Yeah I mean even going the AMD route I spent more than I did on the entire Chevy Tahoe for the three of them!

I'm sad when I tried the Arc cards last month it was a disaster and I ended up returning them but if you are more patient than me and have the right platform that they happen to work reliably with because they are a better value per RAM than even AMD you might be able to save some money.

dogememe · Jun 11, 2026

And no, that’s not 100W, haha…

I’m averaging about $300/day worth of AI usage based on GPT-5.4 API pricing using maybe $5-10 worth of electricity. So happy!

Although it gets very toasty in my room. I repurposed the AC tube to try to keep most of the hot air going out the window but not sure it’s actually helping.

ripcord · Jul 6, 2026

All you AI coders need to cool your jets, LOL! You're ruining the internet for the rest of us.

I was searching for a recipe to use up some perishable ingredients and today the majority of search results I get are complete AI slop. Enough with the vibe coded websites that were published in the past couple of weeks. I'm betting if I went searching on github I would find a new recipe site generation tool built on using some huggingface project.

I think I wanna to vibe code a browser plugin that limits search engine results to things from prior to 2025.

AI at home, round 2

Similar threads