AI at home, round 2

Joined
Apr 15, 2017
Messages
6,489
Location
California
Sometime last year I first started exploring local AI models and even made a thread about it somewhere on here with my initial attempts. Started with two 5060ti 16gb cards and Ollama and never really got anywhere useful with it. The biggest problem is that while Ollama is easy it's quite restrictive and inefficient... it's not really a good path forward if you're serious about local AI. Anyway, the AI landscape has changed significantly plus my workflow has also changed. In the meantime I sold one of the 5060tis for more than I paid for it and the other one is in my main workstation/gaming PC.

In the meantime, what I used the most for my web development and other projects is GitHub Copilot BUT at the end of this month they're going from including a massive amount of usage with a fixed price subscription (for $40/month I could do almost anything and everything using some really good, expensive models) to per token pricing. Based on my past usage I would be spending $500-1000/mo for what I just used this month using the new GHCP pricing. So I downgraded to the $10/mo plan so I have access to the most advanced frontier models as needed but so far I've been playing with my local AI models and they can do 80% of the work. We shall see how my usage goes.

Anyway, I made (mostly vibe-coded) a custom piece of software that uses Docker and Llama-cpp that gives me a nice web UI that I can use to manage devices and models. I also spent a few bucks on GPUs. Haha. More on that in a bit.

I can chat with it using the web interface but more useful than that it gives me an OpenAI-compatible API that I can integrate with stuff. Right now I mostly just use it in OpenCode or occasionally in the Continue VSCode plugin but I'm also working on integrating it with more things. But the main thing about my custom bit of software is that it supports multiple GPU vendors, mixed GPUs, and pooling (within the same vendor). Yes I could have done it with vLLM which is more powerful and performant, but that's more work to configure and this way it does what I want, the way I want. Plus I'm lazy and a nice web UI I can click stuff in is less work than configuring and managing vLLM.

My biggest challenge right now is properly implementing real time lookups and web searches. I mainly used Grok for this ($30/mo plan) but I've since gone down to the free plan and just spread my free usage across Grok, Gemini, ChatGPT, and Claude. $30 is $30! Once I have implemented web search and real time data access into my app I will use a lot less of the cloud services. Because I prefer to keep my data under my control. I have also been doing just fine without Claude Code, their usage is just too restrictive. Although from my understanding now that they rent a bunch of compute from xAI they loosened this. But I think if I do get another AI subscription it will be Cursor. We'll see.

Software aside, I have two "AI servers" now:

1. 1x AMD AI Pro R9700 32GB GPU, Intel Core i7-14700F CPU, 64GB DDR5, 1TB NVMe SSD. I'm also going to add an Arc A380 6GB card to this just as a cheap low power way to run small models concurrently with the larger models without powering on the other system. Currently I just use the CPU for this but it's more power-efficient to use a small GPU instead of the CPU and our power costs are pretty high here. Ultimately if local AI really does alleviate all my GHCP usage I will probably get a second R9700 but I need to get a better platform/motherboard first because the existing motherboard only runs the second PCI-E slot at x4 which will bottleneck the GPU. This is my primary AI server.

Originally I had two Arc B60s instead of the single AMD R9700 but they were just too unstable. I tried them in various computers but it was a mess. So I returned them and exchanged them for the AMD card. I'm much happier with it. Although 48GB of VRAM would have been great!

2. 3x NVIDIA RTX3050 8GB GPU, Intel Core i7-9800X, 64GB DDR4, 512GB NVMe SSD. This one was a hodgepodge of cheap leftover parts combined with a few other things I got a good deal, otherwise it's really not efficient and not the ideal route. It's a secondary server I use for testing and various smaller models but sometimes the NVIDIA CUDA stack just works better than the Vulkan stack I'm using on the other server for AMD. Initially I was using ROCm for AMD but that thing is so trash and so broken in so many ways AMD should be ashamed of themselves...

Yes a 64GB or 128GB Mac Mini or Studio would be more efficient but I love being able to tinker with stuff and my custom thing runs on Ubuntu so that wouldn't really do what I want.

For the models, there are so many to list that I'm playing with. Qwen3.6 really is insanely good for a local, not huge model!

Oh, and it's warm in here! I think my bedroom looks more like a datacenter (albeit a very sloppy one with a hodgepodge pile of desktop PCs) than a bedroom.

As the project gets more stable, secure, and reliable, I might post a link to my open source project, but for now, just wanted to share and discuss the hardware and local AI in general. Anyone else doing local AI at home? And if so, on what hardware, with what software, what models, and what workflow?

IMG_8196.webp


IMG_8199.webp
 
Very nice. Those are a couple of nice setups for that.

I have just started getting into running LM Studio and local models for coding a project I’ve been working on — essentially a comprehensive fleet maintenance software all hosted on a Docker container. PostgreSQL, .NET backend and Node.js/React front end.

I used some different AI models to assist but does it get stupid expensive. Plus it get darn annoying wasting thru credits with unintended results.

I am still a novice and don’t know a ton.
 
Very nice. Those are a couple of nice setups for that.

I have just started getting into running LM Studio and local models for coding a project I’ve been working on — essentially a comprehensive fleet maintenance software all hosted on a Docker container. PostgreSQL, .NET backend and Node.js/React front end.

I used some different AI models to assist but does it get stupid expensive. Plus it get darn annoying wasting thru credits with unintended results.

I am still a novice and don’t know a ton.
LM Studio is neat and def an easy way to explore it. What is your workflow?
 
Qwen Coder is good? I’m trying to figure out what would fit me best. I want to run all on my 48GB M4 Max MBP.

I have a Ryzen 7 5700X system with 32GB of RAM and a 8GB (I think) 3060ti, but I doubt that’s better than the Mac.
 
LM Studio is neat and def an easy way to explore it. What is your workflow?
I would say I don’t have a workflow. I was using Perplexity Computer (meh) and tried Codex giving them full repo copies and sending them off to work the files and I’d refresh my Docker Desktop instance, refresh NPM and the .NET backend to verify.

In Perplexity it’d work the files then relaunch the entire app within its self contained sandbox.

I briefly tried VScode plugins but could not figure out how to link them to the models running locally.

Ideally I’d love to be able to hand a model a repo copy and continually work on the product.
 
Qwen Coder is good? I’m trying to figure out what would fit me best. I want to run all on my 48GB M4 Max MBP.

I have a Ryzen 7 5700X system with 32GB of RAM and a 8GB (I think) 3060ti, but I doubt that’s better than the Mac.

How are you interacting with it? I'd suggest OpenCode Desktop is an easy way to upgrade from just going back and forth with it in LM Studio.

Have you tried Qwen3.6? It's pretty awesome really.

Edit: just saw your prior response. Try VSCode Insiders which apparently lets you use custom/local providers using the GHCP harness (although I haven't tried it yet) or try OpenCode Desktop :)

Edit: yeah the Mac is probably your better bet. 8GB isn't enough GPU memory on your other system for anything important. Although you can always run a small model on that one for quick stuff so you don't waste your better system's time on it.
 
How are you interacting with it? I'd suggest OpenCode Desktop is an easy way to upgrade from just going back and forth with it in LM Studio.

Have you tried Qwen3.6? It's pretty awesome really.

Edit: just saw your prior response. Try VSCode Insiders which apparently lets you use custom/local providers using the GHCP harness (although I haven't tried it yet) or try OpenCode Desktop :)

Edit: yeah the Mac is probably your better bet. 8GB isn't enough GPU memory on your other system for anything important. Although you can always run a small model on that one for quick stuff so you don't waste your better system's time on it.
Thank you, I will check those out. I might as well use this Mac and let that Apple Silicon do more than idle for once. FWIW, here's a few screenshots of the self-hosted app that I'm working on. The main goal is to be able to track any vehicle, car/truck, trailer, generator, ATV... whatever for its maintenance schedules, repairs, and also keep track of the "on-hand" inventory one might have from filters, wipers to lubricants. Along with reminders to "re-order". I think this fills a unique niche. Especially when some of us see deals and either need the "you have enough on hand" reminder or "hmm.... I do only have 7 quarts left". Also have VIN decoding against the free NHTSA API to show if your ride has recalls as a potential "recall alerter" and to fill in different details of the vehicle.

When it's ready for the lime light, I'd be happy to provide Github info for folks on here to use. More to come. Needs a ton of work, some menu consistency, but the base idea is there.

Screenshot 2026-05-27 at 6.15.19 PM.webp


1779920276196.webp

1779920358461.webp

Screenshot 2026-05-27 at 6.16.30 PM.webp

1779920380135.webp


1779920520241.webp
 
What the drawback to using the free versions of AI? Other than lack of privacy.

Free self hosted AI is often not as fast for complex tasks and depending on what you’re doing it’s not as “smart” as the frontier cloud models (think the latest versions of Claude, Grok, Gemini, ChatGPT). But the gap is closing as we progress.

My biggest annoyance is real time web search. You either need to combine multiple additional projects to make this work or pay for an API. web fetching specific URLs is easy but looking up stuff is not as simply.
 
What the drawback to using the free versions of AI? Other than lack of privacy.
Well this morning I got annoyed and implemented web search through Serper and Brave search API and it only took about 15 minutes. So maybe my complaint was pointless I just hadn’t tried hard enough haha.

image.webp
 
Ended up replacing my two server approach with one combined server after getting a second R9700. My R9700s are a little neutered due to the older motherboard only having PCI-E 3, but it works fine and I'll get a newer board later.

IMG_8257.webp
 
Back
Top Bottom