Project BedroomGPT

Joined
Apr 15, 2017
Messages
6,492
Location
California
I've been using AI for general tasks as well as assisting me with my web design and development work for a couple years now. I mainly use SuperGrok for basic research and simple tasks and Copilot Pro+ with Claude Sonnet 4 in agent mode integrated with VSCode for more complex work. But, I want to take this to the next level, and try to run my own AI stuff locally, instead of endless subscriptions to cloud services. So, Project BedroomGPT was born. If it's all a flop, well, I got a new gaming PC haha.

I usually build my own systems, there are four custom built computers and servers in our home that I put together, but retail component pricing being what they are, and after playing Lenovo's bizarre coupon games, I ended up buying a ThinkStation P3 Tower Gen (Intel). I got that for $850. It gets me an Intel Core Ultra 5 235, 1x16GB DDR5, 512GB NVMe SSD, Windows 11 Pro. I didn't select any upgrades from Lenovo because they are a total rip-off. And honestly selling anything but the most basic computer with a single RAM stick is a joke... yet so many prebuilts, even "gaming PCs" come gimped like that. Lame!

I'm going to pull out the factory SSD, and set it aside. When I outgrow this server or get tired of the project, I'll just put it back in and use it as a gaming computer or sell it.

I'm putting in a 4TB Samsung 990 Pro I already have. Hopefully the fans don't go nuts, as I've had issues with multiple Lenovo laptops and desktops using some sort of custom firmware on their OEM SSDs for thermal/power management... and putting in non-Lenovo SSDs causes increased fan speeds. Worst case I'll put the factory SSD back in and put the 4TB drive in using an NVMe PCI-E adapter, hopefully that works. We'll see.

There is an interesting tidbit in Lenovo documentation for this workstation. The base configuration may or may not include a heatsink on the VRM on the motherboard, but in the documentation they recommend it not just for the higher end CPUs, but also if you're using the base CPU with a high end GPU. So, if mine doesn't come with a VRM heatsink, I will use the $50 of Lenovo rewards I earned from the purchase to buy one.

For the RAM I picked up 128GB (2x64GB) of DDR5 for about $300. I can always add another set of that down the road if it's beneficial, but I'm constrained on budget at this time.

The most important decisions was the GPU, as for AI, this is the heart and soul of things. I can run stuff on the CPU and using system RAM, but you get exponentially better performance using GPU. I may explore hybrid models, where some layers run on the GPU and its memory, and the rest runs on the CPU and system memory. I'm starting with an Nvidia RTX5060ti 16GB as Nvidia is the most popular and most support configuration. I wish I'd got an Intel Arc Pro B60 24GB the other day when Central Computers (accidentally?) listed them for $599, but I can always upgrade my GPU later, and Intel Arc isn't officially supported by many tools yet, so maybe by the time I find another opportunity to pick one up at that price, software and driver integration might be better.

The Lenovo workstation in my config comes with a 750W PSU with 8-pin PCI-e power, not 12VHWPR, but they officially support bigger cards like the RTX Pro 5000 (Blackwell) with an adapter, so I'm not too worried.

For OS, I plan to start with Ubuntu Server. I'm very familiar with this.

The software is really the interesting part. I think I'm going to start with Ollama and Open WebUI. If you're curious, check it out: https://docs.openwebui.com/getting-started/quick-start/starting-with-ollama/

Then, to integrate with VSCode, I'll use the Continue plugin.

The most complicated part of all this? What models to use. There are so many options: https://ollama.com/search

I'll update this thread as my project progresses. In the meantime, while I wait for all my stuff to arrive, does anyone have any suggestions on which models to try? Is anyone else running this stuff locally or on their own server?
 
Pardon my ignorance as I don't use AI a whole lot, but when you say you integrate it with VSCode, do you mean in the sense to feed it prompts then generate actual code for applications, etc.? Is it pretty correct in what it generates for you?

I briefly loaded Ollama LL-3 (I think) on my M4 Max MacBook Pro. I fed it some questions but nothing too demanding. I have yet to dive more into this topic.

I would imagine just about any language model would make my Ivy-Bridge era T320 choke, haha. Probably the MBP it is.
 
Pardon my ignorance as I don't use AI a whole lot, but when you say you integrate it with VSCode, do you mean in the sense to feed it prompts then generate actual code for applications, etc.? Is it pretty correct in what it generates for you?

I briefly loaded Ollama LL-3 (I think) on my M4 Max MacBook Pro. I fed it some questions but nothing too demanding. I have yet to dive more into this topic.

I would imagine just about any language model would make my Ivy-Bridge era T320 choke, haha. Probably the MBP it is.
Yes, so I can tell it to write me code or edit code based on the context I give it. Usually I feed in the entire project directory so it can best understand the project. Of course the more context you give it the slower it goes and more tokens/resources it uses up, but it matters to get a good result consistent with the structure and style of the current project. If it’s a new project, I give it a prompt, and then if I need changes, I can make them manually or prompt again.
 
Pardon my ignorance as I don't use AI a whole lot, but when you say you integrate it with VSCode, do you mean in the sense to feed it prompts then generate actual code for applications, etc.? Is it pretty correct in what it generates for you?

I briefly loaded Ollama LL-3 (I think) on my M4 Max MacBook Pro. I fed it some questions but nothing too demanding. I have yet to dive more into this topic.

I would imagine just about any language model would make my Ivy-Bridge era T320 choke, haha. Probably the MBP it is.

It’s a huge time saver. The other day I was working on migrating a database from an old web app to a completely new web app. I spent a good 20 minutes on a prompt explaining what I needed and then gave it the tables and columns of both the new and old database, and asked it to make a migration script. It mostly worked, but had some errors. So I pasted the errors in. Repeated that cycle for about another 20 minutes. Eventually it worked and I migrated the entire 30,000+ work orders from the old system to the new one perfectly.

Could I have done it by hand? Sure, but it probably would have took more than 40 minutes.
 
So, made some changes... a single GPU wasn't good enough for some of the models I wanted to run so I got a second one. Ollama supports dual GPUs natively and automatically now. So now I can use models up to 32GB in size. I also ended up just using 64GB of RAM instead of 128GB, to save some money.

By default, the Lenovo workstation had the side panel fans as exhaust. I didn't really like that, so I swapped them around so now they are intake, blowing fresh air on the GPUs. I got really lucky, the power cables for the GPUs line up exactly in the space between the two side panel fans. I didn't measure that, I just guessed it would work out.

IMG_6543.webp


Before:

IMG_6544.webp


After:

IMG_6568.webp


Here's a video of it in action... obviously this isn't a very useful example, but just something to show how it works and is able to use both GPUs :)

 
Interesting thread. We do agentic AI to monitor feed data like syslog, but we haven't really tried to do compute in-house.
 
Back
Top Bottom