I'm using Mistral 7B and Llama3:8b on my work computer, it's a Dell Precision 3570 laptop with a Core i7-1255U, Nvidia T550 discrete graphics, and 64GB RAM. I run ollama as a front-end for the models. I have the OS set for ollama.exe to run using the T550, which is something you need to do manually in Windows 11 if the OS doesn't detect that it should use discrete graphics instead of the build-in Intel UHD display adapter.
I experimented with a lot of different models before settling on these ones. Mistral:7B is a little faster but Llama3:8b is more accurate. Both generally speaking run well on this hardware, close to what you'd get on an AI website. Mistral is also kind of frenchy, which is annoying. It's always nagging me about things like copyright law and other various and sundry topics it doesn't need to opine on. Liberté, égalité, fraternité, I guess.
Regarding model selection, the big models are slow because they are too big for this hardware. But what was surprising in the beginning was the smallest models are not always faster because they have to process more on their limited codebase and training more before they come up with an answer. You really need an AI model and size that is optimized for your hardware. Tinyllama is one of the better small models and it works ok, not great, on my home machine, a Core i3 with 20GB of RAM. It takes some experimentation, and I ask every model I've tried what the best model for my hardware is, both for my work and home machines.
I'm willing to try others, especially if they are newer. The older models, even if they run well on your hardware, like Google Gemma, they're not very accurate because the tech has moved on, and Gemma was published in 2023.