You’re paying $20 a month for ChatGPT Plus. Every prompt you type, every document you paste, every half-formed idea you test - it all goes to OpenAI’s servers, gets logged, and could end up in training data. Meanwhile, open-weight models running on your own hardware have gotten good enough that many people can’t tell the difference.
Here’s the thing: setting up your own private ChatGPT alternative used to be a weekend project. Now it takes about 15 minutes.
This guide walks you through installing Ollama and Open WebUI - two free, open-source tools that together give you a ChatGPT-style interface running entirely on your machine. No API keys. No subscriptions. No data leaving your computer.
What You’re Building
By the end of this guide, you’ll have:
- A browser-based chat interface that looks and feels like ChatGPT
- One or more AI models running locally on your hardware
- The ability to upload documents and ask questions about them (RAG)
- Multi-user support if you want to share it with your household
- Zero ongoing costs after setup
The stack is simple: Ollama handles running the AI models (think of it as Docker for LLMs), and Open WebUI provides the web interface you interact with. Ollama has over 100,000 stars on GitHub and has become the standard way to run local models. Open WebUI is the most popular frontend for it, with support for document uploads, web search integration, and model management.
What You’ll Need
Minimum Hardware
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16 GB+ |
| Storage | 10 GB free | 50 GB+ free |
| GPU | Not required | Any with 8 GB+ VRAM |
| OS | macOS, Linux, or Windows (WSL2) | Any of these |
A few things worth knowing about hardware:
- No GPU? No problem. Ollama runs on CPU just fine. You’ll get 3-6 tokens per second on a modern processor - slower than cloud services, but perfectly usable for most tasks.
- 8 GB RAM lets you run 3B parameter models (surprisingly capable for simple tasks).
- 16 GB RAM opens up 7-8B parameter models, which is where quality gets genuinely good.
- A GPU with 8 GB+ VRAM (like an RTX 3060 or M1 Mac) dramatically speeds things up - 20-40+ tokens per second depending on the model.
If your computer was made in the last five years, you can probably run this.
Step 1: Install Ollama
Ollama is a single binary that manages downloading, running, and serving AI models. Installation takes one command.
macOS or Linux:
curl -fsSL https://ollama.com/install.sh | sh
Windows:
Download the installer from ollama.com/download and run it. You’ll need WSL2 installed (Windows will prompt you if it’s missing).
Verify it’s working:
ollama --version
You should see a version number. That’s it - Ollama is installed and running as a background service.
Step 2: Pull Your First Model
Now you need an actual AI model. Ollama’s model library has dozens of options, but here’s what to start with based on your hardware:
8 GB RAM (no GPU):
ollama pull qwen3:4b
Qwen3 4B is small but punches above its weight - good for general conversation, basic coding questions, and summarization.
16 GB RAM or 8 GB VRAM:
ollama pull qwen3:8b
This is the sweet spot for most people. Qwen3 8B outperforms models twice its size on reasoning and coding benchmarks, supports a 32K context window, and runs at around 25 tokens per second on a laptop. It includes a “thinking mode” for complex problems and a faster mode for simple chat.
32 GB RAM or 16+ GB VRAM:
ollama pull qwen3:30b-a3b
The 30B parameter version uses a mixture-of-experts architecture that keeps only 3B parameters active at any time, so it runs faster than you’d expect while delivering noticeably better quality.
The download will take a few minutes depending on your connection. Models range from about 2 GB (4B) to 18 GB (30B).
Test it immediately:
ollama run qwen3:8b
This drops you into a chat. Type something, see it respond. Press Ctrl+D to exit. If this works, your model backend is ready.
Step 3: Install Open WebUI
Open WebUI gives you the browser interface. The fastest way to install it is with Docker.
If you don’t have Docker, install it first:
- macOS: Docker Desktop for Mac
- Linux:
curl -fsSL https://get.docker.com | sh - Windows: Docker Desktop for Windows
Run Open WebUI (CPU setup, Ollama already installed separately):
docker run -d \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Or, if you want everything in one container (Ollama + Open WebUI bundled):
# CPU only
docker run -d \
-p 3000:8080 \
-v ollama:/root/.ollama \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:ollama
# With GPU support (NVIDIA)
docker run -d \
-p 3000:8080 \
--gpus=all \
-v ollama:/root/.ollama \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:ollama
Wait 30-60 seconds for the container to start, then open your browser to http://localhost:3000.
Step 4: First-Time Setup
When you open http://localhost:3000 for the first time:
- Create an admin account. The first user to sign up becomes the administrator. Pick a username and password - this is stored locally, not sent anywhere.
- Select a model. Click the model dropdown at the top of the chat interface. If you installed Ollama separately, your pulled models should appear automatically. If you used the bundled container, pull a model from Admin Panel > Settings > Models.
- Start chatting. Type a message and hit send. The model runs on your hardware, and responses stay on your machine.
That’s the basic setup. You now have a working ChatGPT alternative. But there’s more to configure if you want to get the most out of it.
Step 5: Upload Documents (Local RAG)
One of Open WebUI’s best features is built-in Retrieval-Augmented Generation - the ability to upload documents and ask questions about them. No cloud processing. Your files never leave your machine.
Quick Method: Drop Files Into Chat
Click the + button in the chat input or just drag and drop a file into the chat window. Open WebUI will process the document and make it available for that conversation. Then ask questions about it naturally.
This works with PDFs, text files, Word documents, and more.
Better Method: Create a Knowledge Base
For documents you want to reference across multiple conversations:
- Go to Workspace > Knowledge
- Click + Create a Knowledge Base
- Give it a name (e.g., “Tax Documents 2025” or “Project Specs”)
- Upload your files - drag and drop works here too
- Go to Workspace > Models, select your model, and link the knowledge base
Now in any chat, type # followed by the name of your knowledge base to pull it into the conversation. The model will search through your documents to find relevant information before answering.
RAG Tips
- Well-formatted documents with clear headings produce better results
- Upload files one at a time if they’re large (prevents timeout issues)
- For image-heavy PDFs, check Admin Settings > Documents and consider switching from the default
pypdfextractor to Tika or Docling for better text extraction - You can also sync an entire local directory for automatic updates
Picking the Right Model
You’re not locked into one model. Ollama lets you pull as many as your storage allows, and you can switch between them in Open WebUI with a click.
Here’s a cheat sheet for what’s worth running right now:
| Model | Size | Good For | Pull Command |
|---|---|---|---|
| Qwen3 4B | ~2.5 GB | Quick answers, light hardware | ollama pull qwen3:4b |
| Qwen3 8B | ~5 GB | General use, coding, reasoning | ollama pull qwen3:8b |
| Gemma 3 12B | ~8 GB | Multilingual, long context | ollama pull gemma3:12b |
| Qwen3 30B-A3B | ~18 GB | High quality, efficient MoE | ollama pull qwen3:30b-a3b |
| Llama 3.3 70B | ~40 GB | Best open quality (needs 48 GB+ RAM) | ollama pull llama3.3:70b |
| Qwen3-Coder | ~5 GB | Code generation, debugging | ollama pull qwen3-coder |
The general recommendation for most people: Start with Qwen3 8B. If your hardware handles it well and you want better quality, try the 30B variant. If you primarily write code, add Qwen3-Coder as a second model.
Multi-User Setup
If other people in your household or team want to use it:
- Go to Admin Panel > Users
- Set the default role for new signups (user, pending approval, or admin)
- Share the URL (
http://your-computer-ip:3000on the local network)
Each user gets their own chat history, settings, and permissions. The admin can control which models are available and set usage limits.
What This Means
The gap between cloud AI services and what you can run at home has narrowed to the point where, for many everyday tasks, there’s no practical difference. ChatGPT Plus costs $20/month - that’s $240 a year to rent access to a model while sending your data to someone else’s servers.
Running Qwen3 8B locally gives you comparable conversational quality for most tasks, with two guarantees no cloud service can match: your data never leaves your machine, and the service never shuts down or changes its terms.
This isn’t a niche setup for hobbyists anymore. Open WebUI has built-in document search, web browsing integration, multi-user support, and an admin panel. Ollama handles model updates, GPU memory management, and compatibility across hardware. The tooling has matured to the point where the installation is simpler than setting up many commercial software products.
The tradeoff is real: local models won’t match GPT-4o or Claude on the hardest reasoning tasks, and they run slower on modest hardware. But for drafting emails, summarizing documents, brainstorming, basic coding help, and having a private AI assistant available at all times - including offline - this setup handles it.
What You Can Do
Right now:
- Follow the steps above - total time from start to chatting is about 15 minutes
- Pull a second model to compare quality (try both Qwen3 8B and Gemma 3 12B)
- Upload a PDF you’d normally paste into ChatGPT and test the local RAG
This week:
- Set up the Docker container to auto-start with your computer (
--restart alwaysflag, which the commands above already include) - Create knowledge bases for documents you reference frequently
- Share the URL with family members or coworkers on your network
Going further:
- Add web search to Open WebUI (Admin Settings > Web Search, supports DuckDuckGo, SearXNG, Brave, and others)
- Try the
ollama pullcommand with new models as they release - the Ollama library updates regularly - If you have a spare machine, run this as a dedicated home AI server that anyone on your network can use
The models keep getting better. Six months ago, running something this capable locally wasn’t realistic for most hardware. Now, a computer with 16 GB of RAM and no GPU can run a model that handles most daily tasks. That bar will only keep dropping.