Self-Host AnythingLLM: Your Private Document Chat That Replaces ChatGPT Plus

You have documents you want to ask questions about. Medical records. Financial statements. Legal contracts. Client files. Your personal journal. Proprietary business data.

ChatGPT Plus wants $20/month and uploads everything to OpenAI’s servers. Notion AI now requires the $20/user/month Business plan for full AI access. Both services train on your data, store it indefinitely, and can share it with partners.

There’s another way. AnythingLLM is a free, open-source application with 56,000+ GitHub stars that lets you chat with your documents using AI — running entirely on your own computer. Nothing leaves your machine. No subscriptions. No data leaks.

What You’re Building

By the end of this guide, you’ll have:

A private ChatGPT-style interface
Document upload and chat (PDFs, Word docs, text files, and more)
RAG (Retrieval-Augmented Generation) that grounds answers in your documents
Optional AI agents that can browse the web, search files, and run queries
Everything running locally with no cloud dependencies

Why AnythingLLM Over the Alternatives

Three main contenders exist for local document chat: AnythingLLM, Open WebUI, and PrivateGPT. Here’s why this guide uses AnythingLLM:

AnythingLLM is the easiest entry point. It bundles Ollama (for local models), a vector database (for document search), and a polished interface into one install. Zero configuration required. Open the app, upload documents, start chatting.

Open WebUI has more GitHub stars (128,000+) and a larger plugin ecosystem, but requires you to set up Ollama separately and configure RAG manually. It’s better for power users who want fine-grained control.

PrivateGPT offers complete privacy through its LlamaIndex backend but demands Python deployment experience. The trade-off for total control is setup complexity.

If you’re new to self-hosted AI, start with AnythingLLM. You can always migrate later.

Hardware Requirements

AnythingLLM itself is lightweight:

RAM: 2GB minimum for the app
Storage: 10GB+ (depends on how many documents you store)
CPU: 2 cores minimum

The heavy lifting happens in the local LLM. To run models locally:

Hardware	What You Can Run
8GB RAM/VRAM	7B parameter models (good enough for most document chat)
16GB RAM/VRAM	13B-22B models (better reasoning, longer context)
24GB+ VRAM	32B+ models (near-GPT-4 quality)

No dedicated GPU? AnythingLLM can use cloud APIs (OpenAI, Anthropic, etc.) as a backend while keeping your documents local. Your files never leave your machine — only the queries and document snippets get sent to the API.

Installation

Option 1: Desktop App (Easiest)

Download from anythingllm.com/download
Install like any other app
Launch and follow the setup wizard

The desktop app embeds Ollama, so you don’t need to install anything else. It works on Windows, macOS, and Linux.

Option 2: Docker (Multi-User, Server Deployment)

For running on a home server or sharing with family/team:

docker pull mintplexlabs/anythingllm

docker run -d -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v anythingllm_storage:/app/server/storage \
  -v anythingllm_hotdir:/app/collector/hotdir \
  mintplexlabs/anythingllm

Access at http://localhost:3001. The Docker version supports multiple users and can connect to an external Ollama instance running on your GPU server.

First-Time Setup

Step 1: Choose Your LLM Provider

AnythingLLM asks which LLM to use. Your options:

Local (Maximum Privacy)

Ollama (built into desktop app): Select “Ollama” and pick a model
LM Studio: If you prefer LM Studio’s interface

Cloud (Easier, Less Private)

OpenAI: Use your API key
Anthropic: Use Claude
Azure OpenAI: Enterprise deployments

For this guide, we’ll use the built-in Ollama with llama3.2:3b — small enough to run on most machines, smart enough for document Q&A.

Step 2: Download a Model

If using the embedded Ollama:

Click the gear icon (Settings)
Go to “LLM Preferences”
Select “Ollama”
Click “Download Model”
Choose llama3.2:3b for testing, or llama3.2:8b if you have 16GB+ RAM

The model downloads once and runs entirely offline.

Step 3: Create a Workspace

Workspaces organize your documents. Create one for each project or topic:

Click “New Workspace”
Name it (e.g., “Tax Documents 2025” or “Project Alpha”)
The workspace is now ready for documents

Uploading and Chatting With Documents

Supported Formats

AnythingLLM handles:

PDF files
Word documents (.docx)
Plain text (.txt)
Markdown (.md)
HTML files
YouTube links (transcribes automatically)
Web pages (scrapes content)
GitHub repositories

Upload Process

Open your workspace
Click the paperclip icon or drag files into the chat
Wait for processing (AnythingLLM chunks and embeds the document)
Start asking questions

When you ask a question, AnythingLLM:

Converts your question into an embedding (a numerical representation)
Searches the vector database for relevant document chunks
Feeds those chunks to the LLM as context
Generates an answer grounded in your documents

This is RAG — Retrieval-Augmented Generation. The AI doesn’t hallucinate answers about your documents; it retrieves the relevant sections first.

Example: Querying a Contract

Let’s say you upload a lease agreement. You can ask:

“What are my notice requirements for ending the lease early?”

AnythingLLM retrieves the relevant paragraphs about termination, then answers with specific references to your contract. Click “Show Sources” to see exactly which sections it used.

Compare this to uploading that contract to ChatGPT, where:

OpenAI stores your file on their servers
The content may train future models
Third parties might access it through partnerships
You’re paying $20/month for the privilege

Enabling AI Agents

AnythingLLM includes optional AI agents that can:

Browse the web: Research topics and bring back results
Search documents: Query across all your workspaces
Generate charts: Visualize data from your documents
Run SQL queries: Connect to databases
Save files: Export summaries and reports

To enable agents:

Go to Settings → Agent Configuration
Toggle on the tools you want
Return to your workspace and use @agent to invoke them

Example: @agent search the web for recent changes to California tenant law and compare with my lease

Privacy Configuration

For Maximum Privacy

Use Ollama (built-in or external)
Choose a local embedding model (e.g., nomic-embed-text)
Use the built-in LanceDB vector database
Disable telemetry in Settings → Privacy

With this configuration, everything runs locally. Your documents, queries, and model weights never leave your computer.

Hybrid Approach

If your hardware can’t run good local models:

Use a cloud LLM (OpenAI, Anthropic, etc.)
Keep documents stored locally
Only query snippets get sent to the cloud

Your full documents stay on your machine. The LLM only sees the chunks relevant to each question.

Optimizing Performance

Chunk Size

Default: 1500 tokens. Adjust based on document type:

Technical docs: Keep at 1500
Dense legal/financial PDFs: Drop to 500-750
Long narratives: Increase to 2000

Top K (Retrieval Count)

Default: 5 chunks. More chunks = more context but slower responses. Start at 5, increase if answers miss relevant information.

Embedding Model

The default nomic-embed-text works well for most cases. For better multilingual support, try bge-m3.

What You’re Saving

ChatGPT Plus: $20/month = $240/year Notion AI (Business): $20/user/month = $240/year per person

AnythingLLM: $0

The only cost is electricity and the hardware you already own. A one-time $0 investment that protects your most sensitive documents forever.

Troubleshooting

Model downloads fail: Check your internet connection. Ollama models download from Hugging Face.

Slow responses: Try a smaller model. llama3.2:3b runs faster than llama3.2:8b with modest quality loss.

Document parsing fails: Some PDFs are image-only (scanned documents). AnythingLLM includes OCR, but complex layouts may need preprocessing.

Memory errors: Close other applications. Consider using a cloud LLM backend if your hardware is limited.

Next Steps

Once you’re comfortable with basic document chat:

Connect external Ollama: Run a bigger model on a GPU server, connect from your laptop
Set up multi-user Docker deployment: Share with your family or small team
Enable the browser extension: Clip web pages directly into workspaces
Explore MCP compatibility: AnythingLLM workspaces can be exposed as MCP tools for other AI systems

Your documents are your business. Keep them that way.