Self-host Vane (formerly Perplexica) to replace Perplexity with a private AI search engine

If you liked the idea of Perplexity — type a question, get a synthesised answer with citations — but would rather not ship your chats to third parties, there is now a drop-in open-source replacement you can run on your own machine. It is called Vane, it used to be called Perplexica, and it connects to Ollama for the LLM side and SearXNG for the search side.

This matters right now because of what has been happening to the paid alternative. On April 1, 2026, a class-action lawsuit was filed against Perplexity AI in federal court in San Francisco, alleging that tracking tools embedded in Perplexity’s code sent user chat information directly to Google and Meta before it was even seen by Perplexity itself. The complaint covers all free-tier users between December 7, 2022 and February 4, 2026. If you want an answer engine without that category of risk, local is the obvious answer.

What Vane actually is

Vane is an open-source AI answering engine that ships as a Docker container. It takes your question, runs a real web search through SearXNG, scrapes the relevant pages, and hands them to a language model of your choice to write a cited answer. The project is on GitHub under ItzCrazyKns with around 33,800 stars, MIT license, latest tagged release v1.12.2 on April 10, 2026. The rename from Perplexica to Vane landed in the master branch on March 9, 2026.

The feature list is close to parity with paid products: three search modes (Speed, Balanced, Quality), web / discussions / academic paper sources, image and video search, file upload for documents and PDFs, domain filtering, and search history stored locally instead of on someone else’s server.

Crucially, it does not require you to use cloud models. You can configure it with Ollama for fully local inference, or point it at OpenAI, Anthropic Claude, Google Gemini, Groq, or any OpenAI-compatible endpoint. The choice is yours per request.

What you need before you start

Docker installed and running. This is the path of least resistance.
Ollama running on the host machine. Download it from ollama.com and pull at least one chat model and one embedding model.
Enough RAM/VRAM for whichever model you pick. For a Llama 3.1 8B or Mistral Nemo variant, ~8 GB of VRAM (or unified memory on Apple Silicon) is a reasonable floor. A 70B model wants a high-end GPU or a Mac with 64 GB+ unified memory.
Some comfort with the terminal.

If you already run SearXNG, Vane can use that. If you don’t, the default image bundles SearXNG for you.

Install Vane

The fast path, straight from the README, is one command:

docker run -d -p 3000:3000 \
  -v vane-data:/home/vane/data \
  --name vane \
  itzcrazykns1337/vane:latest

Then open http://localhost:3000 in a browser. That’s it — SearXNG is already inside the image.

If you have your own SearXNG instance, grab the slim image instead:

docker run -d -p 3000:3000 \
  -e SEARXNG_API_URL=http://your-searxng-url:8080 \
  -v vane-data:/home/vane/data \
  --name vane \
  itzcrazykns1337/vane:slim-latest

Note that the README specifies your SearXNG instance needs JSON output format enabled and the Wolfram Alpha engine enabled — both are one-line settings changes in SearXNG’s settings.yml.

Point Vane at Ollama

This is the step people get wrong, so read slowly. Inside a Docker container, localhost means the container itself, not the host machine. For Vane to reach Ollama you have to give it the right URL for your platform:

macOS or Windows (Docker Desktop): http://host.docker.internal:11434
Linux: http://<your-host-private-ip>:11434 (something like http://192.168.1.50:11434)

On Linux there is a further catch: Ollama binds to 127.0.0.1 by default, which means Docker containers cannot reach it. The fix is to set OLLAMA_HOST=0.0.0.0:11434 in Ollama’s systemd service file and restart, as documented in the Dev.to setup guide. After that, Vane’s setup screen will enumerate your local models and let you pick one.

The setup screen also asks you to pick an embedding model separately. A common pairing is a chat model like llama3.1:8b or mistral-nemo plus an embedding model like nomic-embed-text. The chat model handles the synthesis and citation writing; the embedding model handles semantic search over scraped pages and any files you upload.

Model choices that don’t waste your time

The “best” local model answer keeps moving, but practical starting points as of April 2026:

Light hardware (8 GB VRAM / 16 GB unified memory): llama3.1:8b or mistral:7b for the chat model. Fast enough to use interactively.
Mid hardware (16–24 GB VRAM): qwen2.5:14b or gemma2:27b quantised. Noticeably better at following citation instructions.
High-end (48 GB+): llama3.1:70b or qwen2.5:72b. Comparable in quality to the cloud Perplexity experience on most queries, with the caveat that you pay in electricity rather than subscription.
Embedding: nomic-embed-text is the boring, correct choice.

The v1.10.0 and later Vane releases added a keep_alive parameter so you can keep the model warm in VRAM and avoid cold-start latency between queries — worth setting if you plan to actually use this daily.

What it feels like once it’s running

A query in Quality mode with a mid-size local model takes noticeably longer than Perplexity’s cloud — think ~15 to 40 seconds end to end instead of ~5 — because your GPU is doing the generation that Perplexity does on an H100. For most research questions that delta doesn’t matter. For rapid-fire follow-ups it does.

The answer quality is very close to the paid product because the bottleneck is the search, not the LLM, and SearXNG aggregates across hundreds of engines. Citations come out as clickable footnotes. You can upload a PDF and ask questions about it. You can filter to academic papers only. The UI is recognisably the same shape as Perplexity’s.

What you don’t get: no tracking scripts, no terms-of-service that let your questions become training data, no surprise data-sharing with Google and Meta, no account required, no bill.

What this means

The class-action against Perplexity is not a one-off. It is the second wave of what has already happened to ad-tech and social media: a product that felt “free” turned out to have a second customer, and users were the product being sold. The legal exposure will keep finding other answer engines that embedded the same third-party pixels. A local stack sidesteps this entire class of problem because there is no telemetry pipe to compromise — the chat never leaves your LAN.

There is also a subtler payoff. When the model is local, you are not subject to silent prompt injection by the provider (“make the tone more corporate,” “avoid these topics”) and you are not at the mercy of a subscription that might disappear or double in price. The same answer engine will work in five years on the same hardware.

What you can do

Install Docker and Ollama on a machine with a reasonable GPU or an Apple Silicon Mac with 16 GB+ unified memory.
Pull one chat model and one embedding model: ollama pull llama3.1:8b and ollama pull nomic-embed-text.
Run the Vane docker command above.
Open http://localhost:3000, finish the setup screen, and run a test query against something you already know the answer to. Verify the citations actually match the claims — this is the same sanity check you should run on any answer engine, cloud or local.
If you use it daily, bind it to your LAN behind Tailscale or a Cloudflare tunnel so you can query from your phone.
If you were a paying Perplexity user, keep an eye on the class-action docket. Free-tier users between December 2022 and February 2026 are potentially in the covered class.

The point of self-hosting is not that it is always easier. It isn’t. The point is that when the next AI privacy lawsuit lands — and it will — you are reading about it, not in it.