Self-Host Paperless-ngx: Replace Google Drive With Private AI-Powered Document Management

Set up Paperless-ngx with local AI to automatically OCR, tag, and organize all your documents without sending a byte to the cloud

Filing cabinet drawer filled with organized file folders

Every document you upload to Google Drive, OneDrive, or Dropbox gets scanned, indexed, and fed into AI training datasets. Your tax returns, medical records, contracts—all of it lives on someone else’s computer, subject to their terms and their data practices.

Paperless-ngx changes that equation. It’s an open-source document management system that runs entirely on your hardware, using OCR to make your scans searchable and machine learning to automatically tag and categorize incoming documents. Add a local AI layer with Ollama, and you get intelligent document processing without any data leaving your network.

What You Get

Paperless-ngx turns your chaotic pile of PDFs and scans into a searchable, organized archive:

  • OCR for everything: Drop in a scan, get searchable text extracted automatically via Tesseract
  • Automatic organization: Machine learning suggests tags, correspondents, and document types
  • Full-text search: Find any document by searching its contents, not just filenames
  • Email ingestion: Point Paperless at an IMAP mailbox and it automatically imports attachments
  • Mobile scanning: Scan documents on your phone and upload directly to your instance

Adding local AI (via Paperless-GPT or Paperless-AI) enhances this further:

  • Vision-based OCR: LLMs with vision capabilities read low-quality scans that trip up traditional OCR
  • Intelligent metadata: AI extracts titles, dates, and document types from content
  • Natural language search: Ask questions like “show me all utility bills from last quarter”

Requirements

The base Paperless-ngx stack runs on modest hardware:

  • 2GB RAM minimum (4GB recommended)
  • 2+ CPU cores (OCR is CPU-intensive)
  • 10GB storage (plus space for your documents)
  • Docker and Docker Compose

For the AI enhancement layer:

  • 8GB+ RAM (Ollama with a 7B parameter model)
  • NVIDIA GPU optional (speeds up AI inference significantly)

Paperless runs fine on a Raspberry Pi 4, though OCR processing will be slower than on dedicated hardware.

Step 1: Create Your Directory Structure

First, set up folders for Paperless to use:

mkdir -p ~/paperless/{data,media,export,consume}
cd ~/paperless

The consume folder is where you’ll drop documents for automatic import. The media folder stores processed documents. export is for backups.

Step 2: Create the Docker Compose File

Create docker-compose.yml with the core services:

services:
  broker:
    image: docker.io/library/redis:7
    restart: unless-stopped
    volumes:
      - redisdata:/data

  db:
    image: docker.io/library/postgres:16
    restart: unless-stopped
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: paperless

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - db
      - broker
    ports:
      - "8000:8000"
    volumes:
      - ./data:/usr/src/paperless/data
      - ./media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - ./consume:/usr/src/paperless/consume
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_DBHOST: db
      PAPERLESS_DBUSER: paperless
      PAPERLESS_DBPASS: paperless
      PAPERLESS_DBNAME: paperless
      PAPERLESS_OCR_LANGUAGE: eng
      PAPERLESS_SECRET_KEY: change-this-to-a-long-random-string
      PAPERLESS_TIME_ZONE: America/New_York
      PAPERLESS_URL: http://localhost:8000

volumes:
  redisdata:
  pgdata:

Change PAPERLESS_SECRET_KEY to something random. Adjust PAPERLESS_OCR_LANGUAGE if you need other languages (e.g., deu for German, fra for French, or eng+deu for both).

Step 3: Start the Stack

docker compose up -d

Wait a minute for everything to initialize. Check the logs if something seems off:

docker compose logs webserver

Step 4: Create Your Admin Account

docker compose exec webserver python manage.py createsuperuser

Follow the prompts to set up your username, email, and password.

Step 5: Access the Web Interface

Open http://localhost:8000 (or your server’s IP) and log in. You’ll see an empty dashboard ready for documents.

Step 6: Add Your First Document

Drop a PDF or image into the consume folder:

cp ~/Downloads/some-document.pdf ~/paperless/consume/

Paperless detects new files automatically, runs OCR, and adds them to your archive. Watch the process in the web UI under “Logs” or check the tasks page.

Adding Local AI for Smarter Processing

The base setup uses traditional machine learning for tagging suggestions. To add LLM-powered intelligence, you have two main options.

Option A: Paperless-AI (Simpler Setup)

Paperless-AI provides automated document analysis with a clean web interface.

Add to your docker-compose.yml:

  paperless-ai:
    image: ghcr.io/clusterzx/paperless-ai:latest
    restart: unless-stopped
    ports:
      - "3000:3000"
    environment:
      PAPERLESS_API_URL: http://webserver:8000
      PAPERLESS_API_TOKEN: your-api-token-here
      AI_PROVIDER: ollama
      OLLAMA_API_URL: http://ollama:11434
      OLLAMA_MODEL: llama3.1:8b
    depends_on:
      - webserver
      - ollama

  ollama:
    image: ollama/ollama:latest
    restart: unless-stopped
    volumes:
      - ollama-data:/root/.ollama
    # Uncomment for NVIDIA GPU support:
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: all
    #           capabilities: [gpu]

volumes:
  ollama-data:

After starting with docker compose up -d, pull a model:

docker compose exec ollama ollama pull llama3.1:8b

Get your Paperless API token from the web interface: Settings → Account → Auth Tokens → Create Token.

Access the Paperless-AI dashboard at http://localhost:3000 to configure processing rules and watch documents get analyzed in real-time.

Option B: Paperless-GPT (Vision OCR)

Paperless-GPT excels at extracting text from poor-quality scans using vision models. If you have an NVIDIA GPU, this option can dramatically improve OCR accuracy on faded receipts, handwritten notes, and badly-scanned documents.

Add to your docker-compose.yml:

  paperless-gpt:
    image: ghcr.io/icereed/paperless-gpt:latest
    restart: unless-stopped
    ports:
      - "3002:8080"
    volumes:
      - ./paperless-gpt-prompts:/app/prompts
    environment:
      PAPERLESS_BASE_URL: http://webserver:8000
      PAPERLESS_API_TOKEN: your-api-token-here
      LLM_PROVIDER: ollama
      LLM_MODEL: llama3.1:8b
      OLLAMA_HOST: http://ollama:11434
      VISION_LLM_PROVIDER: ollama
      VISION_LLM_MODEL: minicpm-v
    depends_on:
      - webserver
      - ollama

Pull the vision model:

docker compose exec ollama ollama pull minicpm-v

Paperless-GPT runs vision OCR on tagged documents, extracting text that traditional Tesseract misses.

Workflow Tips

Once everything is running, optimize your workflow:

Automatic email import: Configure an IMAP email account in Paperless settings. Forward receipts and documents to this address and they appear in your archive automatically.

Mobile scanning: Use any scanner app that supports WebDAV or just email scans to your Paperless inbox.

Tagging strategy: Create tags for document types (invoice, receipt, contract, medical) and correspondents (IRS, employer, landlord). The AI layer learns your patterns and gets better at auto-tagging over time.

Regular backups: The export folder contains your full archive. Add it to your backup rotation:

docker compose exec webserver document_exporter /usr/src/paperless/export

Privacy Compared to Cloud Services

FeaturePaperless-ngxGoogle DriveDropbox
Data locationYour hardwareGoogle serversAmazon AWS
AI processingLocal LLMCloud AICloud AI
Training dataNever leaves deviceFed to AI models”Anonymized” analytics
Search indexingPrivateGoogle indexes contentDropbox indexes content
Account requiredNoYes (Google account)Yes
Offline accessFullLimitedLimited

Your documents stay on your hardware. OCR runs locally. AI inference happens on your machine. The only network traffic is what you explicitly send.

Troubleshooting

Documents stuck in consume folder: Check file permissions. Paperless runs as UID 1000 by default—make sure your consume folder is writable:

chmod -R 777 ~/paperless/consume

OCR not working on certain languages: Add the language to PAPERLESS_OCR_LANGUAGE. Multiple languages: eng+deu+fra.

Ollama models slow: If you don’t have a GPU, use smaller models like llama3.2:3b or phi3:mini. Or just skip the AI layer—base Paperless-ngx works great without it.

Database errors after upgrade: Back up, then:

docker compose down
docker compose pull
docker compose up -d
docker compose exec webserver python manage.py migrate

What You Can Do Now

Paperless-ngx replaces the document management parts of cloud storage services while keeping everything private. Combined with local AI, you get intelligent organization that actually improves over time—learning your categories, recognizing your correspondents, and extracting text from even the messiest scans.

Start with the base setup. Get your documents flowing in. Add the AI layer when you’re comfortable. Within a week, you’ll wonder why you ever trusted your tax returns to Google’s servers.