Paperless-ngx — Self-hosted document management that actually makes sense

📆 · ⏳ 5 min read · ·

Introduction

Welcome to another week of self-hosting various services in my homelab. This week, we’ll be tackling a problem that many of us face - managing the endless stream of documents, bills, and receipts i.e essentially documents that we have to keep for our own records.

For years, I struggled with organizing my documents. I tried various methods - from simple folder structures to cloud storage solutions like Google Drive, but none of them quite solved the problem of making documents easily searchable with the text content or automatically organizing them based on their content.

That’s when I discovered Paperless-ngx, and it has completely transformed how I handle my documents. What makes it special isn’t just its OCR capabilities, but how it automatically organizes documents based on their content and that too with the ability to search through the text content of the documents.

What is Paperless-ngx?

Paperless-ngx ↗️ is a document management system that helps you archive, index, and search your scanned paper documents. It’s a community-driven rewrite of the original Paperless project ↗️, with many additional features and improvements.

What really drew me to Paperless-ngx was its comprehensive feature set:

  • Automatic OCR (Optical Character Recognition).
  • Full text search through all your documents.
  • Automatic document classification.
  • Tag and correspondent system.
  • Mobile-friendly web interface.
  • Email document importing.
  • Multi-user support with permissions.
  • REST API for automation.
  • Barcode detection and document splitting.
  • Custom workflows with pre/post consume scripts.
  • Multiple language support.

And many more! The project is actively maintained with regular updates and improvements. You can check out their GitHub repository ↗️ to see what’s coming next.

💡

Note

While Paperless-ngx is stable and reliable, it’s worth noting that the OCR process can be resource-intensive. If you’re running on a low-powered device, you might want to adjust some settings which you can find here ↗️

Setup Paperless-ngx

Once again I have been using Docker to set up Paperless-ngx in my homelab so let’s go through the steps to set it up. First, create a new directory and download the docker-compose file:

Terminal window
mkdir paperless-ngx && cd paperless-ngx

Here’s my docker-compose configuration which uses Postgres for the database and redis for the broker.

services:
broker:
image: docker.io/library/redis:7
container_name: paperless-redis
restart: unless-stopped
volumes:
- redisdata:/data
db:
image: docker.io/library/postgres:15
container_name: paperless-db
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
container_name: paperless-ngx
restart: unless-stopped
depends_on:
- db
- broker
ports:
- '8000:8000'
healthcheck:
test: ['CMD', 'curl', '-f', 'http://localhost:8000']
interval: 30s
timeout: 10s
retries: 3
volumes:
- ./data:/usr/src/paperless/data
- ./media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
PAPERLESS_DBUSER: paperless
PAPERLESS_DBPASS: paperless
PAPERLESS_DBNAME: paperless
PAPERLESS_URL: https://docs.mydomain.com
PAPERLESS_SECRET_KEY: change-me
volumes:
pgdata:
redisdata:

You can also use the default storage engine which is sqlite. For that, you can remove the db service and the pgdata volume.

Apart from that, you can also use the tika and gotenberg services for parsing and converting Office documents (such as “.doc”, “.xlsx” and “.odt”) as well as .eml files.

For that, you can add the following services to your docker-compose file:

gotenberg:
image: docker.io/gotenberg/gotenberg:8.7
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-javascript=true"
- "--chromium-allow-list=file:///tmp/.*"
tika:
image: docker.io/apache/tika:latest
restart: unless-stopped

Create the necessary directories:

Terminal window
mkdir -p {export,consume,data,media}

Now you can start Paperless-ngx:

Terminal window
docker compose up -d

The web interface will be available at http://localhost:8000. Create your admin account and you’re ready to go!

My Setup and Usage

Here’s how I’ve set up Paperless-ngx in my homelab:

  1. Document Input Methods:

    • A dedicated scanner folder that’s monitored for new documents. I use Syncthing to sync my scanner folder to my homelab.
    • 3rd-party mobile apps for quick document scanning and uploading.
  2. Reverse Proxy: I use Caddy for reverse proxy for secure access:

    docs.mydomain.com {
    reverse_proxy localhost:8000
    }
  3. Authentication: I use Authelia for SSO across all my services, including Paperless-ngx.

  4. Backup Strategy: As I’ve mentioned earlier, all data that I store in my homelab is backed up with custom scripts which encrypts the data with GPG and then uploads it to an off-site location.

Enjoying the content? Support my work! 💝

Your support helps me create more high-quality technical content. Check out my support page to find various ways to contribute, including affiliate links for services I personally use and recommend.

Features I Love

After using Paperless-ngx for couple of weeks, here are some features that I have found very useful:

  1. Powerful Search: The full-text search is incredibly useful. I can find any document by searching for any text within it, even in scanned documents.

  2. Mobile Access: While the web interface is quite good and mobile-friendly, there are few 3rd-party mobile apps that I use to scan documents and upload them to Paperless-ngx.

    These apps are mentioned in the Paperless-ngx documentation ↗️ as well.

Where I Use Paperless-ngx

Here are some ways I use Paperless-ngx in my daily life:

  1. Bill Management: All my utility bills are scanned and uploaded to Paperless-ngx. They’re automatically tagged and organized by date and provider.

  2. Receipt Tracking: I scan receipts using my phone, and Paperless-ngx automatically extracts the date, amount, and vendor information.

  3. Document Archive: Important documents like contracts and certificates are scanned and tagged for easy retrieval. The OCR makes every word searchable.

  4. Tax Documents: I have a specific workflow for tax-related documents that automatically tags them with the relevant tax year and category.

What I want to explore further

  • Email Integration: I want to explore how I can use email integration to automatically upload documents to Paperless-ngx. I know Paperless-ngx supports IMAP and POP3 but I haven’t tried it yet.

    My goal would be to be able to read documents from my email inbox and choose specifically which all documents I want to upload to Paperless-ngx.

    These would include invoices, Demat statements, bank statements, etc.

  • More Automation: I want to explore more automation possibilities with Paperless-ngx. I’ve already setup a few workflows but I want to explore more and see how I can stop repeating myself.

Conclusion

Paperless-ngx is a relatively new addition to my homelab but it has already become an essential part of my digital life. The ability to quickly find any document, combined with the automatic organization and OCR capabilities, has made managing documents actually enjoyable.

I’ve processed over 600+ documents through my Paperless-ngx instance, and the system has saved me countless hours of manual filing and searching.

Have you tried Paperless-ngx or similar document management systems? How do you handle your digital documents? Share your experiences in the comments below, or reach out to me on Twitter ↗️ / Reddit ↗️.

Happy document organizing!

You may also like

  • # homelab# selfhosted

    Immich — Self-hosted Google Photos alternative that actually works

    Immich is a high-performance, self-hosted photo and video backup solution that rivals Google Photos in features and user experience. Perfect for taking control of your precious memories while maintaining the convenience of cloud photo services.

  • # homelab# selfhosted# security

    Authelia — Self-hosted Single Sign-On (SSO) for your homelab services

    Authelia is a powerful authentication and authorization server that provides secure Single Sign-On (SSO) for all your self-hosted services. Perfect for adding an extra layer of security to your homelab.

  • # homelab# selfhosted

    Speedtest Tracker — Monitor your internet speed with beautiful graphs

    Speedtest Tracker is a self-hosted internet speed monitoring tool that helps you track your ISP's performance over time. Perfect for ensuring you're getting the speeds you're paying for.