Introduction
Welcome to another week of self-hosting various services in my homelab. This week, we’ll be tackling a problem that many of us face - managing the endless stream of documents, bills, and receipts i.e essentially documents that we have to keep for our own records.
For years, I struggled with organizing my documents. I tried various methods - from simple folder structures to cloud storage solutions like Google Drive, but none of them quite solved the problem of making documents easily searchable with the text content or automatically organizing them based on their content.
That’s when I discovered Paperless-ngx, and it has completely transformed how I handle my documents. What makes it special isn’t just its OCR capabilities, but how it automatically organizes documents based on their content and that too with the ability to search through the text content of the documents.
What is Paperless-ngx?
Paperless-ngx ↗️ is a document management system that helps you archive, index, and search your scanned paper documents. It’s a community-driven rewrite of the original Paperless project ↗️, with many additional features and improvements.
What really drew me to Paperless-ngx was its comprehensive feature set:
- Automatic OCR (Optical Character Recognition).
- Full text search through all your documents.
- Automatic document classification.
- Tag and correspondent system.
- Mobile-friendly web interface.
- Email document importing.
- Multi-user support with permissions.
- REST API for automation.
- Barcode detection and document splitting.
- Custom workflows with pre/post consume scripts.
- Multiple language support.
And many more! The project is actively maintained with regular updates and improvements. You can check out their GitHub repository ↗️ to see what’s coming next.
Note
While Paperless-ngx is stable and reliable, it’s worth noting that the OCR process can be resource-intensive. If you’re running on a low-powered device, you might want to adjust some settings which you can find here ↗️
Setup Paperless-ngx
Once again I have been using Docker to set up Paperless-ngx in my homelab so let’s go through the steps to set it up. First, create a new directory and download the docker-compose file:
mkdir paperless-ngx && cd paperless-ngx
Here’s my docker-compose configuration which uses Postgres for the database and redis for the broker.
services: broker: image: docker.io/library/redis:7 container_name: paperless-redis restart: unless-stopped volumes: - redisdata:/data
db: image: docker.io/library/postgres:15 container_name: paperless-db restart: unless-stopped volumes: - pgdata:/var/lib/postgresql/data environment: POSTGRES_DB: paperless POSTGRES_USER: paperless POSTGRES_PASSWORD: paperless
webserver: image: ghcr.io/paperless-ngx/paperless-ngx:latest container_name: paperless-ngx restart: unless-stopped depends_on: - db - broker ports: - '8000:8000' healthcheck: test: ['CMD', 'curl', '-f', 'http://localhost:8000'] interval: 30s timeout: 10s retries: 3 volumes: - ./data:/usr/src/paperless/data - ./media:/usr/src/paperless/media - ./export:/usr/src/paperless/export - ./consume:/usr/src/paperless/consume environment: PAPERLESS_REDIS: redis://broker:6379 PAPERLESS_DBHOST: db PAPERLESS_DBUSER: paperless PAPERLESS_DBPASS: paperless PAPERLESS_DBNAME: paperless PAPERLESS_URL: https://docs.mydomain.com PAPERLESS_SECRET_KEY: change-me
volumes: pgdata: redisdata:
You can also use the default storage engine which is sqlite. For that, you can remove the db
service and the pgdata
volume.
Apart from that, you can also use the tika
and gotenberg
services for parsing and converting Office documents (such as “.doc”, “.xlsx” and “.odt”) as well as .eml
files.
For that, you can add the following services to your docker-compose file:
gotenberg: image: docker.io/gotenberg/gotenberg:8.7 restart: unless-stopped command: - "gotenberg" - "--chromium-disable-javascript=true" - "--chromium-allow-list=file:///tmp/.*"
tika: image: docker.io/apache/tika:latest restart: unless-stopped
Create the necessary directories:
mkdir -p {export,consume,data,media}
Now you can start Paperless-ngx:
docker compose up -d
The web interface will be available at http://localhost:8000
. Create your admin account and you’re ready to go!
My Setup and Usage
Here’s how I’ve set up Paperless-ngx in my homelab:
-
Document Input Methods:
- A dedicated scanner folder that’s monitored for new documents. I use Syncthing to sync my scanner folder to my homelab.
- 3rd-party mobile apps for quick document scanning and uploading.
-
Reverse Proxy: I use Caddy for reverse proxy for secure access:
docs.mydomain.com {reverse_proxy localhost:8000} -
Authentication: I use Authelia for SSO across all my services, including Paperless-ngx.
-
Backup Strategy: As I’ve mentioned earlier, all data that I store in my homelab is backed up with custom scripts which encrypts the data with GPG and then uploads it to an off-site location.
Enjoying the content? Support my work! 💝
Your support helps me create more high-quality technical content. Check out my support page to find various ways to contribute, including affiliate links for services I personally use and recommend.
Features I Love
After using Paperless-ngx for couple of weeks, here are some features that I have found very useful:
-
Powerful Search: The full-text search is incredibly useful. I can find any document by searching for any text within it, even in scanned documents.
-
Mobile Access: While the web interface is quite good and mobile-friendly, there are few 3rd-party mobile apps that I use to scan documents and upload them to Paperless-ngx.
These apps are mentioned in the Paperless-ngx documentation ↗️ as well.
Where I Use Paperless-ngx
Here are some ways I use Paperless-ngx in my daily life:
-
Bill Management: All my utility bills are scanned and uploaded to Paperless-ngx. They’re automatically tagged and organized by date and provider.
-
Receipt Tracking: I scan receipts using my phone, and Paperless-ngx automatically extracts the date, amount, and vendor information.
-
Document Archive: Important documents like contracts and certificates are scanned and tagged for easy retrieval. The OCR makes every word searchable.
-
Tax Documents: I have a specific workflow for tax-related documents that automatically tags them with the relevant tax year and category.
What I want to explore further
-
Email Integration: I want to explore how I can use email integration to automatically upload documents to Paperless-ngx. I know Paperless-ngx supports IMAP and POP3 but I haven’t tried it yet.
My goal would be to be able to read documents from my email inbox and choose specifically which all documents I want to upload to Paperless-ngx.
These would include invoices, Demat statements, bank statements, etc.
-
More Automation: I want to explore more automation possibilities with Paperless-ngx. I’ve already setup a few workflows but I want to explore more and see how I can stop repeating myself.
Conclusion
Paperless-ngx is a relatively new addition to my homelab but it has already become an essential part of my digital life. The ability to quickly find any document, combined with the automatic organization and OCR capabilities, has made managing documents actually enjoyable.
I’ve processed over 600+ documents through my Paperless-ngx instance, and the system has saved me countless hours of manual filing and searching.
Have you tried Paperless-ngx or similar document management systems? How do you handle your digital documents? Share your experiences in the comments below, or reach out to me on Twitter ↗️ / Reddit ↗️.
Happy document organizing!