Stirling PDF — Self-hosted PDF manipulation powerhouse

📆 · ⏳ 4 min read · ·

Introduction

Welcome to another week of self-hosting various services in my homelab. This week, we’ll be exploring PDF manipulation - specifically, how to process and automate PDF operations securely within our network.

Couple of weeks ago I published a blog post on how I am using Paperless-ngx in my homelab. In my journey of building a paperless workflow, I often encounter PDFs that need preprocessing - removing passwords, cleaning metadata, or converting formats. While there are many online tools available, they come with limitations, privacy concerns, and often, malicious ads. I also wanted to automate these tasks using n8n so I started looking for a tool through which I can process some of these PDFs programmatically.

That’s where Stirling PDF coms in - a powerful, open-source PDF manipulation tool that you can self-host. What makes it special isn’t just its comprehensive feature set, but how well it integrates with other services through its API, enabling powerful automation workflows.

What is Stirling PDF?

Stirling PDF ↗️ is a web-based application that provides a comprehensive suite of PDF manipulation tools. Think of it as a self-hosted alternative to online PDF tools, but with more features, better privacy, and automation capabilities.

What really drew me to Stirling PDF was its extensive feature set:

  1. Document Operations:

    • Merge/Split PDFs
    • Compress PDFs
    • Convert to/from various formats
    • Add/Remove passwords
    • Remove metadata and signatures
  2. Image Operations:

    • Extract images from PDFs
    • Convert PDFs to images
    • OCR support
  3. Security Features:

    • Password protection/removal
    • Digital signature removal
    • Sanitization of PDF files (removing JavaScript, etc.)
  4. Advanced Features:

    • API support for automation
    • Multiple language support
    • Mobile-friendly interface
    • Docker support for easy deployment

The project is open-source and actively maintained. You can check out their GitHub repository ↗️ for more details.

Setup Stirling PDF

Once again, I am going to be using Docker to run Stirling PDF in my homelab. First, create a new directory:

Terminal window
mkdir stirling-pdf && cd stirling-pdf

Here’s my docker-compose configuration:

services:
stirling-pdf:
image: stirlingtools/stirling-pdf:latest
container_name: stirling-pdf
ports:
- '8080:8080'
volumes:
- ./tessdata:/usr/share/tessdata # For OCR support
- ./configs:/configs # For configuration files
- ./logs:/logs # For logs
restart: unless-stopped

Create the necessary directories:

Terminal window
mkdir -p {tessdata,configs,logs}

Now you can start Stirling PDF:

Terminal window
docker compose up -d

Enjoying the content? Support my work! 💝

Your support helps me create more high-quality technical content. Check out my support page to find various ways to contribute, including affiliate links for services I personally use and recommend.

How I am using Stirling PDF in my homelab

When I integrated n8n in my homelab, I wanted to automate document processing before they reach Paperless-ngx. Here are some actual workflows I’ve implemented:

  1. Process Password Protected PDFs

    I receive several password-protected documents that need processing before ingestion into Paperless-ngx.

    Here’s how the workflow looks:

    n8n workflow for Gmail to Paperless-ngx
    n8n workflow for Gmail to Paperless-ngx
  2. Using the web app

    I also use the web app extensively to quickly process PDFs, remove passwords, clean metadata, etc.

    Stirling PDF web app
    Stirling PDF web app
💡

Workflows in detail

Let me know if you want to know more about how I created these workflows, I can write a separate post on that (let me know in the comments).

Features I Love

After using Stirling PDF for several months, here are some features that I find particularly useful:

  1. API First Approach: The comprehensive API makes automation a breeze.

    You can access the API contract on your self-hosted instance by going to http://localhost:8080/swagger-ui/index.html.

  2. Batch Processing: The ability to process multiple files at once saves significant time.

  3. OCR Capabilities: Built-in OCR support means I don’t need a separate tool for text recognition.

  4. Clean Interface: The web interface is intuitive and mobile-friendly.

Security Considerations

While Stirling PDF is powerful, it’s important to consider security when setting it up:

  1. Access Control:

    • Enable authentication if exposed to the internet
    • Use reverse proxy with SSL (Caddy)
    • Consider Authelia for SSO (In this blog we did not explore SSO configuration but that depends on your setup and hence its best to explore that on your own, If you are using Authelia and want me to cover that in a blog post, let me know in the comments)
  2. Data Privacy:

    • Regular cleanup of processed files
    • Secure storage of sensitive documents

Conclusion

Stirling PDF has become an essential part of my document processing pipeline. Combined with n8n and Paperless-ngx, it creates a powerful automated workflow that handles my documents securely and efficiently.

Do you use Stirling PDF or similar tools in your homelab? What are your document processing workflows? Share your experiences in the comments below, or reach out to me on Twitter ↗️ / Reddit ↗️.

Happy PDF processing! 🚀

You may also like

  • # homelab# selfhosted

    n8n — Powerful automation for your homelab services

    n8n is a powerful workflow automation tool that helps you connect and automate your self-hosted services. Perfect for creating custom workflows and integrations in your homelab.

  • # homelab# selfhosted

    Paperless-ngx — Self-hosted document management that actually makes sense

    Paperless-ngx is a powerful document management system that helps you go paperless by automatically organizing and making your documents searchable. Perfect for managing bills, receipts, and important documents in your homelab.

  • # homelab# selfhosted

    Immich — Self-hosted Google Photos alternative that actually works

    Immich is a high-performance, self-hosted photo and video backup solution that rivals Google Photos in features and user experience. Perfect for taking control of your precious memories while maintaining the convenience of cloud photo services.