Introduction
Welcome to another week of self-hosting various services in my homelab. This week, we’ll be exploring PDF manipulation - specifically, how to process and automate PDF operations securely within our network.
Couple of weeks ago I published a blog post on how I am using Paperless-ngx in my homelab. In my journey of building a paperless workflow, I often encounter PDFs that need preprocessing - removing passwords, cleaning metadata, or converting formats. While there are many online tools available, they come with limitations, privacy concerns, and often, malicious ads. I also wanted to automate these tasks using n8n so I started looking for a tool through which I can process some of these PDFs programmatically.
That’s where Stirling PDF coms in - a powerful, open-source PDF manipulation tool that you can self-host. What makes it special isn’t just its comprehensive feature set, but how well it integrates with other services through its API, enabling powerful automation workflows.
What is Stirling PDF?
Stirling PDF ↗️ is a web-based application that provides a comprehensive suite of PDF manipulation tools. Think of it as a self-hosted alternative to online PDF tools, but with more features, better privacy, and automation capabilities.
What really drew me to Stirling PDF was its extensive feature set:
-
Document Operations:
- Merge/Split PDFs
- Compress PDFs
- Convert to/from various formats
- Add/Remove passwords
- Remove metadata and signatures
-
Image Operations:
- Extract images from PDFs
- Convert PDFs to images
- OCR support
-
Security Features:
- Password protection/removal
- Digital signature removal
- Sanitization of PDF files (removing JavaScript, etc.)
-
Advanced Features:
- API support for automation
- Multiple language support
- Mobile-friendly interface
- Docker support for easy deployment
The project is open-source and actively maintained. You can check out their GitHub repository ↗️ for more details.
Setup Stirling PDF
Once again, I am going to be using Docker to run Stirling PDF in my homelab. First, create a new directory:
mkdir stirling-pdf && cd stirling-pdf
Here’s my docker-compose configuration:
services: stirling-pdf: image: stirlingtools/stirling-pdf:latest container_name: stirling-pdf ports: - '8080:8080' volumes: - ./tessdata:/usr/share/tessdata # For OCR support - ./configs:/configs # For configuration files - ./logs:/logs # For logs restart: unless-stopped
Create the necessary directories:
mkdir -p {tessdata,configs,logs}
Now you can start Stirling PDF:
docker compose up -d
Enjoying the content? Support my work! 💝
Your support helps me create more high-quality technical content. Check out my support page to find various ways to contribute, including affiliate links for services I personally use and recommend.
How I am using Stirling PDF in my homelab
When I integrated n8n in my homelab, I wanted to automate document processing before they reach Paperless-ngx. Here are some actual workflows I’ve implemented:
-
Process Password Protected PDFs
I receive several password-protected documents that need processing before ingestion into Paperless-ngx.
Here’s how the workflow looks:
n8n workflow for Gmail to Paperless-ngx -
Using the web app
I also use the web app extensively to quickly process PDFs, remove passwords, clean metadata, etc.
Stirling PDF web app
Workflows in detail
Let me know if you want to know more about how I created these workflows, I can write a separate post on that (let me know in the comments).
Features I Love
After using Stirling PDF for several months, here are some features that I find particularly useful:
-
API First Approach: The comprehensive API makes automation a breeze.
You can access the API contract on your self-hosted instance by going to
http://localhost:8080/swagger-ui/index.html
. -
Batch Processing: The ability to process multiple files at once saves significant time.
-
OCR Capabilities: Built-in OCR support means I don’t need a separate tool for text recognition.
-
Clean Interface: The web interface is intuitive and mobile-friendly.
Security Considerations
While Stirling PDF is powerful, it’s important to consider security when setting it up:
-
Access Control:
- Enable authentication if exposed to the internet
- Use reverse proxy with SSL (Caddy)
- Consider Authelia for SSO (In this blog we did not explore SSO configuration but that depends on your setup and hence its best to explore that on your own, If you are using Authelia and want me to cover that in a blog post, let me know in the comments)
-
Data Privacy:
- Regular cleanup of processed files
- Secure storage of sensitive documents
Conclusion
Stirling PDF has become an essential part of my document processing pipeline. Combined with n8n and Paperless-ngx, it creates a powerful automated workflow that handles my documents securely and efficiently.
Do you use Stirling PDF or similar tools in your homelab? What are your document processing workflows? Share your experiences in the comments below, or reach out to me on Twitter ↗️ / Reddit ↗️.
Happy PDF processing! 🚀