Archivebox
ArchiveBox is a self-hosted internet archiving solution to collect and save sites you wish to view offline.
Features include:
- Uses standard formats such as HTML, JSON, PDF, PNG
- Ability to autosave to archive.org
- Supports Scheduled importing
- Supports Realtime importing
Archivebox Requirements
Ingredients
Already deployed:
- Docker swarm cluster with persistent shared storage
- Traefik configured per design
- DNS entry for the hostname you intend to use (or a wildcard), pointed to your keepalived IP
Related:
- Traefik Forward Auth or Authelia to secure your Traefik-exposed services with an additional layer of authentication
Preparation
Setup data locations
First, we create a directory to hold the data which archivebox will store:
mkdir /var/data/archivebox
mkdir /var/data/config/archivebox
cd /var/data/config/archivebox
Create docker-compose.yml
Create a docker swarm config file in docker-compose syntax (v3), something like the example below:
Fast-track with premix! 🚀
"Premix" is a git repository which includes necessary docker-compose and env files for all published recipes. This means that you can launch any recipe with just a git pull
and a docker stack deploy
👍.
🚀 Update: Premix now includes an ansible playbook, enabling you to deploy an entire stack + recipes, with a single ansible command! (more here)
version: '3.2'
services:
archivebox:
image: archivebox/archivebox
command: server --quick-init 0.0.0.0:8000
ports:
- 8000:8000
networks:
- traefik_public
environment:
- PUID=1000
- PGID=1000
- TZ=Pacific/Auckland
- USE_COLOR=True
- SHOW_PROGRESS=False
deploy:
labels:
# traefik common
- traefik.enable=true
- traefik.docker.network=traefik_public
# traefikv1
- traefik.frontend.rule=Host:archive.example.com
- traefik.port=8000
# traefikv2
- "traefik.http.routers.archive.rule=Host(`archive.example.com`)"
- "traefik.http.routers.archive.entrypoints=https"
- "traefik.http.services.archive.loadbalancer.server.port=8000"
volumes:
- /var/data/archivebox:/data
networks:
traefik_public:
external: true
Initalizing Archivebox
Once you have created the docker file you will need to run the following command to configure archivebox and create an account. docker run -v /var/data/archivebox:/data -it archivebox/archivebox init --setup
Serving
Launch Archivebox!
Launch the Archivebox stack by running docker stack deploy archivebox -c <path -to-docker-compose.yml>
Chef's notes 📓
Tip your waiter (sponsor) 👏
Did you receive excellent service? Want to compliment the chef? (..and support development of current and future recipes!) Sponsor me on Github / Ko-Fi / Patreon, or see the contribute page for more (free or paid) ways to say thank you! 👏
Employ your chef (engage) 🤝
Is this too much of a geeky PITA? Do you just want results, stat? I do this for a living - I'm a full-time Kubernetes contractor, providing consulting and engineering expertise to businesses needing short-term, short-notice support in the cloud-native space, including AWS/Azure/GKE, Kubernetes, CI/CD and automation.
Learn more about working with me here.
Flirt with waiter (subscribe) 💌
Want to know now when this recipe gets updated, or when future recipes are added? Subscribe to the RSS feed, or leave your email address below, and we'll keep you updated.