Skip to content

Archivebox

ArchiveBox is a self-hosted internet archiving solution to collect and save sites you wish to view offline.

Archivebox Screenshot

Features include:

  • Uses standard formats such as HTML, JSON, PDF, PNG
  • Ability to autosave to archive.org
  • Supports Scheduled importing
  • Supports Realtime importing

Archivebox Requirements

Ingredients

Already deployed:

Related:

Preparation

Setup data locations

First, we create a directory to hold the data which archivebox will store:

mkdir /var/data/archivebox
mkdir /var/data/config/archivebox
cd /var/data/config/archivebox

Create docker-compose.yml

Create a docker swarm config file in docker-compose syntax (v3), something like the example below:

Fast-track with premix! 🚀

I automatically and instantly share (with my sponsors) a private "premix" git repository, which includes necessary docker-compose and env files for all published recipes. This means that sponsors can launch any recipe with just a git pull and a docker stack deploy 👍.

🚀 Update: Premix now includes an ansible playbook, so that sponsors can deploy an entire stack + recipes, with a single ansible command! (more here)

version: '3.2'

services:
    archivebox:
        image: archivebox/archivebox
        command: server --quick-init 0.0.0.0:8000
        ports:
            - 8000:8000
        networks:
          - traefik_public
        environment:
            - PUID=1000
            - PGID=1000
            - TZ=Pacific/Auckland
            - USE_COLOR=True
            - SHOW_PROGRESS=False
        deploy:
          labels:
            # traefik common
            - traefik.enable=true
            - traefik.docker.network=traefik_public
            # traefikv1
            - traefik.frontend.rule=Host:archive.example.com
            - traefik.port=8000     
            # traefikv2
            - "traefik.http.routers.archive.rule=Host(`archive.example.com`)"
            - "traefik.http.routers.archive.entrypoints=https"
            - "traefik.http.services.archive.loadbalancer.server.port=8000" 
        volumes:
          - /var/data/archivebox:/data


networks:
  traefik_public:
    external: true

Initalizing Archivebox

Once you have created the docker file you will need to run the following command to configure archivebox and create an account. docker run -v /var/data/archivebox:/data -it archivebox/archivebox init --setup

Serving

Launch Archivebox!

Launch the Archivebox stack by running docker stack deploy archivebox -c <path -to-docker-compose.yml>

Chef's notes 📓


  1. The inclusion of Archivebox was due to the efforts of @bencey in Discord (Thanks Ben!) 

Tip your waiter (sponsor) 👏

Did you receive excellent service? Want to compliment the chef? (..and support development of current and future recipes!) Sponsor me on Github / Ko-Fi / Patreon, or see the contribute page for more (free or paid) ways to say thank you! 👏

Employ your chef (engage) 🤝

Is this too much of a geeky PITA? Do you just want results, stat? I do this for a living - I'm a full-time Kubernetes contractor, providing consulting and engineering expertise to businesses needing short-term, short-notice support in the cloud-native space, including AWS/Azure/GKE, Kubernetes, CI/CD and automation.

Learn more about working with me here.

Flirt with waiter (subscribe) 💌

Want to know now when this recipe gets updated, or when future recipes are added? Subscribe to the RSS feed, or leave your email address below, and we'll keep you updated.

Your comments? 💬