4chan archiving infrastructure
This repository contains scripts and documentation to archive 4chan. It is organized into different small scripts that follow the concept of DOTADIW, or “Do One Thing and Do It Well.” Most scripts are written in ash. All scripts are tested to run on Alpine Linux.


4chan archive directory

A 4chan archive directory is a directory that contains scraped 4chan threads in their raw unprocessed form. The structure is compatible with BASC-Archiver.

Scripts assume a structure like this:

  • Everything is contained in a directory named 4chan.
  • Each board gets its own directory.
  • Each thread lives in its own directory inside the respective board.
  • Each thread consists of a json file containing the output of the 4chan API and an images directory, containing original media files w/o thumbnails.