4chan archiving infrastructure
Emilia L. 97f776acf6 initial release of utils/update_fts_index.go 10 months ago
scripts moongirl.sh: set b2sum digest length to 256bit 10 months ago
utils/src initial release of utils/update_fts_index.go 10 months ago
LICENSE Add myself to the license file 11 months ago
README.md Add about and terminology section to README.md 11 months ago

README.md

WIP

fawnboy

4chan archiving infrastructure

About

This repository contains scripts and documentation to archive 4chan. It is organized into different small scripts that follow the concept of DOTADIW, or "Do One Thing and Do It Well." Most scripts are written in ash. All scripts are tested to run on Alpine Linux.

Terminology

4chan archive directory

A 4chan archive directory is a directory that contains scraped 4chan threads in their raw unprocessed form. The structure is compatible with BASC-Archiver.

Scripts assume a structure like this:

  • Everything is contained in a directory named 4chan.
  • Each board gets its own directory.
  • Each thread lives in its own directory inside the respective board.
  • Each thread consists of a json file containing the output of the 4chan API and an images directory, containing original media files w/o thumbnails.