4chan archiving infrastructure
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Emilia L. 97f776acf6 initial release of utils/update_fts_index.go 1 year ago
scripts moongirl.sh: set b2sum digest length to 256bit 1 year ago
utils/src initial release of utils/update_fts_index.go 1 year ago
LICENSE Add myself to the license file 1 year ago
README.md Add about and terminology section to README.md 1 year ago

README.md

WIP

fawnboy

4chan archiving infrastructure

About

This repository contains scripts and documentation to archive 4chan. It is organized into different small scripts that follow the concept of DOTADIW, or “Do One Thing and Do It Well.” Most scripts are written in ash. All scripts are tested to run on Alpine Linux.

Terminology

4chan archive directory

A 4chan archive directory is a directory that contains scraped 4chan threads in their raw unprocessed form. The structure is compatible with BASC-Archiver.

Scripts assume a structure like this:

  • Everything is contained in a directory named 4chan.
  • Each board gets its own directory.
  • Each thread lives in its own directory inside the respective board.
  • Each thread consists of a json file containing the output of the 4chan API and an images directory, containing original media files w/o thumbnails.