initial release of utils/update_fts_index.go
Creates and updates a very simple fts index over all threads found in a 4chan archive directory. It extracts "sub" and "com" of every post and writes the information to plain text files for easy and fast iterating over it with other scripts. HTML is cleaned up with regexes and "html.UnescapeString".
|1 year ago|
|scripts||1 year ago|
|utils/src||1 year ago|
|LICENSE||1 year ago|
|README.md||1 year ago|
4chan archiving infrastructure
This repository contains scripts and documentation to archive 4chan. It is organized into different small scripts that follow the concept of DOTADIW, or “Do One Thing and Do It Well.” Most scripts are written in ash. All scripts are tested to run on Alpine Linux.
A 4chan archive directory is a directory that contains scraped 4chan threads in their raw unprocessed form. The structure is compatible with BASC-Archiver.
Scripts assume a structure like this: