Digital Archiving

Scripting, oh scripting...

This is a small tool I am working on:

# Snatch -- Script to take a show post URL and derive PDF, PS, TXT,
# and sanitized HTML versions for deposit at Internet Archive. The script
# also downloads the related MP3 podcast file and creates an Ogg version.
# This script assumes that Enscript dumps its output to standard out
# instead of the default printer. Lynx and Ghostscript must also be
# installed for this to work as well as sox and aria2c.
# 7 July 2009 -- Stephen Michael Kellat
# This script is released under a BSD license variant. To review it,
# visit
# Usage:
# [Text URL to download] [common file name for derivative formats] [Podcast URL for download]

lynx -dump $1 | enscript --margins=50:50:50:50 --word-wrap -G --color --media=letter -o - | ps2pdf - $2.pdf
lynx -dump $1 | enscript --margins=50:50:50:50 --word-wrap -G --color --media=letter -o - > $
lynx -dump $1 | enscript --margins=50:50:50:50 --word-wrap -G -whtml --color --media=letter -o - > $2.html
lynx -dump $1 > $2.txt
aria2c --split=16 -o $2.mp3 $3
sox $2.mp3 -C8 $2.ogg

Inch by inch, row by row the need to archive the run so far of LISTen grows. The likely depository besides LISNews itself would be Internet Archive.

Yes, the script is a wee bit messy. It is a work in progress. While I hope I will not have to use it, I am not holding my breath either while waiting for Godot.

Syndicate content