who the F*CK replaced the dead RJD2 youtube URL with some generic ambient alarm clock bullsh*t???
Age 49, Dude
Bureaucrat/Wannabe
NG Motivational Speaker
Joined on 3/25/08
who the F*CK replaced the dead RJD2 youtube URL with some generic ambient alarm clock bullsh*t???
Cyberdevil
Yeah it is their own archival tool, but for each page they crawl my server takes a hit. I don't know if there's some kind of rate-limiting built in or if it just gets overloaded with too many requests to give those 503s - it is a shared server after all. Ran it again yesterday and it just isn't working for me, 503s all the way through on new captures.
The Google Sheet I'm running is an index of all my posts at the moment, that haven't been indexed before, ca 8000 lines of:
https://cyberd.org/a-small-haiku.html
https://cyberd.org/a-little-haiku.html
https://cyberd.org/hardcore-henry-2015.html
https://cyberd.org/cd2k16.html
https://cyberd.org/bullet-to-the-head-2012.html
https://cyberd.org/the-tournament-2009.html
https://cyberd.org/the-island-2005.html
https://cyberd.org/jason-bourne-2016.html
https://cyberd.org/teenage-mutant-ninja-turtles-2-2016.html
https://cyberd.org/something-there-for-you.html
https://cyberd.org/week-36-37-summer-recap.html
https://cyberd.org/musicalish-128.html
https://cyberd.org/musicalish-127.html
https://cyberd.org/musicalish-126.html
https://cyberd.org/skiptrace-2016.html
...you just add in a list of URLs in the first column and it runs through it, capturing outlinks on each page too if you so desire. I don't think it crawls indefinitely, just outlinks on each URL you list. Though that can easily be ~ a hundred each with media/scripts.
Ooh, don't think I ran into that one before, maybe useful!