Techie Thread: Making Light is Down

See Patrick’s LiveJournal post for details, but the short version is that we’ve lost everything from March 1.

This thread is for co-ordinating solutions to the problem. What we need:

  1. Any caches of Making Light since March 1. If you have tabs open with ML, please save it, and send to Patrick and Teresa (their initials @panix.com), please.
  2. Any other saved versions: please post what you have here
  3. Offers of assistance in processing these saves
  4. Wisdom in the ways of LJ feeds: apparently there is a makinglight one, which has data back to April 13th
  5. Movable Type and blogging gods and gurus: suggestions, ideas
  6. Time machines you can loan us
  7. Suppliers for magical pixie dust

75 thoughts on “Techie Thread: Making Light is Down”

  1. I just sent pnh a cached copy of “Indistinguishable from Parody” and associated comments.

    Most of the site appears to be cached at Google. It would be best if there were some way to divide up the job of finding and saving the posts there. If someone has a complete list of posts we can pitch in to claim them.

  2. Re: LJ.

    That means that if you find someone who has the feed on their lj (and you don’t mind doing a big sweep backwards), you can find all your posts by going through their friends list. Coding can by found through the page source. For dates of what was posted when (thus preventing you from having to actually look at someone’s entire flist), you can go to Technorati’s feed thingy and find exact dates/times/titles.

  3. Here’s the Making Light LJ feed:

    http://syndicated.livejournal.com/makinglight/

    But it doesn’t go back all that far:

    2:23am, 28th April 2008: “Where do people find the time?”
    9:11pm, 27th April 2008: Open thread 106
    2:20am, 27th April 2008: Eric Clapton, White Power enthusiast
    1:44am, 27th April 2008: Teresa in the Observer
    11:57pm, 26th April 2008: Feeling the Heat
    10:15pm, 26th April 2008: SFWA election results
    3:47pm, 25th April 2008: Indistinguishable from parody
    3:01am, 25th April 2008: The Rather Difficult Font Game
    12:20pm, 23rd April 2008: Live in San Francisco, it’s TNH!
    1:53am, 23rd April 2008: NBC News calls Penn for Hillary
    2:02pm, 17th April 2008: Little Brother
    11:01pm, 16th April 2008: Newsweek invents an alarming trend
    9:00pm, 16th April 2008: Housekeeping
    10:10pm, 14th April 2008: Open thread 105
    11:18pm, 13th April 2008: Could lead to goose-stepping

    However, this could be used as a start for getting things out of Google.

    So for instance, here’s “could Lead to goose-stepping”:
    http://64.233.169.104/search?q=cache:nMdy2rvFsdwJ:nielsenhayden.com/makinglight/archives/010143.html+site:nielsenhayden.com+could+lead+to+goose&hl=en&ct=clnk&cd=1&gl=us&client=firefox-a

  4. Oh bugger. Much sympathy, and all that.

    (I was hoping I’d have an archived RSS feed in NetNewsWire, but alas, my NN configuration is on my Macbook Air which, by a stunning coincidence, is on its way back to Apple for repair right now

    For future reference: if you can get enough of the server up and running to run MySQL, just running mysqldump against the raw database tables and throwing its output at a file on another machine would be amazingly useful in rebuilding things.

  5. previous comment in moderation limbo, so:

    List of dates & titles, from my LJ feed:

    2:23am, 28th April 2008: “Where do people find the time?”
    9:11pm, 27th April 2008: Open thread 106
    2:20am, 27th April 2008: Eric Clapton, White Power enthusiast
    1:44am, 27th April 2008: Teresa in the Observer
    11:57pm, 26th April 2008: Feeling the Heat
    10:15pm, 26th April 2008: SFWA election results
    3:47pm, 25th April 2008: Indistinguishable from parody
    3:01am, 25th April 2008: The Rather Difficult Font Game
    12:20pm, 23rd April 2008: Live in San Francisco, it’s TNH!
    1:53am, 23rd April 2008: NBC News calls Penn for Hillary
    2:02pm, 17th April 2008: Little Brother
    11:01pm, 16th April 2008: Newsweek invents an alarming trend
    9:00pm, 16th April 2008: Housekeeping
    10:10pm, 14th April 2008: Open thread 105
    11:18pm, 13th April 2008: Could lead to goose-stepping

    As a test, I got “Could lead to goose-stepping” and all 460+comments out of the Google cache. Aw heck, I’m just going to pull the rest of this list, OK? While I’m doing that, someone should go to Google or one of the feeds and get a list of date/titles for the rest of the missing posts. Then we can pull them out of the Google cache toot sweet.

  6. I have to run to a band rehearsal (aargh!), but here’s the latest email from our ISP:

    MT is definitely hard on the servers, especially where there are sites like your that tend to get absolutely hammered by spammers.

    The drives themselves are definitely a no go, so whatever burst there was that originally caused the problem picked up initially by the monitor were sever enough that it actually took out the whole server: the motherboard is also fried, and this entire server is toast.

    So, here is the plan. I’m going to strip the drives out and take them back to the office to see if by any stretch we can get one or both to come to life long enough to get some data. In the meantime, we will recrate the account and I will take whatever backup you do have and put the pieces back together. For the data itself between your backup and now, if we can’t get anything from the drives, we will look into the possibility of data recovery. The problem is, of course, that the physical damage, if it was from a bad enough surge, won’t allow anything to be recovered. But if it comes down to that, most reputable places will give a thumbs up/thumbs down on the chances of recovering anything so that no one is wasting any time. We will work on this throughout the weekend, and if we are not successful, send out a few queries to known god recovery companies for options.

    I will update this ticket again once I’m back in the office with the drive after cleaning up here and trying a few more things. That update will have the location of the newly created site so that you can begin uploading the data you have and I can work on getting things back up for now while we work on the rest. Hang in there.
    Regards,

    Annette
    Hosting Matters, Inc.
    http://www.hostingmatters.com

  7. So far, I’ve pulled from the Google cache:

    Archives for all posts (does not include comments) from March & April 08.

    The following posts including comments:

    9:11pm, 27th April 2008: Open thread 106
    1:44am, 27th April 2008: Teresa in the Observer
    10:15pm, 26th April 2008: SFWA election results
    3:47pm, 25th April 2008: Indistinguishable from parody
    3:01am, 25th April 2008: The Rather Difficult Font Game
    11:18pm, 13th April 2008: Could lead to goose-stepping

  8. Now I’ve got everything from the list above, *except* the following:

    2:23am, 28th April 2008: “Where do people find the time?”
    2:20am, 27th April 2008: Eric Clapton, White Power enthusiast
    12:20pm, 23rd April 2008: Live in San Francisco, it’s TNH!

    Other people should try to pull them from Google … HA! I just got the Eric Clapton one out of Yahoo’s cache!! Ph33r mai leet skillz, yo!

  9. In the look-and-feel, I’ve got the stylesheets and such.

    Everybody — google how to browse your browser caches.

  10. Weectory!\o/ I got the others out of the Yahoo cache, too.

    Now I’ve gotta do some other stuff, but I’m uploading all the stuff I’ve collected so far to a web address & sending the url to the nh’s & abi.

  11. Bloglines still has the atom feed; if you display all, you can get every front page post going back to 8 January. Of, um, this year.

    Doesn’t help with the comments at all, barely a drop in a commodious bucket, but.

  12. Doctor Science,

    I’ve picked apart your email address and sent you something.

    Abi

  13. I’ve got the text of the front page. Would that help, and if so, where should I send it?

  14. Quoth Erik Olson, wise in the ways of Firefox:

    If you have firefox

    Go to address bar, type “about:cache”

    Examine page. The disk cache will probably be better. Click on “List Cache Entries”

    A (long) list of links appears. Search for “nielsenhayden”

    Click on links. You’ll get a page with information about that cached item. One of the items will be “file on disk”. Browse to that file in your filesytem finder gadget, and recover.

  15. Ok, I have no clue what I’m doing, but I’ve Got Stuff. I’m flipping through history files, and threads (Open Thread 106, Abi Sutherland on Cats,) is popping up. I’ve been copying and pasting it in Word, in case I lose it and can’t get it back. Should I send the file somewhere?

  16. Save all and await further updates. We are currently working on a list of what we need, who has it, and where to send it.

    Thank you, everyone.

  17. Google Reader has all the front page posts dating back to March 1 and beyond. I’ve started collecting links to posts/comments in Google Cache starting from where Doctor Science left off. It doesn’t look like the cache necessarily has all of the comments for longer threads: for example, “Heads they win; tails we lose” says it has 320 comments in the thread, but it cuts off in the middle of comment 175.

  18. My caches are empty, alas, and have no cachet.

    But if it’s a matter of physical disk recovery, I’ll gladly donate a contribution to help.

    Also, Google’s cache of the View All By for me has April 27 and earlier.
    For me I searched on
    “kathryn.sunnyvale view all”
    to get this. This is part of the email address.

    Searching on my Displayed name
    “Kathryn from Sunnyvale view all”
    does *not* return the View All By cache.

    If you have anyone’s email address or website used for posting, start searching for the View All By caches and save them.

  19. different techie question: Anyone know how to delete a user defined entry in the about:config for firefox? I meant to make a boolean entry, and made it an integer, now I can’t seem to figure out how to kill it, and I can’t covert it.

  20. I have a bunch of comment, dating to april 3rd (using Kathryn from Sunnyvale’s google cache trick).

  21. I’ve sent the Google Cache links to Abi for posts going back to March 1. Posts with comments cut off in Google cut off even earlier in Yahoo.

  22. Ok, this is going to sound weird, but I have put my *real life* email account on this post, and you can pass it on to the ISP guys or email me yourself if you want. I’ve worked as 2nd-3rd line server support for a variety of years. While I know nothing about the ISP’s setup, I do know a couple of random useful things about servers.

    Were their disks mirrored at all? If so, break the mirror. DON’T remirror, they will only lose everything to corruption.

    The disks in the dead server(s) — are they hot-pluggable, can the ones with customer data be removed and plugged into another server? If so, there are ways of getting data off them. Of course they surely know this.

    But, this is going to sound even weirder, but I’m completely serious — have the ISP guys bag them in something moisture-proof and put them in a freezer for a couple of hours, first. It will help data recovery. Believe me, I KNOW this sounds like it makes no sense, but I have used this in the past. It genuinely helps, if there are disk errors. I have never completely figured out the physics of this, but making sure the disks are extremely cold really seems to make them more readable.

    I’ll monitor the email and here.

  23. Once you have your own View All By, check names you replied to, and find their View All By’s and copy them.

    Also, I found I did have 10186 and 10184 in my Cache. (Sadly, I hadn’t been reading ML much due to deadlines.)

  24. I have nothing much to contribute to immediate recovery, but it does look like at least the front-page articles are all going to be found, which is something. Good luck with the comments!

    And very sorry to hear that this has happened. My sympathies to all at Making Light! I, too, have let failing backups slide too long; though I have been lucky with it myself, and this time you got burned.

  25. I noticed that bloglines.com has all the headers and links for the last couple of weeks (since the last time I checked it – I normally just go to the front page), and you can copy the link and search in Google to find the cached page. Presumably if you set your feed for 60 days or 200 posts or whatever they offer, you could pick them all up that way.

  26. Below are the posts I saved that are missing comments, in case anyone else has/can find them. The first number is the last comment or partial comment where the cache cuts off, and the second is the total number of comments the post had when it was cached.

    Heads they win; tails we lose
    – 175/320

    Deep Value
    – 165/434

    The photograph that terrorized London
    – 203/204

    Open thread 104
    – 239/931

    Open thread 103
    – 222/936

    Greyhawk’s flags at half-staff
    – 200/253

  27. I have created a master list posting for the various things we’re finding. I’ll update it with the information here.

  28. I have Abi Sutherland, on Catz
    Posted by Teresa at 09:42 AM * 576 comments

    Abi, I will give you a working email address to go with this shortly.

  29. Folks have already gotten into the google cache question — I don’t know if anybody’s doing something programatic, but that might be the easiest way to pull everything together… (I’m about to be on an airplane, or I’d volunteer — Charlie Stross has a fine hand with perl, if he’s got enough time (although I’m volunteering him without the faintest idea of whether he’s got enough time, or has kept his coding up in the past few years)).

    Beyond that, if data recovery from the hard drive perspective is an issue, I can provide names for some places that have been known to do a decent job…

    Any chance of posting more specific details about “server is down hard” ?

  30. Folks have already gotten into the google cache question — I don’t know if anybody’s doing something programatic, but that might be the easiest way to pull everything together… (I’m about to be on an airplane, or I’d volunteer — Charlie Stross has a fine hand with perl, if he’s got enough time (although I’m volunteering him without the faintest idea of whether he’s got enough time, or has kept his coding up in the past few years)).

    Beyond that, if data recovery from the hard drive perspective is an issue, I can provide names for some places that have been known to do a decent job…

    Any chance of posting more specific details about “server is down hard” ?

  31. I have included my “real” email address above 😉

    fyi, I’m in the EST

    I have all-comments-by for Lee:
    Last entry posted 05.02.08 on entry Open thread 106:
    last fetched: 2008-05-03 17:04:02

    Slushkiller, 770 posts, last post by
    #770 ::: A. J. Luxton ::: (view all by) ::: May 02, 2008, 09:44 AM:
    last fetched:2008-05-03 17:04:23

    Barbara Bauer takes action! (*yawn*), last post by
    #76 ::: John Houghton ::: (view all by) ::: May 02, 2008, 06:44 PM:
    last fetched:2008-05-03 17:04:32

    Open thread 106, last post by
    #281 ::: Clifton Royston ::: (view all by) ::: May 02, 2008, 09:46 PM:
    last fetched:2008-05-03 17:05:10

    Indistinguishable from parody
    #186 ::: Clifton Royston ::: (view all by) ::: May 02, 2008, 06:52 PM:
    last fetched:2008-05-03 17:05:16

    “Where do people find the time?”
    #251 ::: Serge ::: (view all by) ::: May 02, 2008, 03:54 PM:
    last fetched:2008-05-03 17:05:24

    SFWA election results
    #43 ::: albatross ::: (view all by) ::: April 28, 2008, 09:17 PM:
    last fetched:2008-05-03 17:05:31

    All comments xml file:
    from
    Fri, 02 May 2008 08:33:52 -0500
    back to
    Wed, 30 Apr 2008 12:21:51 -0500

    Limited usefulness:
    I have a recent version of the front page.
    last fetched: 2008-05-03 17:03:27

  32. I’ve just emailed Patrick about this, but I have the text (no comments) of all posts back through March 1st, saved in a text file at the moment. Would this be helpful, or are you just trying to recover the comments at this point?

  33. My firefox caches are currently empty (I blew them away a few days ago because the app was acting sick) and my “Undo Tab” history doesn’t go back far enough to have anything I don’t already have up in the window (the last 3 ML threads).

    HOWEVER, I’ve been running Time Machine to a disk hanging off my WiFi basestation since April 8, so I may have old cache files. I’m looking through that now. If there are some particular threads you haven’t resurrected yet, let my know by email.

  34. (@makerfaire writing by phone)

    Don’t forget the tail ends of old threads, especially those funny reactions to spam… using Abi’s VAB could help trace those. ditto xopher.

  35. (@makerfaire writing by phone)

    Don’t forget the tail ends of old threads, especially those funny reactions to spam… using Abi’s VAB could help trace those. ditto xopher.

  36. (@makerfaire writing by phone)

    Don’t forget the tail ends of old threads, especially those funny reactions to spam… using Abi’s VAB could help trace those. ditto xopher.

  37. also, haven’t seen the Internet Archive mentioned. anyone check yet?

  38. I do all my ML reading in Safari rather than Firefox, and my caches are too recently hosed out to be of use. (Damn.) In case any other Safari users out there want to give it a whirl, though: there’s a shareware-but-fully-functional-for-first-7-days app called File Juicer that will take Safari’s cache and create a big folder full of its contents. From there, anything that can search multiple files for a single string (text editors like BBEdit, e.g.) would be able to tell you if you’ve got cached ML pages in there.

  39. I’ve just posted a simple how-to on saving pages from cache here: http://www.absolutewrite.com/forums/showpost.php?p=2320536&postcount=6

    It’s heavily crossed out because I totally misunderstood the problem, and fixed my misunderstanding. I think I have it right. Trying to catch up and figure out what I should go save. Maybe someone who’s co-ordinating here could post on that thread? It’s here:

    http://www.absolutewrite.com/forums/showthread.php?t=101505

    (Random comment: Zack, that’s not Kaolin’s name any more. When he got married he changed it to Kaolin Imago Fire.)

  40. Shweta: That makes two times today I’ve failed at reading comprehension. I saw “Kaolin Fire” on his professional website and my brain filled in his old last name despite its not appearing anywhere.

  41. For some curious reason, I downloaded Open thread 105 to my desktop so I have comments through 373 if that can be of use. Will go search for anything else but I had a fit of desk-top cleaning and I fear any others have gone to the trash emptied graveyard.

  42. So it turns out I can get the cache of every single page plus comments in march and april. It’ll just take a wee while.

    However, the issue is that this doesn’t capture all the comments. For example, I only get comments 1-165 for March 31st, and comments 1-203 (out of 204!) for March 30th.

    Even when I search for #166 on March 31st, I don’t get it in cache. I get this: http://www.google.com/search?q=http%3A%2F%2Fnielsenhayden.com%2Fmakinglight%2Farchives%2F010104.html%23010104+%23166&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a

    but clicking on cache doesn’t get me the “re post #166” I see on the search page.

    Any suggestions?

  43. I should have mostly-complete comment collections for the posts I wrote. Occasionally gMail intercepts comment notifications and knocks them into my Spam folder, where they get deleted after 30 days. I just looked, and there were 58 such comments, going back to mid-April. So I can’t say with any confidence that I have full collections of those comment threads.

  44. I’ll let you know what I don’t have once I get an otherwise-complete collection.
    -s

  45. Zack, downloaded, and ARGH.
    Do you have time (if I compile links) to just click them all & get all the comments? Or anyone else who has the same result Zack does?

  46. What I have saved: The numbers are (number of comments saved/total number claimed by cache).

    The starred lines are ones where I have incomplete comments. I’m going to post the links here and on Absolute Write and ask other people to pretty please click them & see if they get more comments. They’re in cache. I just can’t get them. Not sure why.

    March 1, 2008, Department of Who’s Surprised?: 66/66
    March 3, 2008, Can you read this?: 53/53
    March 3, 2008, All come singing: 69/69
    * March 4, 2008, Greyhawk’s flags at half-staff: 199.5/253
    March 11, 2008, Phase one: collect underpants: 265/265
    * March 13, 2008, Open thread 103: 221/936
    March 16, 2008, Literary Divination, A Parlour Game: 106/106
    March 18, 2008, Arthur C. Clarke, 1917-2008: 177/177
    March 20, 2008, Going to need a bigger laser: 174/174
    * March 28, 2008, Open thread 104: 238.5/931
    March 28, 2008, Divided by common errors: 34/34
    * March 30, 2008, The photograph that terrorized London: 202.5/204
    March 31, 2008, Deep Values: 434/434

    April 1, Amsterdam: 70/70
    April 4, 2008, Pity the Times: 167/167
    April 4, 2008, Forty Years Gone: 70/70
    * April 6, 2008, Heads they win; tails we lose: 174.5/320
    April 6, 2008, Some must employ the scythe: 126/126
    April 9, 2008, Don’t Miss the Deadline: 25/25
    April 11, 2008, Future of Publishing, Part 5,271,009: 32/32
    April 12, 2008, A book by its cover: 37/37
    * April 13, 2008, Could lead to goose-stepping: 150.5/469
    April 13, 2008, Bury my acorns at Wounded Knee: 87/87
    * April 14, 2008, Open thread 105: 215/906
    April 16, 2008, Newsweek invents an alarming trend: 217/217
    April 16, 2008, Housekeeping: 7/7
    April 17, 2008, Little Brother: 180/180
    April 22, 2008, NBC News calls Penn for Hillary: 124/124
    April 23, 2008, Live in San Francisco, it’s TNH!: 18/18
    April 24, 2008, The Rather Difficult Font Game: 125/125 (yahoo cache)
    April 25, 2008, Indistinguishable from parody: 152/152
    April 26, 2008, Eric Clapton, White Power enthusiast: 105/105
    April 26, 2008, Feeling the Heat: 32/32 (msn cache)
    April 26, 2008, Teresa in the Observer: 13/13
    April 26, 2008, SFWA election results: 45/45
    April 27, 2008, “Where do people find the time?”: 198/198
    April 27, 2008, Open thread 106: 107/107

  47. Shweta:

    I got “could lead to goose-stepping” all the way to comment #468 by going through MSN’s cache. What email addy should I use? Or should I put them up on the same site I already put a bunch of things for the nh’s&abi to get to?

  48. Shweta:

    I got all the ones on your list.*buffs fingernails* I’m putting them up where the gang can get at them; let me know if you need me to email them anywhere.

  49. Awesome! If you give me links or files, I can put together a complete all-together March/April set to upload. That might be more user-friendly than two chunks?

    My email is shweta at divmod dot com.

    Next step, checking what I have against Teresa’s list of known last posts. whee.

  50. I just mailed Patrick the full comment thread for Open Thread 106 — and I know it was the full thread, because I’d just posted and was ego-refreshing at the time. (I wasn’t refreshing *that* much. Don’t look at me like that… IT WASN’T MY FAULT!) So that’s 285 of 285.

  51. (Sorry for the posting-hiccups upthread–new smartphone, old habits. Extras can be deleted, please)

    As posted to PNH’s “back” thread, it may not be good to put up the March 1 backup back up online yet, because Google, Yahoo, et. al. will start repopulating their cache with these.

    There are folks who might not be back until Monday to do their View All By search (and someone mentioned reference to 600 VAB’s found on Google)

  52. You probably know this already, but… Even if the drive controllers are fried, it’s very likely that some or all of the data on the dead disk can be recovered. It’s expensive, but not as expensive as one might think. (On the order of $1000, as far as I can tell.)

    And yes, if it comes to that and if you’re taking up a collection, I’ll chip in.

  53. Kathryn: Unfortunately, the Internet Archive isn’t helpful right now, although it may be in six months or so. From the FAQ at http://www.archive.org/about/faqs.php#103:

    *****

    Why are there no recent archives in the Wayback Machine?

    It generally takes 6 months or more for pages to appear in the Wayback Machine after they are collected, because of delays in transferring material to long-term storage and indexing.

    There is no access to files before they appear in the Wayback Machine.

    *****

    For example, searching for http://nielsenhayden.com/makinglight/archives/008845.html (Seatbelts Save Lives), which is missing relatively recent comments. The Archive has “Search Results for Jan 01, 1996 – Nov 06, 2007” with three hits: May 06, 2007, May 17, 2007, Jul 18, 2007.

  54. ML seems like it’s a big enough site that, once you’re fully back up and running, you probably want to consider setting up off-site database replication, which mySQL makes fairly easy. (Basically, SQL queries that modify the database get applied to the primary database and also get forwarded to a database running on another machine.) That way you’re at least more likely to be protected against catastrophic hardware failure, as seems to have occurred here, and the replicated database is closer to real-time current.

    I may be able to offer you a mySQL instance running on a reasonable box which could be used for replication, depending on how big the database is and how much bandwidth it uses. My e-mail address is available via the web site linked from my name up above if you’re interested.

  55. Google has the annoying feature of terminating the cache display after a certain byte count is reached. Use MSN search instead. Unfortunately, their spider didn’t run in time to get the last 100 or so comments of the Shirky thread.

  56. Ditto the replication offer, possibly; and I’d be happy to try to help with whatever I could, but it looks like there’s a lot of smart folks in this thread that have things handled. If there’s a need for massaging html back into real data, I could possibly make a script to do that, though I don’t know how complicated moveable type’s data formats are.

Comments are closed.