Hermits Rock

Go to content Go to navigation

Creating a Digital Archive

In my ongoing effort to diminish my physical footprint on this world, I’m poised to scan every handwritten note I’ve ever taken. Before I do, however, a few questions:

  1. PDF or TIFF? (The proprietary nature of PDFs concerns me, but I prefer to treat the notes as documents rather than images.)
  2. Is 1 MB per page reasonable enough for legibility?
  3. How many backups of the entire archive would you recommend?


I have no opinion on 1-2.

Two backups are completely sufficient so long as they are in separate physical locations.

Somewhat OT: I saw a notice once in the cloackroom of my main research library in Oxford that went something like this: “Laptop bag LOST in this room! The laptop was not inside, but it did contain CDs of data for my PhD collected over the course of several years. These were the only copies!!”

It’s a little difficult for me to feel sorry for someone capable of that level of stupidity.

I’ve been hearing more about network backups lately; that seems like it might be a good option for a lot of things that one likes to keep but doesn’t need critically.

I have several small hard drives lying around, but I might need to buy another backup drive sometime—one that’s bigger than our current HD, so we can back the whole thing up there.

You people are no help, so I’m scanning anyway. I’ve already got about 375 pages done already at 350 MB, give or take the number of pages. I should have twice that by the end of the day.

I would have no worries about the PDF format. I don’t see Adobe going anywhere any time soon, and even if they were to perish, there will be plenty of utilities for opening PDFs or converting to other formats.

JH is spot on about the separate locations.

you’re welcome for all our help…

you should ask laura’s librarian friends…

Good idea!

Which is also to say, even though I’ve already started, it doesn’t mean I can’t change what I’m doing. After all, I’ve got a few thousand pages to go…

800 pp & 1 Gigabyte down (some notes I’m scanning hi res, so the file size is bigger—the largest single file today was 250 MB)… a lot more to go.

PDF is probably okay. Best, of course, would be if you get your hands on some OCR software. If you know Perl or feel like learning it, Greenstone might be helpful.

More about LOCKSS and stuff when I’m at a faster computer. In the meantime, if there’s anything you really want, keep a hard copy.

Most of the stuff isn’t crucial to keep forever, but for some reason I feel uneasy dumping it for good. This seemed to me like a decent compromise between letting go and renting rooms for storage.

Anyway, the scanner has a hardcoded OCR app programmed into it, though I don’t know if it works for handwriting. Other than document size, why would that be better than imaging?

OCR makes your stuff searchable. Otherwise, it’s just a bunch of images. Of course, if what you’re preserving is mostly images, then that’s not a problem.

Also, as promised, my friend and fellow librarian/MFA Karen made a video about LOCKSS.

I could probably also run it through OCR software after scanning as an image, yes? Then I will have the best of both worlds!