[Malvern] Document Storage

Andrew Morris zaglabod at btinternet.com
Mon Sep 17 11:24:35 BST 2007


Ian,

 From experience, starting from scratch with a pile of past documents is 
time-consuming ... you really don't realise just how much time it takes 
to scan in a multi-page document. Start with current documents, as they 
appear, and add in a bunch of back documents to each session. With time, 
you will clear the backlog, and build a library.

Any scanned document makes for a much bigger file than an original print 
file. Take any bill PDF from BT or equivalent, print it and scan the 
result. The scan is easily 10-20 times the file size. OCR doesn't cope 
easily with anything esoteric as a font. Plus, OCR'ed stuff cannot be 
used in legal circumstances, but scanned stuff can be; it's how some 
evidence has been stored for years (used to be microfilm, now it's 
high-res digital).

I scanned everything into PDF on an HP PSC 1210 all-in-one scanner and 
printer, generally produced about 500k-1M per page, which sometimes 
compressed a bit in 7zip, but not always. That continued until the 
scanner software hit some sort of timeout (about 3 years) at which point 
all the HP Director stuff refused to work (won't start), independent of 
the platform it was loaded on (98, 98SE, ME, XP, Vista). Now I can only 
do JPG from the scanner button. Don't know why, HP refused to talk to me 
after I gave the question and then the age of the scanner. Works fine on 
XSANE on Linux, but only to 600dpi. Windows would alias upto to 2400.

You can cut the size of file by choosing your destination media. If you 
want to be able to reprint, then you need 300 dpi; if you only want to 
read on a screen, then 150dpi is adequate, even allowing for future 
resolution upgrades on displays.

I double-store everything current (up to 24 months) on HD and CD/DVD. 
Burn the CD/DVD at low speed for long-term reliability (as you would do 
for any archival stuff). Prune the HD at 24 months.

Andy


Ian Pascoe wrote:
> Morning Folks
> 
> In the list's opinion which is the best way to store documents?
> 
> In particular, as my own filing system is, well non existant, I was thinking
> about scanning all necessary documents and then storing them eithre to HD or
> CD / DVD.
> 
> I've been trying to work out in my own mind what would be the better way to
> store these scanned documents that will maintain the clarity and be of
> minimal size.
> 
> So far it's looking like storing them as a tiff image, but I'm not sure
> whether it's worth the time to push them through an OCR tool and into an
> appropriate document format.  Either raw or compressed through something
> like 7Zip into a self extracting file, or such like
> 
> It is not necessary for the stored images to be "legal" copies but are
> merely there for my own reference.
> 
> Thoughts, apart from sorting out my paper filing system!
> 
> E
> 
> 
> 
> _______________________________________________
> Malvern mailing list
> Malvern at mailman.lug.org.uk
> https://mailman.lug.org.uk/mailman/listinfo/malvern
> 
> 



More information about the Malvern mailing list