[Malvern] Document Storage

Ian Pascoe ianpascoe at btinternet.com
Mon Sep 17 20:57:02 BST 2007


Geoff - thanks.

Andy, how do you cope with multiple paged documents and what / how do you
know what file contains what information?

E

-----Original Message-----
From: malvern-bounces at mailman.lug.org.uk
[mailto:malvern-bounces at mailman.lug.org.uk]On Behalf Of Andrew Morris
Sent: 17 September 2007 11:24
To: Malvern at mailman.lug.org.uk
Subject: Re: [Malvern] Document Storage


Ian,

 From experience, starting from scratch with a pile of past documents is
time-consuming ... you really don't realise just how much time it takes
to scan in a multi-page document. Start with current documents, as they
appear, and add in a bunch of back documents to each session. With time,
you will clear the backlog, and build a library.

Any scanned document makes for a much bigger file than an original print
file. Take any bill PDF from BT or equivalent, print it and scan the
result. The scan is easily 10-20 times the file size. OCR doesn't cope
easily with anything esoteric as a font. Plus, OCR'ed stuff cannot be
used in legal circumstances, but scanned stuff can be; it's how some
evidence has been stored for years (used to be microfilm, now it's
high-res digital).

I scanned everything into PDF on an HP PSC 1210 all-in-one scanner and
printer, generally produced about 500k-1M per page, which sometimes
compressed a bit in 7zip, but not always. That continued until the
scanner software hit some sort of timeout (about 3 years) at which point
all the HP Director stuff refused to work (won't start), independent of
the platform it was loaded on (98, 98SE, ME, XP, Vista). Now I can only
do JPG from the scanner button. Don't know why, HP refused to talk to me
after I gave the question and then the age of the scanner. Works fine on
XSANE on Linux, but only to 600dpi. Windows would alias upto to 2400.

You can cut the size of file by choosing your destination media. If you
want to be able to reprint, then you need 300 dpi; if you only want to
read on a screen, then 150dpi is adequate, even allowing for future
resolution upgrades on displays.

I double-store everything current (up to 24 months) on HD and CD/DVD.
Burn the CD/DVD at low speed for long-term reliability (as you would do
for any archival stuff). Prune the HD at 24 months.

Andy


Ian Pascoe wrote:
> Morning Folks
>
> In the list's opinion which is the best way to store documents?
>
> In particular, as my own filing system is, well non existant, I was
thinking
> about scanning all necessary documents and then storing them eithre to HD
or
> CD / DVD.
>
> I've been trying to work out in my own mind what would be the better way
to
> store these scanned documents that will maintain the clarity and be of
> minimal size.
>
> So far it's looking like storing them as a tiff image, but I'm not sure
> whether it's worth the time to push them through an OCR tool and into an
> appropriate document format.  Either raw or compressed through something
> like 7Zip into a self extracting file, or such like
>
> It is not necessary for the stored images to be "legal" copies but are
> merely there for my own reference.
>
> Thoughts, apart from sorting out my paper filing system!
>
> E
>
>
>
> _______________________________________________
> Malvern mailing list
> Malvern at mailman.lug.org.uk
> https://mailman.lug.org.uk/mailman/listinfo/malvern
>
>

_______________________________________________
Malvern mailing list
Malvern at mailman.lug.org.uk
https://mailman.lug.org.uk/mailman/listinfo/malvern





More information about the Malvern mailing list