[Gllug] VACANCY: Software Developer/Analyst (Cloud Computing)

James Courtier-Dutton james.dutton at gmail.com
Thu Nov 4 13:06:28 UTC 2010


On 4 November 2010 12:39, John Hearns <hearnsj at googlemail.com> wrote:
> On 4 November 2010 10:35, Richard Palmer <richard.d.palmer at kcl.ac.uk> wrote:
>> Dear list,
>>
>>        We have a vacancy at the Centre for e-Research at King's College
>>        London for a software developer/analyst. A quick job summary:
>>> internal cloud (using underutilised institutional computers and storage). The
>> purpose of this cloud pilot is to support researchers, both by managing
>> research data (integrated with a Fedora digital repository system) and by
>> allowing the execution of computation-intensive processing.
>
> A reference to this digital repository system would be very interesting.
> As a coincidence, I've been participating in a thread on the Beowulf
> list regarding long-term data storage,
> ie digital archaeology and the problems of accessing long dead data
> when the physical drves are dead,
> or the data format is forgotten and not self-documenting.
> I was having a small rant about data now being stored in the cloud, so
> data should expect to migrate between
> several generations of physical storage type without the user needing
> to know, and it should carry its metadata with itself.

Technology called Virtual Machines has done a good job of mitigating
the risk of loosing access to long dead data.
While the data is still readable by some application, create a VM that
runs this application so that you can run it many years in the future
on modern hardware.
If possible, also include any know data format documentation with the VM.

There is still the problem of the dead data being on media that you
don't have a reader for anymore.
For example, some document on old 3inch Amstrad floppy disk or some
ancient backup tape format.
The only way round this is to audit all data and what media it is on.
The put a plan in place for data refresh per media type.
I.e. Starting on date X, retrieve all data on soon to be expired media
type Y and copy it to new modern media type Z.
Then dispose of expired media type Y.
It is also sensible to add error detection checksums to the data. e.g.
Sha256sum in order to detect faulty media that would then require
extra work to do data recovery during the media refresh.
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list