[Gllug] Lots of computing power required - is Amazon EC2 an answer?
Adrian McMenamin
adrian at mcmen.demon.co.uk
Fri Jul 1 18:56:33 UTC 2011
On Thu, June 30, 2011 11:05 pm, Adrian McMenamin wrote:
> I have struggled to find an algorithmic way out of this - but cannot see
> one.
>
> I have some code which is very parallelisable but requires huge computing
> resources - on a AMD 64 dual box I have here I reckon it will take about
> 20 days to run - but if I had access to about 600 CPUs it might take 20 -
> 30 minutes (it's run against a massive XML file - 800 MB or so - with
> different parameters)
>
> I am also doing this for a student project so have no big budget to spend
> - but would be willing to spend some money.
>
> Is Amazon EC2 a possible answer - it seems less than clear to me if this
> is about web serving or computing - so I thought I'd ask where people will
> know the answer straight away :)
>
Well, in despair last night I spoke too soon - I have found algorithmic
improvements that have speeded this all up by about a factor of 50, which
means its all manageable now - the code is in Groovy and I realised that
if I used a specialised version of the Java LinkedHashMap, instead of just
a generic Map, and took better advantage of the localisation within the
data then I was likely to see a speed up - and indeed I do, at least on my
test file.
The code is not a secret - its on github
https://github.com/mcmenaminadrian - and (again in despair) - I described
the way it works here:
http://cartesianproduct.wordpress.com/2011/06/30/help-computing-power-or-better-algorithm-required/
The offers from people to make clusters available - one on the list and
one privately - were great and it is still the case that the code - which
is highly parallel - would, I am sure, wizz through it. But I am hoping I
don't need to take up any offers.
I was intrigued by the idea of building my own cluster - I really would
seriously consider that, but I wonder if it would be a diversion when I am
seriously up against the clock on the whole process - my project report
has to be submitted in mid-September and while I am still on track (now I
seem to have solved this problem), it is extremely tight for time. Can
these things be build from scratch in a day realistically?
Thanks for all the help
Adrian
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list