[Gllug] Lots of computing power required - is Amazon EC2 an answer?

salsaman at xs4all.nl salsaman at xs4all.nl
Fri Jul 1 23:06:39 UTC 2011


On Fri, July 1, 2011 20:56, Adrian McMenamin wrote:
> On Thu, June 30, 2011 11:05 pm, Adrian McMenamin wrote:
>> I have struggled to find an algorithmic way out of this - but cannot see
>> one.
>>
>> I have some code which is very parallelisable but requires huge
>> computing
>> resources - on a AMD 64 dual box I have here I reckon it will take about
>> 20 days to run - but if I had access to about 600 CPUs it might take 20
>> -
>> 30 minutes (it's run against a massive XML file - 800 MB or so - with
>> different parameters)
>>
>> I am also doing this for a student project so have no big budget to
>> spend
>> - but would be willing to spend some money.
>>
>> Is Amazon EC2 a possible answer - it seems less than clear to me if this
>> is about web serving or computing - so I thought I'd ask where people
>> will
>> know the answer straight away :)
>>
> Well, in despair last night I spoke too soon - I have found algorithmic
> improvements that have speeded this all up by about a factor of 50, which
> means its all manageable now - the code is in Groovy and I realised that
> if I used a specialised version of the Java LinkedHashMap, instead of just
> a generic Map, and took better advantage of the localisation within the
> data then I was likely to see a speed up - and indeed I do, at least on my
> test file.
>
> The code is not a secret - its on github
> https://github.com/mcmenaminadrian - and (again in despair) - I described
> the way it works here:
> http://cartesianproduct.wordpress.com/2011/06/30/help-computing-power-or-better-algorithm-required/
>
> The offers from people to make clusters available - one on the list and
> one privately - were great and it is still the case that the code - which
> is highly parallel - would, I am sure, wizz through it. But I am hoping I
> don't need to take up any offers.
>
> I was intrigued by the idea of building my own cluster - I really would
> seriously consider that, but I wonder if it would be a diversion when I am
> seriously up against the clock on the whole process - my project report
> has to be submitted in mid-September and while I am still on track (now I
> seem to have solved this problem), it is extremely tight for time. Can
> these things be build from scratch in a day realistically?
>
> Thanks for all the help
>
> Adrian
>
>
>
> --
> Gllug mailing list  -  Gllug at gllug.org.uk
> http://lists.gllug.org.uk/mailman/listinfo/gllug
>
>



I was going to suggest hashing, but I was waiting to see the algorithm
first. Good that you found a way to speed it up.

Salsaman.


--
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list