[Klug-general] MySQL Clusters

Sun Apr 15 11:40:10 UTC 2012

OK first some idea of what I wanted:

In hand I have a 16gb DB which around 70% of that is brought over by 
Pentaho ETL from an AS400 everyday. Many tables are reasonable size, 
with a few of the biggest containing >2 million records. Record sizes 
can be quite large on some of these too.

The idea was to provide 1) a faster backup on the data that exists 
purely in this system and does not come from AS400, 2) high availability 
for this critical data, 3) remove the bottle neck that the Pentaho BI 
analysis and reporting tools, plus other external queries were having on 
performance.

Of course other aspects were being dealt with via other options, however 
the idea of using a MySQL cluster needed to be considered. The set was 
amazingly simple, and I used the suggested preferred config  of four 
storage nodes and one management node. As this was using bits I had 
around I didn't have any machines spare that could handle the memory 
requirements. The suggested memory sizing (as the cluster works by 
caching the WHOLE db to memory) is at least double the size of DB on 
disk, with some other calcs for index sizes and table information. I 
calc'd I would possibly need around 40gb of memory. For testing swap of 
this size would seriously impact performance so I only gave it 15gb of swap.

Config as I said was simple. Install distros on all nodes, install MySQL 
cluster server on all of them. Configure the manager node (which need 
not be high spec, the storage nodes need the memory) with a couple of 
lines in my.cnf, on the nodes do like wise (the MySQL web pages were 
fantastic help), took about 30min to actually get the servers 
configured, was that simple.

Loading the data was then going to be the challenge. I restored a copy 
of the full database onto the management node. Now, to make the db 
cluster aware it is only a matter of changing the table engine to 
ndbcluster so one by one I did an ALTER TABLE xxx SET ENGINE=ndbcluster.

It was quite amazing watching the nodes and see this new table and the 
data replicate to each storage node, the table though was instantly 
available for query.

Did that for all the small ones while watching the memory usage on each 
node. A few times I had to restart nodes when various messages came up 
because tweaks to the configures were required. Rolling restarts are 
very good on the cluster and having four nodes is the ONLY way for this 
to work, some have tried it with two but you would need to bring the 
whole cluster down for any changes. Not good.

As I then started to edge into the swap disk space hit issues as I knew 
I would, the whole point is it should all be in memory. Before I got 
there I did a few tests between my existing MySQL server and this 
cluster. Considering the main server was under load and far out spec the 
processor etc the cluster did still beat it by a factor of 10 most of 
the time. I say most of the time, the documents do say that certain 
features are not supported and its a long list.

One major issue I saw but didnt test was the issue over constant 
clearing of tables and reinsert. The documents mention that memory will 
not be reclaimed (fully) when doing this so for my ETL processes this 
could be a major problem, perhaps requiring a rolling restart each time 
I do this. Not happy about that unless i can test it out.

I did find it easy to break, but then I wasn't really using the full 
memory so that might be understandable. So I would say admin could be 
needed more frequently than I would have liked, but then this is HA 
(high availability computing), and thats understandable.

I eventually killed the project as it became clear that the scale of 
work and cost (will mention that in a moment) would far out weigh our 
current issues and simpler solutions to reduce the issues above would work).

Costs then, well OK the software is free, but the hard won't be. For the 
memory size I looked around and found an IBM xSeries server that could 
handle up to 128Gb of memory and that came to around £800 for the bare 
unit. Lots of drive bays and these are only needed for the OS and disk 
copy of the DB, so I would have gone for two for just mirror RAID for 
each node. The setup for above using just list prices came in at around 
£8000. Not bad really for what it could do if it could deliver what it 
promised.

I am still most impressed by the performance and maybe I will comeback 
to it should I have spare kit a little better than what I had at this time.

If anyone has any particular questions then feel free to ask so long as 
they won't give away company secrets of course. :-)

  Kev

On 13/04/12 13:47, Kevin Groves wrote:
> Hmm. Was going to follow up on the posting I thought I had made about 
> my experiments with MySQL Clusters. Can't find it. Did I imagine that 
> or was there one?
>
> In any case anyone interested before I throw a large email onto the list?
>
> Thanks
> Kev,
>
> _______________________________________________
> Kent mailing list
> Kent at mailman.lug.org.uk
> https://mailman.lug.org.uk/mailman/listinfo/kent
>