[Klug-general] MySQL Clusters
kgroves at ksoft-electrical-projects.co.uk
Sun Apr 15 11:40:10 UTC 2012
OK first some idea of what I wanted:
In hand I have a 16gb DB which around 70% of that is brought over by
Pentaho ETL from an AS400 everyday. Many tables are reasonable size,
with a few of the biggest containing >2 million records. Record sizes
can be quite large on some of these too.
The idea was to provide 1) a faster backup on the data that exists
purely in this system and does not come from AS400, 2) high availability
for this critical data, 3) remove the bottle neck that the Pentaho BI
analysis and reporting tools, plus other external queries were having on
Of course other aspects were being dealt with via other options, however
the idea of using a MySQL cluster needed to be considered. The set was
amazingly simple, and I used the suggested preferred config of four
storage nodes and one management node. As this was using bits I had
around I didn't have any machines spare that could handle the memory
requirements. The suggested memory sizing (as the cluster works by
caching the WHOLE db to memory) is at least double the size of DB on
disk, with some other calcs for index sizes and table information. I
calc'd I would possibly need around 40gb of memory. For testing swap of
this size would seriously impact performance so I only gave it 15gb of swap.
Config as I said was simple. Install distros on all nodes, install MySQL
cluster server on all of them. Configure the manager node (which need
not be high spec, the storage nodes need the memory) with a couple of
lines in my.cnf, on the nodes do like wise (the MySQL web pages were
fantastic help), took about 30min to actually get the servers
configured, was that simple.
Loading the data was then going to be the challenge. I restored a copy
of the full database onto the management node. Now, to make the db
cluster aware it is only a matter of changing the table engine to
ndbcluster so one by one I did an ALTER TABLE xxx SET ENGINE=ndbcluster.
It was quite amazing watching the nodes and see this new table and the
data replicate to each storage node, the table though was instantly
available for query.
Did that for all the small ones while watching the memory usage on each
node. A few times I had to restart nodes when various messages came up
because tweaks to the configures were required. Rolling restarts are
very good on the cluster and having four nodes is the ONLY way for this
to work, some have tried it with two but you would need to bring the
whole cluster down for any changes. Not good.
As I then started to edge into the swap disk space hit issues as I knew
I would, the whole point is it should all be in memory. Before I got
there I did a few tests between my existing MySQL server and this
cluster. Considering the main server was under load and far out spec the
processor etc the cluster did still beat it by a factor of 10 most of
the time. I say most of the time, the documents do say that certain
features are not supported and its a long list.
One major issue I saw but didnt test was the issue over constant
clearing of tables and reinsert. The documents mention that memory will
not be reclaimed (fully) when doing this so for my ETL processes this
could be a major problem, perhaps requiring a rolling restart each time
I do this. Not happy about that unless i can test it out.
I did find it easy to break, but then I wasn't really using the full
memory so that might be understandable. So I would say admin could be
needed more frequently than I would have liked, but then this is HA
(high availability computing), and thats understandable.
I eventually killed the project as it became clear that the scale of
work and cost (will mention that in a moment) would far out weigh our
current issues and simpler solutions to reduce the issues above would work).
Costs then, well OK the software is free, but the hard won't be. For the
memory size I looked around and found an IBM xSeries server that could
handle up to 128Gb of memory and that came to around £800 for the bare
unit. Lots of drive bays and these are only needed for the OS and disk
copy of the DB, so I would have gone for two for just mirror RAID for
each node. The setup for above using just list prices came in at around
£8000. Not bad really for what it could do if it could deliver what it
I am still most impressed by the performance and maybe I will comeback
to it should I have spare kit a little better than what I had at this time.
If anyone has any particular questions then feel free to ask so long as
they won't give away company secrets of course. :-)
On 13/04/12 13:47, Kevin Groves wrote:
> Hmm. Was going to follow up on the posting I thought I had made about
> my experiments with MySQL Clusters. Can't find it. Did I imagine that
> or was there one?
> In any case anyone interested before I throw a large email onto the list?
> Kent mailing list
> Kent at mailman.lug.org.uk
More information about the Kent