[Gllug] Virtual disk allocation advice requested

Fri Jul 4 08:25:14 UTC 2008

On Fri, Jul 04, 2008 at 12:11:16AM +0100, Jose Luis Martinez wrote:
> All the below reasoning is correct, but virtualization relies heavily
> in the assumption that hardware is so fast that such considerations
> are not necessary in most cases.

That's not really entirely accurate.

The first generation of x86 hardware that had virtualization extensions
was not really focused on performance - it was merely enablement.
ie, the ability to run unmodified guest operating systems without
dynamic code re-writing, or resorting to paravirtualization.

Even with this hardware assist, pure paravirtualization is still
faster for both CPU and I/O intensive workloads - if you had the
choice you would not use hardware assist. 

The second generation of x86 virtualization hardware (AMD's Barcelona
with NPT, and Intel's equivalent  EPT) provides more advanced page
table handling allowing CPU intensive workloads to match paravirt
or even exceed it if you combine with huge pages. This is principally
to avoid/reduce the overhead of context switches which would otherwise
result in TLB flushes (which kill performance). 

This still doesn't really do very much for I/O performance, mostly
helping CPU intensive workloads. For any serious I/O performance you
need to paravirtualize the drivers to take into account the fact that
you are virtualized. Emulating real hardware is just too inefficient
because real hardware was designed around the performance characteristics
of physical machines, and virtualized machines have wildly difference
characteristics. You have no choice but to re-design the I/O driver
model around the characteristics of a virtualized machine. Hardware
assist can't help with this.

Eventually x86 will get real IOMMU's and then in combination with 
PCI-IOV your real physical NIC and disk adapter will support 
virtualization. This will enable your guest to access dedicated
I/O channels on the physical devices in the underlying host without
needing backend drivers in a Dom0 / other guest. Even this won't
neccessarily solve the issue though, because a important reason
for virtualization is to de-couple OS from te hardware to allow
things like migration & running legacy OS on new hardware. So the
world will primarily use paravirtualized I/O drivers for performance
indefinitely.

> > So inevitably another guest using the NFS server involves a lot of
> > switching, something like this:
> >
> >  +----------------+----------------+----------------+
> >  | dom0           | NFS server     | using NFS      |
> >  |                |                |                |
> >  | bridge         |                |                |
> >  +----------------+----------------+----------------+
> >          <---------- request ---------------
> >       routed by dom0
> >          ------ request --->
> >          <----- disk I/O ---
> >       handled by dom0
> >          ------ disk I/O -->
> >          <----- response ---
> >       routed by dom0
> >          ----------- response -------------->
> >
> > Something like the above, I may have missed out or exaggerated some
> > context switches, but I think that single request/response turns into
> > on the order of 6 context switches.  The situation is much worse if
> > you're not paravirtualizing.

The situation is actually even worse than this in Xen, because you have
extra context switches into the hypervisor, and the scheduling is very
sub-optimal. eg, if a guest is 10ms into a 30ms timeslice and it blocks
on I/O, then the HV will schedule another VM. There's no guarentee that
this other VM is the one that is required to service I/O of the guest.
Further, there's no guarentee this other guest, will them schedule the
QEMU device model required to complete the I/O. So in pathelogical cases
you guests doing I/O can waste/loose alot of their schedular timeslice.
Xen developers are attempting to solve this problem by introducing the
concept of driver domains and moving the I/O model into these domains
instead of Dom0. The schedular will allow the guest to hand-off directly
to its driver doamin upon I/O and not go back into the schedular, thus
not loosing its timeslice and reducing the number of context switches.

KVM is quite different because there is no artificial separation between
the Dom0 and hypervisor so you can avoid a whole set of context switches
and thus avoid trashing the TLB so often. It also avoids the schedular
problems because the VM is treated as part of the QEMU process for the
Linux schedular's POV. Of course VirtIO is not yet as mature as Xen's
paravirt I/O driver architcture so performance isn't consistently better
yet, but under certain workloads network is already on a par with Xen.

> > In practice this is noticably slow.  NFS over UDP was unusable when I
> > tried it, and NFS(v3) over TCP is usable but slow.

In Xen the network drivers are very well optimized for DomU <-> DomU
connectivity allowing you to get extremely good I/O performance for
TCP at least. They do this by avoiding any TCP checksums for the DomU
to DomU case, and also allowing huge MTUs to avoid need for fragmentation.

Daniel
-- 
|: http://berrange.com/     -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://freshmeat.net/~danielpb/    -o-   http://gtk-vnc.sourceforge.net :|
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: Digital signature
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20080704/f668f274/attachment.pgp>
-------------- next part --------------
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug