VirtIO Memory Ballooning
VirtIO provides Memory Ballooning: the host system can reclaim memory from virtual machines (VM) by telling them to give back part of their memory to the host system. This is achieved by inflating the memory balloon inside the VM, which reduced the memory available to other tasks inside the VM. Which memory pages are given back is the decision of the guest operating system (OS): It just tells the host OS which pages it does no longer need and will no longer access. The host OS then un-maps those pages from the guests and marks them as unavailable for the guest VM. The host system can then use them for other tasks like starting even more VMs or other processes.
If later on the VM need more free memory itself, the host can later on return pages to the guest and shrink the holes. This allows to dynamically adjust the memory available to each VM even while the VMs keep running.
For this mechanism to work the guest OS needs support for with.
The Linux contains support out-of-the-box, for Microsoft Windows the VirtIO Drivers for Windows contains the
When setting up a VM two memory sizes can be specified (in
<memory unit='GiB'>2</memory> <currentMemory unit='GiB'>1</currentMemory>
This configures the maximum size to
2 GiB and inflates the balloon leave only
There are two
libvirt commands to change those settings:
virsh setmaxmem --domain $VM --size 3G --config
This updates the maximum memory. Please note that Qemu does not allow changing the size while the VM is running, so you need to shutdown the VM first.
But you can modify the balloon by running the following command:
virsh setmem --domain $VM --size 1500M --current
To query the actual memory balloon size, you can use the command
virsh dominfo $VM:
Max memory: 2097152 KiB Used memory: 2097152 KiBB
There also exists the
virsh dommemstat --domain $VM command.
The output depends on the fact, if the OS supports the ballooning driver or not.
For example during the boot phase, where GRUB is still running, you will see this:
actual 67108864 last_update 0 rss 67215704
Later on when Linux runs you get this:
actual 2097152 swap_in 10936 swap_out 46632 major_fault 8471 minor_fault 111268966 unused 126932 available 2052340 usable 916624 last_update 1558618059 rss 1922568
For this to work the ballooning driver must be told to update these statistics on a regular basis. This can be enabled by running the following command:
virsh dommemstat --domain $VM --period 5
This can be either applied to the running VM using
--live or to the persistent configuration using
--config, which updates the libvirt XML to look like this:
<memballoon model='virtio'> <stats period='5'/> </memballoon>
The following values are always reported:
The actual memory size in KiB available with ballooning enabled.
The time in seconds sine the UNIX epoch (1970-01-01) at which the statistics where last updated.
0 means that polling is not enabled.
The resident set size in KiB, which is the number of pages currently “actively” used by the Qemu process on the host system.
Qemu by default only allocates the pages on demand when they are first accessed.
A newley started VM actually uses only very few pages, but the number of pages increases with each new memory allocation.
The following values are only reported, if the guest OS supports them and polling is enabled:
The number of swapped-in and swapped-out pages as reported by the guest OS since the start of the VM.
The number of page faults as reported by the guest OS since the start of the VM.
Minor page faults happen quiet often, for example when first accessing newly allocated memory or on copy-on-write. They do not required disk IO per-se as only the internal page table data structure must be updated.
Major page faults on the other hand require disk IO as some data is accessed, which must be paged in from disk first.
Inside the Linux kernel this actually is named
That memory is available for immediate use as it is currently neither used by processes or the kernel for caching.
So it is really unused (and is just eating energy and provied no benefit).
Inside the Linux kernel this is named
This consists of the free space plus the space, which can be easily reclaimed.
This for example includes read caches, which contain data read from IO devices, from which the data can be read again if the need arises in the future.
Inside the Linux kernel this is named
This is the maximum allowed memory, which is slightly less than the currently configured memory size, as the Linux kernel and BIOS need some space for themselves.
Optimizing memory usage
The memory balloon can be used to shift memory between VMs and the host system. But finding the right size is tricky and needs some serious thinking.
For further discussion here is an example from my current VM:
actual 1_536_000 1.5 GB available 1_491_188 1.4 GB usable 592_880 0.6 GB unused 118_808 0.1 GB
actualis the currently configured memory size of 1½ GiB.
availableis slightly less as the Linux kernel and BIOS need some extra space ~44 MB.
unusedis currently really unused and does not contain any data, not even cached data. You can steal this amount of memory by increasing the memory balloon without the guest OS having to do any extra work.
usableis somewhere between
availableas it includes data, the guest OS can free on demand. This includes caches and other data, which can be fetch again from block devices. This will then have a higher cost when the data is actually needed again in the future. On the other hand you might be lucky and the data is not needed again at all. Bug you gain some more free memory to speed up some other tasks now.
So this in a bet on the future, which either will pay off or make your VM crawl as slowly as hell.
This is equivalent to increasing the memory balloon by some amount.
That amount can be anything up to
available, but taking away too much memory from the VM will decrease its performance:
It will start swapping, which requires slow IO operations and will take a lot of more time than accessing memory.
If you become too greedy the Linux kernel might also start killing processes as its out-of-memory-handler will kick in more often.
So probably you should stay below
usable or even
Its probably also a good idea to do it iteratively: Just take some small amount from many VMs and monitor their behavior, if they still run fine with their reduced memory foot-print. If they still run fine, you can take more memory from them in the next iteration.
This is equivalent to decreasing the memory balloon.
You should do that if the VM starts swapping, which indicates that it is running out-of-memory.
So if you see a large increase in
swap_out you should shrink the memory balloon.
The same applies when
major_fault increases as major faults need to fetch data from the block device.
In contrast to that
minor_fault do not access block devices immediately, but still indicate the need for more memory.
Currently (2019-05) that ballooning has to be done manually with Qemu. There was a project from 2013 to implement automatic ballooning, but it was never completed.
The naming of the values is somehow confusing, as different components are involved, which name the same thing differently:
- In the Linux kernel the balloon driver is implemented in linux:drivers/virtio/virtio_balloon.c.
- It exports some statistics over the VirtIO interface to Qemu, which is implemented in qemu:hw/virtio/virtio-balloon.c.
libvirtqueries it over the JSON protocol implemented in libvirt:src/qemu/qemu_driver.c.
- The command
virsh dommemstatis implemented in libvirt:tools/virsh-domain-monitor.c.
The following table shows the values and how they are named in the different components:
The Linux kernel provides the memory statistics itself in /proc/meminfo in a human readable format:
MemTotal: 1491188 kB MemFree: 83324 kB MemAvailable: 435308 kB Cached: 575780 kB SReclaimable: 53968 kB VmallocTotal: 34359738367 kB
More low-level information is available in /proc/vmstat:
nr_free_pages 28989 nr_zone_inactive_anon 97759 nr_zone_active_anon 91940 nr_zone_inactive_file 13150 nr_zone_active_file 95545 nr_zone_unevictable 0 nr_zone_write_pending 146 ... nr_slab_reclaimable 13522 nr_slab_unreclaimable 13643 ... pswpin 3169 pswpout 35433 ... pgfault 112897306 pgmajfault 11888 ... htlb_buddy_alloc_success 0 htlb_buddy_alloc_fail 0 ...