Implement check whether (maximum) vCPUs doesn't exceed machine
type's cpu-max settings.
On older versions of QEMU the check is disabled.
Signed-off-by: Michal Novotny <minovotn@redhat.com>
On Thu, Jun 27, 2013 at 03:56:42PM +0100, Daniel P. Berrange wrote:
> Hi Security Team,
>
> I've discovered a way for an unprivileged user with a readonly connection
> to libvirtd, to crash the daemon.
Ok, the final patch for this is issue will be the simpler variant that
Eric suggested
The embargo can be considered to be lifted on Monday July 1st, at
0900 UTC
The following is the GIT change that DV or myself will apply to libvirt
GIT master immediately before the 1.1.0 release:
>From 177b4165c531a4b3ba7f6ab6aa41dca9ceb0b8cf Mon Sep 17 00:00:00 2001
From: "Daniel P. Berrange" <berrange@redhat.com>
Date: Fri, 28 Jun 2013 10:48:37 +0100
Subject: [PATCH] CVE-2013-2218: Fix crash listing network interfaces with
filters
The virConnectListAllInterfaces method has a double-free of the
'struct netcf_if' object when any of the filtering flags cause
an interface to be skipped over. For example when running the
command 'virsh iface-list --inactive'
This is a regression introduced in release 1.0.6 by
commit 7ac2c4fe62
Author: Guannan Ren <gren@redhat.com>
Date: Tue May 21 21:29:38 2013 +0800
interface: list all interfaces with flags == 0
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=971325
The problem was that if virPCIGetVirtualFunctions was given the name
of a non-existent interface, it would return to its caller without
initializing the pointer to the array of virtual functions to NULL,
and the caller (virNetDevGetVirtualFunctions) would try to VIR_FREE()
the invalid pointer.
The final error message before the crash would be:
virPCIGetVirtualFunctions:2088 :
Failed to open dir '/sys/class/net/eth2/device':
No such file or directory
In this patch I move the initialization in virPCIGetVirtualFunctions()
to the begining of the function, and also do an explicit
initialization in virNetDevGetVirtualFunctions, just in case someone
in the future adds code into that function prior to the call to
virPCIGetVirtualFunctions.
This fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=979290https://bugzilla.redhat.com/show_bug.cgi?id=979330
The node device driver was written with the assumption that udev would
use a "change" event to notify libvirt of any change to device status
(including the name of the driver it was bound to). It turns out this
is not the case (see Comment 4 of BZ 979290). That means that a
dumpxml for a device would always show whatever driver happened to be
bound at the time libvirt was started (when the node device cache was
built).
There was already code in the driver (for the benefit of the HAL
backend) that updated the driver name from sysfs each time a device's
info was retrieved from the cache. This patch just enables that manual
update for the udev backend as well.
There were two errors, one as a direct result of commit id '8807b285'
and the other from cut-n-paste
TEST: nodedevxml2xmltest
.............. 14 OK
==25735== 3 bytes in 1 blocks are definitely lost in loss record 1 of 24
==25735== at 0x4A0887C: malloc (vg_replace_malloc.c:270)
==25735== by 0x344D2AF275: xmlStrndup (in /usr/lib64/libxml2.so.2.9.1)
==25735== by 0x4D0C767: virNodeDeviceDefParseNode (node_device_conf.c:997)
==25735== by 0x4D0D3D2: virNodeDeviceDefParse (node_device_conf.c:1337)
==25735== by 0x401CA4: testCompareXMLToXMLHelper (nodedevxml2xmltest.c:28)
==25735== by 0x402B2F: virtTestRun (testutils.c:158)
==25735== by 0x401B27: mymain (nodedevxml2xmltest.c:81)
==25735== by 0x40316A: virtTestMain (testutils.c:722)
==25735== by 0x37C1021A04: (below main) (libc-start.c:225)
==25735==
==25735== 16 bytes in 1 blocks are definitely lost in loss record 10 of 24
==25735== at 0x4A08A6E: realloc (vg_replace_malloc.c:662)
==25735== by 0x4C7385E: virReallocN (viralloc.c:184)
==25735== by 0x4C73906: virExpandN (viralloc.c:214)
==25735== by 0x4C73B4A: virInsertElementsN (viralloc.c:324)
==25735== by 0x4D0C84C: virNodeDeviceDefParseNode (node_device_conf.c:1026)
==25735== by 0x4D0D3D2: virNodeDeviceDefParse (node_device_conf.c:1337)
==25735== by 0x401CA4: testCompareXMLToXMLHelper (nodedevxml2xmltest.c:28)
==25735== by 0x402B2F: virtTestRun (testutils.c:158)
==25735== by 0x401B27: mymain (nodedevxml2xmltest.c:81)
==25735== by 0x40316A: virtTestMain (testutils.c:722)
==25735== by 0x37C1021A04: (below main) (libc-start.c:225)
==25735==
PASS: nodedevxml2xmltest
The first error was resolved by adding a missing VIR_FREE(numberStr); in
the new function virNodeDevCapPciDevIommuGroupParseXML().
The second error was a bit more opaque as the error was a result of copying
the free methodolgy of the existing code in virNodeDevCapsDefFree(). The code
would free each of the entries in the array, but not the memory for the
array itself. Added the necessary VIR_FREE(data->pci_dev.iommuGroupDevices)
and while at it added the missing VIR_FREE(data->pci_dev.virtual_functions)
although there wasn't a test that tripped across it (thus it's been lurking
since commit id 'a010165d').
Commit id '53d5967c' introduced the following:
TEST: storagevolxml2argvtest
.............. 14 OK
==25636== 358 (264 direct, 94 indirect) bytes in 1 blocks are definitely lost in loss record 67 of 75
==25636== at 0x4A06B6F: calloc (vg_replace_malloc.c:593)
==25636== by 0x4C95791: virAlloc (viralloc.c:124)
==25636== by 0x4CA0BB4: virCommandNewArgs (vircommand.c:805)
==25636== by 0x4CA0C88: virCommandNew (vircommand.c:789)
==25636== by 0x408602: virStorageBackendCreateQemuImgCmd (storage_backend.c:849)
==25636== by 0x405427: testCompareXMLToArgvHelper (storagevolxml2argvtest.c:61)
==25636== by 0x4064DF: virtTestRun (testutils.c:158)
==25636== by 0x40516F: mymain (storagevolxml2argvtest.c:195)
==25636== by 0x406B1A: virtTestMain (testutils.c:722)
==25636== by 0x37C1021A04: (below main) (libc-start.c:225)
==25636==
PASS: storagevolxml2argvtest
Commit '861d4056' introduced the following:
TEST: networkxml2xmltest
.................. 18 OK
==25504== 7 bytes in 1 blocks are definitely lost in loss record 5 of 23
==25504== at 0x4A0887C: malloc (vg_replace_malloc.c:270)
==25504== by 0x37C1085D71: strdup (strdup.c:42)
==25504== by 0x4CB835F: virStrdup (virstring.c:546)
==25504== by 0x4CC5179: virXPathString (virxml.c:90)
==25504== by 0x4CC75C2: virNetDevVlanParse (netdev_vlan_conf.c:78)
==25504== by 0x4CF928A: virNetworkPortGroupParseXML (network_conf.c:1555)
==25504== by 0x4CFE385: virNetworkDefParseXML (network_conf.c:2049)
==25504== by 0x4D0113B: virNetworkDefParseNode (network_conf.c:2273)
==25504== by 0x4D01254: virNetworkDefParse (network_conf.c:2234)
==25504== by 0x401E80: testCompareXMLToXMLHelper (networkxml2xmltest.c:32)
==25504== by 0x402D4F: virtTestRun (testutils.c:158)
==25504== by 0x401CE9: mymain (networkxml2xmltest.c:110)
==25504==
PASS: networkxml2xmltest
Also changed the label from error to cleanup and adjusted code since it's
all one exit path
The IF_MAXUNIT macro is not present on all BSDs, so
make its use conditional, to avoid breaking OS-X.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The 'in_addr_t' typedef is not present in Mingw64 headers.
Instead we can use the more portable 'struct in_addr' and
then access its 's_addr' field.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The udev based interface backend did not allow querying data over a
read-only connection which is different than how the netcf backend
operates. This brings the behavior inline with the default, netcf
backend.
When creating a virtual FC HBA with virsh/libvirt API, an error message
will be returned: "error: Node device not found",
also the 'nodedev-dumpxml' shows wrong information of wwpn & wwnn
for the new created device.
Signed-off-by: xschen@tnsoft.com.cn
This reverts f90af69 which switched wwpn & wwwn in the wrong place.
https://www.kernel.org/doc/Documentation/scsi/scsi_fc_transport.txt
Building on FreeBSD had this linker error:
/work/a/ports/devel/libvirt/work/libvirt-1.1.0/src/.libs/libvirt.so:
undefined reference to `virPCIDeviceAddressParse'
This was caused by the new use of virPCIDeviceAddressParse in a
portion of virpci.c that wasn't linux-only (in commit 72c029d8). The
problem was that virPCIDeviceAddressParse had originally been defined
inside #ifdef _linux (because it was only used by another function
that was inside the same ifdef).
The solution is to move it out to the part of virpci.c that is
compiled on all platforms.
(Because the portion that was "moved" was 40-50 lines, but only moved
up by 15 lines, the diff for the patch is less than non-informative -
rather than showing that part that I moved, it shows the bit that was
previously before the moved part, and now sits *after* it.)
Implicit controllers may be dependent on device definitions altered
in a post-parse callback. Specifically, if a console device is
defined without the target type, the type will be set in QEMU's
callback. In the case of s390, this is virtio, which requires
an implicit virtio-serial controller.
Signed-off-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com>
If networkUnplugBandwidth is called on a network which has
no bandwidth defined, print a warning instead of crashing.
This can happen when destroying a domain with bandwidth if
bandwidth was removed from the network after the domain was
started.
https://bugzilla.redhat.com/show_bug.cgi?id=975359
This includes adding it to the nodedev parser and formatter, docs, and
test.
An example of the new iommuGroup element that is a part of the output
from "virsh nodedev-dumpxml" (virNodeDeviceGetXMLDesc()):
<device>
<name>pci_0000_02_00_1</name>
<capability type='pci'>
...
<iommuGroup number='12'>
<address domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
<address domain='0x0000' bus='0x02' slot='0x00' function='0x1'/>
</iommuGroup>
</capability>
</device>
Any device which belongs to an "IOMMU group" (used by vfio) will
have links to all devices of its group listed in
/sys/bus/pci/$device/iommu_group/devices;
/sys/bus/pci/$device/iommu_group is actually a link to
/sys/kernel/iommu_groups/$n, where $n is the group number (there
will be a corresponding device node at /dev/vfio/$n once the
devices are bound to the vfio-pci driver)
The following functions are added:
virPCIDeviceGetIOMMUGroupList
Gets a virPCIDeviceList with one virPCIDeviceList for each device
in the same IOMMU group as the provided virPCIDevice (a copy of the
original device object is included in the list.
virPCIDeviceAddressIOMMUGroupIterate
Calls the function @actor once for each device in the group that
contains the given virPCIDeviceAddress.
virPCIDeviceAddressGetIOMMUGroupAddresses
Fills in a virPCIDeviceAddressPtr * with an array of
virPCIDeviceAddress, one for each device in the iommu group of the
provided virPCIDeviceAddress (including a copy of the original).
virPCIDeviceAddressGetIOMMUGroupNum
Returns the group number as an int (a valid group number will always
be 0 or greater). If there is no iommu_group link in the device's
directory (usually indicating that vfio isn't loaded), -2 will be
returned. On any real error, -1 will be returned.
We only break out of the while loop if *content is an empty string.
However the buffer has been allocated to BUFSIZ + 1 (8193 in my case),
but it gets overwritten in the next for iteration.
Move VIR_FREE right before we overwrite it to avoid the leak.
==5777== 16,386 bytes in 2 blocks are definitely lost in loss record 1,022 of 1,027
==5777== by 0x5296E28: virReallocN (viralloc.c:184)
==5777== by 0x52B0C66: virFileReadLimFD (virfile.c:1137)
==5777== by 0x52B0E1A: virFileReadAll (virfile.c:1199)
==5777== by 0x529B092: virCgroupGetValueStr (vircgroup.c:534)
==5777== by 0x529AF64: virCgroupMoveTask (vircgroup.c:1079)
Introduced by 83e4c77.
https://bugzilla.redhat.com/show_bug.cgi?id=978352
Don't check for '\n' at the end of file if zero bytes were read.
Found by valgrind:
==404== Invalid read of size 1
==404== at 0x529B09F: virCgroupGetValueStr (vircgroup.c:540)
==404== by 0x529AF64: virCgroupMoveTask (vircgroup.c:1079)
==404== by 0x1EB475: qemuSetupCgroupForEmulator (qemu_cgroup.c:1061)
==404== by 0x1D9489: qemuProcessStart (qemu_process.c:3801)
==404== by 0x18557E: qemuDomainObjStart (qemu_driver.c:5787)
==404== by 0x190FA4: qemuDomainCreateWithFlags (qemu_driver.c:5839)
Introduced by 0d0b409.
https://bugzilla.redhat.com/show_bug.cgi?id=978356
Although SRIOV network cards support setting a vlan tag on their
virtual functions, and although setting this vlan tag via a <vlan>
element in a domain's <interface> works, setting a vlan tag for these
devices in a <network> definition, or in a network <portgroup>
definition is also supposed to work (and the comment that validates
<vlan> usage even says that!). However, the check to allow it only
checked for an openvswitch network, so attempts to add <vlan> to a
network of type='hostdev' would fail.
A loop in qemuPrepareHostdevPCIDevices() intended to cycle through all
the objects on the list pcidevs was doing "while (listcount > 0)", but
nothing in the body of the loop was reducing the size of the list - it
was instead removing items from a *different* list. It has now been
safely changed to a for() loop.
(This isn't as bad as it sounds - it's only a problem in case of an
OOM error.)
qemuGetActivePciHostDeviceList() had been creating a list that
contained pointers to objects that were also on the activePciHostdevs
list. In case of an OOM error, this newly created list would be
virObjectUnref'ed, which would cause everything on the list to be
freed. But all of those objects would still be on the
activePciHostdevs list, which could have very bad consequences if that
list was ever again accessed.
The solution used here is to populate the new list with *copies* of
the objects from the original list. It turns out that on return from
qemuGetActivePciHostDeviceList(), the caller would almost immediately
go through all the device objects and "steal" them (i.e. remove the
pointer from the list but not delete it) all from either one list or
the other; we now instead just *delete* (remove from the list and
free) each device from one list or the other, so in the end we have
the same state.
The "fix" I pushed a few commits ago would still leak a virPCIDevice
in case of an OOM error. Although it's inconsequential in practice,
this patch satisfies my OCD.
The same strings were being re-created multiple times just to save
declaring a new variable. In the meantime, the use of the generic
variable names led to confusion when trying to follow the code. This
patch creates strings for:
stubDriverName (was called "driver" in original args)
stubDriverPath ("/sys/bus/pci/drivers/${stubDriverName}")
driverLink ("${device}/driver")
oldDriverName (the final component of path linked to by
"${device}/driver")
oldDriverPath ("/sys/bus/pci/drivers/${oldDriverName}")
then re-uses them as necessary.
I realized after the fact that it's probably better in the long run to
give this function a name that matches the name of the link used in
sysfs to hold the group (iommu_group).
I'm changing it now because I'm about to add several more functions
that deal with iommu groups.
The driver arg to virPCIDeviceDetach is no longer used (the name of the stub driver is now set in the virPCIDevice object, and virPCIDeviceDetach retrieves it from there). Remove it.
Commit 861d40565 added code (my personal change to "clean up" the
submitter's code, *not* the fault of the submitter) that dereferenced
virtVlan without first checking for NULL. This patch fixes that and,
as part of the fix, cleans up some unnecessary obtuseness.
virNetDevBridgeSetSTPDelay accepts delay in milliseconds,
but BSD implementation was expecting seconds. Therefore,
it was working correctly only with delay == 0.
This patch adds functionality to allow libvirt to configure the
'native-tagged' and 'native-untagged' modes on openvswitch networks.
Signed-off-by: Laine Stump <laine@redhat.com>
I just learned that VFIO resets PCI devices when they are assigned to
guests / returned to the host, so it is redundant for libvirt to reset
the devices. This patch inhibits calling virPCIDeviceReset to devices
that will be/were assigned using VFIO.
This patch introduces two new APIs virDomainMigrate3 and
virDomainMigrateToURI3 that may be used in place of their older
variants. These new APIs take optional migration parameters (such as
bandwidth, domain XML, ...) in an array of virTypedParameters, which
makes adding new parameters easier as there's no need to introduce new
APIs whenever a new migration parameter needs to be added. Both APIs are
backward compatible and will automatically use older migration calls in
case the new calls are not supported as long as the typed parameters
array does not contain any parameter which was not supported by the
older calls.
All APIs that take typed parameters are only using params address in
their entry point debug messages. With the new VIR_TYPED_PARAMS_DEBUG
macro, all functions can easily log all individual typed parameters
passed to them.
When unsupported parameter is passed to virTypedParamsValidate,
VIR_ERR_ARGUMENT_UNSUPPORTED should be returned rather than
VIR_ERR_INVALID_ARG, which is more appropriate for supported parameters
used incorrectly.
virPCIDeviceDetach would previously sometimes consume the input device
object (to put it on the inactive list) and sometimes not. Avoiding
memory leaks required checking beforehand to see if the device was
already on the list, and freeing the device object in the caller only
if there wasn't already an identical object on the inactive list.
This patch makes it consistent - virPCIDeviceDetach will *never*
consume the input virPCIDevice object; if it needs to put one on the
inactive list, it will create a copy and put *that* on the list. This
way the caller knows that it is always their responsibility to free
the device object they created.
virPCIDeviceReattach was making the assumption that the dev object
given to it was one and the same with the dev object on the
inactiveDevs list. If that had been the case, it would not need to
free the dev object it removed from the inactive list, because the
caller of virPCIDeviceReattach always frees the dev object that it
passes in. Since the dev object passed in is *never* the same object
that's on the list (it is a different object with the same name and
attributes, created just for the purpose of searching for the actual
object), simply doing a "ListSteal" to remove the object from the list
results in one leaked object; we need to actually free the object
after removing it from the list.
* virPCIDeviceFindByIDs - find a device on a list w/o creating an object
This makes searching for an existing device on a list lighter weight.
* virPCIDeviceCopy - make a copy of an existing virPCIDevice object.
* virPCIDeviceGetDriverPathAndName - construct new strings containing
1) the name of the driver bound to this device.
2) the full path to the sysfs config for that driver.
(This code was lifted from virPCIDeviceUnbindFromStub, and replaced
there with a call to this new function).
Previously stubDriver was always set from a string literal, so it was
okay to use a const char * that wasn't freed when the virPCIDevice was
freed. This will not be the case in the near future, so it is now a
char* that is allocated in virPCIDeviceSetStubDriver() and freed
during virPCIDeviceFree().
libxl supports the LIBXL_DISK_BACKEND_QDISK disk backend, where qemu
is used to provide the disk backend. This patch simply maps the
existing <driver name='qemu'/> to LIBXL_DISK_BACKEND_QDISK.
Specifying an unsupported disk format with the tap driver resulted in
a less than helpful error message
error: Failed to start domain test-hvm
error: internal error libxenlight does not support disk driver qed
Change the message to state that the qed format is not supported by
the tap driver, e.g.
error: Failed to start domain test-hvm
error: internal error libxenlight does not support disk format qed
with disk driver tap
While at it, check for unsupported formats in the other driver
backends.
Add a script which parses the driver API code and validates
that every API registered in a virNNNDriverPtr table contains
an ACL check matching the API name.
NB this currently whitelists a few xen driver functions
which are temporarily lacking in access control checks.
The xen driver is considered insecure until these are
fixed.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Extend the 'gendispatch.pl' script to be able to generate
three new types of file.
- 'aclheader' - defines signatures of helper APIs for
doing authorization checks. There is one helper API
for each API requiring an auth check. Any @acl
annotations result in a method being generated with
a suffix of 'EnsureACL'. If the ACL check requires
examination of flags, an extra 'flags' param will be
present. Some examples
extern int virConnectBaselineCPUEnsureACL(void);
extern int virConnectDomainEventDeregisterEnsureACL(virDomainDefPtr domain);
extern int virDomainAttachDeviceFlagsEnsureACL(virDomainDefPtr domain, unsigned int flags);
Any @aclfilter annotations resuilt in a method being
generated with a suffix of 'CheckACL'.
extern int virConnectListAllDomainsCheckACL(virDomainDefPtr domain);
These are used for filtering individual objects from APIs
which return a list of objects
- 'aclbody' - defines the actual implementation of the
methods described above. This calls into the access
manager APIs. A complex example:
/* Returns: -1 on error (denied==error), 0 on allowed */
int virDomainAttachDeviceFlagsEnsureACL(virConnectPtr conn,
virDomainDefPtr domain,
unsigned int flags)
{
virAccessManagerPtr mgr;
int rv;
if (!(mgr = virAccessManagerGetDefault()))
return -1;
if ((rv = virAccessManagerCheckDomain(mgr,
conn->driver->name,
domain,
VIR_ACCESS_PERM_DOMAIN_WRITE)) <= 0) {
virObjectUnref(mgr);
if (rv == 0)
virReportError(VIR_ERR_ACCESS_DENIED, NULL);
return -1;
}
if (((flags & (VIR_DOMAIN_AFFECT_CONFIG|VIR_DOMAIN_AFFECT_LIVE)) == 0) &&
(rv = virAccessManagerCheckDomain(mgr,
conn->driver->name,
domain,
VIR_ACCESS_PERM_DOMAIN_SAVE)) <= 0) {
virObjectUnref(mgr);
if (rv == 0)
virReportError(VIR_ERR_ACCESS_DENIED, NULL);
return -1;
}
if (((flags & (VIR_DOMAIN_AFFECT_CONFIG)) == (VIR_DOMAIN_AFFECT_CONFIG)) &&
(rv = virAccessManagerCheckDomain(mgr,
conn->driver->name,
domain,
VIR_ACCESS_PERM_DOMAIN_SAVE)) <= 0) {
virObjectUnref(mgr);
if (rv == 0)
virReportError(VIR_ERR_ACCESS_DENIED, NULL);
return -1;
}
virObjectUnref(mgr);
return 0;
}
- 'aclsyms' - generates a linker script to export the
APIs to drivers. Some examples
virConnectBaselineCPUEnsureACL;
virConnectCompareCPUEnsureACL;
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Introduce annotations to all RPC messages to declare what
access control checks are required. There are two new
annotations defined:
@acl: <object>:<permission>
@acl: <object>:<permission>:<flagname>
Declare the access control requirements for the API. May be repeated
multiple times, if multiple rules are required.
<object> is one of 'connect', 'domain', 'network', 'storagepool',
'interface', 'nodedev', 'secret'.
<permission> is one of the permissions in access/viraccessperm.h
<flagname> indicates the rule only applies if the named flag
is set in the API call
@aclfilter: <object>:<permission>
Declare an access control filter that will be applied to a list
of objects being returned by an API. This allows the returned
list to be filtered to only show those the user has permissions
against
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add an access control driver that uses the pkcheck command
to check authorization requests. This is fairly inefficient,
particularly for cases where an API returns a list of objects
and needs to check permission for each object.
It would be desirable to use the polkit API but this links
to glib with abort-on-OOM behaviour, so can't be used. The
other alternative is to speak to dbus directly
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Add a new 'access_drivers' config parameter to the libvirtd.conf
configuration file. This allows admins to setup the default
access control drivers to use for API authorization. The same
driver is to be used by all internal drivers & APIs
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
The access control checks in the 'connectOpen' driver method
will require 'conn->driver' to be non-NULL. Set this before
running the 'connectOpen' method and NULL-ify it again on
failure.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
This patch introduces the virAccessManagerPtr class as the
interface between virtualization drivers and the access
control drivers. The viraccessperm.h file defines the
various permissions that will be used for each type of object
libvirt manages
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
It's not used anywhere except for the switch in
virStorageBackendCreateQemuImgOpts, where leaving it in causes
a dead code coverity warning and omitting it breaks compilation
because of unhandled enum value.
Introduced by 6298f74.
libxl contains logic to determine an appropriate devid for new devices
that do not specify one in their configuration. For all device types
except NICs, the libxl driver allows libxl to determine devid. Do the
same for NICs.
Add <features> and <compat> elements to volume target XML.
<compat> is a string which for qcow2 represents the QEMU version
it should be compatible with. Valid values are 0.10 and 1.1.
1.1 is implicit if the <features> element is present, otherwise
qemu-img default is used. 0.10 can be specified to explicitly
create older images after the qemu-img default changes.
<features> contains optional features, so far
<lazy_refcounts/> is available, which enables caching of reference
counters, improving performance for snapshots.
Detect qcow2 images with version 3 in the image header as
VIR_STORAGE_FILE_QCOW2.
These images have a feature bitfield, with just one feature supported
so far: lazy_refcounts.
The header length changed too, moving the location of the backing
format name.
Add new CPU features for HyperV:
vapic for virtual APIC support
spinlocks for setting spinlock support
<features>
<hyperv>
<vapic state='on'/>
<spinlocks state='on' retries='4096'/>
</hyperv>
</features>
https://bugzilla.redhat.com/show_bug.cgi?id=784836
Commit 752596b5 broke the build with -Werror
qemu/qemu_hotplug.c: In function 'qemuDomainChangeGraphics':
qemu/qemu_hotplug.c:1980:39: error: declaration of 'listen' shadows a
global declaration [-Werror=shadow]
Fix with s/listen/newlisten/
This fixes the problem reported in:
https://bugzilla.redhat.com/show_bug.cgi?id=972690
When checking for a collision of a new libvirt network's subnet with
any existing routes, we read all of /proc/net/route into memory, then
parse all the entries. The function that we use to read this file
requires a "maximum length" parameter, which had previously been set
to 64*1024. As each line in /proc/net/route is 128 bytes, this would
allow for a maximum of 512 entries in the routing table.
This patch increases that number to 128 * 100000, which allows for
100,000 routing table entries. This means that it's possible that 12MB
would be allocated, but that would only happen if there really were
100,000 route table entries on the system, it's only held for a very
short time.
Since there is no method of specifying and unlimited max (and that
would create a potential denial of service anyway) hopefully this
limit is large enough to accomodate everyone.
Currently, we have a bug when updating a graphics device. A graphics device can
have a listen address set. This address is either defined by user (in which case
it's type is VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_ADDRESS) or it can be inherited
from a network (in which case it's type is
VIR_DOMAIN_GRAPHICS_LISTEN_TYPE_NETWORK). However, in both cases we have a
listen address to process (e.g. during migration, as I've tried to fix in
7f15ebc7).
Later, when a user tries to update the graphics device (e.g. set a password),
we check if listen addresses match the original as qemu doesn't know how to
change listen address yet. Hence, users are required to not change the listen
address. The implementation then just dumps listen addresses and compare them.
Previously, while dumping the listen addresses, NULL was returned for NETWORK.
After my patch, this is no longer true, and we get a listen address for olddev
even if it is a type of NETWORK. So we have a real string on one side, the NULL
from user's XML on the other side and hence we think user wants to change the
listen address and we refuse it.
Therefore, we must take the type of listen address into account as well.
Do not leave uninitialized variables, not all parameters are set in
libxlMake*.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
libxl uses some xenstore entries for hints in memory management
(especially when starting new domain). This includes dom0 memory limit
and Xen free memory margin, based on current system state. Entries are
created at first function usage, so force such call at daemon startup,
which most likely will be before any domain startup.
Also prevent automatic memory management if dom0_mem= option passed to
xen hypervisor - it is known to be incompatible with autoballoon.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
As a consequence of the cgroup layout changes from commit 'cfed9ad4', the
lxcDomainGetSchedulerParameters[Flags]()' and lxcGetSchedulerType() APIs
failed to return data for a non running domain. This can be seen through
a 'virsh schedinfo <domain>' command which returns:
Scheduler : Unknown
error: Requested operation is not valid: cgroup CPU controller is not mounted
Prior to that change a non running domain would return:
Scheduler : posix
cpu_shares : 0
vcpu_period : 0
vcpu_quota : 0
emulator_period: 0
emulator_quota : 0
This patch will restore the capability to return configuration only data
for a non running domain regardless of whether cgroups are available.
As a consequence of the cgroup layout changes from commit '632f78ca', the
qemuDomainGetSchedulerParameters[Flags]()' and qemuGetSchedulerType() APIs
failed to return data for a non running domain. This can be seen through
a 'virsh schedinfo <domain>' command which returns:
Scheduler : Unknown
error: Requested operation is not valid: cgroup CPU controller is not mounted
Prior to that change a non running domain would return:
Scheduler : posix
cpu_shares : 0
vcpu_period : 0
vcpu_quota : 0
emulator_period: 0
emulator_quota : 0
This patch will restore the capability to return configuration only data
for a non running domain regardless of whether cgroups are available.
Just to reduce the indentation levels. Remove the unneeded
NULL check for disk->file, as virBufferEscapeString doesn't
print anything with NULL arguments.
This flag is meant for errors happening on the source of the migration
and isn't used on the destination. To allow better migration
compatibility, don't propagate it to the destination.
Paolo Bonzini pointed out that it's actually possible to migrate a qemu
instance that was paused due to I/O error and it will be able to work on
the destination if the storage is accessible.
This patch introduces flag VIR_MIGRATE_ABORT_ON_ERROR that cancels the
migration in case an I/O error happens while it's being performed and
allows migration without this flag. This flag can be possibly used for
other error reasons that may be introduced in the future.
Currently, we wait for SPICE to migrate in the very same loop where we
wait for qemu to migrate. This has a disadvantage of slowing seamless
migration down. One one hand, we should not kill the domain until all
SPICE data has been migrated. On the other hand, there is no need to
wait in the very same loop and hence slowing down 'cont' on the
destination. For instance, if users are watching a movie, they can
experience the movie to be stopped for a couple of seconds, as
processors are not running nor on src nor on dst as libvirt waits for
SPICE to migrate. We should move the waiting phase to migration CONFIRM
phase.
The xml outputed by HAL backend for scsi generic device:
<device>
<name>pci_8086_2922_scsi_host_scsi_device_lun0_scsi_generic</name>
<path>/sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/scsi_generic/sg0</path>
<parent>pci_8086_2922_scsi_host_scsi_device_lun0</parent>
<capability type='scsi_generic'>
<char>/dev/sg0</char>
</capability>
</device>
Since scsi generic device doesn't have DEVTYPE property set, the
only way to know if it's a scsi generic device or not is to read
the "SUBSYSTEM" property.
The XML of the scsi generic device will be like:
<device>
<name>scsi_generic_sg0</name>
<path>/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/scsi_generic/sg0</path>
<parent>scsi_0_0_0_0</parent>
<capability type='scsi_generic'>
<char>/dev/sg0</char>
</capability>
</device>
When qemu >= 1.20, it is safe to use -device for primary video
device as described in 4c993d8ab.
So, we are missing the cap flag in QMP capabilities detection, this
flag can be initialized safely in virQEMUCapsInitQMPBasic.
Checking if the "devtype" is NULL along with each "if" statements
is bad. It wastes the performance, and also not good for reading.
And also when the "devtype" is NULL, the logic is also not clear.
This reorgnizes the logic of with "if...else" and a bunch of "else if".
Other changes:
* Change the function style.
* Remove the useless debug statement.
* Get rid of the goto
* New helper udevDeviceHasProperty to simplify the logic for checking
if a property is existing for the device.
* Add comment to clarify "PCI devices don't set the DEVTYPE property"
* s/sysfs path/sysfs name/, as udev_device_get_sysname returns the
name instead of the full path. E.g. "sg0"
* Refactor the comment for setting VIR_NODE_DEV_CAP_NET cap type
a bit.
The name format is constructed by libvirt, it's not that clear to
get what the device's sysfs path should be. This exposes the device's
sysfs path by a new tag <path>.
Since the sysfspath is filled during enumerating the devices by
either udev or HAL. It's an output-only tag.
Call virLogVMessage instead of virLogMessage, since libudev
called us with a va_list object, not a list of arguments.
Honor message priority and strip the trailing newline.
https://bugzilla.redhat.com/show_bug.cgi?id=969152
Without the socket path explicitly specified, the remote driver tried to
connect to the "/system" instance socket even if "/session" was
specified in the uri. With this patch this configuration now produces an
error.
It is still possible to initiate a session connection with specifying
the path to the socket manually and also manually starting the session
daemon. This was also possible prior to this patch,
This is a minimal fix. We may decide to support remote session
connections using ssh but this will require changes to the remote driver
code so this fix shouldn't cause regressions in the case we decide to do
that.
When creating a timer/event handler reference counting is used. So it could
be possible (in theory) that libxlDomainObjPrivateFree is called with
reference counting >1. The problem is that libxlDomainObjPrivateFree leave
the object in an invalid state with ctx freed (but still having dandling
pointer). This can lead timer/event handler to core.
This patch implements a dispose method for libxlDomainObjPrivate, and moves
freeing the libxl ctx to the dispose method, ensuring the ctx is valid while
the object's reference count is > 0.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Actually only those interface types are handled correctly so reject
others instead of ignoring settings (i.e. treating as bridge/ethernet
anyway).
Also allow <script/> in 'ethernet' (which should be the only
script-allowing type). Keep <script/> allowed in bridge to be compatible
with legacy 'xen' driver.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Convert input XML to migratable before using it in
qemuDomainSaveImageOpen.
XML in the save image is migratable, i.e. doesn't contain implicit
controllers. If these controllers were in a non-default order in the
input XML, the ABI check would fail. Removing and re-adding these
controllers fixes it.
https://bugzilla.redhat.com/show_bug.cgi?id=834196
The legacy xen toolstack will set pygrub as the bootloader if not
specified. For compatibility, do the same in the libxl driver
iff not using direct kernel boot.
Currently, the libxl driver reports a connection type of "xenlight".
To be compatible with the legacy Xen driver, it should return "Xen".
Note: I noticed this while testing the libxl driver on OpenStack.
After switching my Xen compute nodes to use the libxl stack, I
could no longer launch instances on those nodes since
hypervisor_type was reported as "xenlight" instead of "xen".
During a live migration the guest may receive a disk access I/O error.
In this state the guest is unable to continue running on a remote host
after migration as some state may be present in the kernel and not
migrated.
With this patch, the migration is canceled in such case so it can either
continue on the source if the I/O issues are recovered or has to be
destroyed anyways.
https://bugzilla.redhat.com/show_bug.cgi?id=971485
As of d7f9d82753 we copy the listen
address from the qemu.conf config file in case none has been provided
via XML. But later, when migrating, we should not include such listen
address in the migratable XML as it is something autogenerated, not
requested by user. Moreover, the binding to the listen address will
likely fail, unless the address is '0.0.0.0' or its IPv6 equivalent.
This patch introduces a new boolean attribute to virDomainGraphicsListenDef
to distinguish autofilled listen addresses. However, we must keep the
attribute over libvirtd restarts, so it must be kept within status XML.
This patch fixes changes done in commit 29c1e913e4
that was pushed without implementing review feedback.
The flag introduced by the patch is changed to VIR_DOMAIN_VCPU_GUEST and
documentation makes the difference between regular hotplug and this new
functionality more explicit.
The virsh options that enable the use of the new flag are changed to
"--guest" and the documentation is fixed too.
virProcessGetNamespaces() opens files in /proc/XXX/ns/ which will
later be passed to setns(). We have to make sure that the file
descriptors in the array are in the correct order. In particular
the 'user' namespace must be first otherwise setns() may fail
for other namespaces.
The order has been taken from util-linux's sys-utils/nsenter.c
Also we must ignore EINVAL in setns() which occurs if the
namespace associated with the fd, matches the calling process'
current namespace.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Currently, there's a path to use the ncpuinfo variable uninitialized,
which leads to a compiler warning:
qemu/qemu_driver.c: In function 'qemuDomainGetVcpusFlags':
qemu/qemu_driver.c:4573:9: error: 'ncpuinfo' may be used
uninitialized in this function [-Werror=maybe-uninitialized]
for (i = 0; i < ncpuinfo; i++) {
^
This patch implements support for the "cpu-add" QMP command that plugs
CPUs into a live guest. The "cpu-add" command was introduced in QEMU
1.5. For the hotplug to work machine type "pc-i440fx-1.5" is required.
This flag will allow to use qemu guest agent commands to disable
(offline) and enable (online) processors in a live guest that has the
guest agent running.
The qemu guest agent allows to online and offline CPUs from the
perspective of the guest. This patch adds helpers that call
'guest-get-vcpus' and 'guest-set-vcpus' guest agent functions and
convert the data for internal libvirt usage.