slackcoder/qemu - QEMU is a generic and open source machine & userspace emulator and virtualizer

Age	Commit message (Collapse)	Author
2019-02-27	i2c: Split smbus into parts	Corey Minyard
	smbus.c and smbus.h had device side code, master side code, and smbus.h has some smbus_eeprom.c definitions. Split them into separate files. Signed-off-by: Corey Minyard <cminyard@mvista.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-02-26	Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging	Peter Maydell
	Block layer patches: - Block graph change fixes (avoid loops, cope with non-tree graphs) - bdrv_set_aio_context() related fixes - HMP snapshot commands: Use only tag, not the ID to identify snapshots - qmeu-img, commit: Error path fixes - block/nvme: Build fix for gcc 9 - MAINTAINERS updates - Fix various issues with bdrv_refresh_filename() - Fix various iotests - Include LUKS overhead in qemu-img measure for qcow2 - A fix for vmdk's image creation interface # gpg: Signature made Mon 25 Feb 2019 14:18:15 GMT # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (71 commits) iotests: Skip 211 on insufficient memory vmdk: false positive of compat6 with hwversion not set iotests: add LUKS payload overhead to 178 qemu-img measure test qcow2: include LUKS payload overhead in qemu-img measure iotests.py: s/_/-/g on keys in qmp_log() iotests: Let 045 be run concurrently iotests: Filter SSH paths iotests.py: Filter filename in any string value iotests.py: Add is_str() iotests: Fix 207 to use QMP filters for qmp_log iotests: Fix 232 for LUKS iotests: Remove superfluous rm from 232 iotests: Fix 237 for Python 2.x iotests: Re-add filename filters iotests: Test json:{} filenames of internal BDSs block: BDS options may lack the "driver" option block/null: Generate filename even with latency-ns block/curl: Implement bdrv_refresh_filename() block/curl: Harmonize option defaults block/nvme: Fix bdrv_refresh_filename() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-02-26	authz: delete existing ACL implementation	Daniel P. Berrange
	The 'qemu_acl' type was a previous non-QOM based attempt to provide an authorization facility in QEMU. Because it is non-QOM based it cannot be created via the command line and requires special monitor commands to manipulate it. The new QAuthZ subclasses provide a superset of the functionality in qemu_acl, so the latter can now be deleted. The HMP 'acl_*' monitor commands are converted to use the new QAuthZSimple data type instead in order to provide temporary backwards compatibility. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2019-02-26	authz: add QAuthZPAM object type for authorizing using PAM	Daniel P. Berrange
	Add an authorization backend that talks to PAM to check whether the user identity is allowed. This only uses the PAM account validation facility, which is essentially just a check to see if the provided username is permitted access. It doesn't use the authentication or session parts of PAM, since that's dealt with by the relevant part of QEMU (eg VNC server). Consider starting QEMU with a VNC server and telling it to use TLS with x509 client certificates and configuring it to use an PAM to validate the x509 distinguished name. In this example we're telling it to use PAM for the QAuthZ impl with a service name of "qemu-vnc" $ qemu-system-x86_64 \ -object tls-creds-x509,id=tls0,dir=/home/berrange/security/qemutls,\ endpoint=server,verify-peer=yes \ -object authz-pam,id=authz0,service=qemu-vnc \ -vnc :1,tls-creds=tls0,tls-authz=authz0 This requires an /etc/pam/qemu-vnc file to be created with the auth rules. A very simple file based whitelist can be setup using $ cat > /etc/pam/qemu-vnc <<EOF account requisite pam_listfile.so item=user sense=allow file=/etc/qemu/vnc.allow EOF The /etc/qemu/vnc.allow file simply contains one username per line. Any username not in the file is denied. The usernames in this example are the x509 distinguished name from the client's x509 cert. $ cat > /etc/qemu/vnc.allow <<EOF CN=laptop.berrange.com,O=Berrange Home,L=London,ST=London,C=GB EOF More interesting would be to configure PAM to use an LDAP backend, so that the QEMU authorization check data can be centralized instead of requiring each compute host to have file maintained. The main limitation with this PAM module is that the rules apply to all QEMU instances on the host. Setting up different rules per VM, would require creating a separate PAM service name & config file for every guest. An alternative approach for the future might be to not pass in the plain username to PAM, but instead combine the VM name or UUID with the username. This requires further consideration though. Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2019-02-26	authz: add QAuthZListFile object type for a file access control list	Daniel P. Berrangé
	Add a QAuthZListFile object type that implements the QAuthZ interface. This built-in implementation is a proxy around the QAuthZList object type, initializing it from an external file, and optionally, automatically reloading it whenever it changes. To create an instance of this object via the QMP monitor, the syntax used would be: { "execute": "object-add", "arguments": { "qom-type": "authz-list-file", "id": "authz0", "props": { "filename": "/etc/qemu/vnc.acl", "refresh": true } } } If "refresh" is "yes", inotify is used to monitor the file, automatically reloading changes. If an error occurs during reloading, all authorizations will fail until the file is next successfully loaded. The /etc/qemu/vnc.acl file would contain a JSON representation of a QAuthZList object { "rules": [ { "match": "fred", "policy": "allow", "format": "exact" }, { "match": "bob", "policy": "allow", "format": "exact" }, { "match": "danb", "policy": "deny", "format": "glob" }, { "match": "dan*", "policy": "allow", "format": "exact" }, ], "policy": "deny" } This sets up an authorization rule that allows 'fred', 'bob' and anyone whose name starts with 'dan', except for 'danb'. Everyone unmatched is denied. The object can be loaded on the comand line using -object authz-list-file,id=authz0,filename=/etc/qemu/vnc.acl,refresh=yes Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-02-26	authz: add QAuthZList object type for an access control list	Daniel P. Berrange
	Add a QAuthZList object type that implements the QAuthZ interface. This built-in implementation maintains a trivial access control list with a sequence of match rules and a final default policy. This replicates the functionality currently provided by the qemu_acl module. To create an instance of this object via the QMP monitor, the syntax used would be: { "execute": "object-add", "arguments": { "qom-type": "authz-list", "id": "authz0", "props": { "rules": [ { "match": "fred", "policy": "allow", "format": "exact" }, { "match": "bob", "policy": "allow", "format": "exact" }, { "match": "danb", "policy": "deny", "format": "glob" }, { "match": "dan*", "policy": "allow", "format": "exact" }, ], "policy": "deny" } } } This sets up an authorization rule that allows 'fred', 'bob' and anyone whose name starts with 'dan', except for 'danb'. Everyone unmatched is denied. It is not currently possible to create this via -object, since there is no syntax supported to specify non-scalar properties for objects. This is likely to be addressed by later support for using JSON with -object, or an equivalent approach. In any case the future "authz-listfile" object can be used from the CLI and is likely a better choice, as it allows the ACL to be refreshed automatically on change. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2019-02-26	authz: add QAuthZSimple object type for easy whitelist auth checks	Daniel P. Berrangé
	In many cases a single VM will just need to whitelist a single identity as the allowed user of network services. This is especially the case for TLS live migration (optionally with NBD storage) where we just need to whitelist the x509 certificate distinguished name of the source QEMU host. Via QMP this can be configured with: { "execute": "object-add", "arguments": { "qom-type": "authz-simple", "id": "authz0", "props": { "identity": "fred" } } } Or via the command line -object authz-simple,id=authz0,identity=fred Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2019-02-26	authz: add QAuthZ object as an authorization base class	Daniel P. Berrange
	The current qemu_acl module provides a simple access control list facility inside QEMU, which is used via a set of monitor commands acl_show, acl_policy, acl_add, acl_remove & acl_reset. Note there is no ability to create ACLs - the network services (eg VNC server) were expected to create ACLs that they want to check. There is also no way to define ACLs on the command line, nor potentially integrate with external authorization systems like polkit, pam, ldap lookup, etc. The QAuthZ object defines a minimal abstract QOM class that can be subclassed for creating different authorization providers. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2019-02-26	util: add helper APIs for dealing with inotify in portable manner	Daniel P. Berrangé
	The inotify userspace API for reading events is quite horrible, so it is useful to wrap it in a more friendly API to avoid duplicating code across many users in QEMU. Wrapping it also allows introduction of a platform portability layer, so that we can add impls for non-Linux based equivalents in future. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-02-26	softfloat: Implement float128_to_uint32	David Hildenbrand
	Handling it just like float128_to_uint32_round_to_zero, that hopefully is free of bugs :) Documentation basically copied from float128_to_uint64 Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2019-02-26	softfloat: add float128_is_{normal,denormal}	David Hildenbrand
	Needed on s390x, to test for the data class of a number. So it will gain soon a user. A number is considered normal if the exponent is neither 0 nor all 1's. That can be checked by adding 1 to the exponent, and comparing against >= 2 after dropping an eventual overflow into the sign bit. While at it, convert the other floatXX_is_normal functions to use a similar, less error prone calculation, as suggested by Richard H. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2019-02-26	spapr: add hotplug hooks for PHB hotplug	Greg Kurz
	Hotplugging PHBs is a machine-level operation, but PHBs reside on the main system bus, so we register spapr machine as the handler for the main system bus. Provide the usual pre-plug, plug and unplug-request handlers. Move the checking of the PHB index to the pre-plug handler. It is okay to do that and assert in the realize function because the pre-plug handler is always called, even for the oldest machine types we support. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> (Fixed interrupt controller phandle in "interrupt-map" and TCE table size in "ibm,dma-window" FDT fragment, Greg Kurz) Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059672926.1466090.13612804072190051439.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26	spapr_pci: provide node start offset via spapr_populate_pci_dt()	Michael Roth
	PHB hotplug re-uses PHB device tree generation code and passes it to a guest via RTAS. Doing this requires knowledge of where exactly in the device tree the node describing the PHB begins. Provide this via a new optional pointer that can be used to store the PHB node's start offset. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059671912.1466090.10891589403973703473.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26	spapr: create DR connectors for PHBs	Michael Roth
	Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059670389.1466090.10015601248906623076.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26	spapr_pci: add PHB unrealize	Greg Kurz
	To support PHB hotplug we need to clean up lingering references, memory, child properties, etc. prior to the PHB object being finalized. Generally this will be called as a result of calling object_unparent() on the PHB object, which in turn would normally be called as the result of an unplug() operation. When the PHB is finalized, child objects will be unparented in turn, and finalized if the PHB was the only reference holder. so we don't bother to explicitly unparent child objects of the PHB, with the notable exception of DRCs. This is needed to avoid a QEMU crash when unplugging a PHB and resetting the machine before the guest could handle the event. The DRCs are removed from the QOM tree by pci_unregister_root_bus() and we must make sure we're not leaving stale aliases under the global /dr-connector path. The formula that gives the number of DMA windows is moved to an inline function in the hw/pci-host/spapr.h header because it will have other users. The unrealize function is able to cope with partially realized PHBs. It is hence used to implement proper rollback on the realize error path. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <155059669881.1466090.13515030705986041517.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26	spapr_irq: Expose the phandle of the interrupt controller	Greg Kurz
	This will be used by PHB hotplug in order to create the "interrupt-map" property of the PHB node. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059669374.1466090.12943228478046223856.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26	spapr: Expose the name of the interrupt controller node	Greg Kurz
	This will be needed by PHB hotplug in order to access the "phandle" property of the interrupt controller node. Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <155059668867.1466090.6339199751719123386.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26	xics: Write source state to KVM at claim time	Greg Kurz
	The pseries machine only uses LSIs to support legacy PCI devices. Every PHB claims 4 LSIs at realize time. When using in-kernel XICS (or upcoming in-kernel XIVE), QEMU synchronizes the state of all irqs, including these LSIs, later on at machine reset. In order to support PHB hotplug, we need a way to tell KVM about the LSIs that doesn't require a machine reset. An easy way to do that is to always inform KVM when an interrupt is claimed, which really isn't a performance path. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059668360.1466090.5969630516627776426.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26	spapr/drc: Drop spapr_drc_attach() fdt argument	Greg Kurz
	All DRC subtypes have been converted to generate the FDT fragment at configure connector time instead of attach time. The fdt and fdt_offset arguments of spapr_drc_attach() aren't needed anymore. Drop them and make the implementation of the dt_populate() method mandatory. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059667853.1466090.16527852453054217565.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26	spapr/pci: Generate FDT fragment at configure connector time	Greg Kurz
	Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059667346.1466090.326696113231137772.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26	spapr: Generate FDT fragment for CPUs at configure connector time	Greg Kurz
	Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059666839.1466090.3833376527523126752.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26	spapr: Generate FDT fragment for LMBs at configure connector time	Greg Kurz
	Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059666331.1466090.6766540766297333313.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26	spapr_drc: Allow FDT fragment to be added later	Greg Kurz
	The current logic is to provide the FDT fragment when attaching a device to a DRC. This works perfectly fine for our current hotplug support, but soon we will add support for PHB hotplug which has some constraints, that CPU, PCI and LMB devices don't seem to have. The first constraint is that the "ibm,dma-window" property of the PHB node requires the IOMMU to be configured, ie, spapr_tce_table_enable() has been called, which happens during PHB reset. It is okay in the case of hotplug since the device is reset before the hotplug handler is called. On the contrary with coldplug, the hotplug handler is called first and device is only reset during the initial system reset. Trying to create the FDT fragment on the hotplug path in this case, would result in somthing like this: ibm,dma-window = < 0x80000000 0x00 0x00 0x00 0x00 >; This will cause linux in the guest to panic, by simply removing and re-adding the PHB using the drmgr command: page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz)); if (!page) panic("iommu_init_table: Can't allocate %ld bytes\n", sz); The second and maybe more problematic constraint is that the "interrupt-map" property needs to reference the interrupt controller node using the very same phandle that SLOF has already exposed to the guest. QEMU requires SLOF to call the private KVMPPC_H_UPDATE_DT hcall at some point to know about this phandle. With the latest QEMU and SLOF, this happens when SLOF gets quiesced. This means that if the PHB gets hotplugged after CAS but before SLOF quiesce, then we're sure that the phandle is not known when the hotplug handler is called. The FDT is only needed when the guest first invokes RTAS to configure the connector actually, long after SLOF quiesce. Let's postpone the creation of FDT fragments for PHBs to rtas_ibm_configure_connector(). Since we only need this for PHBs, introduce a new method in the base DRC class for that. DRC subtypes will be converted to use it in subsequent patches. Allow spapr_drc_attach() to be passed a NULL fdt argument if the method is available. When all DRC subtypes have been converted, the fdt argument will eventually disappear. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059665823.1466090.18358845122627355537.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26	target/ppc/spapr: Set LPCR:HR when using Radix mode	Benjamin Herrenschmidt
	The HW relies on LPCR:HR along with the PATE to determine whether to use Radix or Hash mode. In fact it uses LPCR:HR more commonly than the PATE. For us, it's also more efficient to do so, especially since unlike the HW we do not maintain a cache of the current PATE and HV PATE in a generic place. Prepare the grounds for that by ensuring that LPCR:HR is set properly on SPAPR machines. Another option would have been to use a callback to get the PATE but this gets messy when implementing bare metal support, it's much simpler (and faster) to use LPCR. Since existing migration streams may not have it, fix it up in spapr_post_load() as well based on the pseudo-PATE entry that we keep. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215170029.15641-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26	ppc: add host-serial and host-model machine attributes (CVE-2019-8934)	Prasad J Pandit
	On ppc hosts, hypervisor shares following system attributes - /proc/device-tree/system-id - /proc/device-tree/model with a guest. This could lead to information leakage and misuse.[] Add machine attributes to control such system information exposure to a guest. [] https://wiki.openstack.org/wiki/OSSN/OSSN-0028 Reported-by: Daniel P. Berrangé <berrange@redhat.com> Fix-suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Message-Id: <20190218181349.23885-1-ppandit@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26	target/ppc: Add POWER9 external interrupt model	Benjamin Herrenschmidt
	Adds support for the Hypervisor directed interrupts in addition to the OS ones. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: - modified the icp_realize() and xive_tctx_realize() to take into account explicitely the POWER9 interrupt model - introduced a specific power9_set_irq for POWER9 ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215161648.9600-10-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-25	Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into ↵	Peter Maydell
	staging Pull request # gpg: Signature made Fri 22 Feb 2019 14:07:01 GMT # gpg: using RSA key 9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: (27 commits) tests/virtio-blk: add test for DISCARD command tests/virtio-blk: add test for WRITE_ZEROES command tests/virtio-blk: add virtio_blk_fix_dwz_hdr() function tests/virtio-blk: change assert on data_size in virtio_blk_request() virtio-blk: add DISCARD and WRITE_ZEROES features virtio-blk: set config size depending on the features enabled virtio-net: make VirtIOFeature usable for other virtio devices virtio-blk: add "discard" and "write-zeroes" properties virtio-blk: add host_features field in VirtIOBlock virtio-blk: add acct_failed param to virtio_blk_handle_rw_error() hw/ide: drop iov field from IDEDMA hw/ide: drop iov field from IDEBufferedRequest hw/ide: drop iov field from IDEState tests/test-bdrv-drain: use QEMU_IOVEC_INIT_BUF migration/block: use qemu_iovec_init_buf qemu-img: use qemu_iovec_init_buf block/vmdk: use qemu_iovec_init_buf block/qed: use qemu_iovec_init_buf block/qcow2: use qemu_iovec_init_buf block/qcow: use qemu_iovec_init_buf ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-02-25	block: Purify .bdrv_refresh_filename()	Max Reitz
	Currently, BlockDriver.bdrv_refresh_filename() is supposed to both refresh the filename (BDS.exact_filename) and set BDS.full_open_options. Now that we have generic code in the central bdrv_refresh_filename() for creating BDS.full_open_options, we can drop the latter part from all BlockDriver.bdrv_refresh_filename() implementations. This also means that we can drop all of the existing default code for this from the global bdrv_refresh_filename() itself. Furthermore, we now have to call BlockDriver.bdrv_refresh_filename() after having set BDS.full_open_options, because the block driver's implementation should now be allowed to depend on BDS.full_open_options being set correctly. Finally, with this patch we can drop the @options parameter from BlockDriver.bdrv_refresh_filename(); also, add a comment on this function's purpose in block/block_int.h while touching its interface. This completely obsoletes blklogwrite's implementation of .bdrv_refresh_filename(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-25-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25	block: Add BlockDriver.bdrv_gather_child_options	Max Reitz
	Some follow-up patches will rework the way bs->full_open_options is refreshed in bdrv_refresh_filename(). The new implementation will remove the need for the block drivers' bdrv_refresh_filename() implementations to set bs->full_open_options; instead, it will be generic and use static information from each block driver. However, by implementing bdrv_gather_child_options(), block drivers will still be able to override the way the full_open_options of their children are incorporated into their own. We need to implement this function for VMDK because we have to prevent the generic implementation from gathering the options of all children: It is not possible to specify options for the extents through the runtime options. For quorum, the child names that would be used by the generic implementation and the ones that we actually (currently) want to use differ. See quorum_gather_child_options() for more information. Note that both of these are cases which are not ideal: In case of VMDK it would probably be nice to be able to specify options for all extents. In case of quorum, the current runtime option structure is simply broken and needs to be fixed (but that is left for another patch). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-23-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25	block: Add strong_runtime_opts to BlockDriver	Max Reitz
	This new field can be set by block drivers to list the runtime options they accept that may influence the contents of the respective BDS. As of a follow-up patch, this list will be used by the common bdrv_refresh_filename() implementation to decide which options to put into BDS.full_open_options (and consequently whether a JSON filename has to be created), thus freeing the drivers of having to implement that logic themselves. Additionally, this patch adds the field to all of the block drivers that need it and sets it accordingly. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-22-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25	block: Add bdrv_dirname()	Max Reitz
	This function may be implemented by block drivers to derive a directory name from a BDS. Concatenating this g_free()-able string with a relative filename must result in a valid (not necessarily existing) filename, so this is a function that should generally be not implemented by format drivers, because this is protocol-specific. If a BDS's driver does not implement this function, bdrv_dirname() will fall through to the BDS's file if it exists. If it does not, the exact_filename field will be used to generate a directory name. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-15-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25	block: bdrv_get_full_backing_filename's ret. val.	Max Reitz
	Make bdrv_get_full_backing_filename() return an allocated string instead of placing the result in a caller-provided buffer. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-12-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25	block: bdrv_get_full_backing_filename_from_...'s ret. val.	Max Reitz
	Make bdrv_get_full_backing_filename_from_filename() return an allocated string instead of placing the result in a caller-provided buffer. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-11-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25	block: Make path_combine() return the path	Max Reitz
	Besides being safe for arbitrary path lengths, after some follow-up patches all callers will want a freshly allocated buffer anyway. In the meantime, path_combine_deprecated() is added which has the same interface as path_combine() had before this patch. All callers to that function will be converted in follow-up patches. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20190201192935.18394-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25	block: Add BDS.auto_backing_file	Max Reitz
	If the backing file is overridden, this most probably does change the guest-visible data of a BDS. Therefore, we will need to consider this in bdrv_refresh_filename(). To see whether it has been overridden, we might want to compare bs->backing_file and bs->backing->bs->filename. However, bs->backing_file is changed by bdrv_set_backing_hd() (which is just used to change the backing child at runtime, without modifying the image header), so bs->backing_file most of the time simply contains a copy of bs->backing->bs->filename anyway, so it is useless for such a comparison. This patch adds an auto_backing_file BDS field which contains the backing file path as indicated by the image header, which is not changed by bdrv_set_backing_hd(). Because of bdrv_refresh_filename() magic, however, a BDS's filename may differ from what has been specified during bdrv_open(). Then, the comparison between bs->auto_backing_file and bs->backing->bs->filename may fail even though bs->backing was opened from bs->auto_backing_file. To mitigate this, we can copy the real BDS's filename (after the whole bdrv_open() and bdrv_refresh_filename() process) into bs->auto_backing_file, if we know the former has been opened based on the latter. This is only possible if no options modifying the backing file's behavior have been specified, though. To simplify things, this patch only copies the filename from the backing file if no options have been specified for it at all. Furthermore, there are cases where an overlay is created by qemu which already contains a BDS's filename (e.g. in blockdev-snapshot-sync). We do not need to worry about updating the overlay's bs->auto_backing_file there, because we actually wrote a post-bdrv_refresh_filename() filename into the image header. So all in all, there will be false negatives where (as of a future patch) bdrv_refresh_filename() will assume that the backing file differs from what was specified in the image header, even though it really does not. However, these cases should be limited to where (1) the user actually did override something in the backing chain (e.g. by specifying options for the backing file), or (2) the user executed a QMP command to change some node's backing file (e.g. change-backing-file or block-commit with @backing-file given) where the given filename does not happen to coincide with qemu's idea of the backing BDS's filename. Then again, (1) really is limited to -drive. With -blockdev or blockdev-add, you have to adhere to the schema, so a user cannot give partial "unimportant" options (e.g. by just setting backing.node-name and leaving the rest to the image header). Therefore, trying to fix this would mean trying to fix something for -drive only. To improve on (2), we would need a full infrastructure to "canonicalize" an arbitrary filename (+ options), so it can be compared against another. That seems a bit over the top, considering that filenames nowadays are there mostly for the user's entertainment. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-5-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25	block: Use bdrv_refresh_filename() to pull	Max Reitz
	Before this patch, bdrv_refresh_filename() is used in a pushing manner: Whenever the BDS graph is modified, the parents of the modified edges are supposed to be updated (recursively upwards). However, that is nonviable, considering that we want child changes not to concern parents. Also, in the long run we want a pull model anyway: Here, we would have a bdrv_filename() function which returns a BDS's filename, freshly constructed. This patch is an intermediate step. It adds bdrv_refresh_filename() calls before every place a BDS.filename value is used. The only exceptions are protocol drivers that use their own filename, which clearly would not profit from refreshing that filename before. Also, bdrv_get_encrypted_filename() is removed along the way (as a user of BDS.filename), since it is completely unused. In turn, all of the calls to bdrv_refresh_filename() before this patch are removed, because we no longer have to call this function on graph changes. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-2-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-02-25	block: fix bdrv_check_perm for non-tree subgraph	Vladimir Sementsov-Ogievskiy
	bdrv_check_perm in it's recursion checks each node in context of new permissions for one parent, because of nature of DFS. It works well, while children subgraph of top-most updated node is a tree, i.e. it doesn't have any kind of loops. But if we have a loop (not oriented, of course), i.e. we have two different ways from top-node to some child-node, then bdrv_check_perm will do wrong thing: top \| \ \| \| v v A B \| \| v v node It will once check new permissions of node in context of new A permissions and old B permissions and once visa-versa. It's a wrong way and may lead to corruption of permission system. We may start with no-permissions and all-shared for both A->node and B->node relations and finish up with non shared write permission for both ways. The following commit will add a test, which shows this bug. To fix this situation, let's really set BdrvChild permissions during bdrv_check_perm procedure. And we are happy here, as check-perm is already written in transaction manner, so we just need to restore backed-up permissions in _abort. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25	nbd: Use low-level QIOChannel API in nbd_read_eof()	Kevin Wolf
	Instead of using the convenience wrapper qio_channel_read_all_eof(), use the lower level QIOChannel API. This means duplicating some code, but we'll need this because this coroutine yield is special: We want it to be interruptible so that nbd_client_attach_aio_context() can correctly reenter the coroutine. This moves the bdrv_dec/inc_in_flight() pair into nbd_read_eof(), so that connection_co will always sit in this exact qio_channel_yield() call when bdrv_drain() returns. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2019-02-25	nbd: Move nbd_read_eof() to nbd/client.c	Kevin Wolf
	The only caller of nbd_read_eof() is nbd_receive_reply(), so it doesn't have to live in the header file, but can move next to its caller. Also add the missing coroutine_fn to the function and its caller. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2019-02-25	io: Make qio_channel_yield() interruptible	Kevin Wolf
	Similar to how qemu_co_sleep_ns() allows preemption from an external coroutine entry, allow reentering qio_channel_yield() early. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25	block-backend: Make blk_inc/dec_in_flight public	Kevin Wolf
	For some users of BlockBackends, just increasing the in_flight counter is easier than implementing separate handlers in BlockDevOps. Make the helper functions for this public. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25	block/snapshot: remove bdrv_snapshot_delete_by_id_or_name	Daniel Henrique Barboza
	After the previous patch, the only instance of this function left is inside qemu-img.c. qemu-img is using it inside the 'img_snapshot' function to delete snapshots in the SNAPSHOT_DELETE case, based on a "snapshot_name" string that refers to the tag, not ID, of the QEMUSnapshotInfo struct. This can be verified by checking the SNAPSHOT_CREATE case that comes shortly before SNAPSHOT_DELETE. In that case, the same "snapshot_name" variable is being strcpy to the 'name' field of the QEMUSnapshotInfo struct sn: pstrcpy(sn.name, sizeof(sn.name), snapshot_name); Based on that, it is unlikely that "snapshot_name" might contain an "id" in SNAPSHOT_DELETE. This patch changes SNAPSHOT_DELETE to use snapshot_find() and snapshot_delete() instead of bdrv_snapshot_delete_by_id_or_name. After that, there is no instances left of bdrv_snapshot_delete_by_id_or_name in the code, so it is safe to remove it entirely. Suggested-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-02-25	Merge remote-tracking branch 'remotes/kraxel/tags/vga-20190222-pull-request' ↵	Peter Maydell
	into staging vga: bugfixes and edid support for virtio-vga # gpg: Signature made Fri 22 Feb 2019 08:24:25 GMT # gpg: using RSA key 4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full] # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" [full] # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full] # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * remotes/kraxel/tags/vga-20190222-pull-request: display/virtio: add edid support. virtio-gpu: remove useless 'waiting' field virtio-gpu: block both 2d and 3d rendering virtio-gpu: remove unused config_size virtio-gpu: remove unused qdev Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-02-25	Merge remote-tracking branch 'remotes/kraxel/tags/ui-20190222-pull-request' ↵	Peter Maydell
	into staging ui: add support for -display spice-app ui: gtk+sdl bugfixes. # gpg: Signature made Fri 22 Feb 2019 07:53:13 GMT # gpg: using RSA key 4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full] # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" [full] # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full] # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * remotes/kraxel/tags/ui-20190222-pull-request: display: add -display spice-app launching a Spice client spice: use a default name for the server qapi: document DisplayType enum build-sys: add gio-2.0 check char: register spice ports after spice started char: move SpiceChardev and open_spice_port() to spice.h header spice: do not stop spice if VM is paused spice: merge options lists spice: avoid spice runtime assert char/spice: discard write() if backend is disconnected char/spice: trigger HUP event ui/gtk: Fix the license information sdl2: drop qemu_input_event_send_key_qcode call spice: set device address and device display ID in QXL interface kbd-state: don't block auto-repeat events Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-02-22	hw/smbios: fix offset of type 3 sku field	Daniel P. Berrangé
	The type 3 SMBIOS structure[1] ends with fields ... 0x14 - contained element count 0x15 - contained element record length 0x16 - sku number The smbios_type_3 struct missed the contained element record length field, causing sku number to be reported at the wrong offset. [1] https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.1.1.pdf Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20190215153600.1770727-1-berrange@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Fixes: e41fca3da72 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-22	pci: Move NVIDIA vendor id to the rest of ids	Alexey Kardashevskiy
	sPAPR code will use it too so move it from VFIO to the common code. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190214051440.59167-1-aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-22	virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size	David Gibson
	The virtio-balloon always works in units of 4kiB (BALLOON_PAGE_SIZE), but we can only actually discard memory in units of the host page size. Now, we handle this very badly: we silently ignore balloon requests that aren't host page aligned, and for requests that are host page aligned we discard the entire host page. The latter can corrupt guest memory if its page size is smaller than the host's. The obvious choice would be to disable the balloon if the host page size is not 4kiB. However, that would break the special case where host and guest have the same page size, but that's larger than 4kiB. That case currently works by accident[1] - and is used in practice on many production POWER systems where 64kiB has long been the Linux default page size on both host and guest. To make the balloon safe, without breaking that useful special case, we need to accumulate 4kiB balloon requests until we have a whole contiguous host page to discard. We could in principle do that across all guest memory, but it would require a large bitmap to track. This patch represents a compromise: we track ballooned subpages for a single contiguous host page at a time. This means that if the guest discards all 4kiB chunks of a host page in succession, we will discard it. This is the expected behaviour in the (host page) == (guest page) != 4kiB case we want to support. If the guest scatters 4kiB requests across different host pages, we don't discard anything, and issue a warning. Not ideal, but at least we don't corrupt guest memory as the previous version could. Warning reporting is kind of a compromise here. Determining whether we're in a problematic state at realize() time is tricky, because we'd have to look at the host pagesizes of all memory backends, but we can't really know if some of those backends could be for special purpose memory that's not subject to ballooning. Reporting only when the guest tries to balloon a partial page also isn't great because if the guest page size happens to line up it won't indicate that we're in a non ideal situation. It could also cause alarming repeated warnings whenever a migration is attempted. So, what we do is warn the first time the guest attempts balloon a partial host page, whether or not it will end up ballooning the rest of the page immediately afterwards. [1] Because when the guest attempts to balloon a page, it will submit requests for each 4kiB subpage. Most will be ignored, but the one which happens to be host page aligned will discard the whole lot. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190214043916.22128-6-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-22	Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20190221' into staging	Peter Maydell
	Allow const void * as argument to helpers. Remove obsolete TODO file. # gpg: Signature made Thu 21 Feb 2019 18:59:11 GMT # gpg: using RSA key 64DF38E8AF7E215F # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/pull-tcg-20190221: include/exec/helper-head.h: support "const void *" in helper calls tcg: Remove TODO file Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-02-22	virtio-blk: add DISCARD and WRITE_ZEROES features	Stefano Garzarella
	This patch adds the support of DISCARD and WRITE_ZEROES commands, that have been introduced in the virtio-blk protocol to have better performance when using SSD backend. We support only one segment per request since multiple segments are not widely used and there are no userspace APIs that allow applications to submit multiple segments in a single call. Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20190221103314.58500-7-sgarzare@redhat.com Message-Id: <20190221103314.58500-7-sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-02-22	virtio-blk: set config size depending on the features enabled	Stefano Garzarella
	Starting from DISABLE and WRITE_ZEROES features, we use an array of VirtIOFeature (as virtio-net) to properly set the config size depending on the features enabled. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20190221103314.58500-6-sgarzare@redhat.com Message-Id: <20190221103314.58500-6-sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>