slackcoder/qemu - QEMU is a generic and open source machine & userspace emulator and virtualizer

Age	Commit message (Collapse)	Author
2022-10-27	arm: re-randomize rng-seed on reboot	Jason A. Donenfeld
	When the system reboots, the rng-seed that the FDT has should be re-randomized, so that the new boot gets a new seed. Since the FDT is in the ROM region at this point, we add a hook right after the ROM has been added, so that we have a pointer to that copy of the FDT. Cc: Peter Maydell <peter.maydell@linaro.org> Cc: qemu-arm@nongnu.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Message-id: 20221025004327.568476-5-Jason@zx2c4.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	x86: do not re-randomize RNG seed on snapshot load	Jason A. Donenfeld
	Snapshot loading is supposed to be deterministic, so we shouldn't re-randomize the various seeds used. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Message-id: 20221025004327.568476-4-Jason@zx2c4.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	device-tree: add re-randomization helper function	Jason A. Donenfeld
	When the system reboots, the rng-seed that the FDT has should be re-randomized, so that the new boot gets a new seed. Several architectures require this functionality, so export a function for injecting a new seed into the given FDT. Cc: Alistair Francis <alistair.francis@wdc.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-id: 20221025004327.568476-3-Jason@zx2c4.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	reset: allow registering handlers that aren't called by snapshot loading	Jason A. Donenfeld
	Snapshot loading only expects to call deterministic handlers, not non-deterministic ones. So introduce a way of registering handlers that won't be called when reseting for snapshots. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Message-id: 20221025004327.568476-2-Jason@zx2c4.com [PMM: updated json doc comment with Markus' text; fixed checkpatch style nit] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: Use the max page size in a 2-stage ptw	Richard Henderson
	We had only been reporting the stage2 page size. This causes problems if stage1 is using a larger page size (16k, 2M, etc), but stage2 is using a smaller page size, because cputlb does not set large_page_{addr,mask} properly. Fix by using the max of the two page sizes. Reported-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221024051851.3074715-15-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: Implement FEAT_HAFDBS, dirty bit portion	Richard Henderson
	Perform the atomic update for hardware management of the dirty bit. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221024051851.3074715-14-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: Implement FEAT_HAFDBS, access flag portion	Richard Henderson
	Perform the atomic update for hardware management of the access flag. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221024051851.3074715-13-richard.henderson@linaro.org [PMM: Fix accidental PROT_WRITE to PAGE_WRITE; add missing main-loop.h include] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: Tidy merging of attributes from descriptor and table	Richard Henderson
	Replace some gotos with some nested if statements. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20221024051851.3074715-12-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: Consider GP an attribute in get_phys_addr_lpae	Richard Henderson
	Both GP and DBM are in the upper attribute block. Extend the computation of attrs to include them, then simplify the setting of guarded. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20221024051851.3074715-11-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: Don't shift attrs in get_phys_addr_lpae	Richard Henderson
	Leave the upper and lower attributes in the place they originate from in the descriptor. Shifting them around is confusing, since one cannot read the bit numbers out of the manual. Also, new attributes have been added which would alter the shifts. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20221024051851.3074715-10-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: Fix fault reporting in get_phys_addr_lpae	Richard Henderson
	Always overriding fi->type was incorrect, as we would not properly propagate the fault type from S1_ptw_translate, or arm_ldq_ptw. Simplify things by providing a new label for a translation fault. For other faults, store into fi directly. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20221024051851.3074715-9-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: Remove loop from get_phys_addr_lpae	Richard Henderson
	The unconditional loop was used both to iterate over levels and to control parsing of attributes. Use an explicit goto in both cases. While this appears less clean for iterating over levels, we will need to jump back into the middle of this loop for atomic updates, which is even uglier. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221024051851.3074715-8-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: Add ARMFault_UnsuppAtomicUpdate	Richard Henderson
	This fault type is to be used with FEAT_HAFDBS when the guest enables hw updates, but places the tables in memory where atomic updates are unsupported. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20221024051851.3074715-7-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: Move S1_ptw_translate outside arm_ld[lq]_ptw	Richard Henderson
	Separate S1 translation from the actual lookup. Will enable lpae hardware updates. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221024051851.3074715-6-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: Extract HA and HD in aa64_va_parameters	Richard Henderson
	Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20221024051851.3074715-5-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: Add isar predicates for FEAT_HAFDBS	Richard Henderson
	The MMFR1 field may indicate support for hardware update of access flag alone, or access flag and dirty bit. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221024051851.3074715-4-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: Add ptw_idx to S1Translate	Richard Henderson
	Hoist the computation of the mmu_idx for the ptw up to get_phys_addr_with_struct and get_phys_addr_twostage. This removes the duplicate check for stage2 disabled from the middle of the walk, performing it only once. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20221024051851.3074715-3-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: Introduce regime_is_stage2	Richard Henderson
	Reduce the amount of typing required for this check. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221024051851.3074715-2-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/imx: reload cmp timer outside of the reload ptimer transaction	Axel Heider
	When running seL4 tests (https://docs.sel4.systems/projects/sel4test) on the sabrelight platform, the timer tests fail. The arm/imx6 EPIT timer interrupt does not fire properly, instead of a e.g. second in can take up to a minute to finally see the interrupt. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1263 Signed-off-by: Axel Heider <axel.heider@hensoldt.net> Message-id: 166663118138.13362.1229967229046092876-0@git.sr.ht Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	hw/hyperv/hyperv.c: Use device_cold_reset() instead of device_legacy_reset()	Peter Maydell
	The semantic difference between the deprecated device_legacy_reset() function and the newer device_cold_reset() function is that the new function resets both the device itself and any qbuses it owns, whereas the legacy function resets just the device itself and nothing else. In hyperv_synic_reset() we reset a SynICState, which has no qbuses, so for this purpose the two functions behave identically and we can stop using the deprecated one. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-id: 20221013171817.1447562-1-peter.maydell@linaro.org
2022-10-27	hw/core/resettable: fix reset level counting	Damien Hedde
	The code for handling the reset level count in the Resettable code has two issues: The reset count is only decremented for the 1->0 case. This means that if there's ever a nested reset that takes the count to 2 then it will never again be decremented. Eventually the count will exceed the '50' limit in resettable_phase_enter() and QEMU will trip over the assertion failure. The repro case in issue 1266 is an example of this that happens now the SCSI subsystem uses three-phase reset. Secondly, the count is decremented only after the exit phase handler is called. Moving the reset count decrement from "just after" to "just before" calling the exit phase handler allows resettable_is_in_reset() to return false during the handler execution. This simplifies reset handling in resettable devices. Typically, a function that updates the device state will just need to read the current reset state and not anymore treat the "in a reset-exit transition" as a special case. Note that the semantics change to the *_is_in_reset() functions will have no effect on the current codebase, because only two devices (hw/char/cadence_uart.c and hw/misc/zynq_sclr.c) currently call those functions, and in neither case do they do it from the device's exit phase methed. Fixes: 4a5fc890 ("scsi: Use device_cold_reset() and bus_cold_reset()") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1266 Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reported-by: Michael Peter <michael.peter@hensoldt-cyber.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20221020142749.3357951-1-peter.maydell@linaro.org Buglink: https://bugs.launchpad.net/qemu/+bug/1905297 Reported-by: Michael Peter <michael.peter@hensoldt-cyber.com> [PMM: adjust the docs paragraph changed to get the name of the 'enter' phase right and to clarify exactly when the count is adjusted; rewrite the commit message] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: honor HCR_E2H and HCR_TGE in arm_excp_unmasked()	Ake Koomsin
	An exception targeting EL2 from lower EL is actually maskable when HCR_E2H and HCR_TGE are both set. This applies to both secure and non-secure Security state. We can remove the conditions that try to suppress masking of interrupts when we are Secure and the exception targets EL2 and Secure EL2 is disabled. This is OK because in that situation arm_phys_excp_target_el() will never return 2 as the target EL. The 'not if secure' check in this function was originally written before arm_hcr_el2_eff(), and back then the target EL returned by arm_phys_excp_target_el() could be 2 even if we were in Secure EL0/EL1; but it is no longer needed. Signed-off-by: Ake Koomsin <ake@igel.co.jp> Message-id: 20221017092432.546881-1-ake@igel.co.jp [PMM: Add commit message paragraph explaining why it's OK to remove the checks on secure and SCR_EEL2] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	hw/arm/virt: Fix devicetree warnings about the virtio-iommu node	Jean-Philippe Brucker
	The "PCI Bus Binding to: IEEE Std 1275-1994" defines the compatible string for a PCIe bus or endpoint as "pci<vendorid>,<deviceid>" or similar. Since the initial binding for PCI virtio-iommu didn't follow this rule, it was modified to accept both strings and ensure backward compatibility. Also, the unit-name for the node should be "device,function". Fix corresponding dt-validate and dtc warnings: pcie@10000000: virtio_iommu@16:compatible: ['virtio,pci-iommu'] does not contain items matching the given schema pcie@10000000: Unevaluated properties are not allowed (... 'virtio_iommu@16' were unexpected) From schema: linux/Documentation/devicetree/bindings/pci/host-generic-pci.yaml virtio_iommu@16: compatible: 'oneOf' conditional failed, one must be fixed: ['virtio,pci-iommu'] is too short 'pci1af4,1057' was expected From schema: dtschema/schemas/pci/pci-bus.yaml Warning (pci_device_reg): /pcie@10000000/virtio_iommu@16: PCI unit address format error, expected "2,0" Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-27	target/arm: Implement FEAT_E0PD	Peter Maydell
	FEAT_E0PD adds new bits E0PD0 and E0PD1 to TCR_EL1, which allow the OS to forbid EL0 access to half of the address space. Since this is an EL0-specific variation on the existing TCR_ELx.{EPD0,EPD1}, we can implement it entirely in aa64_va_parameters(). This requires moving the existing regime_is_user() to internals.h so that the code in helper.c can get at it. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20221021160131.3531787-1-peter.maydell@linaro.org
2022-10-27	vl: Allow ThreadContext objects to be created before the sandbox option	David Hildenbrand
	Currently, there is no way to configure a CPU affinity inside QEMU when the sandbox option disables it for QEMU as a whole, for example, via: -sandbox enable=on,resourcecontrol=deny While ThreadContext objects can be created on the QEMU commandline and the CPU affinity can be configured externally via the thread-id, this is insufficient if a ThreadContext with a certain CPU affinity is already required during QEMU startup, before we can intercept QEMU and configure the CPU affinity. Blocking sched_setaffinity() was introduced in 24f8cdc57224 ("seccomp: add resourcecontrol argument to command line"), "to avoid any bigger of the process". However, we only care about once QEMU is running, not when the instance starting QEMU explicitly requests a certain CPU affinity on the QEMU comandline. Right now, for NUMA-aware preallocation of memory backends used for initial machine RAM, one has to: 1) Start QEMU with the memory-backend with "prealloc=off" 2) Pause QEMU before it starts the guest (-S) 3) Create ThreadContext, configure the CPU affinity using the thread-id 4) Configure the ThreadContext as "prealloc-context" of the memory backend 5) Trigger preallocation by setting "prealloc=on" To simplify this handling especially for initial machine RAM, allow creation of ThreadContext objects before parsing sandbox options, such that the CPU affinity requested on the QEMU commandline alongside the sandbox option can be set. As ThreadContext objects essentially only create a persistent context thread and set the CPU affinity, this is easily possible. With this change, we can create a ThreadContext with a CPU affinity on the QEMU commandline and use it for preallocation of memory backends glued to the machine (simplified example): To make "-name debug-threads=on" keep working as expected for the context threads, perform earlier parsing of "-name". qemu-system-x86_64 -m 1G \ -object thread-context,id=tc1,cpu-affinity=3-4 \ -object memory-backend-ram,id=pc.ram,size=1G,prealloc=on,prealloc-threads=2,prealloc-context=tc1 \ -machine memory-backend=pc.ram \ -S -monitor stdio -sandbox enable=on,resourcecontrol=deny And while we can query the current CPU affinity: (qemu) qom-get tc1 cpu-affinity [ 3, 4 ] We can no longer change it from QEMU directly: (qemu) qom-set tc1 cpu-affinity 1-2 Error: Setting CPU affinity failed: Operation not permitted Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20221014134720.168738-8-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27	hostmem: Allow for specifying a ThreadContext for preallocation	David Hildenbrand
	Let's allow for specifying a thread context via the "prealloc-context" property. When set, preallcoation threads will be crated via the thread context -- inheriting the same CPU affinity as the thread context. Pinning preallcoation threads to CPUs can heavily increase performance in NUMA setups, because, preallocation from a CPU close to the target NUMA node(s) is faster then preallocation from a CPU further remote, simply because of memory bandwidth for initializing memory with zeroes. This is especially relevant for very large VMs backed by huge/gigantic pages, whereby preallocation is mandatory. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20221014134720.168738-7-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27	util: Make qemu_prealloc_mem() optionally consume a ThreadContext	David Hildenbrand
	... and implement it under POSIX. When a ThreadContext is provided, create new threads via the context such that these new threads obtain a properly configured CPU affinity. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20221014134720.168738-6-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27	util: Add write-only "node-affinity" property for ThreadContext	David Hildenbrand
	Let's make it easier to pin threads created via a ThreadContext to all host CPUs currently belonging to a given set of host NUMA nodes -- which is the common case. "node-affinity" is simply a shortcut for setting "cpu-affinity" manually to the list of host CPUs belonging to the set of host nodes. This property can only be written. A simple QEMU example to set the CPU affinity to host node 1 on a system with two nodes, 24 CPUs each, whereby odd-numbered host CPUs belong to host node 1: qemu-system-x86_64 -S \ -object thread-context,id=tc1,node-affinity=1 And we can query the cpu-affinity via HMP/QMP: (qemu) qom-get tc1 cpu-affinity [ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47 ] We cannot query the node-affinity: (qemu) qom-get tc1 node-affinity Error: Insufficient permission to perform this operation But note that due to dynamic library loading this example will not work before we actually make use of thread_context_create_thread() in QEMU code, because the type will otherwise not get registered. We'll wire this up next to make it work. Note that if the host CPUs for a host node change due do CPU hot(un)plug CPU onlining/offlining (i.e., lscpu output changes) after the ThreadContext was started, the CPU affinity will not get updated. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221014134720.168738-5-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27	util: Introduce ThreadContext user-creatable object	David Hildenbrand
	Setting the CPU affinity of QEMU threads is a bit problematic, because QEMU doesn't always have permissions to set the CPU affinity itself, for example, with seccomp after initialized by QEMU: -sandbox enable=on,resourcecontrol=deny General information about CPU affinities can be found in the man page of taskset: CPU affinity is a scheduler property that "bonds" a process to a given set of CPUs on the system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs. While upper layers are already aware of how to handle CPU affinities for long-lived threads like iothreads or vcpu threads, especially short-lived threads, as used for memory-backend preallocation, are more involved to handle. These threads are created on demand and upper layers are not even able to identify and configure them. Introduce the concept of a ThreadContext, that is essentially a thread used for creating new threads. All threads created via that context thread inherit the configured CPU affinity. Consequently, it's sufficient to create a ThreadContext and configure it once, and have all threads created via that ThreadContext inherit the same CPU affinity. The CPU affinity of a ThreadContext can be configured two ways: (1) Obtaining the thread id via the "thread-id" property and setting the CPU affinity manually (e.g., via taskset). (2) Setting the "cpu-affinity" property and letting QEMU try set the CPU affinity itself. This will fail if QEMU doesn't have permissions to do so anymore after seccomp was initialized. A simple QEMU example to set the CPU affinity to host CPU 0,1,6,7 would be: qemu-system-x86_64 -S \ -object thread-context,id=tc1,cpu-affinity=0-1,cpu-affinity=6-7 And we can query it via HMP/QMP: (qemu) qom-get tc1 cpu-affinity [ 0, 1, 6, 7 ] But note that due to dynamic library loading this example will not work before we actually make use of thread_context_create_thread() in QEMU code, because the type will otherwise not get registered. We'll wire this up next to make it work. In general, the interface behaves like pthread_setaffinity_np(): host CPU numbers that are currently not available are ignored; only host CPU numbers that are impossible with the current kernel will fail. If the list of host CPU numbers does not include a single CPU that is available, setting the CPU affinity will fail. A ThreadContext can be reused, simply by reconfiguring the CPU affinity. Note that the CPU affinity of previously created threads will not get adjusted. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221014134720.168738-4-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27	util: Introduce qemu_thread_set_affinity() and qemu_thread_get_affinity()	David Hildenbrand
	Usually, we let upper layers handle CPU pinning, because pthread_setaffinity_np() (-> sched_setaffinity()) is blocked via seccomp when starting QEMU with -sandbox enable=on,resourcecontrol=deny However, we want to configure and observe the CPU affinity of threads from QEMU directly in some cases when the sandbox option is either not enabled or not active yet. So let's add a way to configure CPU pinning via qemu_thread_set_affinity() and obtain CPU affinity via qemu_thread_get_affinity() and implement them under POSIX using pthread_setaffinity_np() + pthread_getaffinity_np(). Implementation under Windows is possible using SetProcessAffinityMask() + GetProcessAffinityMask(), however, that is left as future work. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20221014134720.168738-3-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27	util: Cleanup and rename os_mem_prealloc()	David Hildenbrand
	Let's * give the function a "qemu_" style name make sure the parameters in the implementation match the prototype * rename smp_cpus to max_threads, which makes the semantics of that parameter clearer ... and add a function documentation. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20221014134720.168738-2-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27	target/s390x: Fix emulation of the VISTR instruction	Thomas Huth
	The element size is encoded in the M3 field, not in the M4 field. Fixes: be6324c6b734 ("s390x/tcg: Implement VECTOR ISOLATE STRING") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1248 Message-Id: <20221012182755.1014853-3-thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-10-27	tests/tcg/s390x: Test compiler flags only once, not every time	Thomas Huth
	This is common practice, see the Makefile.target in the aarch64 folder for example. Suggested-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20221012182755.1014853-2-thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-10-27	s390x/tod-kvm: don't save/restore the TOD in PV guests	Nico Boehr
	Under PV, the guest's TOD clock is under control of the ultravisor and the hypervisor cannot change it. With upcoming kernel changes[1], the Linux kernel will reject QEMU's request to adjust the guest's clock in this case, so don't attempt to set the clock. This avoids the following warning message on save/restore of a PV guest: warning: Unable to set KVM guest TOD clock: Operation not supported [1] https://lore.kernel.org/all/20221011160712.928239-2-nrb@linux.ibm.com/ Fixes: c3347ed0d2ee ("s390x: protvirt: Support unpack facility") Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Message-Id: <20221012123229.1196007-1-nrb@linux.ibm.com> [thuth: Add curly braces] Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-10-27	s390x: step down as general arch maintainer	Cornelia Huck
	I haven't really been working on s390x for some time now, and in practice, I don't have time for it, either. So let's remove myself from this entry. Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20221010160957.40779-1-cohuck@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-10-27	s390x/pv: remove semicolon from macro definition	Claudio Imbrenda
	Remove spurious semicolon at the end of the macro s390_pv_cmd Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20221010151041.89071-1-imbrenda@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-10-27	qerror: QERR_PERMISSION_DENIED is no longer used, drop	Markus Armbruster
	Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221012153801.2604340-5-armbru@redhat.com>
2022-10-27	qtest: Improve error messages when property can not be set right now	Markus Armbruster
	When you try to set qtest property "log" while the qtest object is active, the error message blames "insufficient permission": $ qemu-system-x86_64 -S -display none -nodefaults -monitor stdio -chardev socket,id=chrqt0,path=qtest.socket,server=on,wait=off -object qtest,id=qt0,chardev=chrqt0,log=/dev/null QEMU 7.1.50 monitor - type 'help' for more information (qemu) qom-set /objects/qt0 log qtest.log Error: Insufficient permission to perform this operation This implies it could work with "sufficient permission". It can't. Change the error message to: Error: Property 'log' can not be set now Same for property "chardev". Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221012153801.2604340-4-armbru@redhat.com>
2022-10-27	backends: Improve error messages when property can no longer be set	Markus Armbruster
	When you try to set virtio-rng property "filename" after the backend has been completed with user_creatable_complete(), the error message blames "insufficient permission": $ qemu-system-x86_64 -S -display none -nodefaults -monitor stdio -object rng-random,id=rng0 -device virtio-rng,id=vrng0,rng=rng0 QEMU 7.1.50 monitor - type 'help' for more information (qemu) qom-set /objects/rng0 filename /dev/random Error: Insufficient permission to perform this operation This implies it could work with "sufficient permission". It can't. Change the error message to: Error: Property 'filename' can no longer be set Same for cryptodev-vhost-user property "chardev", rng-egd property "chardev", and vhost-user-backend property "chardev". Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221012153801.2604340-3-armbru@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> [Commit message tidied up]
2022-10-27	qom: Improve error messages when property has no getter or setter	Markus Armbruster
	When you try to set a property that has no setter, the error message blames "insufficient permission": $ qemu-system-x86_64 -S -display none -nodefaults -monitor stdio QEMU 7.1.50 monitor - type 'help' for more information (qemu) qom-set /machine type q35 Error: Insufficient permission to perform this operation This implies it could work with "sufficient permission". It can't. Change the error message to: Error: Property 'pc-i440fx-7.2-machine.type' is not writable Do the same for getting a property that has no getter. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221012153801.2604340-2-armbru@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2022-10-26	bsd-user: Catch up with sys/param.h requirement for machine/pmap.h	Muhammad Moinur Rahman
	Some versions of FreeBSD now require sys/param.h for machine/pmap.h on x86. Include them here to meet that requirement. It does no harm on older versions, so there's no need to #ifdef it. Signed-off-by: Muhammad Moinur Rahman <bofh@FreeBSD.org> Reviewed-by: John Baldwin <jhb@FreeBSD.org> Signed-off-by: Warner Losh <imp@bsdimp.com>
2022-10-26	virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint	Stefan Hajnoczi
	Register guest RAM using BlockRAMRegistrar and set the BDRV_REQ_REGISTERED_BUF flag so block drivers can optimize memory accesses in I/O requests. This is for vdpa-blk, vhost-user-blk, and other I/O interfaces that rely on DMA mapping/unmapping. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20221013185908.1297568-14-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-10-26	blkio: implement BDRV_REQ_REGISTERED_BUF optimization	Stefan Hajnoczi
	Avoid bounce buffers when QEMUIOVector elements are within previously registered bdrv_register_buf() buffers. The idea is that emulated storage controllers will register guest RAM using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O requests. Therefore no blkio_map_mem_region() calls are necessary in the performance-critical I/O code path. This optimization doesn't apply if the I/O buffer is internally allocated by QEMU (e.g. qcow2 metadata). There we still take the slow path because BDRV_REQ_REGISTERED_BUF is not set. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20221013185908.1297568-13-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-10-26	stubs: add qemu_ram_block_from_host() and qemu_ram_get_fd()	Stefan Hajnoczi
	The blkio block driver will need to look up the file descriptor for a given pointer. This is possible in softmmu builds where the RAMBlock API is available for querying guest RAM. Add stubs so tools like qemu-img that link the block layer still build successfully. In this case there is no guest RAM but that is fine. Bounce buffers and their file descriptors will be allocated with libblkio's blkio_alloc_mem_region() so we won't rely on QEMU's qemu_ram_get_fd() in that case. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-12-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-10-26	exec/cpu-common: add qemu_ram_get_fd()	Stefan Hajnoczi
	Add a function to get the file descriptor for a RAMBlock. Device emulation code typically uses the MemoryRegion APIs but vhost-style code may use RAMBlock directly for sharing guest memory with another process. This new API will be used by the libblkio block driver so it can share guest memory via .bdrv_register_buf(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-11-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-10-26	block: add BlockRAMRegistrar	Stefan Hajnoczi
	Emulated devices and other BlockBackend users wishing to take advantage of blk_register_buf() all have the same repetitive job: register RAMBlocks with the BlockBackend using RAMBlockNotifier. Add a BlockRAMRegistrar API to do this. A later commit will use this from hw/block/virtio-blk.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20221013185908.1297568-10-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-10-26	numa: use QLIST_FOREACH_SAFE() for RAM block notifiers	Stefan Hajnoczi
	Make list traversal work when a callback removes a notifier mid-traversal. This is a cleanup to prevent bugs in the future. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-id: 20221013185908.1297568-9-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-10-26	block: return errors from bdrv_register_buf()	Stefan Hajnoczi
	Registering an I/O buffer is only a performance optimization hint but it is still necessary to return errors when it fails. Later patches will need to detect errors when registering buffers but an immediate advantage is that error_report() calls are no longer needed in block driver .bdrv_register_buf() functions. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-8-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-10-26	block: add BDRV_REQ_REGISTERED_BUF request flag	Stefan Hajnoczi
	Block drivers may optimize I/O requests accessing buffers previously registered with bdrv_register_buf(). Checking whether all elements of a request's QEMUIOVector are within previously registered buffers is expensive, so we need a hint from the user to avoid costly checks. Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all QEMUIOVector elements in an I/O request are known to be within previously registered buffers. Always pass the flag through to driver read/write functions. There is little harm in passing the flag to a driver that does not use it. Passing the flag to drivers avoids changes across many block drivers. Filter drivers would need to explicitly support the flag and pass through to their children when the children support it. That's a lot of code changes and it's hard to remember to do that everywhere, leading to silent reduced performance when the flag is accidentally dropped. The only problematic scenario with the approach in this patch is when a driver passes the flag through to internal I/O requests that don't use the same I/O buffer. In that case the hint may be set when it should actually be clear. This is a rare case though so the risk is low. Some drivers have assert(!flags), which no longer works when BDRV_REQ_REGISTERED_BUF is passed in. These assertions aren't very useful anyway since the functions are called almost exclusively by bdrv_driver_preadv/pwritev() so if we get flags handling right there then the assertion is not needed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20221013185908.1297568-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-10-26	block: use BdrvRequestFlags type for supported flag fields	Stefan Hajnoczi
	Use the enum type so GDB displays the enum members instead of printing a numeric constant. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20221013185908.1297568-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>