aboutsummaryrefslogtreecommitdiff
path: root/hw/i386
AgeCommit message (Collapse)Author
2018-06-20intel-iommu: rework the page walk logicPeter Xu
This patch fixes a potential small window that the DMA page table might be incomplete or invalid when the guest sends domain/context invalidations to a device. This can cause random DMA errors for assigned devices. This is a major change to the VT-d shadow page walking logic. It includes but is not limited to: - For each VTDAddressSpace, now we maintain what IOVA ranges we have mapped and what we have not. With that information, now we only send MAP or UNMAP when necessary. Say, we don't send MAP notifies if we know we have already mapped the range, meanwhile we don't send UNMAP notifies if we know we never mapped the range at all. - Introduce vtd_sync_shadow_page_table[_range] APIs so that we can call in any places to resync the shadow page table for a device. - When we receive domain/context invalidation, we should not really run the replay logic, instead we use the new sync shadow page table API to resync the whole shadow page table without unmapping the whole region. After this change, we'll only do the page walk once for each domain invalidations (before this, it can be multiple, depending on number of notifiers per address space). While at it, the page walking logic is also refactored to be simpler. CC: QEMU Stable <qemu-stable@nongnu.org> Reported-by: Jintack Lim <jintack@cs.columbia.edu> Tested-by: Jintack Lim <jintack@cs.columbia.edu> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit 63b88968f139b6a77f2f81e6f1eedf70c0170a85) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20intel-iommu: trace domain id during page walkPeter Xu
This patch only modifies the trace points. Previously we were tracing page walk levels. They are redundant since we have page mask (size) already. Now we trace something much more useful which is the domain ID of the page walking. That can be very useful when we trace more than one devices on the same system, so that we can know which map is for which domain. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit d118c06ebbee2d23ddf873cae4a809311aa61310) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20intel-iommu: pass in address space when page walkPeter Xu
We pass in the VTDAddressSpace too. It'll be used in the follow up patches. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit 2f764fa87d2a81812b313dd6d998e10126292653) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20intel-iommu: introduce vtd_page_walk_infoPeter Xu
During the recursive page walking of IOVA page tables, some stack variables are constant variables and never changed during the whole page walking procedure. Isolate them into a struct so that we don't need to pass those contants down the stack every time and multiple times. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit fe215b0cbb8c1f4b4af0a64aa5c02042080dd537) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20intel-iommu: only do page walk for MAP notifiersPeter Xu
For UNMAP-only IOMMU notifiers, we don't need to walk the page tables. Fasten that procedure by skipping the page table walk. That should boost performance for UNMAP-only notifiers like vhost. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit 4f8a62a933a79094e44bc1b16b63bb23e62d67b4) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20intel-iommu: add iommu lockPeter Xu
SECURITY IMPLICATION: this patch fixes a potential race when multiple threads access the IOMMU IOTLB cache. Add a per-iommu big lock to protect IOMMU status. Currently the only thing to be protected is the IOTLB/context cache, since that can be accessed even without BQL, e.g., in IO dataplane. Note that we don't need to protect device page tables since that's fully controlled by the guest kernel. However there is still possibility that malicious drivers will program the device to not obey the rule. In that case QEMU can't really do anything useful, instead the guest itself will be responsible for all uncertainties. CC: QEMU Stable <qemu-stable@nongnu.org> Reported-by: Fam Zheng <famz@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit 1d9efa73e12ddf361ea997c2d532cc4afa6674d1) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20intel-iommu: remove IntelIOMMUNotifierNodePeter Xu
That is not really necessary. Removing that node struct and put the list entry directly into VTDAddressSpace. It simplfies the code a lot. Since at it, rename the old notifiers_list into vtd_as_with_notifiers. CC: QEMU Stable <qemu-stable@nongnu.org> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit b4a4ba0d68f50f218ee3957b6638dbee32a5eeef) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20intel-iommu: send PSI always even if across PDEsPeter Xu
SECURITY IMPLICATION: without this patch, any guest with both assigned device and a vIOMMU might encounter stale IO page mappings even if guest has already unmapped the page, which may lead to guest memory corruption. The stale mappings will only be limited to the guest's own memory range, so it should not affect the host memory or other guests on the host. During IOVA page table walking, there is a special case when the PSI covers one whole PDE (Page Directory Entry, which contains 512 Page Table Entries) or more. In the past, we skip that entry and we don't notify the IOMMU notifiers. This is not correct. We should send UNMAP notification to registered UNMAP notifiers in this case. For UNMAP only notifiers, this might cause IOTLBs cached in the devices even if they were already invalid. For MAP/UNMAP notifiers like vfio-pci, this will cause stale page mappings. This special case doesn't trigger often, but it is very easy to be triggered by nested device assignments, since in that case we'll possibly map the whole L2 guest RAM region into the device's IOVA address space (several GBs at least), which is far bigger than normal kernel driver usages of the device (tens of MBs normally). Without this patch applied to L1 QEMU, nested device assignment to L2 guests will dump some errors like: qemu-system-x86_64: VFIO_MAP_DMA: -17 qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000, 0x7f89a920d000) = -17 (File exists) CC: QEMU Stable <qemu-stable@nongnu.org> Acked-by: Jason Wang <jasowang@redhat.com> [peterx: rewrite the commit message] Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit 36d2d52bdb45f5b753a61fdaf0fe7891f1f5b61d) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20intel-iommu: Extend address width to 48 bitsPrasad Singamsetty
The current implementation of Intel IOMMU code only supports 39 bits iova address width. This patch provides a new parameter (x-aw-bits) for intel-iommu to extend its address width to 48 bits but keeping the default the same (39 bits). The reason for not changing the default is to avoid potential compatibility problems with live migration of intel-iommu enabled QEMU guest. The only valid values for 'x-aw-bits' parameter are 39 and 48. After enabling larger address width (48), we should be able to map larger iova addresses in the guest. For example, a QEMU guest that is configured with large memory ( >=1TB ). To check whether 48 bits aw is enabled, we can grep in the guest dmesg output with line: "DMAR: Host address width 48". Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit 37f51384ae05bd50f83308339dbffa3e78404874) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20intel-iommu: Redefine macros to enable supporting 48 bit address widthPrasad Singamsetty
The current implementation of Intel IOMMU code only supports 39 bits host/iova address width so number of macros use hard coded values based on that. This patch is to redefine them so they can be used with variable address widths. This patch doesn't add any new functionality but enables adding support for 48 bit address width. Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> (cherry picked from commit 92e5d85e8345a22e87eda940ffe0f6422eb45360) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20multiboot: Check validity of mh_header_addrKevin Wolf
I couldn't find a case where this prevents something bad from happening that isn't already caught by other checks, but let's err on the safe side and check that mh_header_addr is as expected. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jack Schwartz <jack.schwartz@oracle.com> (cherry picked from commit dbf2dce7aabb7723542bd182175904846d70b0f9) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20multiboot: Reject kernels exceeding the address spaceKevin Wolf
The code path where mh_load_end_addr is non-zero in the Multiboot header checks that mh_load_end_addr >= mh_load_addr and so mb_load_size is checked. However, mb_load_size is not checked when calculated from the file size, when mh_load_end_addr is 0. If the kernel binary size is larger than can fit in the address space after load_addr, we ended up with a kernel_size that is smaller than load_size, which means that we read the file into a too small buffer. Add a check to reject kernel files with such Multiboot headers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jack Schwartz <jack.schwartz@oracle.com> (cherry picked from commit b17a9054a0652a1481be48a6729e972abf02412f) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20multiboot: fprintf(stderr...) -> error_report()Jack Schwartz
Change all fprintf(stderr...) calls in hw/i386/multiboot.c to call error_report() instead, including the mb_debug macro. Remove the "\n" from strings passed to all modified calls, since error_report() appends one. Signed-off-by: Jack Schwartz <jack.schwartz@oracle.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit 4b9006a41ea8818f2385ae5228e07f211bb4a33d) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20multiboot: Use header names when displaying fieldsJack Schwartz
Refer to field names when displaying fields in printf and debug statements. Signed-off-by: Jack Schwartz <jack.schwartz@oracle.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit ce5eb6dc4dc5652f7e360a1db817f1d5dafab90f) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20multiboot: Remove unused variables from multiboot.cJack Schwartz
Remove unused variables: mh_mode_type, mh_width, mh_height, mh_depth Signed-off-by: Jack Schwartz <jack.schwartz@oracle.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit 7a2e43cc96fd017883973caf9ee076ae23a3bebd) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2018-06-20multiboot: bss_end_addr can be zeroJack Schwartz
The multiboot spec (https://www.gnu.org/software/grub/manual/multiboot/), section 3.1.3, allows for bss_end_addr to be zero. A zero bss_end_addr signifies there is no .bss section. Suggested-by: Daniel Kiper <daniel.kiper@oracle.com> Signed-off-by: Jack Schwartz <jack.schwartz@oracle.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit 2a8fcd119eb7c6bb3837fc3669eb1b2dfb31daf8) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-12-01pc: fix crash on attempted cpu unplugIgor Mammedov
when qemu is started with '-no-acpi' CLI option, an attempt to unplug a CPU using device_del results in null pointer dereference at: #0 object_get_class #1 pc_machine_device_unplug_request_cb #2 qmp_marshal_device_del which is caused by pcms->acpi_dev == NULL due to ACPI support being disabled. Considering that ACPI support is necessary for unplug to work, check that it's enabled and fail unplug request gracefully if no acpi device were found. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-11-16NUMA: Enable adding NUMA node implicitlyDou Liyang
Linux and Windows need ACPI SRAT table to make memory hotplug work properly, however currently QEMU doesn't create SRAT table if numa options aren't present on CLI. Which breaks both linux and windows guests in certain conditions: * Windows: won't enable memory hotplug without SRAT table at all * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT table present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers when memory is hotplugged and guest tries to use it with that drivers. Fix above issues by automatically creating a numa node when QEMU is started with memory hotplug enabled but without '-numa' options on CLI. (PS: auto-create numa node only for new machine types so not to break migration). Which would provide SRAT table to guests without explicit -numa options on CLI and would allow: * Windows: to enable memory hotplug * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit allocated buffers that legacy drivers/hw can handle. [Rewritten by Igor] Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Suggested-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Thomas Huth <thuth@redhat.com> Cc: Alistair Francis <alistair23@gmail.com> Cc: Takao Indoh <indou.takao@jp.fujitsu.com> Cc: Izumi Taku <izumi.taku@jp.fujitsu.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-11-16hw/pci-host: Fix x86 Host Bridges 64bit PCI holeMarcel Apfelbaum
Currently there is no MMIO range over 4G reserved for PCI hotplug. Since the 32bit PCI hole depends on the number of cold-plugged PCI devices and other factors, it is very possible is too small to hotplug PCI devices with large BARs. Fix it by reserving 2G for I4400FX chipset in order to comply with older Win32 Guest OSes and 32G for Q35 chipset. Even if the new defaults of pci-hole64-size will appear in "info qtree" also for older machines, the property was not implemented so no changes will be visible to guests. Note this is a regression since prev QEMU versions had some range reserved for 64bit PCI hotplug. Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-11-05pci-assign: RemovePaolo Bonzini
Legacy PCI device assignment has been removed from Linux in 4.12, and had been deprecated 2 years ago there. We can remove it from QEMU as well. The ROM loading code was shared with Xen PCI passthrough, so move it to hw/xen. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-30Merge remote-tracking branch ↵Peter Maydell
'remotes/ehabkost/tags/x86-and-machine-pull-request' into staging x86/cpu/numa queue, 2017-10-27 # gpg: Signature made Fri 27 Oct 2017 15:17:12 BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/x86-and-machine-pull-request: (39 commits) x86: Skip check apic_id_limit for Xen numa: fixup parsed NumaNodeOptions earlier mips: r4k: replace cpu_model with cpu_type mips: mipssim: replace cpu_model with cpu_type mips: Magnum/Acer Pica 61: replace cpu_model with cpu_type mips: fulong2e: replace cpu_model with cpu_type mips: malta/boston: replace cpu_model with cpu_type mips: use object_new() instead of gnew()+object_initialize() sparc: leon3: use generic cpu_model parsing sparc: sparc: use generic cpu_model parsing sparc: sun4u/sun4v/niagara: use generic cpu_model parsing sparc: cleanup cpu type name composition tricore: use generic cpu_model parsing tricore: cleanup cpu type name composition unicore32: use generic cpu_model parsing unicore32: cleanup cpu type name composition xtensa: lx60/lx200/ml605/kc705: use generic cpu_model parsing xtensa: sim: use generic cpu_model parsing xtensa: cleanup cpu type name composition sh4: remove SuperHCPUClass::name field ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-27x86: Skip check apic_id_limit for XenLan Tianyu
Xen vIOMMU device model will be in Xen hypervisor. Skip vIOMMU check for Xen here when vcpu number is more than 255. Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Message-Id: <1502842933-8323-1-git-send-email-tianyu.lan@intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-26xen: Log errno rather than return valueRoss Lagerwall
xen_modified_memory() sets errno to communicate what went wrong so log this rather than the return value which is not interesting. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2017-10-19Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell
* TCG 8-byte atomic accesses bugfix (Andrew) * Report disk rotation rate (Daniel) * Report invalid scsi-disk block size configuration (Mark) * KVM and memory API MemoryListener fixes (David, Maxime, Peter Xu) * x86 CPU hotplug crash fix (Igor) * Load/store API documentation (Peter Maydell) * Small fixes by myself and Thomas * qdev DEVICE_DELETED deferral (Michael) # gpg: Signature made Wed 18 Oct 2017 10:56:24 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (29 commits) scsi: reject configurations with logical block size > physical block size qdev: defer DEVICE_DEL event until instance_finalize() Revert "qdev: Free QemuOpts when the QOM path goes away" qdev: store DeviceState's canonical path to use when unparenting qemu-pr-helper: use new libmultipath API watch_mem_write: implement 8-byte accesses notdirty_mem_write: implement 8-byte accesses memory: reuse section_from_flat_range() kvm: simplify kvm_align_section() kvm: region_add and region_del is not called on updates kvm: fix error message when failing to unregister slot kvm: tolerate non-existing slot for log_start/log_stop/log_sync kvm: fix alignment of ram address memory: call log_start after region_add target/i386: trap on instructions longer than >15 bytes target/i386: introduce x86_ld*_code tco: add trace events docs/devel/loads-stores.rst: Document our various load and store APIs nios2: define tcg_env build: remove CONFIG_LIBDECNUMBER ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-15pc: remove useless hot_add_cpu initialisationLaurent Vivier
Since 4458fb3a79 (pc: Eliminate pc_default_machine_options()), hot_add_cpu is set in pc_machine_class_init(), so we don't need to set it in pc_q35_machine_options(), pc_i440fx_machine_options() and xenfv_machine_options(), except to clear it in pc_i440fx_1_4_machine_opt(). Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-10-15isapc: Remove unnecessary migration compatibility codeEduardo Habkost
We don't touch isapc when we change guest ABI and add new entries to PC_COMPAT_* or new PCMachineClass compat flags. This means isapc never guaranteed guest ABI and cross-QEMU-version live migration compatibility. There's no point in keeping code for kvm-pv-eoi and APIC ID compatibility in pc_init_isa(). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-10-15pci: Add INTERFACE_CONVENTIONAL_PCI_DEVICE to Conventional PCI devicesEduardo Habkost
Add INTERFACE_CONVENTIONAL_PCI_DEVICE to all direct subtypes of TYPE_PCI_DEVICE, except: 1) The ones that already have INTERFACE_PCIE_DEVICE set: * base-xhci * e1000e * nvme * pvscsi * vfio-pci * virtio-pci * vmxnet3 2) base-pci-bridge Not all PCI bridges are Conventional PCI devices, so INTERFACE_CONVENTIONAL_PCI_DEVICE is added only to the subtypes that are actually Conventional PCI: * dec-21154-p2p-bridge * i82801b11-bridge * pbm-bridge * pci-bridge The direct subtypes of base-pci-bridge not touched by this patch are: * xilinx-pcie-root: Already marked as PCIe-only. * pcie-pci-bridge: Already marked as PCIe-only. * pcie-port: all non-abstract subtypes of pcie-port are already marked as PCIe-only devices. 3) megasas-base Not all megasas devices are Conventional PCI devices, so the interface names are added to the subclasses registered by megasas_register_types(), according to information in the megasas_devices[] array. "megasas-gen2" already implements INTERFACE_PCIE_DEVICE, so add INTERFACE_CONVENTIONAL_PCI_DEVICE only to "megasas". Acked-by: Alberto Garcia <berto@igalia.com> Acked-by: John Snow <jsnow@redhat.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-10-15fw_cfg: add write callbackMarc-André Lureau
Reintroduce the write callback that was removed when write support was removed in commit 023e3148567ac898c7258138f8e86c3c2bb40d07. Contrary to the previous callback implementation, the write_cb callback is called whenever a write happened, so handlers must be ready to handle partial write as necessary. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-10-12pc: make sure that plugged CPUs are of the same typeIgor Mammedov
heterogeneous cpus are not supported and hotplugging different cpu model crashes QEMU: qemu-system-x86_64 -cpu qemu64 -smp 1,maxcpus=2 (qemu) device_add host-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,id=foo (qemu) info cpus error: failed to get MSR 0x38d qemu-system-x86_64: target/i386/kvm.c:2121: kvm_get_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. Aborted (core dumped) Gracefully fail hotplug process in case of user mistake. Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1507638879-200718-1-git-send-email-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-02kvmclock: use the updated system_timer_msrJim Somerville
Fixes e2b6c17 (kvmclock: update system_time_msr address forcibly) which makes a call to get the latest value of the address stored in system_timer_msr, but then uses the old address anyway. Signed-off-by: Jim Somerville <Jim.Somerville@windriver.com> Message-Id: <59b67db0bd15a46ab47c3aa657c81a4c11f168ea.1506702472.git.Jim.Somerville@windriver.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-27migration: pre_save return intDr. David Alan Gilbert
Modify the pre_save method on VMStateDescription to return an int rather than void so that it potentially can fail. Changed zillions of devices to make them return 0; the only case I've made it return non-0 is hw/intc/s390_flic_kvm.c that already had an error_report/return case. Note: If you add an error exit in your pre_save you must emit an error_report to say why. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170925112917.21340-2-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-09-20Merge remote-tracking branch ↵Peter Maydell
'remotes/ehabkost/tags/machine-next-pull-request' into staging Machine/CPU/NUMA queue, 2017-09-19 # gpg: Signature made Tue 19 Sep 2017 21:17:01 BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/machine-next-pull-request: MAINTAINERS: Update git URLs for my trees hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM NUMA: Replace MAX_NODES with nb_numa_nodes in for loop numa: cpu: calculate/set default node-ids after all -numa CLI options are parsed arm: drop intermediate cpu_model -> cpu type parsing and use cpu type directly pc: use generic cpu_model parsing vl.c: convert cpu_model to cpu type and set of global properties before machine_init() cpu: make cpu_generic_init() abort QEMU on error qom: cpus: split cpu_generic_init() on feature parsing and cpu creation parts hostmem-file: Add "discard-data" option osdep: Define QEMU_MADV_REMOVE vl: Clean up user-creatable objects when exiting Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-19hw/acpi-build: Fix SRAT memory building in case of node 0 without RAMEduardo Habkost
Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node 0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by cut out the 640K hole in the same way the PCI 4G hole does. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Message-Id: <1504231805-30957-2-git-send-email-douly.fnst@cn.fujitsu.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-19numa: cpu: calculate/set default node-ids after all -numa CLI options are parsedIgor Mammedov
Calculating default node-ids for CPUs in possible_cpu_arch_ids() is rather fragile since defaults calculation uses nb_numa_nodes but callback might be potentially called early before all -numa CLI options are parsed, which would lead to cpus assigned only upto nb_numa_nodes at the time possible_cpu_arch_ids() is called. Issue was introduced by (7c88e65 numa: mirror cpu to node mapping in MachineState::possible_cpus) and for example CLI: -smp 4 -numa node,cpus=0 -numa node would set props.node-id in possible_cpus array for every non explicitly mapped CPU to the first node. Issue is not visible to guest nor to mgmt interface due to 1) implictly mapped cpus are forced to the first node in case of partial mapping 2) in case of default mapping possible_cpu_arch_ids() is called after all -numa options are parsed (resulting in correct mapping). However it's fragile to rely on late execution of possible_cpu_arch_ids(), therefore add machine specific callback that returns node-id for CPU and use it to calculate/ set defaults at machine_numa_finish_init() time when all -numa options are parsed. Reported-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496314408-163972-1-git-send-email-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-19General warn report fixupsAlistair Francis
Tidy up some of the warn_report() messages after having converted them to use warn_report(). Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <9cb1d23551898c9c9a5f84da6773e99871285120.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19Convert multi-line fprintf() to warn_report()Alistair Francis
Convert all the multi-line uses of fprintf(stderr, "warning:"..."\n"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using these commands: find ./* -type f -exec sed -i \ 'N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N;N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + Indentation fixed up manually afterwards. Some of the lines were manually edited to reduce the line length to below 80 charecters. Some of the lines with newlines in the middle of the string were also manually edit to avoid checkpatch errrors. The #include lines were manually updated to allow the code to compile. Several of the warning messages can be improved after this patch, to keep this patch mechanical this has been moved into a later patch. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Anthony Perard <anthony.perard@citrix.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Yongbok Kim <yongbok.kim@imgtec.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Alexander Graf <agraf@suse.de> Cc: Jason Wang <jasowang@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <5def63849ca8f551630c6f2b45bcb1c482f765a6.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19Convert single line fprintf(.../n) to warn_report()Alistair Francis
Convert all the single line uses of fprintf(stderr, "warning:"..."\n"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using this command: find ./* -type f -exec sed -i \ 's|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig' \ {} + Some of the lines were manually edited to reduce the line length to below 80 charecters. The #include lines were manually updated to allow the code to compile. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael Roth <mdroth@linux.vnet.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Yongbok Kim <yongbok.kim@imgtec.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: James Hogan <james.hogan@imgtec.com> [mips] Message-Id: <ae8f8a7f0a88ded61743dff2adade21f8122a9e7.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19hw/i386: Improve some of the warning messagesAlistair Francis
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1d6ef2ccd9667878ed5820fcf17eef35957ea5d8.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19multiboot: validate multiboot header address valuesPrasad J Pandit
While loading kernel via multiboot-v1 image, (flags & 0x00010000) indicates that multiboot header contains valid addresses to load the kernel image. These addresses are used to compute kernel size and kernel text offset in the OS image. Validate these address values to avoid an OOB access issue. This is CVE-2017-14167. Reported-by: Thomas Garnier <thgarnie@google.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Message-Id: <20170907063256.7418-1-ppandit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19pc: use generic cpu_model parsingIgor Mammedov
define default CPU type in generic way in pc_machine_class_init() and let common machine code to handle cpu_model parsing Patch also introduces TARGET_DEFAULT_CPU_TYPE define for 2 purposes: * make foo_machine_class_init() look uniform on every target * use define in [bsd|linux]-user targets to pick default cpu type Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1505318697-77161-5-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-08intel_iommu: fix missing BQL in pt fast pathPeter Xu
In vtd_switch_address_space() we did the memory region switch, however it's possible that the caller of it has not taken the BQL at all. Make sure we have it. CC: Paolo Bonzini <pbonzini@redhat.com> CC: Jason Wang <jasowang@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-09-08hw/acpi: Move acpi_set_pci_info to pcihpAnthony PERARD
HW part of ACPI PCI hotplug in QEMU depends on ACPI_PCIHP_PROP_BSEL being set on a PCI bus that supports ACPI hotplug. It should work regardless of the source of ACPI tables (QEMU generator/legacy SeaBIOS/Xen). So move ACPI_PCIHP_PROP_BSEL initialization into HW ACPI implementation part from QEMU's ACPI table generator. To do PCI passthrough with Xen, the property ACPI_PCIHP_PROP_BSEL needs to be set, but this was done only when ACPI tables are built which is not needed for a Xen guest. The need for the property starts with commit "pc: pcihp: avoid adding ACPI_PCIHP_PROP_BSEL twice" (f0c9d64a68b776374ec4732424a3e27753ce37b6). Adding find_i440fx into stubs so that mips-softmmu target can be built. Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-09-08pc: add 2.11 machine typesMarcel Apfelbaum
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-08-31i386: replace g_malloc()+memcpy() with g_memdup()Marc-André Lureau
I found these pattern via grepping the source tree. I don't have a coccinelle script for it! Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-08-23numa: Move numa_legacy_auto_assign_ram to pc-i440fx-2.9Eduardo Habkost
The 'm->numa_auto_assign_ram = numa_legacy_auto_assign_ram;' line was supposed to be in pc_i440fx_2_9_machine_options() (see commit 3bfe5716 "numa: equally distribute memory on nodes"), but the merge commit adb354dd ("Merge remote-tracking branch 'mst/tags/for_upstream' into staging") moved it to the pc_i440fx_2_10_machine_options(). Move the line back to pc_i440fx_2_9_machine_options(). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-id: 20170818190943.23858-1-ehabkost@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-08-22hw/ppc/spapr: Fix segfault when instantiating a 'pc-dimm' without 'memdev'Thomas Huth
QEMU currently crashes when trying to use a 'pc-dimm' on the pseries machine without specifying its 'memdev' property. This happens because pc_dimm_get_memory_region() does not check whether the 'memdev' property has properly been set by the user. Looking closer at this function, it's also obvious that it is using &error_abort to call another function - and this is bad in a function that is used in the hot-plugging calling chain since this can also cause QEMU to exit unexpectedly. So let's fix these issues in a proper way now: Add a "Error **errp" parameter to pc_dimm_get_memory_region() which we use in case the 'memdev' property has not been set by the user, and which we can use instead of the &error_abort, and change the callers of get_memory_region() to make use of this "errp" parameter for proper error checking. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-08-08hw/i386: allow SHPC for Q35 machineAleksandr Bezzubikov
Unmask previously masked SHPC feature in _OSC method. Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-08-02pc: acpi: force FADT rev1 for 440fx based machine typesIgor Mammedov
w2k used to boot on QEMU until revision of FADT has been bumped to rev3 (commit 77af8a2b hw/i386: Use Rev3 FADT (ACPI 2.0) instead of Rev1 to improve guest OS support.) Keep PC machine at rev1 to remain compatible and Q35 at rev3 where w2k isn't supported anyway so OSX could run as well. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Tested-by: John Arbuckle <programmingkidx@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-08-02pc: make 'pc.rom' readonly when machine has PCI enabledIgor Mammedov
looking at bios ROM mapping in QEMU it seems that only isapc (i.e. not PCI enabled machine) requires ROM being mapped as RW in other cases BIOS is mapped as RO. Do the same for option ROM 'pc.rom' when machine has PCI enabled. As useful side-effect pc.rom MemoryRegion stops being put in vhost memory map (filtered out by vhost_section()), which reduces number of entries by 1. Coincidentally it fixes migration failure reported in "[PATCH V2] vhost: fix a migration failed because of vhost region merge" where following destination CLI with /sys/module/vhost/parameters/max_mem_regions = 8 export DIMMSCOUNT=6 QEMU -enable-kvm \ -netdev type=tap,id=guest0,vhost=on,script=no,vhostforce \ -device virtio-net-pci,netdev=guest0 \ -m 256,slots=256,maxmem=2G \ `i=0; while [ $i -lt $DIMMSCOUNT ]; do echo \ "-object memory-backend-ram,id=m$i,size=128M \ -device pc-dimm,id=d$i,memdev=m$i"; i=$(($i + 1)); \ done` will fail to startup with error: "-device pc-dimm,id=d5,memdev=m5: a used vhost backend has no free memory slots left" while it's possible to add the 6th DIMM during hotplug on source. Issue is caused by the fact that number of entries in vhost map is bigger on 1 entry, when -device is processed, than after guest boots up, and that offending entry belongs to 'pc.rom', it's not like vhost intends to do IO in ROM range so making it RO hides region from vhost and makes number of entries in vhost memory map at -device/machine_done time match number of entries after guest boots. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reported-by: Peng Hao <peng.hao2@zte.com.cn> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-08-02intel_iommu: use access_flags for iotlbPeter Xu
It was cached by read/write separately. Let's merge them. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>