aboutsummaryrefslogtreecommitdiff
path: root/hw/i386
AgeCommit message (Collapse)Author
2019-01-07hw: apply machine compat properties without touching globalsMarc-André Lureau
Similarly to accel properties, move compat properties out of globals registration, and apply the machine compat properties during device_post_init(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>
2019-01-07machines: replace COMPAT define with a static arrayMarc-André Lureau
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com>
2018-12-20x86-iommu: turn on IR by default if properPeter Xu
When the user didn't specify "intremap" for the IOMMU device, we turn it on by default if it is supported. This will turn IR on for the default Q35 platform as long as the IOMMU device is specified on new kernels. Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-20x86-iommu: switch intr_supported to OnOffAuto typePeter Xu
Switch the intr_supported variable from a boolean to OnOffAuto type so that we can know whether the user specified it or not. With that we'll have a chance to help the user to choose more wisely where possible. Introduce x86_iommu_ir_supported() to mask these changes. No functional change at all. Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-20q35: set split kernel irqchip as defaultPeter Xu
Starting from QEMU 4.0, let's specify "split" as the default value for kernel-irqchip. So for QEMU>=4.0 we'll have: allowed=Y,required=N,split=Y for QEMU<=3.1 we'll have: allowed=Y,required=N,split=N (omitting all the "kernel_irqchip_" prefix) Note that this will let the default q35 machine type to depend on Linux version 4.4 or newer because that's where split irqchip is introduced in kernel. But it's fine since we're boosting supported Linux version for QEMU 4.0 to around Linux 4.5. For more information please refer to the discussion on AMD's RDTSCP: https://lore.kernel.org/lkml/20181210181328.GA762@zn.tnic/ Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-20hw/i386: Remove deprecated machines pc-0.10 and pc-0.11Thomas Huth
They've been deprecated for two releases and nobody complained that they are still required anymore, so it's time to remove these now. And while we're at it, mark the other remaining old 0.x machine types as deprecated (since they can not properly be used for live-migration anyway). Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
2018-12-19hw: acpi: Export and share the ARM RSDP buildSamuel Ortiz
Now that build_rsdp() supports building both legacy and current RSDP tables, we can move it to a generic folder (hw/acpi) and have the i386 ACPI code reuse it in order to reduce code duplication. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com>
2018-12-19hw: i386: Use correct RSDT length for checksumIgor Mammedov
AcpiRsdpDescriptor describes revision 2 RSDP table so using sizeof(*rsdp) for checksum calculation isn't correct since we are adding extra 16 bytes. But acpi_data_push() zeroes out table, so just by luck we are summing up exta zeros which still yelds correct checksum. Fix it up by explicitly stating table size instead of using pointer arithmetics on stucture. PS: Extra 16 bytes are still wasted, but droping them will break migration for machines older than 2.3 due to size mismatch, for 2.3 and older it's not an issue since they are using resizable memory regions (a1666142d) for ACPI blobs. So keep wasting memory to avoid breaking old machines. Fixes: 72c194f7e (i386: ACPI table generation code from seabios) Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-19hw: acpi: The RSDP build API can return voidSamuel Ortiz
For both x86 and ARM architectures, the internal RSDP build API can return void as the current return value is unused. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-19intel_iommu: remove "x-" prefix for "aw-bits"Peter Xu
We're going to have 57bits aw-bits support sooner. It's possibly time to remove the "x-" prefix. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-19intel_iommu: dma read/write draining supportPeter Xu
Support DMA read/write draining should be easy for existing VT-d emulation since the emulation itself does not have any request queue there so we don't need to do anything to flush the un-commited queue. What we need to do is to declare the support. These capabilities are required to pass Windows SVVP test program. It is verified that when with parameters "x-aw-bits=48,caching-mode=off" we can pass the Windows SVVP test with this patch applied. Otherwise we'll fail with: IOMMU[0] - DWD (DMA write draining) not supported IOMMU[0] - DWD (DMA read draining) not supported Segment 0 has no DMA remapping capable IOMMU units However since these bits are not declared support for QEMU<=3.1, we'll need a compatibility bit for it and we turn this on by default only for QEMU>=4.0. Please refer to VT-d spec 6.5.4 for more information. CC: Yu Wang <wyu@redhat.com> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1654550 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-19intel_iommu: convert invalid traces into error reportsPeter Xu
Report more *_invalid() tracepoints to error_report_once() so that we can detect issues even without tracing enabled. Drop those tracepoints. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-19intel_iommu: dump correct iova when failedPeter Xu
The iotlb.iova can be zero if failure really happened. Dump the addr instead. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-19hw/smbios: Move to the hw/firmware/ subdirectoryPhilippe Mathieu-Daudé
SMBIOS is just another firmware interface used by some QEMU models. We will later introduce more firmware interfaces in this subdirectory. Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-18qmp: query-current-machine with wakeup-suspend-supportDaniel Henrique Barboza
When issuing the qmp/hmp 'system_wakeup' command, what happens in a nutshell is: - qmp_system_wakeup_request set runstate to RUNNING, sets a wakeup_reason and notify the event - in the main_loop, all vcpus are paused, a system reset is issued, all subscribers of wakeup_notifiers receives a notification, vcpus are then resumed and the wake up QAPI event is fired Note that this procedure alone doesn't ensure that the guest will awake from SUSPENDED state - the subscribers of the wake up event must take action to resume the guest, otherwise the guest will simply reboot. At this moment, only the ACPI machines via acpi_pm1_cnt_init and xen_hvm_init have wake-up from suspend support. However, only the presence of 'system_wakeup' is required for QGA to support 'guest-suspend-ram' and 'guest-suspend-hybrid' at this moment. This means that the user/management will expect to suspend the guest using one of those suspend commands and then resume execution using system_wakeup, regardless of the support offered in system_wakeup in the first place. This patch creates a new API called query-current-machine [1], that holds a new flag called 'wakeup-suspend-support' that indicates if the guest supports wake up from suspend via system_wakeup. The machine is considered to implement wake-up support if a call to a new 'qemu_register_wakeup_support' is made during its init, as it is now being done inside acpi_pm1_cnt_init and xen_hvm_init. This allows for any other machine type to declare wake-up support regardless of ACPI state or wakeup_notifiers subscription, making easier for newer implementations that might have their own mechanisms in the future. This is the expected output of query-current-machine when running a x86 guest: {"execute" : "query-current-machine"} {"return": {"wakeup-suspend-support": true}} Running the same x86 guest, but with the --no-acpi option: {"execute" : "query-current-machine"} {"return": {"wakeup-suspend-support": false}} This is the output when running a pseries guest: {"execute" : "query-current-machine"} {"return": {"wakeup-suspend-support": false}} With this extra tool, management can avoid situations where a guest that does not have proper suspend/wake capabilities ends up in inconsistent state (e.g. https://github.com/open-power-host-os/qemu/issues/31). [1] the decision of creating the query-current-machine API is based on discussions in the QEMU mailing list where it was decided that query-target wasn't a proper place to store the wake-up flag, neither was query-machines because this isn't a static property of the machine object. This new API can then be used to store other dynamic machine properties that are scattered around the code ATM. More info at: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg04235.html Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20181205194701.17836-2-danielhb413@gmail.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-12-14hw/i386/multiboot.c: Don't use load_image()Peter Maydell
The load_image() function is deprecated, as it does not let the caller specify how large the buffer to read the file into is. Instead use load_image_size(). While we are converting the code, add the missing error check. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20181130151712.2312-7-peter.maydell@linaro.org
2018-12-14hw/i386/pc.c: Don't use load_image()Peter Maydell
The load_image() function is deprecated, as it does not let the caller specify how large the buffer to read the file into is. Use the glib g_file_get_contents() function instead, which does the whole "allocate memory for the file and read it in" operation. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20181130151712.2312-6-peter.maydell@linaro.org
2018-12-11pc: Use default_machine_opts to set suppress_vmdescEduardo Habkost
Instead of setting suppress_vmdesc at instance_init time, set default_machine_opts on pc_i440fx_2_2_machine_options() to implement equivalent behavior. This will let us eliminate the need for pc_compat_*() functions for PC machine-types. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20181205205827.19387-6-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-12-11q35/440fx/arm/spapr: Add QEMU 4.0 machine typeAlex Williamson
Including all machine types that might have a pcie-root-port. Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <154394083644.28192.8501647946108201466.stgit@gimli.home> Reviewed-by: Eric Auger <eric.auger@redhat.com> [ehabkost: fixed accidental recursion at spapr_machine_3_1_class_options()] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-12-11i386: Rename bools in PCMachineState to end in _enabledCorey Minyard
This makes their function more clear and prevents conflicts when adding the actual devices to the machine state, if necessary. Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20181107152434.22219-1-minyard@acm.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-11-20hw/i386: add pc-i440fx-3.1 & pc-q35-3.1Marc-André Lureau
We have a couple of PC_COMPAT_3_0, so we should have 3.1 PC machines, and update the 3.0 machines to make use of those. Fixes a "Known issue" from https://wiki.qemu.org/Planning/3.1. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20181120132604.22854-1-marcandre.lureau@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-11-05x86_iommu/amd: Enable Guest virtual APIC supportSingh, Brijesh
Now that amd-iommu support interrupt remapping, enable the GASup in IVRS table and GASup in extended feature register to indicate that IOMMU support guest virtual APIC mode. GASup provides option to guest OS to make use of 128-bit IRTE. Note that the GAMSup is set to zero to indicate that amd-iommu does not support guest virtual APIC mode (aka AVIC) which would be used for the nested VMs. See Table 21 from IOMMU spec for interrupt virtualization controls Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05x86_iommu/amd: Add interrupt remap support when VAPIC is enabledSingh, Brijesh
Emulate the interrupt remapping support when guest virtual APIC is enabled. For more information refer: IOMMU spec rev 3.0 (section 2.2.5.2) When VAPIC is enabled, it uses interrupt remapping as defined in Table 22 and Figure 17 from IOMMU spec. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05i386: acpi: add IVHD device entry for IOAPICSingh, Brijesh
When interrupt remapping is enabled, add a special IVHD device (type IOAPIC). Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05x86_iommu/amd: Add interrupt remap support when VAPIC is not enabledSingh, Brijesh
Emulate the interrupt remapping support when guest virtual APIC is not enabled. For more info Refer: AMD IOMMU spec Rev 3.0 - section 2.2.5.1 When VAPIC is not enabled, it uses interrupt remapping as defined in Table 20 and Figure 15 from IOMMU spec. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Cc: Peter Xu <peterx@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05x86_iommu/amd: Prepare for interrupt remap supportSingh, Brijesh
Register the interrupt remapping callback and read/write ops for the amd-iommu-ir memory region. amd-iommu-ir is set to higher priority to ensure that this region won't be masked out by other memory regions. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Cc: Peter Xu <peterx@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05x86_iommu/amd: make the address space naming consistent with intel-iommuSingh, Brijesh
To be consistent with intel-iommu: - rename the address space to use '_' instead of '-' - update the memory region relationships Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05x86_iommu/amd: remove V=1 check from amdvi_validate_dte()Singh, Brijesh
Currently, the amdvi_validate_dte() assumes that a valid DTE will always have V=1. This is not true. The V=1 means that bit[127:1] are valid. A valid DTE can have IV=1 and V=0 (i.e address translation disabled and interrupt remapping enabled) Remove the V=1 check from amdvi_validate_dte(), make the caller responsible to check for V or IV bits. This also fixes a bug in existing code that when error is detected during the translation we'll fail the translation instead of assuming a passthrough mode. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05x86_iommu: move vtd_generate_msi_message in common fileSingh, Brijesh
The vtd_generate_msi_message() in intel-iommu is used to construct a MSI Message from IRQ. A similar function will be needed when we add interrupt remapping support in amd-iommu. Moving the function in common file to avoid the code duplication. Rename it to x86_iommu_irq_to_msi_message(). There is no logic changes in the code flow. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Suggested-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05x86_iommu: move the kernel-irqchip check in common codeSingh, Brijesh
Interrupt remapping needs kernel-irqchip={off|split} on both Intel and AMD platforms. Move the check in common place. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05intel_iommu: handle invalid ce for shadow syncPeter Xu
We should handle VTD_FR_CONTEXT_ENTRY_P properly when synchronizing shadow page tables. Having invalid context entry there is perfectly valid when we move a device out of an existing domain. When that happens, instead of posting an error we invalidate the whole region. Without this patch, QEMU will crash if we do these steps: (1) start QEMU with VT-d IOMMU and two 10G NICs (ixgbe) (2) bind the NICs with vfio-pci in the guest (3) start testpmd with the NICs applied (4) stop testpmd (5) rebind the NIC back to ixgbe kernel driver The patch should fix it. Reported-by: Pei Zhang <pezhang@redhat.com> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1627272 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05intel_iommu: move ce fetching out when sync shadowPeter Xu
There are two callers for vtd_sync_shadow_page_table_range(): one provided a valid context entry and one not. Move that fetching operation into the caller vtd_sync_shadow_page_table() where we need to fetch the context entry. Meanwhile, remove the error_report_once() directly since we're already tracing all the error cases in the previous call. Instead, return error number back to caller. This will not change anything functional since callers are dropping it after all. We do this move majorly because we want to do something more later in vtd_sync_shadow_page_table(). Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05intel_iommu: better handling of dmar state switchPeter Xu
QEMU is not handling the global DMAR switch well, especially when from "on" to "off". Let's first take the example of system reset. Assuming that a guest has IOMMU enabled. When it reboots, we will drop all the existing DMAR mappings to handle the system reset, however we'll still keep the existing memory layouts which has the IOMMU memory region enabled. So after the reboot and before the kernel reloads again, there will be no mapping at all for the host device. That's problematic since any software (for example, SeaBIOS) that runs earlier than the kernel after the reboot will assume the IOMMU is disabled, so any DMA from the software will fail. For example, a guest that boots on an assigned NVMe device might fail to find the boot device after a system reboot/reset and we'll be able to observe SeaBIOS errors if we capture the debugging log: WARNING - Timeout at nvme_wait:144! Meanwhile, we should see DMAR errors on the host of that NVMe device. It's the DMA fault that caused a NVMe driver timeout. The correct fix should be that we do proper switching of device DMA address spaces when system resets, which will setup correct memory regions and notify the backend of the devices. This might not affect much on non-assigned devices since QEMU VT-d emulation will assume a default passthrough mapping if DMAR is not enabled in the GCMD register (please refer to vtd_iommu_translate). However that's required for an assigned devices, since that'll rebuild the correct GPA to HPA mapping that is needed for any DMA operation during guest bootstrap. Besides the system reset, we have some other places that might change the global DMAR status and we'd better do the same thing there. For example, when we change the state of GCMD register, or the DMAR root pointer. Do the same refresh for all these places. For these two places we'll also need to explicitly invalidate the context entry cache and iotlb cache. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1625173 CC: QEMU Stable <qemu-stable@nongnu.org> Reported-by: Cong Li <coli@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> -- v2: - do the same for GCMD write, or root pointer update [Alex] - test is carried out by me this time, by observing the vtd_switch_address_space tracepoint after system reboot v3: - rewrite commit message as suggested by Alex Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-05intel_iommu: introduce vtd_reset_caches()Peter Xu
Provide the function and use it in vtd_init(). Used to reset both context entry cache and iotlb cache for the whole IOMMU unit. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-10-24pc-dimm: pass PCDIMMDevice to pc_dimm_.*plugDavid Hildenbrand
We're plugging/unplugging a PCDIMMDevice, so directly pass this type instead of a more generic DeviceState. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20181005092024.14344-5-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-10-19pc: Fix machine property nvdimm-persistence error handlingMarkus Armbruster
Calling error_report() in a function that takes an Error ** argument is suspicious. pc.c's pc_machine_set_nvdimm_persistence() does that, and then exit()s. Wrong. Attempting to set machine property nvdimm-persistence to a bad value instantly kills the VM: $ qemu-system-x86_64 -nodefaults -S -display none -qmp stdio {"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 3}, "package": "v3.0.0-837-gc5e4e49258"}, "capabilities": []}} {"execute": "qmp_capabilities"} {"return": {}} {"execute": "qom-set", "arguments": {"path": "/machine", "property": "nvdimm-persistence", "value": "instadeath"}} -machine nvdimm-persistence=instadeath: unsupported option $ echo $? 1 Broken when commit 11c39b5cd96 (v3.0.0) replaced error_propagate(); return by error_report(); exit() instead of error_setg(); return. Fix that. Fixes: 11c39b5cd966ddc067a1ca0c5392ec9b666c45b7 Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20181017082702.5581-10-armbru@redhat.com>
2018-10-02kvmclock: run KVM_KVMCLOCK_CTRL ioctl in vcpu threadYongji Xie
According to KVM API Documentation, we should only run vcpu ioctls from the same thread that was used to create the vcpu. This patch makes KVM_KVMCLOCK_CTRL ioctl consistent with the Documentation. No functional change. Signed-off-by: Yongji Xie <xieyongji@baidu.com> Signed-off-by: Chai Wen <chaiwen@baidu.com> Message-Id: <1531315364-2551-1-git-send-email-xieyongji@baidu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Yongji Xie <elohimes@gmail.com>
2018-10-02change get_image_size return type to int64_tLi Zhijian
Previously, if the size of initrd >=2G, qemu exits with error: root@haswell-OptiPlex-9020:/home/lizj# /home/lizhijian/lkp/qemu-colo/x86_64-softmmu/qemu-system-x86_64 -kernel ./vmlinuz-4.16.0-rc4 -initrd large.cgz -nographic qemu: error reading initrd large.cgz: No such file or directory root@haswell-OptiPlex-9020:/home/lizj# du -sh large.cgz 2.5G large.cgz this patch changes the caller side that use this function to calculate size of initrd file as well. v2: update error message and int64_t printing format Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <1536833233-14121-1-git-send-email-lizhijian@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-25Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2018-09-24' ↵Peter Maydell
into staging Error reporting & miscellaneous patches for 2018-09-24 # gpg: Signature made Mon 24 Sep 2018 16:16:50 BST # gpg: using RSA key 3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-error-2018-09-24: MAINTAINERS: Fix F: patterns that don't match anything Drop "qemu:" prefix from error_report() arguments qemu-error: make use of {error, warn}_report_once_cond qemu-error: add {error, warn}_report_once_cond Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-24Drop "qemu:" prefix from error_report() argumentsMao Zhongyi
error_report and friends already add a "qemu-system-xxx" prefix to the string, so a "qemu:" prefix is redundant in the string. Just drop it. Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1537495530-580-1-git-send-email-maozhongyi@cmss.chinamobile.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-09-07pc: acpi: revert back to 1 SRAT entry for hotpluggable areaIgor Mammedov
Commit 10efd7e108 "pc: acpi: fix memory hotplug regression by reducing stub SRAT entry size" attemped to fix hotplug regression introduced by 848a1cc1e "hw/acpi-build: build SRAT memory affinity structures for DIMM devices" fixed issue for Windows/3.0+ linux kernels, however it regressed 2.6 based kernels (RHEL6) to the point where guest might crash at boot. Reason is that 2.6 kernel discards SRAT table due too small last entry which down the road leads to crashes. Hack I've tried in 10efd7e108 is also not ACPI spec compliant according to which whole possible RAM should be described in SRAT. Revert 10efd7e108 to fix regression for 2.6 based kernels. With 10efd7e108 reverted, I've also tried splitting SRAT table statically in different ways %/node and %/slot but Windows still fails to online 2nd pc-dimm hot-plugged into node 0 (as described in 10efd7e108) and sometimes even coldplugged pc-dimms where affected with static SRAT partitioning. The only known so far way where Windows stays happy is when we have 1 SRAT entry in the last node covering all hotplug area. Revert 848a1cc1e until we come up with a way to avoid regression on Windows with hotplug area split in several entries. Tested this with 2.6/3.0 based kernels (RHEL6/7) and WS20[08/12/12R2/16]). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-08-27intel-iommu: replace more vtd_err_* tracesPeter Xu
Replace all the trace_vtd_err_*() hooks with the new error_report_once() since they are similar to trace_vtd_err() - dumping the first error would be mostly enough, then we have them on by default too. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180815095328.32414-4-peterx@redhat.com> [Use "%x" instead of "%" PRIx16 to print uint16_t, whitespace tidied up] Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-08-27intel-iommu: start to use error_report_oncePeter Xu
Replace existing trace_vtd_err() with error_report_once() then stderr will capture something if any of the error happens, meanwhile we don't suffer from any DDOS. Then remove the trace point. Since at it, provide more information where proper (now we can pass parameters into the report function). Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180815095328.32414-3-peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> [Two format strings fixed, whitespace tidied up] Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-08-23pc-dimm: assign and verify the "addr" property during pre_plugDavid Hildenbrand
We can assign and verify the address before realizing and trying to plug. reading/writing the address property should never fail for DIMMs, so let's reduce error handling a bit by using &error_abort. Getting access to the memory region now might however fail. So forward errors from get_memory_region() properly. As all memory devices should use the alignment of the underlying memory region for guest physical address asignment, do detection of the alignment in pc_dimm_pre_plug(), but allow pc.c to overwrite the alignment for compatibility handling. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180801133444.11269-5-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-23pc: drop memory region alignment check for 0David Hildenbrand
All applicable memory regions always have an alignment > 0. All memory backends result in file_ram_alloc() or qemu_anon_ram_alloc() getting called, setting the alignment to > 0. So a PCDIMM memory region always has an alignment > 0. NVDIMM copy the alignment of the original memory memory region into the handcrafted memory region that will be used at this place. So the check for 0 can be dropped and we can reduce the special handling. Dropping this check makes factoring out of alignment handling easier as compat handling only has to look at pcmc->enforce_aligned_dimm and not care about the alignment of the memory region. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180801133444.11269-4-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-23pc-dimm: assign and verify the "slot" property during pre_plugDavid Hildenbrand
We can assign and verify the slot before realizing and trying to plug. reading/writing the slot property should never fail, so let's reduce error handling a bit by using &error_abort. To do this during pre_plug, add and use (x86, ppc) pc_dimm_pre_plug(). Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180801133444.11269-2-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-03pc: acpi: fix memory hotplug regression by reducing stub SRAT entry sizeIgor Mammedov
Commit 848a1cc1e (hw/acpi-build: build SRAT memory affinity structures for DIMM devices) broke the first dimm hotplug in following cases: 1: there is no coldplugged dimm in the last numa node but there is a coldplugged dimm in another node -m 4096,slots=4,maxmem=32G \ -object memory-backend-ram,id=m0,size=2G \ -device pc-dimm,memdev=m0,node=0 \ -numa node,nodeid=0 \ -numa node,nodeid=1 2: if order of dimms on CLI is: 1st plugged dimm in node1 2nd plugged dimm in node0 -m 4096,slots=4,maxmem=32G \ -object memory-backend-ram,size=2G,id=m0 \ -device pc-dimm,memdev=m0,node=1 \ -object memory-backend-ram,id=m1,size=2G \ -device pc-dimm,memdev=m1,node=0 \ -numa node,nodeid=0 \ -numa node,nodeid=1 (qemu) object_add memory-backend-ram,id=m2,size=1G (qemu) device_add pc-dimm,memdev=m2,node=0 the first DIMM hotplug to any node except the last one fails (Windows is unable to online it). Length reduction of stub hotplug memory SRAT entry, fixes issue for some reason. RHBZ: 1609234 Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-08-03hw/acpi-build: Add a check for memory-less NUMA nodesDou Liyang
Currently, Qemu ACPI builder doesn't consider the memory-less NUMA nodes, eg: -m 4G,slots=4,maxmem=8G \ -numa node,nodeid=0 \ -numa node,nodeid=1,mem=2G \ -numa node,nodeid=2,mem=2G \ -numa node,nodeid=3\ Guest Linux will report [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0xffffffffffffffff] [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x00000000-0x0009ffff] [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x00100000-0x7fffffff] [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x80000000-0xbfffffff] [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x100000000-0x13fffffff] [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0x140000000-0x13fffffff] [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0x140000000-0x33fffffff] hotplug [mem 0x00000000-0xffffffffffffffff] and [mem 0x140000000-0x13fffffff] are bogus. Add a check to avoid building srat memory for memory-less NUMA nodes, also update the test file. Now the info in guest linux will be [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x00000000-0x0009ffff] [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x00100000-0x7fffffff] [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x80000000-0xbfffffff] [ 0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x100000000-0x13fffffff] [ 0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0x140000000-0x33fffffff] hotplug Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-07-17i386: only parse the initrd_filename once for multiboot modulesDaniel P. Berrangé
The multiboot code parses the initrd_filename twice, first to count how many entries there are, and second to process each entry. This changes the first loop to store the parse module names in a list, and the second loop can now use these names. This avoids having to pass NULL to the get_opt_value() method which means it can safely assume a non-NULL param. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20180514171913.17664-3-berrange@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Tested-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-17i386: fix regression parsing multiboot initrd modulesDaniel P. Berrangé
The logic for parsing the multiboot initrd modules was messed up in commit 950c4e6c94b15cd0d8b63891dddd7a8dbf458e6a Author: Daniel P. Berrangé <berrange@redhat.com> Date: Mon Apr 16 12:17:43 2018 +0100 opts: don't silently truncate long option values Causing the length to be undercounter, and the number of modules over counted. It also passes NULL to get_opt_value() which was not robust at accepting a NULL value. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20180514171913.17664-2-berrange@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Tested-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>