aboutsummaryrefslogtreecommitdiff
path: root/hw
AgeCommit message (Collapse)Author
2021-03-22acpi: Set proper maximum size for "etc/acpi/rsdp" blobDavid Hildenbrand
Let's also set a maximum size for "etc/acpi/rsdp", so the maximum size doesn't get implicitly set based on the initial table size. In my experiments, the table size was in the range of 22 bytes, so a single page (== what we used until now) seems to be good enough. Now that we have defined maximum sizes for all currently used table types, let's assert that we catch usage with new tables that need a proper maximum size definition. Also assert that our initial size does not exceed the maximum size; while qemu_ram_alloc_internal() properly asserts that the initial RAMBlock size is <= its maximum size, the result might differ when the host page size is bigger than 4k. Suggested-by: Laszlo Ersek <lersek@redhat.com> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Shannon Zhao <shannon.zhaosl@gmail.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210304105554.121674-5-david@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22acpi: Move maximum size logic into acpi_add_rom_blob()David Hildenbrand
We want to have safety margins for all tables based on the table type. Let's move the maximum size logic into acpi_add_rom_blob() and make it dependent on the table name, so we don't have to replicate for each and every instance that creates such tables. Suggested-by: Laszlo Ersek <lersek@redhat.com> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Shannon Zhao <shannon.zhaosl@gmail.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210304105554.121674-4-david@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22microvm: Don't open-code "etc/table-loader"David Hildenbrand
Let's just reuse ACPI_BUILD_LOADER_FILE. Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Shannon Zhao <shannon.zhaosl@gmail.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210304105554.121674-3-david@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22acpi: Set proper maximum size for "etc/table-loader" blobDavid Hildenbrand
The resizeable memory region / RAMBlock that is created for the cmd blob has a maximum size of whole host pages (e.g., 4k), because RAMBlocks work on full host pages. In addition, in i386 ACPI code: acpi_align_size(tables->linker->cmd_blob, ACPI_BUILD_ALIGN_SIZE); makes sure to align to multiples of 4k, padding with 0. For example, if our cmd_blob is created with a size of 2k, the maximum size is 4k - we cannot grow beyond that. Growing might be required due to guest action when rebuilding the tables, but also on incoming migration. This automatic generation of the maximum size used to be sufficient, however, there are cases where we cross host pages now when growing at runtime: we exceed the maximum size of the RAMBlock and can crash QEMU when trying to resize the resizeable memory region / RAMBlock: $ build/qemu-system-x86_64 --enable-kvm \ -machine q35,nvdimm=on \ -smp 1 \ -cpu host \ -m size=2G,slots=8,maxmem=4G \ -object memory-backend-file,id=mem0,mem-path=/tmp/nvdimm,size=256M \ -device nvdimm,label-size=131072,memdev=mem0,id=nvdimm0,slot=1 \ -nodefaults \ -device vmgenid \ -device intel-iommu Results in: Unexpected error in qemu_ram_resize() at ../softmmu/physmem.c:1850: qemu-system-x86_64: Size too large: /rom@etc/table-loader: 0x2000 > 0x1000: Invalid argument In this configuration, we consume exactly 4k (32 entries, 128 bytes each) when creating the VM. However, once the guest boots up and maps the MCFG, we also create the MCFG table and end up consuming 2 additional entries (pointer + checksum) -- which is where we try resizing the memory region / RAMBlock, however, the maximum size does not allow for it. Currently, we get the following maximum sizes for our different mutable tables based on behavior of resizeable RAMBlock: hw table max_size ------- --------------------------------------------------------- virt "etc/acpi/tables" ACPI_BUILD_TABLE_MAX_SIZE (0x200000) virt "etc/table-loader" HOST_PAGE_ALIGN(initial_size) virt "etc/acpi/rsdp" HOST_PAGE_ALIGN(initial_size) i386 "etc/acpi/tables" ACPI_BUILD_TABLE_MAX_SIZE (0x200000) i386 "etc/table-loader" HOST_PAGE_ALIGN(initial_size) i386 "etc/acpi/rsdp" HOST_PAGE_ALIGN(initial_size) microvm "etc/acpi/tables" ACPI_BUILD_TABLE_MAX_SIZE (0x200000) microvm "etc/table-loader" HOST_PAGE_ALIGN(initial_size) microvm "etc/acpi/rsdp" HOST_PAGE_ALIGN(initial_size) Let's set the maximum table size for "etc/table-loader" to 64k, so we can properly grow at runtime, which should be good enough for the future. Migration is not concerned with the maximum size of a RAMBlock, only with the used size - so existing setups are not affected. Of course, we cannot migrate a VM that would have crash when started on older QEMU from new QEMU to older QEMU without failing early on the destination when synchronizing the RAM state: qemu-system-x86_64: Size too large: /rom@etc/table-loader: 0x2000 > 0x1000: Invalid argument qemu-system-x86_64: error while loading state for instance 0x0 of device 'ram' qemu-system-x86_64: load of migration failed: Invalid argument We'll refactor the code next, to make sure we get rid of this implicit behavior for "etc/acpi/rsdp" as well and to make the code easier to grasp. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Shannon Zhao <shannon.zhaosl@gmail.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Laszlo Ersek <lersek@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210304105554.121674-2-david@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22pci: acpi: add _DSM method to PCI devicesIgor Mammedov
Implement _DSM according to: PCI Firmware Specification 3.1 4.6.7. DSM for Naming a PCI or PCI Express Device Under Operating Systems and wire it up to cold and hot-plugged PCI devices. Feature depends on ACPI hotplug being enabled (as that provides PCI devices descriptions in ACPI and MMIO registers that are reused to fetch acpi-index). acpi-index should work for - cold plugged NICs: $QEMU -device e1000,acpi-index=100 => 'eno100' - hot-plugged (monitor) device_add e1000,acpi-index=200,id=remove_me => 'eno200' - re-plugged (monitor) device_del remove_me (monitor) device_add e1000,acpi-index=1 => 'eno1' Windows also sees index under "PCI Label Id" field in properties dialog but otherwise it doesn't seem to have any effect. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-6-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22acpi: add aml_to_decimalstring() and aml_call6() helpersIgor Mammedov
it will be used by follow up patches Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-5-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22pci: acpi: ensure that acpi-index is uniqueIgor Mammedov
it helps to avoid device naming conflicts when guest OS is configured to use acpi-index for naming. Spec ialso says so: PCI Firmware Specification Revision 3.2 4.6.7. _DSM for Naming a PCI or PCI Express Device Under Operating Systems " Instance number must be unique under \_SB scope. This instance number does not have to be sequential in a given system configuration. " Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-4-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22pci: introduce acpi-index property for PCI deviceIgor Mammedov
In x86/ACPI world, linux distros are using predictable network interface naming since systemd v197. Which on QEMU based VMs results into path based naming scheme, that names network interfaces based on PCI topology. With itm on has to plug NIC in exactly the same bus/slot, which was used when disk image was first provisioned/configured or one risks to loose network configuration due to NIC being renamed to actually used topology. That also restricts freedom to reshape PCI configuration of VM without need to reconfigure used guest image. systemd also offers "onboard" naming scheme which is preferred over PCI slot/topology one, provided that firmware implements: " PCI Firmware Specification 3.1 4.6.7. DSM for Naming a PCI or PCI Express Device Under Operating Systems " that allows to assign user defined index to PCI device, which systemd will use to name NIC. For example, using -device e1000,acpi-index=100 guest will rename NIC to 'eno100', where 'eno' is default prefix for "onboard" naming scheme. This doesn't require any advance configuration on guest side to com in effect at 'onboard' scheme takes priority over path based naming. Hope is that 'acpi-index' it will be easier to consume by management layer, compared to forcing specific PCI topology and/or having several disk image templates for different topologies and will help to simplify process of spawning VM from the same template without need to reconfigure guest NIC. This patch adds, 'acpi-index'* property and wires up a 32bit register on top of pci hotplug register block to pass index value to AML code at runtime. Following patch will add corresponding _DSM code and wire it up to PCI devices described in ACPI. *) name comes from linux kernel terminology Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20210315180102.3008391-3-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22hw/sd: sdhci: Reset the data pointer of s->fifo_buffer[] when a different ↵Bin Meng
block size is programmed If the block size is programmed to a different value from the previous one, reset the data pointer of s->fifo_buffer[] so that s->fifo_buffer[] can be filled in using the new block size in the next transfer. With this fix, the following reproducer: outl 0xcf8 0x80001010 outl 0xcfc 0xe0000000 outl 0xcf8 0x80001001 outl 0xcfc 0x06000000 write 0xe000002c 0x1 0x05 write 0xe0000005 0x1 0x02 write 0xe0000007 0x1 0x01 write 0xe0000028 0x1 0x10 write 0x0 0x1 0x23 write 0x2 0x1 0x08 write 0xe000000c 0x1 0x01 write 0xe000000e 0x1 0x20 write 0xe000000f 0x1 0x00 write 0xe000000c 0x1 0x32 write 0xe0000004 0x2 0x0200 write 0xe0000028 0x1 0x00 write 0xe0000003 0x1 0x40 cannot be reproduced with the following QEMU command line: $ qemu-system-x86_64 -nographic -machine accel=qtest -m 512M \ -nodefaults -device sdhci-pci,sd-spec-version=3 \ -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \ -device sd-card,drive=mydrive -qtest stdio Cc: qemu-stable@nongnu.org Fixes: CVE-2020-17380 Fixes: CVE-2020-25085 Fixes: CVE-2021-3409 Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller") Reported-by: Alexander Bulekov <alxndr@bu.edu> Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum) Reported-by: Sergej Schumilo (Ruhr-Universität Bochum) Reported-by: Simon Wörner (Ruhr-Universität Bochum) Buglink: https://bugs.launchpad.net/qemu/+bug/1892960 Buglink: https://bugs.launchpad.net/qemu/+bug/1909418 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146 Tested-by: Alexander Bulekov <alxndr@bu.edu> Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210303122639.20004-6-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22hw/sd: sdhci: Limit block size only when SDHC_BLKSIZE register is writableBin Meng
The codes to limit the maximum block size is only necessary when SDHC_BLKSIZE register is writable. Tested-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210303122639.20004-5-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22hw/sd: sdhci: Correctly set the controller status for ADMABin Meng
When an ADMA transfer is started, the codes forget to set the controller status to indicate a transfer is in progress. With this fix, the following 2 reproducers: https://paste.debian.net/plain/1185136 https://paste.debian.net/plain/1185141 cannot be reproduced with the following QEMU command line: $ qemu-system-x86_64 -nographic -machine accel=qtest -m 512M \ -nodefaults -device sdhci-pci,sd-spec-version=3 \ -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \ -device sd-card,drive=mydrive -qtest stdio Cc: qemu-stable@nongnu.org Fixes: CVE-2020-17380 Fixes: CVE-2020-25085 Fixes: CVE-2021-3409 Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller") Reported-by: Alexander Bulekov <alxndr@bu.edu> Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum) Reported-by: Sergej Schumilo (Ruhr-Universität Bochum) Reported-by: Simon Wörner (Ruhr-Universität Bochum) Buglink: https://bugs.launchpad.net/qemu/+bug/1892960 Buglink: https://bugs.launchpad.net/qemu/+bug/1909418 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146 Tested-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210303122639.20004-4-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22hw/sd: sdhci: Don't write to SDHC_SYSAD register when transfer is in progressBin Meng
Per "SD Host Controller Standard Specification Version 7.00" chapter 2.2.1 SDMA System Address Register: This register can be accessed only if no transaction is executing (i.e., after a transaction has stopped). With this fix, the following reproducer: outl 0xcf8 0x80001010 outl 0xcfc 0xfbefff00 outl 0xcf8 0x80001001 outl 0xcfc 0x06000000 write 0xfbefff2c 0x1 0x05 write 0xfbefff0f 0x1 0x37 write 0xfbefff0a 0x1 0x01 write 0xfbefff0f 0x1 0x29 write 0xfbefff0f 0x1 0x02 write 0xfbefff0f 0x1 0x03 write 0xfbefff04 0x1 0x01 write 0xfbefff05 0x1 0x01 write 0xfbefff07 0x1 0x02 write 0xfbefff0c 0x1 0x33 write 0xfbefff0e 0x1 0x20 write 0xfbefff0f 0x1 0x00 write 0xfbefff2a 0x1 0x01 write 0xfbefff0c 0x1 0x00 write 0xfbefff03 0x1 0x00 write 0xfbefff05 0x1 0x00 write 0xfbefff2a 0x1 0x02 write 0xfbefff0c 0x1 0x32 write 0xfbefff01 0x1 0x01 write 0xfbefff02 0x1 0x01 write 0xfbefff03 0x1 0x01 cannot be reproduced with the following QEMU command line: $ qemu-system-x86_64 -nographic -machine accel=qtest -m 512M \ -nodefaults -device sdhci-pci,sd-spec-version=3 \ -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \ -device sd-card,drive=mydrive -qtest stdio Cc: qemu-stable@nongnu.org Fixes: CVE-2020-17380 Fixes: CVE-2020-25085 Fixes: CVE-2021-3409 Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller") Reported-by: Alexander Bulekov <alxndr@bu.edu> Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum) Reported-by: Sergej Schumilo (Ruhr-Universität Bochum) Reported-by: Simon Wörner (Ruhr-Universität Bochum) Buglink: https://bugs.launchpad.net/qemu/+bug/1892960 Buglink: https://bugs.launchpad.net/qemu/+bug/1909418 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146 Tested-by: Alexander Bulekov <alxndr@bu.edu> Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210303122639.20004-3-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22hw/sd: sdhci: Don't transfer any data when command time outBin Meng
At the end of sdhci_send_command(), it starts a data transfer if the command register indicates data is associated. But the data transfer should only be initiated when the command execution has succeeded. With this fix, the following reproducer: outl 0xcf8 0x80001810 outl 0xcfc 0xe1068000 outl 0xcf8 0x80001804 outw 0xcfc 0x7 write 0xe106802c 0x1 0x0f write 0xe1068004 0xc 0x2801d10101fffffbff28a384 write 0xe106800c 0x1f 0x9dacbbcad9e8f7061524334251606f7e8d9cabbac9d8e7f60514233241505f write 0xe1068003 0x28 0x80d000251480d000252280d000253080d000253e80d000254c80d000255a80d000256880d0002576 write 0xe1068003 0x1 0xfe cannot be reproduced with the following QEMU command line: $ qemu-system-x86_64 -nographic -M pc-q35-5.0 \ -device sdhci-pci,sd-spec-version=3 \ -drive if=sd,index=0,file=null-co://,format=raw,id=mydrive \ -device sd-card,drive=mydrive \ -monitor none -serial none -qtest stdio Cc: qemu-stable@nongnu.org Fixes: CVE-2020-17380 Fixes: CVE-2020-25085 Fixes: CVE-2021-3409 Fixes: d7dfca0807a0 ("hw/sdhci: introduce standard SD host controller") Reported-by: Alexander Bulekov <alxndr@bu.edu> Reported-by: Cornelius Aschermann (Ruhr-Universität Bochum) Reported-by: Sergej Schumilo (Ruhr-Universität Bochum) Reported-by: Simon Wörner (Ruhr-Universität Bochum) Buglink: https://bugs.launchpad.net/qemu/+bug/1892960 Buglink: https://bugs.launchpad.net/qemu/+bug/1909418 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1928146 Acked-by: Alistair Francis <alistair.francis@wdc.com> Tested-by: Alexander Bulekov <alxndr@bu.edu> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210303122639.20004-2-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22hw/sd: sd: Actually perform the erase operationBin Meng
At present the sd_erase() does not erase the requested range of card data to 0xFFs. Let's make the erase operation actually happen. Signed-off-by: Bin Meng <bin.meng@windriver.com> Message-Id: <1613811493-58815-1-git-send-email-bmeng.cn@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22hw/sd: sd: Fix build error when DEBUG_SD is onBin Meng
"qemu-common.h" should be included to provide the forward declaration of qemu_hexdump() when DEBUG_SD is on. Signed-off-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20210228050609.24779-1-bmeng.cn@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-03-22virtio-pmem: fix virtio_pmem_resp assign problemWang Liang
ret in virtio_pmem_resp is a uint32_t variable, which should be assigned using virtio_stl_p. The kernel side driver does not guarantee virtio_pmem_resp to be initialized to zero in advance, So sometimes the flush operation will fail. Signed-off-by: Wang Liang <wangliangzz@inspur.com> Message-Id: <20210317024145.271212-1-wangliangzz@126.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22vhost-user: Monitor slave channel in vhost_user_read()Greg Kurz
Now that everything is in place, have the nested event loop to monitor the slave channel. The source in the main event loop is destroyed and recreated to ensure any pending even for the slave channel that was previously detected is purged. This guarantees that the main loop wont invoke slave_read() based on an event that was already handled by the nested loop. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312092212.782255-7-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22vhost-user: Introduce nested event loop in vhost_user_read()Greg Kurz
A deadlock condition potentially exists if a vhost-user process needs to request something to QEMU on the slave channel while processing a vhost-user message. This doesn't seem to affect any vhost-user implementation so far, but this is currently biting the upcoming enablement of DAX with virtio-fs. The issue is being observed when the guest does an emergency reboot while a mapping still exits in the DAX window, which is very easy to get with a busy enough workload (e.g. as simulated by blogbench [1]) : - QEMU sends VHOST_USER_GET_VRING_BASE to virtiofsd. - In order to complete the request, virtiofsd then asks QEMU to remove the mapping on the slave channel. All these dialogs are synchronous, hence the deadlock. As pointed out by Stefan Hajnoczi: When QEMU's vhost-user master implementation sends a vhost-user protocol message, vhost_user_read() does a "blocking" read during which slave_fd is not monitored by QEMU. The natural solution for this issue is an event loop. The main event loop cannot be nested though since we have no guarantees that its fd handlers are prepared for re-entrancy. Introduce a new event loop that only monitors the chardev I/O for now in vhost_user_read() and push the actual reading to a one-shot handler. A subsequent patch will teach the loop to monitor and process messages from the slave channel as well. [1] https://github.com/jedisct1/Blogbench Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312092212.782255-6-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22vhost-user: Convert slave channel to QIOChannelSocketGreg Kurz
The slave channel is implemented with socketpair() : QEMU creates the pair, passes one of the socket to virtiofsd and monitors the other one with the main event loop using qemu_set_fd_handler(). In order to fix a potential deadlock between QEMU and a vhost-user external process (e.g. virtiofsd with DAX), we want to be able to monitor and service the slave channel while handling vhost-user requests. Prepare ground for this by converting the slave channel to be a QIOChannelSocket. This will make monitoring of the slave channel as simple as calling qio_channel_add_watch_source(). Since the connection is already established between the two sockets, only incoming I/O (G_IO_IN) and disconnect (G_IO_HUP) need to be serviced. This also allows to get rid of the ancillary data parsing since QIOChannelSocket can do this for us. Note that the MSG_CTRUNC check is dropped on the way because QIOChannelSocket ignores this case. This isn't a problem since slave_read() provisions space for 8 file descriptors, but affected vhost-user slave protocol messages generally only convey one. If for some reason a buggy implementation passes more file descriptors, no need to break the connection, just like we don't break it if some other type of ancillary data is received : this isn't explicitely violating the protocol per-se so it seems better to ignore it. The current code errors out on short reads and writes. Use the qio_channel_*_all() variants to address this on the way. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312092212.782255-5-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22vhost-user: Factor out duplicated slave_fd teardown codeGreg Kurz
Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312092212.782255-4-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22vhost-user: Fix double-close on slave_read() error pathGreg Kurz
Some message types, e.g. VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG, can convey file descriptors. These must be closed before returning from slave_read() to avoid being leaked. This can currently be done in two different places: [1] just after the request has been processed [2] on the error path, under the goto label err: These path are supposed to be mutually exclusive but they are not actually. If the VHOST_USER_NEED_REPLY_MASK flag was passed and the sending of the reply fails, both [1] and [2] are performed with the same descriptor values. This can potentially cause subtle bugs if one of the descriptor was recycled by some other thread in the meantime. This code duplication complicates rollback for no real good benefit. Do the closing in a unique place, under a new fdcleanup: goto label at the end of the function. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312092212.782255-3-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22vhost-user: Drop misleading EAGAIN checks in slave_read()Greg Kurz
slave_read() checks EAGAIN when reading or writing to the socket fails. This gives the impression that the slave channel is in non-blocking mode, which is certainly not the case with the current code base. And the rest of the code isn't actually ready to cope with non-blocking I/O. Just drop the checks everywhere in this function for the sake of clarity. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312092212.782255-2-groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-03-22virtio: Fix virtio_mmio_read()/virtio_mmio_write()Laurent Vivier
Both functions don't check the personality of the interface (legacy or modern) before accessing the configuration memory and always use virtio_config_readX()/virtio_config_writeX(). With this patch, they now check the personality and in legacy mode call virtio_config_readX()/virtio_config_writeX(), otherwise call virtio_config_modern_readX()/virtio_config_modern_writeX(). This change has been tested with virtio-mmio guests (virt stretch/armhf and virt sid/m68k) and virtio-pci guests (pseries RHEL-7.3/ppc64 and /ppc64le). Signed-off-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20210314200300.3259170-1-laurent@vivier.eu> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-03-22hw/net: virtio-net: Initialize nc->do_not_pad to trueBin Meng
For virtio-net, there is no need to pad the Ethernet frame size to 60 bytes before sending to it. Signed-off-by: Bin Meng <bmeng.cn@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2021-03-19Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into ↵Peter Maydell
staging * fixes for i386 TCG paging * fixes for Hyper-V enlightenments * avoid uninitialized variable warning # gpg: Signature made Fri 19 Mar 2021 14:38:12 GMT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: tests/qtest: cleanup the testcase for bug 1878642 hw/intc/i8259: Refactor pic_read_irq() to avoid uninitialized variable i386: Make migration fail when Hyper-V reenlightenment was enabled but 'user_tsc_khz' is unset i386: Fix 'hypercall_hypercall' typo target/i386: svm: do not discard high 32 bits of EXITINFO1 target/i386: fail if toggling LA57 in 64-bit mode target/i386: allow modifying TCG phys-addr-bits qom: use qemu_printf to print help for user-creatable objects Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-19hw: Replace anti-social QOM type namesMarkus Armbruster
Several QOM type names contain ',': ARM,bitband-memory etraxfs,pic etraxfs,serial etraxfs,timer fsl,imx25 fsl,imx31 fsl,imx6 fsl,imx6ul fsl,imx7 grlib,ahbpnp grlib,apbpnp grlib,apbuart grlib,gptimer grlib,irqmp qemu,register SUNW,bpp SUNW,CS4231 SUNW,DBRI SUNW,DBRI.prom SUNW,fdtwo SUNW,sx SUNW,tcx xilinx,zynq_slcr xlnx,zynqmp xlnx,zynqmp-pmu-soc xlnx,zynq-xadc These are all device types. They can't be plugged with -device / device_add, except for xlnx,zynqmp-pmu-soc, and I doubt that one actually works. They *can* be used with -device / device_add to request help. Usability is poor, though: you have to double the comma, like this: $ qemu-system-x86_64 -device SUNW,,fdtwo,help Trap for the unwary. The fact that this was broken in device-introspect-test for more than six years until commit e27bd49876 fixed it demonstrates that "the unwary" includes seasoned developers. One QOM type name contains ' ': "ICH9 SMB". Because having to remember just one way to quote would be too easy. Rename the "SUNW,FOO types to "sun-FOO". Summarily replace ',' and ' ' by '-' in the other type names. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210304140229.575481-2-armbru@redhat.com> Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-19fdc: Inline fdctrl_connect_drives() into fdctrl_realize_common()Markus Armbruster
The previous commit rendered the name fdctrl_connect_drives() somewhat misleading. Get rid of it by inlining the (now pretty simple) function into its only caller. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20210309161214.1402527-4-armbru@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2021-03-19fdc: Drop deprecated floppy configurationMarkus Armbruster
Drop the crap deprecated in commit 4a27a638e7 "fdc: Deprecate configuring floppies with -global isa-fdc" (v5.1.0). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20210309161214.1402527-3-armbru@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2021-03-19hw/intc/i8259: Refactor pic_read_irq() to avoid uninitialized variablePhilippe Mathieu-Daudé
Some compiler versions are smart enough to detect a potentially uninitialized variable, but are not smart enough to detect that this cannot happen due to the code flow: ../hw/intc/i8259.c: In function ‘pic_read_irq’: ../hw/intc/i8259.c:203:13: error: ‘irq2’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 203 | irq = irq2 + 8; | ~~~~^~~~~~~~~~ Restrict irq2 variable use to the inner statement. Fixes: 78ef2b6989f ("i8259: Reorder intack in pic_read_irq") Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210318163059.3686596-1-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-19Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches and object-add QAPIfication - QAPIfy object-add and --object - stream: Fail gracefully if permission is denied - storage-daemon: Fix crash on quit when job is still running - curl: Fix use after free - char: Deprecate backend aliases, fix QMP query-chardev-backends - Fix image creation option defaults that exist in both the format and the protocol layer (e.g. 'cluster_size' in qcow2 and rbd; the qcow2 default was incorrectly applied to the rbd layer) # gpg: Signature made Fri 19 Mar 2021 09:18:22 GMT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (42 commits) vl: allow passing JSON to -object qom: move user_creatable_add_opts logic to vl.c and QAPIfy it tests: convert check-qom-proplist to keyval qom: Support JSON in HMP object_add and tools --object char: Simplify chardev_name_foreach() char: Deprecate backend aliases 'tty' and 'parport' char: Skip CLI aliases in query-chardev-backends qom: Add user_creatable_parse_str() hmp: QAPIfy object_add qemu-img: Use user_creatable_process_cmdline() for --object qom: Add user_creatable_add_from_str() qemu-nbd: Use user_creatable_process_cmdline() for --object qemu-io: Use user_creatable_process_cmdline() for --object qom: Factor out user_creatable_process_cmdline() qom: Remove user_creatable_add_dict() qemu-storage-daemon: Implement --object with qmp_object_add() qom: Make "object" QemuOptsList optional qapi/qom: QAPIfy object-add qapi/qom: Add ObjectOptions for x-remote-object qapi/qom: Add ObjectOptions for input-* ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-19qapi/qom: QAPIfy object-addKevin Wolf
This converts object-add from 'gen': false to the ObjectOptions QAPI type. As an immediate benefit, clients can now use QAPI schema introspection for user creatable QOM objects. It is also the first step towards making the QAPI schema the only external interface for the creation of user creatable objects. Once all other places (HMP and command lines of the system emulator and all tools) go through QAPI, too, some object implementations can be simplified because some checks (e.g. that mandatory options are set) are already performed by QAPI, and in another step, QOM boilerplate code could be generated from the schema. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Peter Krempa <pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2021-03-18Merge remote-tracking branch 'remotes/philmd/tags/pflash-20210318' into stagingPeter Maydell
Parallel NOR Flash patches queue - Code movement to ease maintainability - Tracing improvements # gpg: Signature made Thu 18 Mar 2021 15:44:12 GMT # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * remotes/philmd/tags/pflash-20210318: hw/block/pflash_cfi: Replace DPRINTF with trace events hw/block/pflash_cfi01: Correct the type of PFlashCFI01.ro hw/block/pflash_cfi01: Clarify trace events hw/block/pflash_cfi02: Add DeviceReset method hw/block/pflash_cfi02: Factor out pflash_reset_state_machine() hw/block/pflash_cfi02: Rename register_memory(true) as mode_read_array hw/block/pflash_cfi02: Open-code pflash_register_memory(rom=false) hw/block/pflash_cfi02: Set rom_mode to true in pflash_setup_mappings() hw/block/pflash_cfi02: Extract pflash_cfi02_fill_cfi_table() hw/block/pflash_cfi01: Extract pflash_cfi01_fill_cfi_table() hw/block/pflash_cfi: Fix code style for checkpatch.pl Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-18Merge remote-tracking branch 'remotes/nvme/tags/nvme-next-pull-request' into ↵Peter Maydell
staging emulated nvme updates and fixes * fixes for Coverity CID 1450756, 1450757 and 1450758 (me) * fix for a bug in zone management receive (me) * metadata and end-to-end data protection support (me & Gollu Appalanaidu) * verify support (Gollu Appalanaidu) * multiple lba formats and format nvm support (Minwoo Im) and a couple of misc refactorings from me. v2: - remove an unintended submodule update. Argh. # gpg: Signature made Thu 18 Mar 2021 11:53:48 GMT # gpg: using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9 # gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [unknown] # gpg: aka "Klaus Jensen <k.jensen@samsung.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468 4272 63D5 6FC5 E55D A838 # Subkey fingerprint: 5228 33AA 75E2 DCE6 A247 66C0 4DE1 AF31 6D4F 0DE9 * remotes/nvme/tags/nvme-next-pull-request: hw/block/nvme: add support for the format nvm command hw/block/nvme: pull lba format initialization hw/block/nvme: prefer runtime helpers instead of device parameters hw/block/nvme: support multiple lba formats hw/block/nvme: add non-mdts command size limit for verify hw/block/nvme: add verify command hw/block/nvme: end-to-end data protection hw/block/nvme: add metadata support hw/block/nvme: fix zone management receive reporting too many zones hw/block/nvme: assert namespaces array indices hw/block/nvme: fix potential overflow Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-18Merge remote-tracking branch ↵Peter Maydell
'remotes/berrange-gitlab/tags/dep-many-pull-request' into staging Remove many old deprecated features The following features have been deprecated for well over the 2 release cycle we promise ``-drive file=json:{...{'driver':'file'}}`` (since 3.0) ``-vnc acl`` (since 4.0.0) ``-mon ...,control=readline,pretty=on|off`` (since 4.1) ``migrate_set_downtime`` and ``migrate_set_speed`` (since 2.8.0) ``query-named-block-nodes`` result ``encryption_key_missing`` (since 2.10.0) ``query-block`` result ``inserted.encryption_key_missing`` (since 2.10.0) ``migrate-set-cache-size`` and ``query-migrate-cache-size`` (since 2.11.0) ``query-named-block-nodes`` and ``query-block`` result dirty-bitmaps[i].status (since 4.0) ``query-cpus`` (since 2.12.0) ``query-cpus-fast`` ``arch`` output member (since 3.0.0) ``query-events`` (since 4.0) chardev client socket with ``wait`` option (since 4.0) ``acl_show``, ``acl_reset``, ``acl_policy``, ``acl_add``, ``acl_remove`` (since 4.0.0) ``ide-drive`` (since 4.2) ``scsi-disk`` (since 4.2) # gpg: Signature made Thu 18 Mar 2021 09:23:39 GMT # gpg: using RSA key DAF3A6FDB26B62912D0E8E3FBE86EBB415104FDF # gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>" [full] # gpg: aka "Daniel P. Berrange <berrange@redhat.com>" [full] # Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF * remotes/berrange-gitlab/tags/dep-many-pull-request: block: remove support for using "file" driver with block/char devices block: remove 'dirty-bitmaps' field from 'BlockInfo' struct block: remove dirty bitmaps 'status' field block: remove 'encryption_key_missing' flag from QAPI hw/scsi: remove 'scsi-disk' device hw/ide: remove 'ide-drive' device chardev: reject use of 'wait' flag for socket client chardevs machine: remove 'arch' field from 'query-cpus-fast' QMP command machine: remove 'query-cpus' QMP command migrate: remove QMP/HMP commands for speed, downtime and cache size monitor: remove 'query-events' QMP command monitor: raise error when 'pretty' option is used with HMP ui, monitor: remove deprecated VNC ACL option and HMP commands Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-18Merge remote-tracking branch ↵Peter Maydell
'remotes/stsquad/tags/pull-misc-6.0-updates-170321-2' into staging Final fixes for 6.0 - plugins physical address changes - syscall tracking plugin - plugin kernel-doc comments (without integration) - libfdt build fix for guest-loader # gpg: Signature made Wed 17 Mar 2021 07:19:23 GMT # gpg: using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44 # gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [full] # Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44 * remotes/stsquad/tags/pull-misc-6.0-updates-170321-2: hw/core: Only build guest-loader if libfdt is available plugins: Fixes typo in qemu-plugin.h plugins: getting qemu_plugin_get_hwaddr only expose one function prototype plugins: expand kernel-doc for memory query and instrumentation plugins: expand kernel-doc for instruction query and instrumentation plugins: expand inline exec kernel-doc documentation. plugins: add qemu_plugin_id_t to kernel-doc plugins: add qemu_plugin_cb_flags to kernel-doc plugins: expand the typedef kernel-docs for translation plugins: expand the callback typedef kernel-docs plugins: cleanup kernel-doc for qemu_plugin_install plugins: expand kernel-doc for qemu_info_t plugins: Expose physical addresses instead of device offsets plugins: new syscalls plugin utils: Use fixed-point arithmetic in qemu_strtosz Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-18hw/block/nvme: add support for the format nvm commandMinwoo Im
Format NVM admin command can make a namespace or namespaces to be with different LBA size and metadata size with protection information types. This patch introduces Format NVM command with LBA format, Metadata, and Protection Information for the device. The secure erase operation things and support for formatting zoned namespaces are yet to be added. The parameter checks inside of this patch has been referred from Keith's old branch. Signed-off-by: Minwoo Im <minwoo.im@samsung.com> [anaidu.gollu: rebased on e2e] Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> [k.jensen: rebased for reworked aio tracking] Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-18hw/block/nvme: pull lba format initializationKlaus Jensen
Pull lba format initialization code into separate function in preparation for Format NVM support. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
2021-03-18hw/block/nvme: prefer runtime helpers instead of device parametersKlaus Jensen
In preparation for Format NVM support, use runtime helpers instead of the constant device parameters when getting lba size information etc. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
2021-03-18hw/block/nvme: support multiple lba formatsMinwoo Im
This patch introduces multiple LBA formats supported with the typical logical block sizes of 512 bytes and 4096 bytes as well as metadata sizes of 0, 8, 16 and 64 bytes. The format will be chosed based on the lbads and ms parameters of the nvme-ns device. Signed-off-by: Minwoo Im <minwoo.im@samsung.com> [k.jensen: resurrected and rebased] Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-18hw/block/nvme: add non-mdts command size limit for verifyKlaus Jensen
Verify is not subject to MDTS, so a single Verify command may result in excessive amounts of allocated memory. Impose a limit on the data size by adding support for TP 4040 ("Non-MDTS Command Size Limits"). Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-18hw/block/nvme: add verify commandGollu Appalanaidu
See NVM Express 1.4, section 6.14 ("Verify Command"). Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> [k.jensen: rebased, refactored for e2e] Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-18hw/block/nvme: end-to-end data protectionKlaus Jensen
Add support for namespaces formatted with protection information. The type of end-to-end data protection (i.e. Type 1, Type 2 or Type 3) is selected with the `pi` nvme-ns device parameter. If the number of metadata bytes is larger than 8, the `pil` nvme-ns device parameter may be used to control the location of the 8-byte DIF tuple. The default `pil` value of '0', causes the DIF tuple to be transferred as the last 8 bytes of the metadata. Set to 1 to store this in the first eight bytes instead. Co-authored-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-18hw/block/nvme: add metadata supportKlaus Jensen
Add support for metadata in the form of extended logical blocks as well as a separate buffer of data. The new `ms` nvme-ns device parameter specifies the size of metadata per logical block in bytes. The `mset` nvme-ns device parameter controls whether metadata is transfered as part of an extended lba (set to '1') or in a separate buffer (set to '0', the default). Regardsless of the scheme chosen with `mset`, metadata is stored at the end of the namespace backing block device. This requires the user provided PRP/SGLs to be walked and "split" into data and metadata scatter/gather lists if the extended logical block scheme is used, but has the advantage of not breaking the deallocated blocks support. Co-authored-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-18hw/block/nvme: fix zone management receive reporting too many zonesKlaus Jensen
nvme_zone_mgmt_recv uses nvme_ns_nlbas() to get the number of LBAs in the namespace and then calculates the number of zones to report by incrementing slba with ZSZE until exceeding the number of LBAs as returned by nvme_ns_nlbas(). This is bad because the namespace might be of such as size that some LBAs are valid, but are not part of any zone, causing zone management receive to report one additional (but non-existing) zone. Fix this with a conventional loop on i < ns->num_zones instead. Fixes: a479335bfaf3 ("hw/block/nvme: Support Zoned Namespace Command Set") Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-03-18hw/block/nvme: assert namespaces array indicesKlaus Jensen
Coverity complains about a possible memory corruption in the nvme_ns_attach and _detach functions. While we should not (famous last words) be able to reach this function without nsid having previously been validated, this is still an open door for future misuse. Make Coverity and maintainers happy by asserting that the index into the array is valid. Also, while not detected by Coverity (yet), add an assert in nvme_subsys_ns and nvme_subsys_register_ns as well since a similar issue is exists there. Fixes: 037953b5b299 ("hw/block/nvme: support namespace detach") Fixes: CID 1450757 Fixes: CID 1450758 Cc: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2021-03-18hw/block/nvme: fix potential overflowKlaus Jensen
page_size is a uint32_t, and zasl is a uint8_t, so the expression `page_size << zasl` is done using 32-bit arithmetic and might overflow. Since we then compare this against a 64 bit data_size value, Coverity complains that we might overflow unintentionally. An MDTS/ZASL value in excess of 4GiB is probably impractical, but it is not entirely unrealistic, so add a cast such that we handle that case properly. Fixes: 578d914b263c ("hw/block/nvme: align zoned.zasl with mdts") Fixes: CID 1450756 Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-03-18hw/block/pflash_cfi: Replace DPRINTF with trace eventsDavid Edmondson
Rather than having a device specific debug implementation in pflash_cfi01.c and pflash_cfi02.c, use the standard tracing facility. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210216142721.1985543-2-david.edmondson@oracle.com> [PMD: Rebased, fixed pflash_write_block_erase trace event format] Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
2021-03-18hw/block/pflash_cfi01: Correct the type of PFlashCFI01.roDavid Edmondson
PFlashCFI01.ro is a bool, declare it as such. Signed-off-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210216142721.1985543-3-david.edmondson@oracle.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
2021-03-18hw/block/pflash_cfi01: Clarify trace eventsPhilippe Mathieu-Daudé
Use the 'mode_read_array' event when we set the device in such mode, and use the 'reset' event in DeviceReset handler. Reviewed-by: Bin Meng <bmeng.cn@gmail.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210310170528.1184868-10-philmd@redhat.com>
2021-03-18hw/block/pflash_cfi02: Add DeviceReset methodPhilippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Bin Meng <bmeng.cn@gmail.com> Message-Id: <20210310170528.1184868-9-philmd@redhat.com>