slackcoder/qemu - QEMU is a generic and open source machine & userspace emulator and virtualizer

Age	Commit message (Collapse)	Author
2021-03-10	spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state	Daniel Henrique Barboza
	Handling errors in memory hotunplug in the pSeries machine is more complex than any other device type, because there are all the complications that other devices has, and more. For instance, determining a timeout for a DIMM hotunplug must consider if it's a Hash-MMU or a Radix-MMU guest, because Hash guests takes longer to hotunplug DIMMs. The size of the DIMM is also a factor, given that longer DIMMs naturally takes longer to be hotunplugged from the kernel. And there's also the guest memory usage to be considered: if there's a process that is consuming memory that would be lost by the DIMM unplug, the kernel will postpone the unplug process until the process finishes, and then initiate the regular hotunplug process. The first two considerations are manageable, but the last one is a deal breaker. There is no sane way for the pSeries machine to determine the memory load in the guest when attempting a DIMM hotunplug - and even if there was a way, the guest can start using all the RAM in the middle of the unplug process and invalidate our previous assumptions - and in result we can't even begin to calculate a timeout for the operation. This means that we can't implement a viable timeout mechanism for memory unplug in pSeries. Going back to why we would consider an unplug timeout, the reason is that we can't know if the kernel is giving up the unplug. Turns out that, sometimes, we can. Consider a failed memory hotunplug attempt where the kernel will error out with the following message: 'pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs' This happens when there is a LMB that the kernel gave up in removing, and the LMBs previously marked for removal are now being added back. This happens in the pseries kernel in [1], dlpar_memory_remove_by_ic() into dlpar_add_lmb(), and after that update_lmb_associativity_index(). In this function, the kernel is configuring the LMB DRC connector again. Note that this is a valid usage in LOPAR, as stated in section "ibm,configure-connector RTAS Call": 'A subsequent sequence of calls to ibm,configure-connector with the same entry from the “ibm,drc-indexes” or “ibm,drc-info” property will restart the configuration of devices which were not completely configured.' We can use this kernel behavior in our favor. If a DRC connector reconfiguration for a LMB that we marked as unplug pending happens, this indicates that the kernel changed its mind about the unplug and is reasserting that it will keep using all the LMBs of the DIMM. In this case, it's safe to assume that the whole DIMM device unplug was cancelled. This patch hops into rtas_ibm_configure_connector() and, in the scenario described above, clear the unplug state for the DIMM device. This will not solve all the problems we still have with memory unplug, but it will cover this case where the kernel reconfigures LMBs after a failed unplug. We are a bit more resilient, without using an unreliable timeout, and we didn't make the remaining error cases any worse. [1] arch/powerpc/platforms/pseries/hotplug-memory.c Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210222194531.62717-6-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-10	spapr_drc.c: add hotunplug timeout for CPUs	Daniel Henrique Barboza
	There is a reliable way to make a CPU hotunplug fail in the pseries machine. Hotplug a CPU A, then offline all other CPUs inside the guest but A. When trying to hotunplug A the guest kernel will refuse to do it, because A is now the last online CPU of the guest. PAPR has no 'error callback' in this situation to report back to the platform, so the guest kernel will deny the unplug in silent and QEMU will never know what happened. The unplug pending state of A will remain until the guest is shutdown or rebooted. Previous attempts of fixing it (see [1] and [2]) were aimed at trying to mitigate the effects of the problem. In [1] we were trying to guess which guest CPUs were online to forbid hotunplug of the last online CPU in the QEMU layer, avoiding the scenario described above because QEMU is now failing in behalf of the guest. This is not robust because the last online CPU of the guest can change while we're in the middle of the unplug process, and our initial assumptions are now invalid. In [2] we were accepting that our unplug process is uncertain and the user should be allowed to spam the IRQ hotunplug queue of the guest in case the CPU hotunplug fails. This patch presents another alternative, using the timeout infrastructure introduced in the previous patch. CPU hotunplugs in the pSeries machine will now timeout after 15 seconds. This is a long time for a single CPU unplug to occur, regardless of guest load - although the user is strongly encouraged to not hotunplug devices from a guest under high load - and we can be sure that something went wrong if it takes longer than that for the guest to release the CPU (the same can't be said about memory hotunplug - more on that in the next patch). Timing out the unplug operation will reset the unplug state of the CPU and allow the user to try it again, regardless of the error situation that prevented the hotunplug to occur. Of all the not so pretty fixes/mitigations for CPU hotunplug errors in pSeries, timing out the operation is an admission that we have no control in the process, and must assume the worst case if the operation doesn't succeed in a sensible time frame. [1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg03353.html [2] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04400.html Reported-by: Xujun Ma <xuma@redhat.com> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414 Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210222194531.62717-5-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-10	spapr_drc.c: introduce unplug_timeout_timer	Daniel Henrique Barboza
	The LoPAR spec provides no way for the guest kernel to report failure of hotplug/hotunplug events. This wouldn't be bad if those operations were granted to always succeed, but that's far for the reality. What ends up happening is that, in the case of a failed hotunplug, regardless of whether it was a QEMU error or a guest misbehavior, the pSeries machine is retaining the unplug state of the device in the running guest. This state is cleanup in machine reset, where it is assumed that this state represents a device that is pending unplug, and the device is hotunpluged from the board. Until the reset occurs, any hotunplug operation of the same device is forbid because there is a pending unplug state. This behavior has at least one undesirable side effect. A long standing pending unplug state is, more often than not, the result of a hotunplug error. The user had to dealt with it, since retrying to unplug the device is noy allowed, and then in the machine reset we're removing the device from the guest. This means that we're failing the user twice - failed to hotunplug when asked, then hotunplugged without notice. Solutions to this problem range between trying to predict when the hotunplug will fail and forbid the operation from the QEMU layer, from opening up the IRQ queue to allow for multiple hotunplug attempts, from telling the users to 'reboot the machine if something goes wrong'. The first solution is flawed because we can't fully predict guest behavior from QEMU, the second solution is a trial and error remediation that counts on a hope that the unplug will eventually succeed, and the third is ... well. This patch introduces a crude, but effective solution to hotunplug errors in the pSeries machine. For each unplug done, we'll timeout after some time. If a certain amount of time passes, we'll cleanup the hotunplug state from the machine. During the timeout period, any unplug operations in the same device will still be blocked. After that, we'll assume that the guest failed the operation, and allow the user to try again. If the timeout is too short we'll prevent legitimate hotunplug situations to occur, so we'll need to overestimate the regular time an unplug operation takes to succeed to account that. The true solution for the hotunplug errors in the pSeries machines is a PAPR change to allow for the guest to warn the platform about it. For now, the work done in this timeout design can be used for the new PAPR 'abort hcall' in the future, given that for both cases we'll need code to cleanup the existing unplug states of the DRCs. At this moment we're adding the basic wiring of the timer into the DRC. Next patch will use the timer to timeout failed CPU hotunplugs. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210222194531.62717-4-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-10	spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()	Daniel Henrique Barboza
	spapr_drc_detach() is not the best name for what the function does. The function does not detach the DRC, it makes an uncommited attempt to do it. It'll mark the DRC as pending unplug, via the 'unplug_request' flag, and only if the DRC state is drck->empty_state it will detach the DRC, via spapr_drc_release(). This is a contrast with its pair spapr_drc_attach(), where the function is indeed creating the DRC QOM object. If you know what spapr_drc_attach() does, you can be misled into thinking that spapr_drc_detach() is removing the DRC from QEMU internal state, which isn't true. The current role of this function is better described as a request for detach, since there's no guarantee that we're going to detach the DRC in the end. Rename the function to spapr_drc_unplug_request to reflect what is is doing. The initial idea was to change the name to spapr_drc_detach_request(), and later on change the unplug_request flag to detach_request. However, unplug_request is a migratable boolean for a long time now and renaming it is not worth the trouble. spapr_drc_unplug_request() setting drc->unplug_request is more natural than spapr_drc_detach_request setting drc->unplug_request. Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210222194531.62717-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-10	spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable	Daniel Henrique Barboza
	When moving a physical DRC to "Available", drc_isolate_physical() will move the DRC state to STATE_PHYSICAL_POWERON and, if the DRC is marked for unplug, call spapr_drc_detach(). For physical DRCs, drck->empty_state is STATE_PHYSICAL_POWERON, meaning that we're sure that spapr_drc_detach() will end up calling spapr_drc_release() in the end. Likewise, for logical DRCs, drc_set_unusable will move the DRC to "Unusable" state, setting drc->state to STATE_LOGICAL_UNUSABLE, which is the drck->empty_state for logical DRCs. spapr_drc_detach() will call spapr_drc_release() in this case as well. In both scenarios, spapr_drc_detach() is being used as a spapr_drc_release(), wrapper, where we also set unplug_requested (which is already true, otherwise spapr_drc_detach() wouldn't be called in the first place) and check if drc->state == drck->empty_state, which we also know it's guaranteed to be true because we just set it. Just use spapr_drc_release() in these functions to be clear of our intentions in both these functions. Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210222194531.62717-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-10	spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical()	Daniel Henrique Barboza
	drc_isolate_logical() is used to move the DRC from the "Configured" to the "Available" state, erroring out if the DRC is in the unexpected "Unisolate" state and doing nothing (with RTAS_OUT_SUCCESS) if the DRC is already in "Available" or in "Unusable" state. When moving from "Configured" to "Available", the DRC is moved to the LOGICAL_AVAILABLE state, a drc->unplug_requested check is done and, if true, spapr_drc_detach() is called. What spapr_drc_detach() does then is: - set drc->unplug_requested to true. In fact, this is the only place where unplug_request is set to true; - does nothing else if drc->state != drck->empty_state. If the DRC state is equal to drck->empty_state, spapr_drc_release() is called. For logical DRCs, drck->empty_state = LOGICAL_UNUSABLE. In short, calling spapr_drc_detach() in drc_isolate_logical() does nothing. It'll set unplug_request to true again ('again' since it was already true - otherwise the function wouldn't be called), and will return without calling spapr_drc_release() because the DRC is not in LOGICAL_UNUSABLE, since drc_isolate_logical() just moved it to LOGICAL_AVAILABLE. The only place where the logical DRC is released is when called from drc_set_unusable(), when it is moved to the "Unusable" state. As it should, according to PAPR. Even though calling spapr_drc_detach() in drc_isolate_logical() is benign, removing it will avoid further thought about the matter. So let's go ahead and do that. As a note, this logic was introduced in commit bbf5c878ab76. Since then, the DRC handling code was refactored and enhanced, and PAPR itself went through some changes in the DRC area as well. It is expected that some assumptions we had back then are now deprecated. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210211225246.17315-2-danielhb413@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-10	hw/display/sm501: Inline template header into C file	Peter Maydell
	We no longer need to include sm501_template.h multiple times, so we can simply inline its contents into sm501.c. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20210212180653.27588-4-peter.maydell@linaro.org> Acked-by: BALATON Zoltan <balaton@eik.bme.hu> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-10	hw/display/sm501: Expand out macros in template header	Peter Maydell
	Now that we only include sm501_template.h for the DEPTH==32 case, we can expand out the uses of the BPP, PIXEL_TYPE and PIXEL_NAME macros in that header. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20210212180653.27588-3-peter.maydell@linaro.org> Acked-by: BALATON Zoltan <balaton@eik.bme.hu> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-10	hw/display/sm501: Remove dead code for non-32-bit RGB surfaces	Peter Maydell
	For a long time now the UI layer has guaranteed that the console surface is always 32 bits per pixel RGB. Remove the legacy dead code from the sm501 display device which was handling the possibility that the console surface was some other format. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20210212180653.27588-2-peter.maydell@linaro.org> Acked-by: BALATON Zoltan <balaton@eik.bme.hu> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-03-09	hw/lm32/Kconfig: Have MILKYMIST select LM32_DEVICES	Philippe Mathieu-Daudé
	The Milkymist board requires more than the PTIMER. Directly select the LM32_DEVICES. This fixes: /usr/bin/ld: libqemu-lm32-softmmu.fa.p/target_lm32_gdbstub.c.o: in function `lm32_cpu_gdb_read_register': target/lm32/gdbstub.c:46: undefined reference to `lm32_pic_get_im' target/lm32/gdbstub.c:48: undefined reference to `lm32_pic_get_ip' libqemu-lm32-softmmu.fa.p/target_lm32_op_helper.c.o: in function `helper_wcsr_im': target/lm32/op_helper.c:107: undefined reference to `lm32_pic_set_im' libqemu-lm32-softmmu.fa.p/target_lm32_op_helper.c.o: in function `helper_wcsr_ip': target/lm32/op_helper.c:114: undefined reference to `lm32_pic_set_ip' libqemu-lm32-softmmu.fa.p/target_lm32_op_helper.c.o: in function `helper_wcsr_jtx': target/lm32/op_helper.c:120: undefined reference to `lm32_juart_set_jtx' libqemu-lm32-softmmu.fa.p/target_lm32_op_helper.c.o: in function `helper_wcsr_jrx': target/lm32/op_helper.c:125: undefined reference to `lm32_juart_set_jrx' libqemu-lm32-softmmu.fa.p/target_lm32_translate.c.o: in function `lm32_cpu_dump_state': target/lm32/translate.c:1161: undefined reference to `lm32_pic_get_ip' target/lm32/translate.c:1161: undefined reference to `lm32_pic_get_im' Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20210221225626.2589247-4-f4bug@amsat.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-03-09	hw/lm32/Kconfig: Rename CONFIG_LM32 -> CONFIG_LM32_DEVICES	Philippe Mathieu-Daudé
	We want to be able to use the 'LM32' config for architecture specific features. As CONFIG_LM32 is only used to select peripherals, rename it CONFIG_LM32_DEVICES. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20210221225626.2589247-3-f4bug@amsat.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-03-09	hw/lm32/Kconfig: Introduce CONFIG_LM32_EVR for lm32-evr/uclinux boards	Philippe Mathieu-Daudé
	We want to be able to use the 'LM32' config for architecture specific features. Introduce CONFIG_LM32_EVR to select the lm32-evr / lm32-uclinux boards. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20210221225626.2589247-2-f4bug@amsat.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-03-09	Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging	Peter Maydell
	Block layer patches: - qemu-storage-daemon: add --pidfile option - qemu-storage-daemon: CLI error messages include the option name now - vhost-user-blk export: Misc fixes - docs: Improvements for qemu-storage-daemon documentation - parallels: load bitmap extension - backup-top: Don't crash on post-finalize accesses - Improve error messages related to node-name options - iotests improvements # gpg: Signature made Mon 08 Mar 2021 17:01:41 GMT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (30 commits) blockdev: Clarify error messages pertaining to 'node-name' block: Clarify error messages pertaining to 'node-name' docs: qsd: Explain --export nbd,name=... default MAINTAINERS: update parallels block driver iotests: add parallels-read-bitmap test iotests.py: add unarchive_sample_image() helper parallels: support bitmap extension for read-only mode block/parallels: BDRVParallelsState: add cluster_size field parallels.txt: fix bitmap L1 table description qcow2-bitmap: make bytes_covered_by_bitmap_cluster() public block/export: port virtio-blk read/write range check block/export: port virtio-blk discard/write zeroes input validation block/export: fix vhost-user-blk export sector number calculation block/export: use VIRTIO_BLK_SECTOR_BITS block/export: fix blk_size double byteswap libqtest: add qtest_remove_abrt_handler() libqtest: add qtest_kill_qemu() libqtest: add qtest_socket_server() vhost-user-blk: fix blkcfg->num_queues endianness docs: replace insecure /tmp examples in qsd docs ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-09	virtio-gpu: Adjust code space style	lijiejun
	Fix code style. Operator needs align with eight spaces, and delete line space. Signed-off-by: lijiejun <a_lijiejun@163.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <1615292050-108748-1-git-send-email-a_lijiejun@163.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-03-09	exec/memory: Use struct Object typedef	Philippe Mathieu-Daudé
	We forward-declare Object typedef in "qemu/typedefs.h" since commit ca27b5eb7cd ("qom/object: Move Object typedef to 'qemu/typedefs.h'"). Use it everywhere to make the code simpler. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20210225182003.3629342-1-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-03-09	vhost_user_gpu: Drop dead check for g_malloc() failure	Markus Armbruster
	Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20210126124240.2081959-3-armbru@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-03-09	scsi: Silence gcc warning	Eric Blake
	On Fedora 33, gcc 10.2.1 notes that scsi_cdb_length(buf) can set len==-1, which in turn overflows g_malloc(): [5/5] Linking target qemu-system-x86_64 In function ‘scsi_disk_new_request_dump’, inlined from ‘scsi_new_request’ at ../hw/scsi/scsi-disk.c:2608:9: ../hw/scsi/scsi-disk.c:2582:19: warning: argument 1 value ‘18446744073709551612’ exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=] 2582 \| line_buffer = g_malloc(len * 5 + 1); \| ^ Silence it with a decent assertion, since we only convert a buffer to bytes when we have a valid cdb length. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210209152350.207958-1-eblake@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-03-09	Various spelling fixes	Michael Tokarev
	An assorted set of spelling fixes in various places. Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20210309111510.79495-1-mjt@msgid.tls.msk.ru> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-03-09	Merge remote-tracking branch 'remotes/mcayland/tags/qemu-sparc-20210307' ↵	Peter Maydell
	into staging qemu-sparc queue # gpg: Signature made Sun 07 Mar 2021 12:07:13 GMT # gpg: using RSA key CC621AB98E82200D915CC9C45BC2C56FAE0F321F # gpg: issuer "mark.cave-ayland@ilande.co.uk" # gpg: Good signature from "Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>" [full] # Primary key fingerprint: CC62 1AB9 8E82 200D 915C C9C4 5BC2 C56F AE0F 321F * remotes/mcayland/tags/qemu-sparc-20210307: (42 commits) esp: add support for unaligned accesses esp: implement non-DMA transfers in PDMA mode esp: add trivial implementation of the ESP_RFLAGS register esp: convert cmdbuf from array to Fifo8 esp: convert ti_buf from array to Fifo8 esp: transition to message out phase after SATN and stop command esp: add maxlen parameter to get_cmd() esp: raise interrupt after every non-DMA byte transferred to the FIFO esp: remove old deferred command completion mechanism esp: defer command completion interrupt on incoming data transfers esp: latch individual bits in ESP_RINTR register esp: implement FIFO flush command esp: add 4 byte PDMA read and write transfers esp: remove pdma_origin from ESPState esp: use FIFO for PDMA transfers between initiator and device esp: fix PDMA target selection esp: rename get_cmd_cb() to esp_select() esp: remove CMD pdma_origin esp: use in-built TC to determine PDMA transfer length esp: use ti_wptr/ti_rptr to manage the current FIFO position for PDMA ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-03-09	hw/misc: Model KCS devices in the Aspeed LPC controller	Andrew Jeffery
	Keyboard-Controller-Style devices for IPMI purposes are exposed via LPC IO cycles from the BMC to the host. Expose support on the BMC side by implementing the usual MMIO behaviours, and expose the ability to inspect the KCS registers in "host" style by accessing QOM properties associated with each register. The model caters to the IRQ style of both the AST2600 and the earlier SoCs (AST2400 and AST2500). The AST2600 allocates an IRQ for each LPC sub-device, while there is a single IRQ shared across all subdevices on the AST2400 and AST2500. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210302014317.915120-6-andrew@aj.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org>
2021-03-09	hw/misc: Add a basic Aspeed LPC controller model	Cédric Le Goater
	This is a very minimal framework to access registers which are used to configure the AHB memory mapping of the flash chips on the LPC HC Firmware address space. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Message-Id: <20210302014317.915120-5-andrew@aj.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org>
2021-03-09	hw/arm: ast2600: Correct the iBT interrupt ID	Andrew Jeffery
	The AST2600 allocates distinct GIC IRQs for the LPC subdevices such as the iBT device. Previously on the AST2400 and AST2500 the LPC subdevices shared a single LPC IRQ. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210302014317.915120-4-andrew@aj.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org>
2021-03-09	hw/arm: ast2600: Set AST2600_MAX_IRQ to value from datasheet	Andrew Jeffery
	The datasheet says we have 197 IRQs allocated, and we need more than 128 to describe IRQs from LPC devices. Raise the value now to allow modelling of the LPC devices. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210302014317.915120-3-andrew@aj.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org>
2021-03-09	hw/arm: ast2600: Force a multiple of 32 of IRQs for the GIC	Andrew Jeffery
	This appears to be a requirement of the GIC model. The AST2600 allocates 197 GIC IRQs, which we will adjust shortly. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210302014317.915120-2-andrew@aj.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org>
2021-03-09	arm/ast2600: Fix SMP booting with -kernel	Joel Stanley
	The ast2600 machines do not have PSCI firmware, so this property should have never been set. Removing this node fixes SMP booting Linux kernels that have PSCI enabled, as Linux fails to find PSCI in the device tree and falls back to the soc-specific method for enabling secondary CPUs. The comment is out of date as Qemu has supported -kernel booting since 9bb6d14081ce ("aspeed: Add boot stub for smp booting"), in v5.1. Fixes: f25c0ae1079d ("aspeed/soc: Add AST2600 support") Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210303010505.635621-1-joel@jms.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org>
2021-03-09	hw/block/nvme: support Identify NS Attached Controller List	Minwoo Im
	Support Identify command for Namespace attached controller list. This command handler will traverse the controller instances in the given subsystem to figure out whether the specified nsid is attached to the controllers or not. The 4096bytes Identify data will return with the first entry (16bits) indicating the number of the controller id entries. So, the data can hold up to 2047 entries for the controller ids. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> [k.jensen: rebased for dma refactor] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-03-09	hw/block/nvme: support changed namespace asynchronous event	Minwoo Im
	If namespace inventory is changed due to some reasons (e.g., namespace attachment/detachment), controller can send out event notifier to the host to manage namespaces. This patch sends out the AEN to the host after either attach or detach namespaces from controllers. To support clear of the event from the controller, this patch also implemented Get Log Page command for Changed Namespace List log type. To return namespace id list through the command, when namespace inventory is updated, id is added to the per-controller list (changed_ns_list). To indicate the support of this async event, this patch set OAES(Optional Asynchronous Events Supported) in Identify Controller data structure. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-03-09	hw/block/nvme: support namespace attachment command	Minwoo Im
	This patch supports Namespace Attachment command for the pre-defined nvme-ns device nodes. Of course, attach/detach namespace should only be supported in case 'subsys' is given. This is because if we detach a namespace from a controller, somebody needs to manage the detached, but allocated namespace in the NVMe subsystem. As command effect for the namespace attachment command is registered, the host will be notified that namespace inventory is changed so that host will rescan the namespace inventory after this command. For example, kernel driver manages this command effect via passthru IOCTL. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> [k.jensen: rebased for dma refactor] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-03-09	hw/block/nvme: refactor nvme_select_ns_iocs	Minwoo Im
	This patch has no functional changes. This patch just refactored nvme_select_ns_iocs() to iterate the attached namespaces of the controlller and make it invoke __nvme_select_ns_iocs(). Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-03-09	hw/block/nvme: support allocated namespace type	Minwoo Im
	From NVMe spec 1.4b "6.1.5. NSID and Namespace Relationships" defines valid namespace types: - Unallocated: Not exists in the NVMe subsystem - Allocated: Exists in the NVMe subsystem - Inactive: Not attached to the controller - Active: Attached to the controller This patch added support for allocated, but not attached namespace type: !nvme_ns(n, nsid) && nvme_subsys_ns(n->subsys, nsid) nvme_ns() returns attached namespace instance of the given controller and nvme_subsys_ns() returns allocated namespace instance in the subsystem. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-03-09	hw/block/nvme: fix allocated namespace list to 256	Minwoo Im
	Expand allocated namespace list (subsys->namespaces) to have 256 entries which is a value lager than at least NVME_MAX_NAMESPACES which is for attached namespace list in a controller. Allocated namespace list should at least larger than attached namespace list. n->num_namespaces = NVME_MAX_NAMESPACES; The above line will set the NN field by id->nn so that the subsystem should also prepare at least this number of namespace list entries. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-03-09	hw/block/nvme: fix namespaces array to 1-based	Minwoo Im
	subsys->namespaces array used to be sized to NVME_SUBSYS_MAX_NAMESPACES. But subsys->namespaces are being accessed with 1-based namespace id which means the very first array entry will always be empty(NULL). Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-03-09	hw/block/nvme: support namespace detach	Minwoo Im
	Given that now we have nvme-subsys device supported, we can manage namespace allocated, but not attached: detached. This patch introduced a parameter for nvme-ns device named 'detached'. This parameter indicates whether the given namespace device is detached from a entire NVMe subsystem('subsys' given case, shared namespace) or a controller('bus' given case, private namespace). - Allocated namespace 1) Shared ns in the subsystem 'subsys0': -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,subsys=subsys0,detached=true 2) Private ns for the controller 'nvme0' of the subsystem 'subsys0': -device nvme-subsys,id=subsys0 -device nvme,serial=foo,id=nvme0,subsys=subsys0 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true 3) (Invalid case) Controller 'nvme0' has no subsystem to manage ns: -device nvme,serial=foo,id=nvme0 -device nvme-ns,id=ns1,drive=blknvme0,nsid=1,bus=nvme0,detached=true Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-03-09	hw/block/nvme: refactor nvme_dma	Klaus Jensen
	The nvme_dma function doesn't just do DMA (QEMUSGList-based) memory transfers; it also handles QEMUIOVector copies. Introduce the NvmeTxDirection enum and rename to nvme_tx. Remove mapping of PRPs/SGLs from nvme_tx and instead assert that they have been mapped previously. This allows more fine-grained use in subsequent patches. Add new (better named) helpers, nvme_{c2h,h2c}, that does both PRP/SGL mapping and transfer. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-09	hw/block/nvme: remove the req dependency in map functions	Klaus Jensen
	The PRP and SGL mapping functions does not have any particular need for the entire NvmeRequest as a parameter. Clean it up. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-09	hw/block/nvme: try to deal with the iov/qsg duality	Klaus Jensen
	Introduce NvmeSg and try to deal with that pesky qsg/iov duality that haunts all the memory-related functions. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-09	hw/block/nvme: fix strerror printing	Klaus Jensen
	Fix missing sign inversion. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-09	hw/block/nvme: remove block accounting for write zeroes	Klaus Jensen
	A Write Zeroes commands should not be counted in either the 'Data Units Written' or in 'Host Write Commands' SMART/Health Information Log page. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-09	hw/block/nvme: remove redundant len member in compare context	Klaus Jensen
	The 'len' member of the nvme_compare_ctx struct is redundant since the same information is available in the 'iov' member. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-09	hw/block/nvme: report non-mdts command size limit for dsm	Gollu Appalanaidu
	Dataset Management is not subject to MDTS, but exceeded a certain size per range causes internal looping. Report this limit (DMRSL) in the NVM command set specific identify controller data structure. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-09	hw/block/nvme: add trace event for zone read check	Gollu Appalanaidu
	Add a trace event for the offline zone condition when checking zone read. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> [k.jensen: split commit] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-03-09	hw/block/nvme: fix potential compilation error	Gollu Appalanaidu
	assert may be compiled to a noop and we could end up returning an uninitialized status. Fix this by always returning Internal Device Error as a fallback. Note that, as pointed out by Philippe, per commit 262a69f4282 ("osdep.h: Prohibit disabling assert() in supported builds") this shouldn't be possible. But clean it up so we don't worry about it again. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> [k.jensen: split commit] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-03-09	hw/block/nvme: add identify trace event	Gollu Appalanaidu
	Add a trace event for the Identify command. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
2021-03-09	hw/block/nvme: remove unnecessary endian conversion	Gollu Appalanaidu
	Remove an unnecessary le_to_cpu conversion in Identify. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
2021-03-09	hw/block/nvme: align zoned.zasl with mdts	Klaus Jensen
	ZASL (Zone Append Size Limit) is defined exactly like MDTS (Maximum Data Transfer Size), that is, it is a value in units of the minimum memory page size (CAP.MPSMIN) and is reported as a power of two. The 'mdts' nvme device parameter is specified as in the spec, but the 'zoned.append_size_limit' parameter is specified in bytes. This is suboptimal for a number of reasons: 1. It is just plain confusing wrt. the definition of mdts. 2. There is a lot of complexity involved in validating the value; it must be a power of two, it should be larger than 4k, if it is zero we set it internally to mdts, but still report it as zero. 3. While "hw/block/nvme: improve invalid zasl value reporting" slightly improved the handling of the parameter, the validation is still wrong; it does not depend on CC.MPS, it depends on CAP.MPSMIN. And we are not even checking that it is actually less than or equal to MDTS, which is kinda the one condition it must satisfy. Fix this by defining zasl exactly like mdts and checking the one thing that it must satisfy (that it is less than or equal to mdts). Also, change the default value from 128KiB to 0 (aka, whatever mdts is). Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-09	hw/block/nvme: deduplicate bad mdts trace event	Klaus Jensen
	If mdts is exceeded, trace it from a single place. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-09	hw/block/nvme: document 'mdts' nvme device parameter	Klaus Jensen
	Document the 'mdts' nvme device parameter. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org>
2021-03-09	hw/block/nvme: add broadcast nsid support flush command	Gollu Appalanaidu
	Add support for using the broadcast nsid to issue a flush on all namespaces through a single command. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2021-03-09	hw/block/nvme: use locally assigned QEMU IEEE OUI	Gollu Appalanaidu
	Commit 6eb7a071292a ("hw/block/nvme: change controller pci id") changed the controller to use a Red Hat assigned PCI Device and Vendor ID, but did not change the IEEE OUI away from the Intel IEEE OUI. Fix that and use the locally assigned QEMU IEEE OUI instead if the `use-intel-id` parameter is not explicitly set. Also reverse the Intel IEEE OUI bytes. Signed-off-by: Gollu Appalanaidu <anaidu.gollu@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2021-03-09	hw/block/nvme: improve invalid zasl value reporting	Klaus Jensen
	The Zone Append Size Limit (ZASL) must be at least 4096 bytes, so improve the user experience by adding an early parameter check in nvme_check_constraints. When ZASL is still too small due to the host configuring the device for an even larger page size, convert the trace point in nvme_start_ctrl to an NVME_GUEST_ERR such that this is logged by QEMU instead of only traced. Reported-by: Corne <info@dantalion.nl> Cc: Dmitry Fomichev <Dmitry.Fomichev@wdc.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>