aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-03-04include/block/block_int: split header into I/O and global state APIEmanuele Giuseppe Esposito
Similarly to the previous patch, split block_int.h in block_int-io.h and block_int-global-state.h block_int-common.h contains the structures shared between the two headers, and the functions that can't be categorized as I/O or global state. Assertions are added in the next patch. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-12-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04block.c: assertions to the block layer permissions APIEmanuele Giuseppe Esposito
Now that we "covered" the three main cases where the permission API was being used under BQL (fuse, amend and invalidate_cache), we can safely assert for the permission functions implemented in block.c Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-11-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04IO_CODE and IO_OR_GS_CODE for block-backend I/O APIEmanuele Giuseppe Esposito
Mark all I/O functions with IO_CODE, and all "I/O OR GS" with IO_OR_GS_CODE. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-10-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04block/block-backend.c: assertions for block-backendEmanuele Giuseppe Esposito
All the global state (GS) API functions will check that qemu_in_main_thread() returns true. If not, it means that the safety of BQL cannot be guaranteed, and they need to be moved to I/O. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-9-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04include/sysemu/block-backend: split header into I/O and global state (GS) APIEmanuele Giuseppe Esposito
Similarly to the previous patches, split block-backend.h in block-backend-io.h and block-backend-global-state.h In addition, remove "block/block.h" include as it seems it is not necessary anymore, together with "qemu/iov.h" block-backend-common.h contains the structures shared between the two headers, and the functions that can't be categorized as I/O or global state. Assertions are added in the next patch. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-8-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04block/export/fuse.c: allow writable exports to take RESIZE permissionEmanuele Giuseppe Esposito
Allow writable exports to get BLK_PERM_RESIZE permission from creation, in fuse_export_create(). In this way, there is no need to give the permission in fuse_do_truncate(), which might be run in an iothread. Permissions should be set only in the main thread, so in any case if an iothread tries to set RESIZE, it will be blocked. Also assert in fuse_do_truncate that if we give the RESIZE permission we can then restore the original ones. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303151616.325444-7-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04IO_CODE and IO_OR_GS_CODE for block I/O APIEmanuele Giuseppe Esposito
Mark all I/O functions with IO_CODE, and all "I/O OR GS" with IO_OR_GS_CODE. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-6-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04assertions for block global state APIEmanuele Giuseppe Esposito
All the global state (GS) API functions will check that qemu_in_main_thread() returns true. If not, it means that the safety of BQL cannot be guaranteed, and they need to be moved to I/O. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-5-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04include/block/block: split header into I/O and global state APIEmanuele Giuseppe Esposito
block.h currently contains a mix of functions: some of them run under the BQL and modify the block layer graph, others are instead thread-safe and perform I/O in iothreads. Some others can only be called by either the main loop or the iothread running the AioContext (and not other iothreads), and using them in another thread would cause deadlocks, and therefore it is not ideal to define them as I/O. It is not easy to understand which function is part of which group (I/O vs GS vs "I/O or GS"), and this patch aims to clarify it. The "GS" functions need the BQL, and often use aio_context_acquire/release and/or drain to be sure they can modify the graph safely. The I/O function are instead thread safe, and can run in any AioContext. "I/O or GS" functions run instead in the main loop or in a single iothread, and use BDRV_POLL_WHILE(). By splitting the header in two files, block-io.h and block-global-state.h we have a clearer view on what needs what kind of protection. block-common.h contains common structures shared by both headers. block.h is left there for legacy and to avoid changing all includes in all c files that use the block APIs. Assertions are added in the next patch. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-4-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04main loop: macros to mark GS and I/O functionsEmanuele Giuseppe Esposito
Righ now, IO_CODE and IO_OR_GS_CODE are nop, as there isn't really a way to check that a function is only called in I/O. On the other side, we can use qemu_in_main_thread() to check if we are in the main loop. The usage of macros makes easy to extend them in the future without making changes in all callers. They will also visually help understanding in which category each function is, without looking at the header. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-3-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04main-loop.h: introduce qemu_in_main_thread()Emanuele Giuseppe Esposito
When invoked from the main loop, this function is the same as qemu_mutex_iothread_locked, and returns true if the BQL is held. When invoked from iothreads or tests, it returns true only if the current AioContext is the Main Loop. This essentially just extends qemu_mutex_iothread_locked to work also in unit tests or other users like storage-daemon, that run in the Main Loop but end up using the implementation in stubs/iothread-lock.c. Using qemu_mutex_iothread_locked in unit tests defaults to false because they use the implementation in stubs/iothread-lock, making all assertions added in next patches fail despite the AioContext is still the main loop. See the comment in the function header for more information. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-2-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04iotests/185: Add post-READY quit testsHanna Reitz
185 tests quitting qemu while a block job is active. It does not specifically test quitting qemu while a mirror or active commit job is in its READY phase. Add two test cases for this, where we respectively mirror or commit to an external QSD instance, which provides a throttled block device. qemu is supposed to cancel the job so that it can quit as soon as possible instead of waiting for the job to complete (which it did before 6.2). Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303164814.284974-5-hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04qsd: Add --daemonizeHanna Reitz
To implement this, we reuse the existing daemonizing functions from the system emulator, which mainly do the following: - Fork off a child process, and set up a pipe between parent and child - The parent process waits until the child sends a status byte over the pipe (0 means that the child was set up successfully; anything else (including errors or EOF) means that the child was not set up successfully), and then exits with an appropriate exit status - The child process enters a new session (forking off again), changes the umask, and will ignore terminal signals from then on - Once set-up is complete, the child will chdir to /, redirect all standard I/O streams to /dev/null, and tell the parent that set-up has been completed successfully In contrast to qemu-nbd's --fork implementation, during the set up phase, error messages are not piped through the parent process. qemu-nbd mainly does this to detect errors, though (while os_daemonize() has the child explicitly signal success after set up); because we do not redirect stderr after forking, error messages continue to appear on whatever the parent's stderr was (until set up is complete). Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303164814.284974-4-hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04qsd: Add pre-init argument parsing passHanna Reitz
In contrast to qemu-nbd (where it is called --fork) and the system emulator, QSD does not have a --daemonize switch yet. Just like them, QSD allows setting up block devices and exports on the command line. When doing so, it is often necessary for whoever invoked the QSD to wait until these exports are fully set up. A --daemonize switch allows precisely this, by virtue of the parent process exiting once everything is set up. Note that there are alternative ways of waiting for all exports to be set up, for example: - Passing the --pidfile option and waiting until the respective file exists (but I do not know if there is a way of implementing this without a busy wait loop) - Set up some network server (e.g. on a Unix socket) and have the QSD connect to it after all arguments have been processed by appending corresponding --chardev and --monitor options to the command line, and then wait until the QSD connects Having a --daemonize option would make this simpler, though, without having to rely on additional tools (to set up a network server) or busy waiting. Implementing a --daemonize switch means having to fork the QSD process. Ideally, we should do this as early as possible: All the parent process has to do is to wait for the child process to signal completion of its set-up phase, and therefore there is basically no initialization that needs to be done before the fork. On the other hand, forking after initialization steps means having to consider how those steps (like setting up the block layer or QMP) interact with a later fork, which is often not trivial. In order to fork this early, we must scan the command line for --daemonize long before our current process_options() call. Instead of adding custom new code to do so, just reuse process_options() and give it a @pre_init_pass argument to distinguish the two passes. I believe there are some other switches but --daemonize that deserve parsing in the first pass: - --help and --version are supposed to only print some text and then immediately exit (so any initialization we do would be for naught). This changes behavior, because now "--blockdev inv-drv --help" will print a help text instead of complaining about the --blockdev argument. Note that this is similar in behavior to other tools, though: "--help" is generally immediately acted upon when finding it in the argument list, potentially before other arguments (even ones before it) are acted on. For example, "ls /does-not-exist --help" prints a help text and does not complain about ENOENT. - --pidfile does not need initialization, and is already exempted from the sequential order that process_options() claims to strictly follow (the PID file is only created after all arguments are processed, not at the time the --pidfile argument appears), so it makes sense to include it in the same category as --daemonize. - Invalid arguments should always be reported as soon as possible. (The same caveat with --help applies: That means that "--blockdev inv-drv --inv-arg" will now complain about --inv-arg, not inv-drv.) This patch does make some references to --daemonize without having implemented it yet, but that will happen in the next patch. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20220303164814.284974-3-hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04os-posix: Add os_set_daemonize()Hanna Reitz
The daemonizing functions in os-posix (os_daemonize() and os_setup_post()) only daemonize the process if the static `daemonize` variable is set. Right now, it can only be set by os_parse_cmd_args(). In order to use os_daemonize() and os_setup_post() from the storage daemon to have it be daemonized, we need some other way to set this `daemonize` variable, because I would rather not tap into the system emulator's arg-parsing code. Therefore, this patch adds an os_set_daemonize() function, which will return an error on os-win32 (because daemonizing is not supported there). Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303164814.284974-2-hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04cpus: use coroutine TLS macros for iothread_lockedStefan Hajnoczi
qemu_mutex_iothread_locked() may be used from coroutines. Standard __thread variables cannot be used by coroutines. Use the coroutine TLS macros instead. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220222140150.27240-5-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04rcu: use coroutine TLS macrosStefan Hajnoczi
RCU may be used from coroutines. Standard __thread variables cannot be used by coroutines. Use the coroutine TLS macros instead. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220222140150.27240-4-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04util/async: replace __thread with QEMU TLS macrosStefan Hajnoczi
QEMU TLS macros must be used to make TLS variables safe with coroutines. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220222140150.27240-3-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04tls: add macros for coroutine-safe TLS variablesStefan Hajnoczi
Compiler optimizations can cache TLS values across coroutine yield points, resulting in stale values from the previous thread when a coroutine is re-entered by a new thread. Serge Guelton developed an __attribute__((noinline)) wrapper and tested it with clang and gcc. I formatted his idea according to QEMU's coding style and wrote documentation. The compiler can still optimize based on analyzing noinline code, so an asm volatile barrier with an output constraint is required to prevent unwanted optimizations. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1952483 Suggested-by: Serge Guelton <sguelton@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220222140150.27240-2-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04block: move BQL logic of bdrv_co_invalidate_cache in bdrv_activateEmanuele Giuseppe Esposito
Split bdrv_co_invalidate cache in two: the Global State (under BQL) code that takes care of permissions and running GS callbacks, and leave only the I/O code (->bdrv_co_invalidate_cache) running in the I/O coroutine. The only side effect is that bdrv_co_invalidate_cache is not recursive anymore, and so is every direct call to bdrv_invalidate_cache(). Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220209105452.1694545-6-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04block: rename bdrv_invalidate_cache_all, blk_invalidate_cache and ↵Emanuele Giuseppe Esposito
test_sync_op_invalidate_cache Following the bdrv_activate renaming, change also the name of the respective callers. bdrv_invalidate_cache_all -> bdrv_activate_all blk_invalidate_cache -> blk_activate test_sync_op_invalidate_cache -> test_sync_op_activate No functional change intended. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220209105452.1694545-5-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04block: introduce bdrv_activateEmanuele Giuseppe Esposito
This function is currently just a wrapper for bdrv_invalidate_cache(), but in future will contain the code of bdrv_co_invalidate_cache() that has to always be protected by BQL, and leave the rest in the I/O coroutine. Replace all bdrv_invalidate_cache() invokations with bdrv_activate(). Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220209105452.1694545-4-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04crypto: distinguish between main loop and I/O in ↵Emanuele Giuseppe Esposito
block_crypto_amend_options_generic_luks block_crypto_amend_options_generic_luks uses the block layer permission API, therefore it should be called with the BQL held. However, the same function is being called by two BlockDriver callbacks: bdrv_amend_options (under BQL) and bdrv_co_amend (I/O). The latter is I/O because it is invoked by block/amend.c's blockdev_amend_run(), a .run callback of the amend JobDriver. Therefore we want to change this function to still perform the permission check, but making sure it is done under BQL regardless of the caller context. Remove the permission check in block_crypto_amend_options_generic_luks() and: - in block_crypto_amend_options_luks() (BQL case, called by .bdrv_amend_options()), reuse helper functions block_crypto_amend_{prepare/cleanup} that take care of checking permissions. - for block_crypto_co_amend_luks() (I/O case, called by .bdrv_co_amend()), don't check for permissions but delegate .bdrv_amend_pre_run() and .bdrv_amend_clean() to do it, performing these checks before and after the job runs in its aiocontext. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220209105452.1694545-3-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04crypto: perform permission checks under BQLEmanuele Giuseppe Esposito
Move the permission API calls into driver-specific callbacks that always run under BQL. In this case, bdrv_crypto_luks needs to perform permission checks before and after qcrypto_block_amend_options(). The problem is that the caller, block_crypto_amend_options_generic_luks(), can also run in I/O from .bdrv_co_amend(). This does not comply with Global State-I/O API split, as permissions API must always run under BQL. Firstly, introduce .bdrv_amend_pre_run() and .bdrv_amend_clean() callbacks. These two callbacks are guaranteed to be invoked under BQL, respectively before and after .bdrv_co_amend(). They take care of performing the permission checks in the same way as they are currently done before and after qcrypto_block_amend_options(). These callbacks are in preparation for next patch, where we delete the original permission check. Right now they just add redundant control. Then, call .bdrv_amend_pre_run() before job_start in qmp_x_blockdev_amend(), so that it will be run before the job coroutine is created and stay in the main loop. As a cleanup, use JobDriver's .clean() callback to call .bdrv_amend_clean(), and run amend-specific cleanup callbacks under BQL. After this patch, permission failures occur early in the blockdev-amend job to update a LUKS volume's keys. iotest 296 must now expect them in x-blockdev-amend's QMP reply instead of waiting for the actual job to fail later. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220209105452.1694545-2-eesposit@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220304153729.711387-6-hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04Merge remote-tracking branch 'remotes/nvme/tags/nvme-next-pull-request' into ↵Peter Maydell
staging hw/nvme updates - add enhanced protection information (64-bit guard) # gpg: Signature made Fri 04 Mar 2022 06:23:36 GMT # gpg: using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9 # gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [unknown] # gpg: aka "Klaus Jensen <k.jensen@samsung.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468 4272 63D5 6FC5 E55D A838 # Subkey fingerprint: 5228 33AA 75E2 DCE6 A247 66C0 4DE1 AF31 6D4F 0DE9 * remotes/nvme/tags/nvme-next-pull-request: hw/nvme: 64-bit pi support hw/nvme: add pi tuple size helper hw/nvme: add support for the lbafee hbs feature hw/nvme: move format parameter parsing hw/nvme: add host behavior support feature hw/nvme: move dif/pi prototypes into dif.h Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-03-04hw/vhost-user-i2c: Add support for VIRTIO_I2C_F_ZERO_LENGTH_REQUESTViresh Kumar
VIRTIO_I2C_F_ZERO_LENGTH_REQUEST is a mandatory feature, that must be implemented by everyone. Add its support. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Message-Id: <fc47ab63b1cd414319c9201e8d6c7705b5ec3bd9.1644490993.git.viresh.kumar@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04virtio: fix the condition for iommu_platform not supportedHalil Pasic
The commit 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but unsupported") claims to fail the device hotplug when iommu_platform is requested, but not supported by the (vhost) device. On the first glance the condition for detecting that situation looks perfect, but because a certain peculiarity of virtio_platform it ain't. In fact the aforementioned commit introduces a regression. It breaks virtio-fs support for Secure Execution, and most likely also for AMD SEV or any other confidential guest scenario that relies encrypted guest memory. The same also applies to any other vhost device that does not support _F_ACCESS_PLATFORM. The peculiarity is that iommu_platform and _F_ACCESS_PLATFORM collates "device can not access all of the guest RAM" and "iova != gpa, thus device needs to translate iova". Confidential guest technologies currently rely on the device/hypervisor offering _F_ACCESS_PLATFORM, so that, after the feature has been negotiated, the guest grants access to the portions of memory the device needs to see. So in for confidential guests, generally, _F_ACCESS_PLATFORM is about the restricted access to memory, but not about the addresses used being something else than guest physical addresses. This is the very reason for which commit f7ef7e6e3b ("vhost: correctly turn on VIRTIO_F_IOMMU_PLATFORM") fences _F_ACCESS_PLATFORM from the vhost device that does not need it, because on the vhost interface it only means "I/O address translation is needed". This patch takes inspiration from f7ef7e6e3b ("vhost: correctly turn on VIRTIO_F_IOMMU_PLATFORM"), and uses the same condition for detecting the situation when _F_ACCESS_PLATFORM is requested, but no I/O translation by the device, and thus no device capability is needed. In this situation claiming that the device does not support iommu_plattform=on is counter-productive. So let us stop doing that! Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reported-by: Jakob Naucke <Jakob.Naucke@ibm.com> Fixes: 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but unsupported") Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: qemu-stable@nongnu.org Message-Id: <20220207112857.607829-1-pasic@linux.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-03-04vhost-user: fix VirtQ notifier cleanupXueming Li
When vhost-user device cleanup, remove notifier MR and munmaps notifier address in the event-handling thread, VM CPU thread writing the notifier in concurrent fails with an error of accessing invalid address. It happens because MR is still being referenced and accessed in another thread while the underlying notifier mmap address is being freed and becomes invalid. This patch calls RCU and munmap notifiers in the callback after the memory flatview update finish. Fixes: 44866521bd6e ("vhost-user: support registering external host notifiers") Cc: qemu-stable@nongnu.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Message-Id: <20220207071929.527149-3-xuemingl@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04vhost-user: remove VirtQ notifier restoreXueming Li
Notifier set when vhost-user backend asks qemu to mmap an FD and offset. When vhost-user backend restart or getting killed, VQ notifier FD and mmap addresses become invalid. After backend restart, MR contains the invalid address will be restored and fail on notifier access. On the other hand, qemu should munmap the notifier, release underlying hardware resources to enable backend restart and allocate hardware notifier resources correctly. Qemu shouldn't reference and use resources of disconnected backend. This patch removes VQ notifier restore, uses the default vhost-user notifier to avoid invalid address access. After backend restart, the backend should ask qemu to install a hardware notifier if needed. Fixes: 44866521bd6e ("vhost-user: support registering external host notifiers") Cc: qemu-stable@nongnu.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Message-Id: <20220207071929.527149-2-xuemingl@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04hw/smbios: add assertion to ensure handles of tables 19 and 32 do not collideAni Sinha
Since change dcf359832eec02 ("hw/smbios: fix table memory corruption with large memory vms") we reserve additional space between handle numbers of tables 17 and 19 for large VMs. This may cause table 19 to collide with table 32 in their handle numbers for those large VMs. This change adds an assertion to ensure numbers do not collide. If they do, qemu crashes with useful debug information for taking additional steps. Signed-off-by: Ani Sinha <ani@anisinha.ca> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20220223143322.927136-8-ani@anisinha.ca> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04hw/smbios: fix overlapping table handle numbers with large memory vmsAni Sinha
The current smbios table implementation splits the main memory in 16 GiB (DIMM like) chunks. With the current smbios table assignment code, we can have only 512 such chunks before the 16 bit handle numbers in the header for tables 17 and 19 conflict. A guest with more than 8 TiB of memory will hit this limitation and would fail with the following assertion in isa-debugcon: ASSERT_EFI_ERROR (Status = Already started) ASSERT /builddir/build/BUILD/edk2-ca407c7246bf/OvmfPkg/SmbiosPlatformDxe/SmbiosPlatformDxe.c(125): !EFI_ERROR (Status) This change adds an additional offset between tables 17 and 19 handle numbers when configuring VMs larger than 8 TiB of memory. The value of the offset is calculated to be equal to the additional space required to be reserved in order to accomodate more DIMM entries without the table handles colliding. In normal cases where the VM memory is smaller or equal to 8 TiB, this offset value is 0. Hence in this case, no additional handle numbers are reserved and table handle values remain as before. Since smbios memory is not transmitted over the wire during migration, this change can break migration for large memory vms if the guest is in the middle of generating the tables during migration. However, in those situations, qemu generates invalid table handles anyway with or without this fix. Hence, we do not preserve the old bug by introducing compat knobs/machine types. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2023977 Signed-off-by: Ani Sinha <ani@anisinha.ca> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20220223143322.927136-7-ani@anisinha.ca> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04hw/smbios: code cleanup - use macro definitions for table header handlesAni Sinha
This is a minor cleanup. Using macro definitions makes the code more readable. It is at once clear which tables use which handle numbers in their header. It also makes it easy to calculate the gaps between the numbers and update them if needed. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Ani Sinha <ani@anisinha.ca> Message-Id: <20220223143322.927136-6-ani@anisinha.ca> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2022-03-04hw/acpi/erst: clean up unused IS_UEFI_CPER_RECORD macroAni Sinha
This change is cosmetic. IS_UEFI_CPER_RECORD macro definition that was added as a part of the ERST implementation seems to be unused. Remove it. CC: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Eric DeVolder <eric.devolder@oracle.com> Signed-off-by: Ani Sinha <ani@anisinha.ca> Message-Id: <20220223143322.927136-5-ani@anisinha.ca> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04docs/acpi/erst: add device id for ACPI ERST device in pci-ids.txtAni Sinha
Adding device ID for ERST device in pci-ids.txt. It was missed when ERST related patches were reviewed. CC: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Eric DeVolder <eric.devolder@oracle.com> Signed-off-by: Ani Sinha <ani@anisinha.ca> Message-Id: <20220223143322.927136-4-ani@anisinha.ca> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04MAINTAINERS: no need to add my name explicitly as a reviewer for VIOT tablesAni Sinha
I am already listed as a reviewer for ACPI/SMBIOS subsystem. There is no need to again add me as a reviewer for ACPI/VIOT. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Ani Sinha <ani@anisinha.ca> Message-Id: <20220223143322.927136-3-ani@anisinha.ca> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04ACPI ERST: specification for ERST supportEric DeVolder
Information on the implementation of the ACPI ERST support. Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Acked-by: Ani Sinha <ani@anisinha.ca> Message-Id: <20220223143322.927136-2-ani@anisinha.ca> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04qom: assert integer does not overflowMichael S. Tsirkin
QOM reference counting is not designed with an infinite amount of references in mind, trying to take a reference in a loop without dropping a reference will overflow the integer. It is generally a symptom of a reference leak (a missing deref, commonly as part of error handling - such as one fixed here: https://lore.kernel.org/r/20220228095058.27899-1-sgarzare%40redhat.com ). All this can lead to either freeing the object too early (memory corruption) or never freeing it (memory leak). If we happen to dereference at just the right time (when it's wrapping around to 0), we might eventually assert when dereferencing, but the real problem is an extra object_ref so let's assert there to make such issues cleaner and easier to debug. Some micro-benchmarking shows using fetch and add this is essentially free on x86. Since multiple threads could be incrementing in parallel, we assert around INT_MAX to make sure none of these approach the wrap around point: this way we get a memory leak and not a memory corruption, the former is generally easier to debug. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04hw/display/vmware_vga: replace fprintf calls with trace eventsCarwyn Ellis
Debug output was always being sent to STDERR. This has been replaced with trace events. Signed-off-by: Carwyn Ellis <carwynellis@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220206183956.10694-2-carwynellis@gmail.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2022-03-04Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-nios-20220303' ↵Peter Maydell
into staging Rewrite nios2 interrupt handling # gpg: Signature made Thu 03 Mar 2022 19:52:33 GMT # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth-gitlab/tags/pull-nios-20220303: target/nios2: Rewrite interrupt handling target/nios2: Special case ipending in rdctl and wrctl target/nios2: Split mmu_write target/nios2: Hoist R_ZERO check in rdctl target/nios2: Only build mmu.c for system mode target/nios2: Replace MMU_LOG with tracepoints target/nios2: Remove mmu_read_debug Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-03-04edid: Fix clock of Detailed Timing DescriptorAkihiko Odaki
The clock field is 16-bits in EDID Detailed Timing Descriptor, but edid_desc_timing assumed it is 32-bit. Write the 16-bit value if it fits in 16-bit. Write DisplayID otherwise. Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> Message-Id: <20220213021529.2248-1-akihiko.odaki@gmail.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2022-03-04softmmu/qdev-monitor: Add virtio-gpu-gl aliasesAkihiko Odaki
Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> Message-Id: <20220213021800.2525-1-akihiko.odaki@gmail.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2022-03-04ui/cocoa: Add Services menuAkihiko Odaki
Services menu functionality of Cocoa is described at: https://developer.apple.com/design/human-interface-guidelines/macos/extensions/services/ Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220214091320.51750-1-akihiko.odaki@gmail.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2022-03-04ui/clipboard: fix use-after-free regressionMarc-André Lureau
The same info may be used to update the clipboard, and may be freed before being ref'ed again. Fixes: 70a54b01693ed ("ui: avoid compiler warnings from unused clipboard info variable") Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220214115917.1679568-1-marcandre.lureau@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2022-03-04ui: do not create a surface when resizing a GL scanoutMarc-André Lureau
qemu_console_resize() will create a blank surface and replace the current scanout with it if called while the current scanout is GL (texture or dmabuf). This is not only very costly, but also can produce glitches on the display/listener side. Instead, compare the current console size with the fitting console functions, which also works when the scanout is GL. Note: there might be still an unnecessary surface creation on calling qemu_console_resize() when the size is actually changing, but display backends currently rely on DisplaySurface details during dpy_gfx_switch() to handle various resize aspects. We would need more refactoring to handle resize without DisplaySurface, this is left for a future improvement. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220214201337.1814787-4-marcandre.lureau@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2022-03-04ui/console: fix texture leak when calling surface_gl_create_texture()Marc-André Lureau
Make surface_gl_create_texture() idempotent: if the surface is already bound to a texture, do not create a new one. This fixes texture leaks when there are multiple DBus listeners, for example. Reported-by: Akihiko Odaki <akihiko.odaki@gmail.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220214201337.1814787-3-marcandre.lureau@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2022-03-04ui/console: fix crash when using gl context with non-gl listenersMarc-André Lureau
The commit 7cc712e98 ("ui: dispatch GL events to all listener") mechanically replaced the dpy_gl calls with a dispatch loop, using the same pre-conditions. However, it didn't take into account that all listeners do not have to implement the GL callbacks. Add the missing pre-conditions before calling the callbacks. Fix crash when running a GL-enabled VM with "-device virtio-gpu-gl-pci -display egl-headless -vnc :0". Fixes: 7cc712e98 ("ui: dispatch GL events to all listener") Reported-by: Akihiko Odaki <akihiko.odaki@gmail.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220214201337.1814787-2-marcandre.lureau@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2022-03-04docs: Add spec of OVMF GUIDed table for SEV guestsDov Murik
Add docs/specs/sev-guest-firmware.rst which describes the GUIDed table in the end of OVMF's image which is parsed by QEMU, and currently used to describe some values for SEV and SEV-ES guests. Signed-off-by: Dov Murik <dovmurik@linux.ibm.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220103091413.2869-1-dovmurik@linux.ibm.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2022-03-04hw/i386: Replace magic number with field length calculationDov Murik
Replce the literal magic number 48 with length calculation (32 bytes at the end of the firmware after the table footer + 16 bytes of the OVMF table footer GUID). No functional change intended. Signed-off-by: Dov Murik <dovmurik@linux.ibm.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220222071906.2632426-3-dovmurik@linux.ibm.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2022-03-04hw/i386: Improve bounds checking in OVMF table parsingDov Murik
When pc_system_parse_ovmf_flash() parses the optional GUIDed table in the end of the OVMF flash memory area, the table length field is checked for sizes that are too small, but doesn't error on sizes that are too big (bigger than the flash content itself). Add a check for maximal size of the OVMF table, and add an error report in case the size is invalid. In such a case, an error like this will be displayed during launch: qemu-system-x86_64: OVMF table has invalid size 4047 and the table parsing is skipped. Signed-off-by: Dov Murik <dovmurik@linux.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220222071906.2632426-2-dovmurik@linux.ibm.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2022-03-04coreaudio: Notify error in coreaudio_init_outAkihiko Odaki
Otherwise, the audio subsystem tries to use the voice and eventually aborts due to the maximum number of samples in the buffer is not set. Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220226115953.60335-1-akihiko.odaki@gmail.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>