aboutsummaryrefslogtreecommitdiff
path: root/include/exec/memory.h
AgeCommit message (Collapse)Author
2023-08-01misc: Fix some typos in documentation and commentsStefan Weil
Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230730180329.851576-1-sw@weilnetz.de> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-06-28exec/memory: Add symbol for the min value of memory listener priorityIsaku Yamahata
Add MEMORY_LISTNER_PRIORITY_MIN for the symbolic value for the min value of the memory listener instead of the hard-coded magic value 0. Add explicit initialization. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <29f88477fe82eb774bcfcae7f65ea21995f865f2.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-06-28exec/memory: Add symbol for memory listener priority for device backendIsaku Yamahata
Add MEMORY_LISTENER_PRIORITY_DEV_BACKEND for the symbolic value for memory listener to replace the hard-coded value 10 for the device backend. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <8314d91688030d7004e96958f12e2c83fb889245.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-06-28exec/memory: Add symbolic value for memory listener priority for accelIsaku Yamahata
Add MEMORY_LISTNER_PRIORITY_ACCEL for the symbolic value for the memory listener to replace the hard-coded value 10 for accel. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <feebe423becc6e2aa375f59f6abce9a85bc15abb.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-06-13exec/memory: Introduce RAM_NAMED_FILE flagSteve Sistare
migrate_ignore_shared() is an optimization that avoids copying memory that is visible and can be mapped on the target. However, a memory-backend-ram or a memory-backend-memfd block with the RAM_SHARED flag set is not migrated when migrate_ignore_shared() is true. This is wrong, because the block has no named backing store, and its contents will be lost. To fix, ignore shared memory iff it is a named file. Define a new flag RAM_NAMED_FILE to distinguish this case. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <1686151116-253260-1-git-send-email-steven.sistare@oracle.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-05-23hostmem-file: add offset optionAlexander Graf
Add an option for hostmem-file to start the memory object at an offset into the target file. This is useful if multiple memory objects reside inside the same target file, such as a device node. In particular, it's useful to map guest memory directly into /dev/mem for experimentation. To make this work consistently, also fix up all places in QEMU that expect fd offsets to be 0. Signed-off-by: Alexander Graf <graf@amazon.com> Message-Id: <20230403221421.60877-1-graf@amazon.com> Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2023-05-18migration: Add last stage indicator to global dirty logGavin Shan
The global dirty log synchronization is used when KVM and dirty ring are enabled. There is a particularity for ARM64 where the backup bitmap is used to track dirty pages in non-running-vcpu situations. It means the dirty ring works with the combination of ring buffer and backup bitmap. The dirty bits in the backup bitmap needs to collected in the last stage of live migration. In order to identify the last stage of live migration and pass it down, an extra parameter is added to the relevant functions and callbacks. This last stage indicator isn't used until the dirty ring is enabled in the subsequent patches. No functional change intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Message-Id: <20230509022122.20888-2-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-28memory: prevent dma-reentracy issuesAlexander Bulekov
Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA. This flag is set/checked prior to calling a device's MemoryRegion handlers, and set when device code initiates DMA. The purpose of this flag is to prevent two types of DMA-based reentrancy issues: 1.) mmio -> dma -> mmio case 2.) bh -> dma write -> mmio case These issues have led to problems such as stack-exhaustion and use-after-frees. Summary of the problem from Peter Maydell: https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1282 Resolves: CVE-2023-0330 Signed-off-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230427211013.2994127-2-alxndr@bu.edu> [thuth: Replace warn_report() with warn_report_once()] Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-03-16exec/memory: Fix kernel-doc warningBernhard Beschow
During build the kernel-doc script complains about the following issue: src/docs/../include/exec/memory.h:1741: warning: Function parameter or member 'n' not described in 'memory_region_unmap_iommu_notifier_range' src/docs/../include/exec/memory.h:1741: warning: Excess function parameter 'notifier' description in 'memory_region_unmap_iommu_notifier_range' Settle on "notifier" for consistency with other memory functions. Fixes: 7caebbf9ea53 ("memory: introduce memory_region_unmap_iommu_notifier_range()") Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230315072552.47117-1-shentey@gmail.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-02memory: introduce memory_region_unmap_iommu_notifier_range()Jason Wang
This patch introduces a new helper to unmap the range of a specific IOMMU notifier. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230223065924.42503-4-jasowang@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-01-27intel-iommu: Document iova_treePeter Xu
It seems not super clear on when iova_tree is used, and why. Add a rich comment above iova_tree to track why we needed the iova_tree, and when we need it. Also comment for the map/unmap messages, on how they're used and implications (e.g. unmap can be larger than the mapped ranges). Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20230109193727.1360190-1-peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-01-09exec/memory: Expose memory_region_access_valid()Philippe Mathieu-Daudé
Instead of having hardware device poking into memory internal API, expose memory_region_access_valid(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20221217152454.96388-2-philmd@linaro.org> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-11-11Fix several typos in documentation (found by codespell)Stefan Weil
Those typos are in files which are used to generate the QEMU manual. Signed-off-by: Stefan Weil <sw@weilnetz.de> Message-Id: <20221110190825.879620-1-sw@weilnetz.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Ani Sinha <ani@anisinha.ca> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> [thuth: update sentence in can.rst as suggested by Peter] Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-11-07Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi
into staging pci,pc,virtio: features, tests, fixes, cleanups lots of acpi rework first version of biosbits infrastructure ASID support in vhost-vdpa core_count2 support in smbios PCIe DOE emulation virtio vq reset HMAT support part of infrastructure for viommu support in vhost-vdpa VTD PASID support fixes, tests all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmNpXDkPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpD0AH/2G8ZPrgrxJC9y3uD5/5J6QRzO+TsDYbg5ut # uBf4rKSHHzcu6zdyAfsrhbAKKzyD4HrEGNXZrBjnKM1xCiB/SGBcDIWntwrca2+s # 5Dpbi4xvd4tg6tVD4b47XNDCcn2uUbeI0e2M5QIbtCmzdi/xKbFAfl5G8DQp431X # Kmz79G4CdKWyjVlM0HoYmdCw/4FxkdjD02tE/Uc5YMrePNaEg5Bw4hjCHbx1b6ur # 6gjeXAtncm9s4sO0l+sIdyiqlxiTry9FSr35WaQ0qPU+Og5zaf1EiWfdl8TRo4qU # EAATw5A4hyw11GfOGp7oOVkTGvcNB/H7aIxD7emdWZV8+BMRPKo= # =zTCn # -----END PGP SIGNATURE----- # gpg: Signature made Mon 07 Nov 2022 14:27:53 EST # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (83 commits) checkpatch: better pattern for inline comments hw/virtio: introduce virtio_device_should_start tests/acpi: update tables for new core count test bios-tables-test: add test for number of cores > 255 tests/acpi: allow changes for core_count2 test bios-tables-test: teach test to use smbios 3.0 tables hw/smbios: add core_count2 to smbios table type 4 vhost-user: Support vhost_dev_start vhost: Change the sequence of device start intel-iommu: PASID support intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function intel-iommu: drop VTDBus intel-iommu: don't warn guest errors when getting rid2pasid entry vfio: move implement of vfio_get_xlat_addr() to memory.c tests: virt: Update expected *.acpihmatvirt tables tests: acpi: aarch64/virt: add a test for hmat nodes with no initiators hw/arm/virt: Enable HMAT on arm virt machine tests: Add HMAT AArch64/virt empty table files tests: acpi: q35: update expected blobs *.hmat-noinitiators expected HMAT: tests: acpi: q35: add test for hmat nodes without initiators ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-11-07vfio: move implement of vfio_get_xlat_addr() to memory.cCindy Lu
- Move the implement vfio_get_xlat_addr to softmmu/memory.c, and change the name to memory_get_xlat_addr(). So we can use this function on other devices, such as vDPA device. - Add a new function vfio_get_xlat_addr in vfio/common.c, and it will check whether the memory is backed by a discard manager. then device can have its own warning. Signed-off-by: Cindy Lu <lulu@redhat.com> Message-Id: <20221031031020.1405111-2-lulu@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-11-05Fix some typos in documentation and commentsStefan Weil
Most of them were found and fixed using codespell. Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20221030105944.311940-1-sw@weilnetz.de> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-07-20softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodicallyHyman Huang(黄勇)
Introduce the third method GLOBAL_DIRTY_LIMIT of dirty tracking for calculate dirtyrate periodly for dirty page rate limit. Add dirtylimit.c to implement dirtyrate calculation periodly, which will be used for dirty page rate limit. Add dirtylimit.h to export util functions for dirty page rate limit implementation. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <5d0d641bffcb9b1c4cc3e323b6dfecb36050d948.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-15vfio-user: handle PCI BAR accessesJagannathan Raman
Determine the BARs used by the PCI device and register handlers to manage the access to the same. Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 3373e10b5be5f42846f0632d4382466e1698c505.1655151679.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-04-06Replace TARGET_WORDS_BIGENDIANMarc-André Lureau
Convert the TARGET_WORDS_BIGENDIAN macro, similarly to what was done with HOST_BIG_ENDIAN. The new TARGET_BIG_ENDIAN macro is either 0 or 1, and thus should always be defined to prevent misuse. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Suggested-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220323155743.1585078-8-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06Replace config-time define HOST_WORDS_BIGENDIANMarc-André Lureau
Replace a config-time define with a compile time condition define (compatible with clang and gcc) that must be declared prior to its usage. This avoids having a global configure time define, but also prevents from bad usage, if the config header wasn't included before. This can help to make some code independent from qemu too. gcc supports __BYTE_ORDER__ from about 4.6 and clang from 3.2. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> [ For the s390x parts I'm involved in ] Acked-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220323155743.1585078-7-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-20exec/memory: Extract address_space_set() from dma_memory_set()Philippe Mathieu-Daudé
dma_memory_set() does a DMA barrier, set the address space with a constant value. The constant value filling code is not specific to DMA and can be used for AddressSpace. Extract it as a new helper: address_space_set(). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> [lv: rebase] Signed-off-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20220115203725.3834712-2-laurent@vivier.eu>
2022-01-18memory: Update description of memory_region_is_mapped()David Hildenbrand
Let's update the documentation, making it clearer what the semantics of memory_region_is_mapped() actually are. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211102164317.45658-4-david@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2022-01-18memory: Make memory_region_is_mapped() succeed when mapped via an aliasDavid Hildenbrand
memory_region_is_mapped() currently does not return "true" when a memory region is mapped via an alias. Assuming we have: alias (A0) -> alias (A1) -> region (R0) Mapping A0 would currently only make memory_region_is_mapped() succeed on A0, but not on A1 and R0. Let's fix that by adding a "mapped_via_alias" counter to memory regions and updating it accordingly when an alias gets (un)mapped. I am not aware of actual issues, this is rather a cleanup to make it consistent. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211102164317.45658-3-david@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2021-11-01memory: Introduce replay_discarded callback for RamDiscardManagerDavid Hildenbrand
Introduce replay_discarded callback similar to our existing replay_populated callback, to be used my migration code to never migrate discarded memory. Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01memory: make global_dirty_tracking a bitmaskHyman Huang(黄勇)
since dirty ring has been introduced, there are two methods to track dirty pages of vm. it seems that "logging" has a hint on the method, so rename the global_dirty_log to global_dirty_tracking would make description more accurate. dirty rate measurement may start or stop dirty tracking during calculation. this conflict with migration because stop dirty tracking make migration leave dirty pages out then that'll be a problem. make global_dirty_tracking a bitmask can let both migration and dirty rate measurement work fine. introduce GLOBAL_DIRTY_MIGRATION and GLOBAL_DIRTY_DIRTY_RATE to distinguish what current dirty tracking aims for, migration or dirty rate. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Message-Id: <9c9388657cfa0301bd2c1cfa36e7cf6da4aeca19.1624040308.git.huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-09-30memory: Name all the memory listenersPeter Xu
Provide a name field for all the memory listeners. It can be used to identify which memory listener is which. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20210817013553.30584-2-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-09-30memory: Add RAM_PROTECTED flag to skip IOMMU mappingsSean Christopherson
Add a new RAMBlock flag to denote "protected" memory, i.e. memory that looks and acts like RAM but is inaccessible via normal mechanisms, including DMA. Use the flag to skip protected memory regions when mapping RAM for DMA in VFIO. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-08softmmu/physmem: Extend ram_block_discard_(require|disable) by two discard typesDavid Hildenbrand
We want to separate the two cases whereby we discard ram - uncoordinated: e.g., virito-balloon - coordinated: e.g., virtio-mem coordinated via the RamDiscardManager Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-12-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-08memory: Helpers to copy/free a MemoryRegionSectionDavid Hildenbrand
In case one wants to create a permanent copy of a MemoryRegionSections, one needs access to flatview_ref()/flatview_unref(). Instead of exposing these, let's just add helpers to copy/free a MemoryRegionSection and properly adjust references. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-3-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-07-08memory: Introduce RamDiscardManager for RAM memory regionsDavid Hildenbrand
We have some special RAM memory regions (managed by virtio-mem), whereby the guest agreed to only use selected memory ranges. "unused" parts are discarded so they won't consume memory - to logically unplug these memory ranges. Before the VM is allowed to use such logically unplugged memory again, coordination with the hypervisor is required. This results in "sparse" mmaps/RAMBlocks/memory regions, whereby only coordinated parts are valid to be used/accessed by the VM. In most cases, we don't care about that - e.g., in KVM, we simply have a single KVM memory slot. However, in case of vfio, registering the whole region with the kernel results in all pages getting pinned, and therefore an unexpected high memory consumption - discarding of RAM in that context is broken. Let's introduce a way to coordinate discarding/populating memory within a RAM memory region with such special consumers of RAM memory regions: they can register as listeners and get updates on memory getting discarded and populated. Using this machinery, vfio will be able to map only the currently populated parts, resulting in discarded parts not getting pinned and not consuming memory. A RamDiscardManager has to be set for a memory region before it is getting mapped, and cannot change while the memory region is mapped. Note: At some point, we might want to let RAMBlock users (esp. vfio used for nvme://) consume this interface as well. We'll need RAMBlock notifier calls when a RAMBlock is getting mapped/unmapped (via the corresponding memory region), so we can properly register a listener there as well. Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Auger Eric <eric.auger@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: teawater <teawaterz@linux.alibaba.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210413095531.25603-2-david@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-06-15memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap()David Hildenbrand
Let's introduce RAM_NORESERVE, allowing mmap'ing with MAP_NORESERVE. The new flag has the following semantics: " RAM is mmap-ed with MAP_NORESERVE. When set, reserving swap space (or huge pages if applicable) is skipped: will bail out if not supported. When not set, the OS will do the reservation, if supported for the memory type. " Allow passing it into: - memory_region_init_ram_nomigrate() - memory_region_init_resizeable_ram() - memory_region_init_ram_from_file() ... and teach qemu_ram_mmap() and qemu_anon_ram_alloc() about the flag. Bail out if the flag is not supported, which is the case right now for both, POSIX and win32. We will add Linux support next and allow specifying RAM_NORESERVE via memory backends. The target use case is virtio-mem, which dynamically exposes memory inside a large, sparse memory area to the VM. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-9-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15softmmu/memory: Pass ram_flags to memory_region_init_ram_shared_nomigrate()David Hildenbrand
Let's forward ram_flags instead, renaming memory_region_init_ram_shared_nomigrate() into memory_region_init_ram_flags_nomigrate(). Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-6-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15softmmu/memory: Pass ram_flags to qemu_ram_alloc_from_fd()David Hildenbrand
Let's pass in ram flags just like we do with qemu_ram_alloc_from_file(), to clean up and prepare for more flags. Simplify the documentation of passed ram flags: Looking at our documentation of RAM_SHARED and RAM_PMEM is sufficient, no need to be repetitive. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-5-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-05docs: fix broken referenceJohn Snow
Long story short, we need a space here for the reference to work correctly. Longer story: Without the space, kerneldoc generates a line like this: one of :c:type:`MemoryListener.region_add\(\) <MemoryListener>`,:c:type:`MemoryListener.region_del\(\) Sphinx does not process the role information correctly, so we get this (my pseudo-notation) construct: <text>,:c:type:</text> <reference target="MemoryListener">MemoryListener.region_del()</reference> which does not reference the desired entity, and leaves some extra junk in the rendered output. See https://qemu-project.gitlab.io/qemu/devel/memory.html#c.MemoryListener member log_start for an example of the broken output as it looks today. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20210511192950.2061326-1-jsnow@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-05-28Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-tcg-20210526' ↵Peter Maydell
into staging Adjust types for some memory access functions. Reduce inclusion of tcg headers. Fix watchpoints vs replay. Fix tcg/aarch64 roli expansion. Introduce SysemuCPUOps structure. # gpg: Signature made Thu 27 May 2021 00:43:54 BST # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth-gitlab/tags/pull-tcg-20210526: (31 commits) hw/core: Constify TCGCPUOps target/mips: Fold jazz behaviour into mips_cpu_do_transaction_failed cpu: Move CPUClass::get_paging_enabled to SysemuCPUOps cpu: Move CPUClass::get_memory_mapping to SysemuCPUOps cpu: Move CPUClass::get_phys_page_debug to SysemuCPUOps cpu: Move CPUClass::asidx_from_attrs to SysemuCPUOps cpu: Move CPUClass::write_elf* to SysemuCPUOps cpu: Move CPUClass::get_crash_info to SysemuCPUOps cpu: Move CPUClass::virtio_is_big_endian to SysemuCPUOps cpu: Move CPUClass::vmsd to SysemuCPUOps cpu: Introduce SysemuCPUOps structure cpu: Move AVR target vmsd field from CPUClass to DeviceClass cpu: Rename CPUClass vmsd -> legacy_vmsd cpu: Assert DeviceClass::vmsd is NULL on user emulation cpu: Directly use get_memory_mapping() fallback handlers in place cpu: Directly use get_paging_enabled() fallback handlers in place cpu: Directly use cpu_write_elf*() fallback handlers in place cpu: Introduce cpu_virtio_is_big_endian() cpu: Un-inline cpu_get_phys_page_debug and cpu_asidx_from_attrs cpu: Split as cpu-common / cpu-sysemu ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-05-26exec/memory: Use correct type sizePhilippe Mathieu-Daudé
Use uint8_t for (unsigned) byte. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210518183655.1711377-7-philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-05-26memory: Introduce log_sync_global() to memory listenerPeter Xu
Some of the memory listener may want to do log synchronization without being able to specify a range of memory to sync but always globally. Such a memory listener should provide this new method instead of the log_sync() method. Obviously we can also achieve similar thing when we put the global sync logic into a log_sync() handler. However that's not efficient enough because otherwise memory_global_dirty_log_sync() may do the global sync N times, where N is the number of flat ranges in the address space. Make this new method be exclusive to log_sync(). Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210506160549.130416-2-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-05-13migration/ram: Handle RAM block resizes during precopyDavid Hildenbrand
Resizing while migrating is dangerous and does not work as expected. The whole migration code works on the usable_length of ram blocks and does not expect this to change at random points in time. In the case of precopy, the ram block size must not change on the source, after syncing the RAM block list in ram_save_setup(), so as long as the guest is still running on the source. Resizing can be trigger *after* (but not during) a reset in ACPI code by the guest - hw/arm/virt-acpi-build.c:acpi_ram_update() - hw/i386/acpi-build.c:acpi_ram_update() Use the ram block notifier to get notified about resizes. Let's simply cancel migration and indicate the reason. We'll continue running on the source. No harm done. Update the documentation. Postcopy will be handled separately. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-5-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Manual merge
2021-03-23memory: Add offset_in_region to flatview_cb argumentsPeter Maydell
The function flatview_for_each_range() calls a callback for each range in a FlatView. Currently the callback gets the start and length of the range and the MemoryRegion involved, but not the offset within the MemoryRegion. Add this to the callback's arguments; we're going to want it for a new use in the next commit. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20210318174823.18066-4-peter.maydell@linaro.org
2021-03-23memory: Document flatview_for_each_range()Peter Maydell
Add a documentation comment describing flatview_for_each_range(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20210318174823.18066-3-peter.maydell@linaro.org
2021-03-23memory: Make flatview_cb return bool, not intPeter Maydell
The return value of the flatview_cb callback passed to the flatview_for_each_range() function is zero if the iteration through the ranges should continue, or non-zero to break out of it. Use a bool for this rather than int. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20210318174823.18066-2-peter.maydell@linaro.org
2021-03-19memory: Drop "qemu:" prefix from QOM memory region type namesMarkus Armbruster
Almost all QOM type names consist only of letters, digits, '-', '_', and '.'. Just two contain ':': "qemu:memory-region" and "qemu:iommu-memory-region". Neither can be plugged with -object. Rename them to "memory-region" and "iommu-memory-region". Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210304140229.575481-3-armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-09exec/memory: Use struct Object typedefPhilippe Mathieu-Daudé
We forward-declare Object typedef in "qemu/typedefs.h" since commit ca27b5eb7cd ("qom/object: Move Object typedef to 'qemu/typedefs.h'"). Use it everywhere to make the code simpler. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20210225182003.3629342-1-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-02-09memory: alloc RAM from file at offsetJagannathan Raman
Allow RAM MemoryRegion to be created from an offset in a file, instead of allocating at offset of 0 by default. This is needed to synchronize RAM between QEMU & remote process. Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 609996697ad8617e3b01df38accc5c208c24d74e.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-02-09Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into ↵Peter Maydell
staging * Fuzzing improvements (Qiuhao, Alexander) * i386: Fix BMI decoding for instructions with the 0x66 prefix (David) * initial attempt at fixing event_notifier emulation (Maxim) * i386: PKS emulation, fix for "qemu-system-i386 -cpu host" (myself) * meson: RBD test fixes (myself) * meson: TCI warnings (Philippe) * Leaner build for --disable-guest-agent, --disable-system and --disable-tools (Philippe, Stefan) * --enable-tcg-interpreter fix (Richard) * i386: SVM feature bits (Wei) * KVM bugfix (Thomas H.) * Add missing MemoryRegionOps callbacks (PJP) # gpg: Signature made Mon 08 Feb 2021 14:15:35 GMT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (46 commits) target/i386: Expose VMX entry/exit load pkrs control bits target/i386: Add support for save/load IA32_PKRS MSR imx7-ccm: add digprog mmio write method tz-ppc: add dummy read/write methods spapr_pci: add spapr msi read method nvram: add nrf51_soc flash read method prep: add ppc-parity write method vfio: add quirk device write method pci-host: designware: add pcie-msi read method hw/pci-host: add pci-intack write method cpu-throttle: Remove timer_mod() from cpu_throttle_set() replay: rng-builtin support pc-bios/descriptors: fix paths in json files replay: fix replay of the interrupts accel/kvm/kvm-all: Fix wrong return code handling in dirty log code qapi/meson: Restrict UI module to system emulation and tools qapi/meson: Restrict system-mode specific modules qapi/meson: Remove QMP from user-mode emulation qapi/meson: Restrict qdev code to system-mode emulation meson: Restrict emulation code ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-02-08fuzz: ignore address_space_map is_write flagAlexander Bulekov
We passed an is_write flag to the fuzz_dma_read_cb function to differentiate between the mapped DMA regions that need to be populated with fuzzed data, and those that don't. We simply passed through the address_space_map is_write parameter. The goal was to cut down on unnecessarily populating mapped DMA regions, when they are not read from. Unfortunately, nothing precludes code from reading from regions mapped with is_write=true. For example, see: https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04729.html This patch removes the is_write parameter to fuzz_dma_read_cb. As a result, we will fill all mapped DMA regions with fuzzed data, ignoring the specified transfer direction. Signed-off-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Message-Id: <20210120060255.558535-1-alxndr@bu.edu>
2021-02-08migration: support UFFD write fault processing in ram_save_iterate()Andrey Gruzdev
In this particular implementation the same single migration thread is responsible for both normal linear dirty page migration and procesing UFFD page fault events. Processing write faults includes reading UFFD file descriptor, finding respective RAM block and saving faulting page to the migration stream. After page has been saved, write protection can be removed. Since asynchronous version of qemu_put_buffer() is expected to be used to save pages, we also have to flush migraion stream prior to un-protecting saved memory range. Write protection is being removed for any previously protected memory chunk that has hit the migration stream. That's valid for pages from linear page scan along with write fault pages. Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210129101407.103458-4-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> fixup pagefault.address cast for 32bit
2021-02-08migration: introduce UFFD-WP low-level interface helpersAndrey Gruzdev
Glue code to the userfaultfd kernel implementation. Querying feature support, createing file descriptor, feature control, memory region registration, IOCTLs on registered registered regions. Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20210129101407.103458-3-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fixed up range.start casting for 32bit
2021-02-01memory: add readonly support to memory_region_init_ram_from_file()Stefan Hajnoczi
There is currently no way to open(O_RDONLY) and mmap(PROT_READ) when creating a memory region from a file. This functionality is needed since the underlying host file may not allow writing. Add a bool readonly argument to memory_region_init_ram_from_file() and the APIs it calls. Extend memory_region_init_ram_from_file() rather than introducing a memory_region_init_rom_from_file() API so that callers can easily make a choice between read/write and read-only at runtime without calling different APIs. No new RAMBlock flag is introduced for read-only because it's unclear whether RAMBlocks need to know that they are read-only. Pass a bool readonly argument instead. Both of these design decisions can be changed in the future. It just seemed like the simplest approach to me. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20210104171320.575838-2-stefanha@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-08memory: Add IOMMU_NOTIFIER_DEVIOTLB_UNMAP IOMMUTLBNotificationTypeEugenio Pérez
This allows us to differentiate between regular IOMMU map/unmap events and DEVIOTLB unmap. Doing so, notifiers that only need device IOTLB invalidations will not receive regular IOMMU unmappings. Adapt intel and vhost to use it. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20201116165506.31315-4-eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>