aboutsummaryrefslogtreecommitdiff
path: root/hw/mem
AgeCommit message (Collapse)Author
2024-09-13hw: Use device_class_set_legacy_reset() instead of opencodingPeter Maydell
Use device_class_set_legacy_reset() instead of opencoding an assignment to DeviceClass::reset. This change was produced with: spatch --macro-file scripts/cocci-macro-file.h \ --sp-file scripts/coccinelle/device-reset.cocci \ --keep-comments --smpl-spacing --in-place --dir hw Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240830145812.1967042-8-peter.maydell@linaro.org
2024-07-21hw/cxl/cxl-mailbox-utils: Add device DDR5 ECS control featureShiju Jose
CXL spec 3.1 section 8.2.9.9.11.2 describes the DDR5 Error Check Scrub (ECS) control feature. The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM Specification (JESD79-5) and allows the DRAM to internally read, correct single-bit errors, and write back corrected data bits to the DRAM array while providing transparency to error counts. The ECS control feature allows the request to configure ECS input configurations during system boot or at run-time. The ECS control allows the requester to change the log entry type, the ECS threshold count provided that the request is within the definition specified in DDR5 mode registers, change mode between codeword mode and row count mode, and reset the ECS counter. Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Fan Ni <fan.ni@samsung.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://lore.kernel.org/r/20240223085902.1549-4-shiju.jose@huawei.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240705123039.963781-5-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-21hw/cxl/cxl-mailbox-utils: Add device patrol scrub control featureShiju Jose
CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub control feature. The device patrol scrub proactively locates and makes corrections to errors in regular cycle. The patrol scrub control allows the request to configure patrol scrub input configurations. The patrol scrub control allows the requester to specify the number of hours for which the patrol scrub cycles must be completed, provided that the requested number is not less than the minimum number of hours for the patrol scrub cycle that the device is capable of. In addition, the patrol scrub controls allow the host to disable and enable the feature in case disabling of the feature is needed for other purposes such as performance-aware operations which require the background operations to be turned off. Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Fan Ni <fan.ni@samsung.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://lore.kernel.org/r/20240223085902.1549-3-shiju.jose@huawei.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240705123039.963781-4-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-21hw/cxl/mbox: replace sanitize_running() with cxl_dev_media_disabled()Hyeonggon Yoo
The spec states that reads/writes should have no effect and a part of commands should be ignored when the media is disabled, not when the sanitize command is running. Introduce cxl_dev_media_disabled() to check if the media is disabled and replace sanitize_running() with it. Make sure that the media has been correctly disabled during sanitation by adding an assert to __toggle_media(). Now, enabling when already enabled or vice versa results in an assert() failure. Suggested-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Link: https://lore.kernel.org/r/20231222090051.3265307-4-42.hyeyoo@gmail.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240705120643.959422-3-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-21hw/cxl: Add get scan media capabilities cmd supportDavidlohr Bueso
Use simple heuristics to determine the cost of scanning any given chunk, assuming cost is equal across the whole device, without differentiating between volatile or persistent partitions. This is aligned to the fact that these constraints are not enforced in respective poison query commands. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://lore.kernel.org/r/20230908073152.4386-3-dave@stgolabs.net Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240705120643.959422-2-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-21hw/cxl: Check for multiple mappings of memory backends.Jonathan Cameron
Similar protection to that provided for -numa memdev=x to make sure that memory used to back a type3 device is not also mapped as normal RAM, or for multiple type3 devices. This is an easy footgun to remove and seems multiple people have run into it. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240705113956.941732-4-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-03hw/cxl/events: Improve QMP interfaces and documentation for add/release ↵Jonathan Cameron
dynamic capacity. New DCD command definitions updated in response to review comments from Markus. - Used CxlXXXX instead of CXLXXXXX for newly added types. - Expanded some abreviations in type names to be easier to read. - Additional documentation for some fields. - Replace slightly vague cxl r3.1 references with "Compute Express Link (CXL) Specification, Revision 3.1, XXXX" to bring them inline with what it says on the specification cover. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240625170805.359278-2-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-01hw/cxl: Fix read from bogus memoryIra Weiny
Peter and coverity report: We've passed '&data' to address_space_write(), which means "read from the address on the stack where the function argument 'data' lives", so instead of writing 64 bytes of data to the guest , we'll write 64 bytes which start with a host pointer value and then continue with whatever happens to be on the host stack after that. Indeed the intention was to write 64 bytes of data at the address given. Fix the parameter to address_space_write(). Reported-by: Peter Maydell <peter.maydell@linaro.org> Link: https://lore.kernel.org/all/CAFEAcA-u4sytGwTKsb__Y+_+0O2-WwARntm3x8WNhvL1WfHOBg@mail.gmail.com/ Fixes: 6bda41a69bdc ("hw/cxl: Add clear poison mailbox command support.") Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Message-Id: <20240531-fix-poison-set-cacheline-v1-1-e3bc7e8f1158@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-07-01hw/mem/cxl_type3: Allow to release extent superset in QMP interfaceFan Ni
Before the change, the QMP interface used for add/release DC extents only allows to release an extent whose DPA range is contained by a single accepted extent in the device. With the change, we relax the constraints. As long as the DPA range of the extent is covered by accepted extents, we allow the release. Tested-by: Svetly Todorov <svetly.todorov@memverge.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Fan Ni <fan.ni@samsung.com> Message-Id: <20240523174651.1089554-15-nifan.cxl@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-01hw/mem/cxl_type3: Add DPA range validation for accesses to DC regionsFan Ni
All DPA ranges in the DC regions are invalid to access until an extent covering the range has been successfully accepted by the host. A bitmap is added to each region to record whether a DC block in the region has been backed by a DC extent. Each bit in the bitmap represents a DC block. When a DC extent is accepted, all the bits representing the blocks in the extent are set, which will be cleared when the extent is released. Tested-by: Svetly Todorov <svetly.todorov@memverge.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Fan Ni <fan.ni@samsung.com> Message-Id: <20240523174651.1089554-13-nifan.cxl@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-01hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extentsFan Ni
To simulate FM functionalities for initiating Dynamic Capacity Add (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue add/release dynamic capacity extents requests. With the change, we allow to release an extent only when its DPA range is contained by a single accepted extent in the device. That is to say, extent superset release is not supported yet. 1. Add dynamic capacity extents: For example, the command to add two continuous extents (each 128MiB long) to region 0 (starting at DPA offset 0) looks like below: { "execute": "qmp_capabilities" } { "execute": "cxl-add-dynamic-capacity", "arguments": { "path": "/machine/peripheral/cxl-dcd0", "host-id": 0, "selection-policy": "prescriptive", "region": 0, "extents": [ { "offset": 0, "len": 134217728 }, { "offset": 134217728, "len": 134217728 } ] } } 2. Release dynamic capacity extents: For example, the command to release an extent of size 128MiB from region 0 (DPA offset 128MiB) looks like below: { "execute": "cxl-release-dynamic-capacity", "arguments": { "path": "/machine/peripheral/cxl-dcd0", "host-id": 0, "removal-policy":"prescriptive", "region": 0, "extents": [ { "offset": 134217728, "len": 134217728 } ] } } Tested-by: Svetly Todorov <svetly.todorov@memverge.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Signed-off-by: Fan Ni <fan.ni@samsung.com> Message-Id: <20240523174651.1089554-12-nifan.cxl@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-01hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release ↵Fan Ni
dynamic capacity response Per CXL spec 3.1, two mailbox commands are implemented: Add Dynamic Capacity Response (Opcode 4802h) 8.2.9.9.9.3, and Release Dynamic Capacity (Opcode 4803h) 8.2.9.9.9.4. For the process of the above two commands, we use two-pass approach. Pass 1: Check whether the input payload is valid or not; if not, skip Pass 2 and return mailbox process error. Pass 2: Do the real work--add or release extents, respectively. Tested-by: Svetly Todorov <svetly.todorov@memverge.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Signed-off-by: Fan Ni <fan.ni@samsung.com> Message-Id: <20240523174651.1089554-11-nifan.cxl@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-01hw/mem/cxl_type3: Add DC extent list representative and get DC extent list ↵Fan Ni
mailbox support Add dynamic capacity extent list representative to the definition of CXLType3Dev and implement get DC extent list mailbox command per CXL.spec.3.1:.8.2.9.9.9.2. Tested-by: Svetly Todorov <svetly.todorov@memverge.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Fan Ni <fan.ni@samsung.com> Message-Id: <20240523174651.1089554-10-nifan.cxl@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-01hw/mem/cxl_type3: Add host backend and address space handling for DC regionsFan Ni
Add (file/memory backed) host backend for DCD. All the dynamic capacity regions will share a single, large enough host backend. Set up address space for DC regions to support read/write operations to dynamic capacity for DCD. With the change, the following support is added: 1. Add a new property to type3 device "volatile-dc-memdev" to point to host memory backend for dynamic capacity. Currently, all DC regions share one host backend; 2. Add namespace for dynamic capacity for read/write support; 3. Create cdat entries for each dynamic capacity region. Reviewed-by: Gregory Price <gregory.price@memverge.com> Signed-off-by: Fan Ni <fan.ni@samsung.com> Message-Id: <20240523174651.1089554-9-nifan.cxl@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-01hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size ↵Fan Ni
instead of mr as argument The function ct3_build_cdat_entries_for_mr only uses size of the passed memory region argument, refactor the function definition to make the passed arguments more specific. Reviewed-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Fan Ni <fan.ni@samsung.com> Message-Id: <20240523174651.1089554-8-nifan.cxl@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-01hw/mem/cxl_type3: Add support to create DC regions to type3 memory devicesFan Ni
With the change, when setting up memory for type3 memory device, we can create DC regions. A property 'num-dc-regions' is added to ct3_props to allow users to pass the number of DC regions to create. To make it easier, other region parameters like region base, length, and block size are hard coded. If needed, these parameters can be added easily. With the change, we can create DC regions with proper kernel side support like below: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_dc_region) echo $region > /sys/bus/cxl/devices/decoder0.0/create_dc_region echo 256 > /sys/bus/cxl/devices/$region/interleave_granularity echo 1 > /sys/bus/cxl/devices/$region/interleave_ways echo "dc0" >/sys/bus/cxl/devices/decoder2.0/mode echo 0x40000000 >/sys/bus/cxl/devices/decoder2.0/dpa_size echo 0x40000000 > /sys/bus/cxl/devices/$region/size echo "decoder2.0" > /sys/bus/cxl/devices/$region/target0 echo 1 > /sys/bus/cxl/devices/$region/commit echo $region > /sys/bus/cxl/drivers/cxl_region/bind Reviewed-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Fan Ni <fan.ni@samsung.com> Message-Id: <20240523174651.1089554-7-nifan.cxl@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
2024-07-01include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 ↵Fan Ni
memory devices Rename mem_size as static_mem_size for type3 memdev to cover static RAM and pmem capacity, preparing for the introduction of dynamic capacity to support dynamic capacity devices. Reviewed-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Fan Ni <fan.ni@samsung.com> Message-Id: <20240523174651.1089554-6-nifan.cxl@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-06-19hw/mem/memory-device: Remove legacy_align from memory_device_pre_plug()Philippe Mathieu-Daudé
'legacy_align' is always NULL, remove it, simplifying memory_device_pre_plug(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240617071118.60464-16-philmd@linaro.org>
2024-06-19hw/mem/pc-dimm: Remove legacy_align argument from pc_dimm_pre_plug()Philippe Mathieu-Daudé
'legacy_align' is always NULL, remove it. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240617071118.60464-15-philmd@linaro.org>
2024-04-25hw/cxl/cxl-cdat: Make cxl_doe_cdat_init() return booleanZhao Liu
As error.h suggested, the best practice for callee is to return something to indicate success / failure. With returned boolean, there's no need to dereference @errp to check failure case. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-ID: <20240418100433.1085447-4-zhao1.liu@linux.intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-04-18memory-device: move stubs out of stubs/Paolo Bonzini
Since the memory-device stubs are needed exactly when the Kconfig symbols are not needed, move them to hw/mem/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240408155330.522792-15-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-12hw/mem/cxl_type3: Fix missing ERRP_GUARD() in ct3_realize()Zhao Liu
As the comment in qapi/error, dereferencing @errp requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: * - It must not be dereferenced, because it may be null. ... * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. * * Using it when it's not needed is safe, but please avoid cluttering * the source with useless code. But in ct3_realize(), @errp is dereferenced without ERRP_GUARD(): cxl_doe_cdat_init(cxl_cstate, errp); if (*errp) { goto err_free_special_ops; } Here we check *errp, because cxl_doe_cdat_init() returns void. And ct3_realize() - as a PCIDeviceClass.realize() method - doesn't get the NULL @errp parameter, it hasn't triggered the bug that dereferencing the NULL @errp. To follow the requirement of @errp, add missing ERRP_GUARD() in ct3_realize(). Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240223085653.1255438-4-zhao1.liu@linux.intel.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-09hw/mem/cxl_type3: Fix problem with g_steal_pointer()Thomas Huth
When setting GLIB_VERSION_MAX_ALLOWED to GLIB_VERSION_2_58 or higher, glib adds type safety checks to the g_steal_pointer() macro. This triggers errors in the ct3_build_cdat_entries_for_mr() function which uses the g_steal_pointer() for type-casting from one pointer type to the other (which also looks quite weird since the local pointers have all been declared with g_autofree though they are never freed here). Fix it by using a proper typecast instead. For making this possible, we have to remove the QEMU_PACKED attribute from some structs since GCC otherwise complains that the source and destination pointer might have different alignment restrictions. Removing the QEMU_PACKED should be fine here since the structs are already naturally aligned. Anyway, add some QEMU_BUILD_BUG_ON() statements to make sure that we've got the right sizes (without padding in the structs). Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-02-14hw/cxl: Standardize all references on CXL r3.1 and minor updatesJonathan Cameron
Previously not all references mentioned any spec version at all. Given r3.1 is the current specification available for evaluation at www.computeexpresslink.org update references to refer to that. Hopefully this won't become a never ending job. A few structure definitions have been updated to add new fields. Defaults of 0 and read only are valid choices for these new DVSEC registers so go with that for now. There are additional error codes and some of the 'questions' in the comments are resolved now. Update documentation reference to point to the CXL r3.1 specification with naming closer to what is on the cover. For cases where there are structure version numbers, add defines so they can be found next to the register definitions. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240126121636.24611-6-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-02-14hw/mem/cxl_type3: Fix potential divide by zero reported by coverityJonathan Cameron
Fixes Coverity ID 1522368. Currently error_fatal is set if interleave_ways_dec() is going to return 0 but we should handle that zero return explicitly. Reported-by: Stefan Hajnoczi <stefanha@gmail.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240126120132.24248-10-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-02-14hw/mem/cxl_type3: Drop handling of failure of g_malloc0() and g_malloc()Jonathan Cameron
As g_malloc0/g_malloc() will just exit QEMU on failure there is no point in checking for it failing. Reviewed-by: Fan Ni <fan.ni@samsung.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240126120132.24248-3-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-02-06memory-device: reintroduce memory region size checkDavid Hildenbrand
We used to check that the memory region size is multiples of the overall requested address alignment for the device memory address. We removed that check, because there are cases (i.e., hv-balloon) where devices unconditionally request an address alignment that has a very large alignment (i.e., 32 GiB), but the actual memory device size might not be multiples of that alignment. However, this change: (a) allows for some practically impossible DIMM sizes, like "1GB+1 byte". (b) allows for DIMMs that partially cover hugetlb pages, previously reported in [1]. Both scenarios don't make any sense: we might even waste memory. So let's reintroduce that check, but only check that the memory region size is multiples of the memory region alignment (i.e., page size, huge page size), but not any additional memory device requirements communicated using md->get_min_alignment(). The following examples now fail again as expected: (a) 1M with 2M THP qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \ -object memory-backend-ram,id=mem1,size=1M \ -device pc-dimm,id=dimm1,memdev=mem1 -> backend memory size must be multiple of 0x200000 (b) 1G+1byte qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \ -object memory-backend-ram,id=mem1,size=1073741825B \ -device pc-dimm,id=dimm1,memdev=mem1 -> backend memory size must be multiple of 0x200000 (c) Unliagned hugetlb size (2M) qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \ -object memory-backend-file,id=mem1,mem-path=/dev/hugepages/tmp,size=511M \ -device pc-dimm,id=dimm1,memdev=mem1 backend memory size must be multiple of 0x200000 (d) Unliagned hugetlb size (1G) qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \ -object memory-backend-file,id=mem1,mem-path=/dev/hugepages1G/tmp,size=2047M \ -device pc-dimm,id=dimm1,memdev=mem1 -> backend memory size must be multiple of 0x40000000 Note that this fix depends on a hv-balloon change to communicate its additional alignment requirements using get_min_alignment() instead of through the memory region. [1] https://lkml.kernel.org/r/f77d641d500324525ac036fe1827b3070de75fc1.1701088320.git.mprivozn@redhat.com Message-ID: <20240117135554.787344-3-david@redhat.com> Reported-by: Zhenyu Zhang <zhenyzha@redhat.com> Reported-by: Michal Privoznik <mprivozn@redhat.com> Fixes: eb1b7c4bd413 ("memory-device: Drop size alignment check") Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2023-12-31meson: remove CONFIG_ALLPaolo Bonzini
CONFIG_ALL is tricky to use and was ported over to Meson from the recursive processing of Makefile variables. Meson sourcesets however have all_sources() and all_dependencies() methods that remove the need for it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-11-15hw/mem/memory-device.c: spelling fix: ontainingMichael Tokarev
Fixes: 6c1b28e9e405 "memory-device: Support empty memory devices" Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-11-07Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi
into staging virtio,pc,pci: features, fixes virtio sound card support vhost-user: back-end state migration cxl: line length reduction enabling fabric management vhost-vdpa: shadow virtqueue hash calculation Support shadow virtqueue RSS Support tests: CPU topology related smbios test cases Fixes, cleanups all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmVKDDoPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpF08H/0Zts8uvkHbgiOEJw4JMHU6/VaCipfIYsp01 # GSfwYOyEsXJ7GIxKWaCiMnWXEm7tebNCPKf3DoUtcAojQj3vuF9XbWBKw/bfRn83 # nGO/iiwbYViSKxkwqUI+Up5YiN9o0M8gBFrY0kScPezbnYmo5u2bcADdEEq6gH68 # D0Ea8i+WmszL891ypvgCDBL2ObDk3qX3vA5Q6J2I+HKX2ofJM59BwaKwS5ghw+IG # BmbKXUZJNjUQfN9dQ7vJuiuqdknJ2xUzwW2Vn612ffarbOZB1DZ6ruWlrHty5TjX # 0w4IXEJPBgZYbX9oc6zvTQnbLDBJbDU89mnme0TcmNMKWmQKTtc= # =vEv+ # -----END PGP SIGNATURE----- # gpg: Signature made Tue 07 Nov 2023 18:06:50 HKT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (63 commits) acpi/tests/avocado/bits: enable console logging from bits VM acpi/tests/avocado/bits: enforce 32-bit SMBIOS entry point hw/cxl: Add tunneled command support to mailbox for switch cci. hw/cxl: Add dummy security state get hw/cxl/type3: Cleanup multiple CXL_TYPE3() calls in read/write functions hw/cxl/mbox: Add Get Background Operation Status Command hw/cxl: Add support for device sanitation hw/cxl/mbox: Wire up interrupts for background completion hw/cxl/mbox: Add support for background operations hw/cxl: Implement Physical Ports status retrieval hw/pci-bridge/cxl_downstream: Set default link width and link speed hw/cxl/mbox: Add Physical Switch Identify command. hw/cxl/mbox: Add Information and Status / Identify command hw/cxl: Add a switch mailbox CCI function hw/pci-bridge/cxl_upstream: Move defintion of device to header. hw/cxl/mbox: Generalize the CCI command processing hw/cxl/mbox: Pull the CCI definition out of the CXLDeviceState hw/cxl/mbox: Split mailbox command payload into separate input and output hw/cxl/mbox: Pull the payload out of struct cxl_cmd and make instances constant hw/cxl: Fix a QEMU_BUILD_BUG_ON() in switch statement scope issue. ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-11-07hw/cxl: Add tunneled command support to mailbox for switch cci.Jonathan Cameron
This implementation of tunneling makes the choice that our Type 3 device is a Logical Device (LD) of a Multi-Logical Device (MLD) that just happens to only have one LD for now. Tunneling is supported from a Switch Mailbox CCI (and shortly via MCTP over I2C connected to the switch MCTP CCI) via an outer level to the FM owned LD in the MLD Type 3 device. From there an inner tunnel may be used to access particular LDs. Protocol wise, the following is what happens in a real system but we don't emulate the transports - just the destinations and the payloads. ( Host -> Switch Mailbox CCI - in band FM-API mailbox command or Host -> Switch MCTP CCI - MCTP over I2C using the CXL FM-API MCTP Binding. ) then (if a tunnel command) Switch -> Type 3 FM Owned LD - MCTP over PCI VDM using the CXL FM-API binding (addressed by switch port) then (if unwrapped command also a tunnel command) Type 3 FM Owned LD to LD0 via internal transport (addressed by LD number) or (added shortly) Host to Type 3 FM Owned MCTP CCI - MCTP over I2C using the CXL FM-API MCTP Binding. then (if unwrapped comand is a tunnel comamnd) Type 3 FM Owned LD to LD0 via internal transport. (addressed by LD number) It is worth noting that the tunneling commands over PCI VDM presumably use the appropriate MCTP binding depending on opcode. This may be the CXL FMAPI binding or the CXL Memory Device Binding. Additional commands will need to be added to make this useful beyond testing the tunneling works. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20231023160806.13206-18-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-07hw/cxl/type3: Cleanup multiple CXL_TYPE3() calls in read/write functionsGregory Price
Call CXL_TYPE3 once at top of function to avoid multiple invocations. Signed-off-by: Gregory Price <gregory.price@memverge.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20231023160806.13206-16-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-07hw/cxl: Add support for device sanitationDavidlohr Bueso
Make use of the background operations through the sanitize command, per CXL 3.0 specs. Traditionally run times can be rather long, depending on the size of the media. Estimate times based on: https://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdf Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20231023160806.13206-14-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-07hw/cxl/mbox: Pull the CCI definition out of the CXLDeviceStateJonathan Cameron
Enables having multiple CCIs per devices. Each CCI (mailbox) has it's own state and command list, so they can't share a single structure. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20231023160806.13206-4-Jonathan.Cameron@huawei.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-07hw/cxl: Line length reductionsJonathan Cameron
Michael Tsirkin observed that there were some unnecessarily long lines in the CXL code in a recent review. This patch is intended to rectify that where it does not hurt readability. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Fan Ni <fan.ni@samsung.com> Message-Id: <20231023140210.3089-5-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-06memory-device: Drop size alignment checkDavid Hildenbrand
There is no strong requirement that the size has to be multiples of the requested alignment, let's drop it. This is a preparation for hv-baloon. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2023-11-03memory-device: Support empty memory devicesDavid Hildenbrand
Let's support empty memory devices -- memory devices that don't have a memory device region in the current configuration. hv-balloon with an optional memdev is the primary use case. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2023-10-12memory-device,vhost: Support automatic decision on the number of memslotsDavid Hildenbrand
We want to support memory devices that can automatically decide how many memslots they will use. In the worst case, they have to use a single memslot. The target use cases are virtio-mem and the hyper-v balloon. Let's calculate a reasonable limit such a memory device may use, and instruct the device to make a decision based on that limit. Use a simple heuristic that considers: * A memslot soft-limit for all memory devices of 256; also, to not consume too many memslots -- which could harm performance. * Actually still free and unreserved memslots * The percentage of the remaining device memory region that memory device will occupy. Further, while we properly check before plugging a memory device whether there still is are free memslots, we have other memslot consumers (such as boot memory, PCI BARs) that don't perform any checks and might dynamically consume memslots without any prior reservation. So we might succeed in plugging a memory device, but once we dynamically map a PCI BAR we would be in trouble. Doing accounting / reservation / checks for all such users is problematic (e.g., sometimes we might temporarily split boot memory into two memslots, triggered by the BIOS). We use the historic magic memslot number of 509 as orientation to when supporting 256 memory devices -> memslots (leaving 253 for boot memory and other devices) has been proven to work reliable. We'll fallback to suggesting a single memslot if we don't have at least 509 total memslots. Plugging vhost devices with less than 509 memslots available while we have memory devices plugged that consume multiple memslots due to automatic decisions can be problematic. Most configurations might just fail due to "limit < used + reserved", however, it can also happen that these memory devices would suddenly consume memslots that would actually be required by other memslot consumers (boot, PCI BARs) later. Note that this has always been sketchy with vhost devices that support only a small number of memslots; but we don't want to make it any worse.So let's keep it simple and simply reject plugging such vhost devices in such a configuration. Eventually, all vhost devices that want to be fully compatible with such memory devices should support a decent number of memslots (>= 509). Message-ID: <20230926185738.277351-13-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2023-10-12memory-device,vhost: Support memory devices that dynamically consume memslotsDavid Hildenbrand
We want to support memory devices that have a dynamically managed memory region container as device memory region. This device memory region maps multiple RAM memory subregions (e.g., aliases to the same RAM memory region), whereby these subregions can be (un)mapped on demand. Each RAM subregion will consume a memslot in KVM and vhost, resulting in such a new device consuming memslots dynamically, and initially usually 0. We already track the number of used vs. required memslots for all memslots. From that, we can derive the number of reserved memslots that must not be used otherwise. The target use case is virtio-mem and the hyper-v balloon, which will dynamically map aliases to RAM memory region into their device memory region container. Properly document what's supported and what's not and extend the vhost memslot check accordingly. Message-ID: <20230926185738.277351-10-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2023-10-12memory-device: Track required and actually used memslots in DeviceMemoryStateDavid Hildenbrand
Let's track how many memslots are required by plugged memory devices and how many are currently actually getting used by plugged memory devices. "required - used" is the number of reserved memslots. For now, the number of used and required memslots is always equal, and there are no reservations. This is a preparation for memory devices that want to dynamically consume memslots after initially specifying how many they require -- where we'll end up with reserved memslots. To track the number of used memslots, create a new address space for our device memory and register a memory listener (add/remove) for that address space. Message-ID: <20230926185738.277351-9-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2023-10-12memory-device: Support memory devices with multiple memslotsDavid Hildenbrand
We want to support memory devices that have a memory region container as device memory region that maps multiple RAM memory regions. Let's start by supporting memory devices that statically map multiple RAM memory regions and, thereby, consume multiple memslots. We already have one device that uses a container as device memory region: NVDIMMs. However, a NVDIMM always ends up consuming exactly one memslot. Let's add support for that by asking the memory device via a new callback how many memslots it requires. Message-ID: <20230926185738.277351-7-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2023-10-12vhost: Return number of free memslotsDavid Hildenbrand
Let's return the number of free slots instead of only checking if there is a free slot. Required to support memory devices that consume multiple memslots. This is a preparation for memory devices that consume multiple memslots. Message-ID: <20230926185738.277351-6-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2023-10-12kvm: Return number of free memslotsDavid Hildenbrand
Let's return the number of free slots instead of only checking if there is a free slot. While at it, check all address spaces, which will also consider SMM under x86 correctly. This is a preparation for memory devices that consume multiple memslots. Message-ID: <20230926185738.277351-5-david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2023-10-04hw/cxl: Support 4 HDM decoders at all levels of topologyJonathan Cameron
Support these decoders in CXL host bridges (pxb-cxl), CXL Switch USP and CXL Type 3 end points. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20230913132523.29780-5-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04hw/cxl: Fix and use same calculation for HDM decoder block size everywhereJonathan Cameron
In order to avoid having the size of the per HDM decoder register block repeated in lots of places, create the register definitions for HDM decoder 1 and use the offset between the first registers in HDM decoder 0 and HDM decoder 1 to establish the offset. Calculate in each function as this is more obvious and leads to shorter line lengths than a single #define which would need a long name to be specific enough. Note that the code currently only supports one decoder, so the bugs this fixes don't actually affect anything. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230913132523.29780-4-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-21hw/mem/cxl_type3: Add missing copyright and license noticeJonathan Cameron
This has been missing from the start. Assume it should match with cxl/cxl-component-utils.c as both were part of early postings from Ben. Reported-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-09-21hw/other: spelling fixesMichael Tokarev
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2023-09-19nvdimm: Reject writing label data to ROM instead of crashing QEMUDavid Hildenbrand
Currently, when using a true R/O NVDIMM (ROM memory backend) with a label area, the VM can easily crash QEMU by trying to write to the label area, because the ROM memory is mmap'ed without PROT_WRITE. [root@vm-0 ~]# ndctl disable-region region0 disabled 1 region [root@vm-0 ~]# ndctl zero-labels nmem0 -> QEMU segfaults Let's remember whether we have a ROM memory backend and properly reject the write request: [root@vm-0 ~]# ndctl disable-region region0 disabled 1 region [root@vm-0 ~]# ndctl zero-labels nmem0 zeroed 0 nmem In comparison, on a system with a R/W NVDIMM: [root@vm-0 ~]# ndctl disable-region region0 disabled 1 region [root@vm-0 ~]# ndctl zero-labels nmem0 zeroed 1 nmem For ACPI, just return "unsupported", like if no label exists. For spapr, return "H_P2", similar to when no label area exists. Could we rely on the "unarmed" property? Maybe, but it looks cleaner to only disallow what certainly cannot work. After all "unarmed=on" primarily means: cannot accept persistent writes. In theory, there might be setups where devices with "unarmed=on" set could be used to host non-persistent data (temporary files, system RAM, ...); for example, in Linux, admins can overwrite the "readonly" setting and still write to the device -- which will work as long as we're not using ROM. Allowing writing label data in such configurations can make sense. Message-ID: <20230906120503.359863-2-david@redhat.com> Fixes: dbd730e85987 ("nvdimm: check -object memory-backend-file, readonly=on option") Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12memory-device: Track used region size in DeviceMemoryStateDavid Hildenbrand
Let's avoid iterating over all devices and simply track it in the DeviceMemoryState. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20230623124553.400585-11-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2023-07-12memory-device: Refactor memory_device_pre_plug()David Hildenbrand
Let's move memory_device_check_addable() and basic checks out of memory_device_get_free_addr() directly into memory_device_pre_plug(). Separating basic checks from address assignment is cleaner and prepares for further changes. As all memory device users now use memory_devices_init(), and that function enforces that the size is 0, we can drop the check for an empty region. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20230623124553.400585-10-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>