aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-03-01migration/multifd: Allow multifd without packetsFabiano Rosas
For the upcoming support to the new 'mapped-ram' migration stream format, we cannot use multifd packets because each write into the ramblock section in the migration file is expected to contain only the guest pages. They are written at their respective offsets relative to the ramblock section header. There is no space for the packet information and the expected gains from the new approach come partly from being able to write the pages sequentially without extraneous data in between. The new format also simply doesn't need the packets and all necessary information can be taken from the standard migration headers with some (future) changes to multifd code. Use the presence of the mapped-ram capability to decide whether to send packets. This only moves code under multifd_use_packets(), it has no effect for now as mapped-ram cannot yet be enabled with multifd. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-15-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01migration/multifd: Decouple recv method from pagesFabiano Rosas
Next patches will abstract the type of data being received by the channels, so do some cleanup now to remove references to pages and dependency on 'normal_num'. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-14-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01migration/multifd: Rename MultiFDSend|RecvParams::data to compress_dataFabiano Rosas
Use a more specific name for the compression data so we can use the generic for the multifd core code. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-13-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01tests/qtest/migration: Add tests for mapped-ram file-based migrationFabiano Rosas
Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-12-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01migration/ram: Add incoming 'mapped-ram' migrationFabiano Rosas
Add the necessary code to parse the format changes for the 'mapped-ram' capability. One of the more notable changes in behavior is that in the 'mapped-ram' case ram pages are restored in one go rather than constantly looping through the migration stream. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-11-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01migration/ram: Add outgoing 'mapped-ram' migrationFabiano Rosas
Implement the outgoing migration side for the 'mapped-ram' capability. A bitmap is introduced to track which pages have been written in the migration file. Pages are written at a fixed location for every ramblock. Zero pages are ignored as they'd be zero in the destination migration as well. The migration stream is altered to put the dirty pages for a ramblock after its header instead of having a sequential stream of pages that follow the ramblock headers. Without mapped-ram (current): With mapped-ram (new): --------------------- -------------------------------- | ramblock 1 header | | ramblock 1 header | --------------------- -------------------------------- | ramblock 2 header | | ramblock 1 mapped-ram header | --------------------- -------------------------------- | ... | | padding to next 1MB boundary | --------------------- | ... | | ramblock n header | -------------------------------- --------------------- | ramblock 1 pages | | RAM_SAVE_FLAG_EOS | | ... | --------------------- -------------------------------- | stream of pages | | ramblock 2 header | | (iter 1) | -------------------------------- | ... | | ramblock 2 mapped-ram header | --------------------- -------------------------------- | RAM_SAVE_FLAG_EOS | | padding to next 1MB boundary | --------------------- | ... | | stream of pages | -------------------------------- | (iter 2) | | ramblock 2 pages | | ... | | ... | --------------------- -------------------------------- | ... | | ... | --------------------- -------------------------------- | RAM_SAVE_FLAG_EOS | -------------------------------- | ... | -------------------------------- where: - ramblock header: the generic information for a ramblock, such as idstr, used_len, etc. - ramblock mapped-ram header: the new information added by this feature: bitmap of pages written, bitmap size and offset of pages in the migration file. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-10-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01migration: Add mapped-ram URI compatibility checkFabiano Rosas
The mapped-ram migration format needs a channel that supports seeking to be able to write each page to an arbitrary offset in the migration stream. Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-9-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01migration/ram: Introduce 'mapped-ram' migration capabilityFabiano Rosas
Add a new migration capability 'mapped-ram'. The core of the feature is to ensure that RAM pages are mapped directly to offsets in the resulting migration file instead of being streamed at arbitrary points. The reasons why we'd want such behavior are: - The resulting file will have a bounded size, since pages which are dirtied multiple times will always go to a fixed location in the file, rather than constantly being added to a sequential stream. This eliminates cases where a VM with, say, 1G of RAM can result in a migration file that's 10s of GBs, provided that the workload constantly redirties memory. - It paves the way to implement O_DIRECT-enabled save/restore of the migration stream as the pages are ensured to be written at aligned offsets. - It allows the usage of multifd so we can write RAM pages to the migration file in parallel. For now, enabling the capability has no effect. The next couple of patches implement the core functionality. Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-8-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01migration/qemu-file: add utility methods for working with seekable channelsFabiano Rosas
Add utility methods that will be needed when implementing 'mapped-ram' migration capability. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Link: https://lore.kernel.org/r/20240229153017.2221-7-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01io: fsync before closing a file channelFabiano Rosas
Make sure the data is flushed to disk before closing file channels. This is to ensure data is on disk and not lost in the event of a host crash. This is currently being implemented to affect the migration code when migrating to a file, but all QIOChannelFile users should benefit from the change. Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Acked-by: "Daniel P. Berrangé" <berrange@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-6-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01io: implement io_pwritev/preadv for QIOChannelFileNikolay Borisov
The upcoming 'mapped-ram' feature will require qemu to write data to (and restore from) specific offsets of the migration file. Add a minimal implementation of pwritev/preadv and expose them via the io_pwritev and io_preadv interfaces. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-5-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01io: Add generic pwritev/preadv interfaceNikolay Borisov
Introduce basic pwritev/preadv support in the generic channel layer. Specific implementation will follow for the file channel as this is required in order to support migration streams with fixed location of each ram page. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-4-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel fileNikolay Borisov
Add a generic QIOChannel feature SEEKABLE which would be used by the qemu_file* apis. For the time being this will be only implemented for file channels. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-3-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01migration/multifd: Cleanup multifd_recv_sync_mainFabiano Rosas
Some minor cleanups and documentation for multifd_recv_sync_main. Use thread_count as done in other parts of the code. Remove p->id from the multifd_recv_state sync, since that is global and not tied to a channel. Add documentation for the sync steps. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01chardev/char-socket: Fix TLS io channels sending too much data to the backendThomas Huth
Commit ffda5db65a ("io/channel-tls: fix handling of bigger read buffers") changed the behavior of the TLS io channels to schedule a second reading attempt if there is still incoming data pending. This caused a regression with backends like the sclpconsole that check in their read function that the sender does not try to write more bytes to it than the device can currently handle. The problem can be reproduced like this: 1) In one terminal, do this: mkdir qemu-pki cd qemu-pki openssl genrsa 2048 > ca-key.pem openssl req -new -x509 -nodes -days 365000 -key ca-key.pem -out ca-cert.pem # enter some dummy value for the cert openssl genrsa 2048 > server-key.pem openssl req -new -x509 -nodes -days 365000 -key server-key.pem \ -out server-cert.pem # enter some other dummy values for the cert gnutls-serv --echo --x509cafile ca-cert.pem --x509keyfile server-key.pem \ --x509certfile server-cert.pem -p 8338 2) In another terminal, do this: wget https://download.fedoraproject.org/pub/fedora-secondary/releases/39/Cloud/s390x/images/Fedora-Cloud-Base-39-1.5.s390x.qcow2 qemu-system-s390x -nographic -nodefaults \ -hda Fedora-Cloud-Base-39-1.5.s390x.qcow2 \ -object tls-creds-x509,id=tls0,endpoint=client,verify-peer=false,dir=$PWD/qemu-pki \ -chardev socket,id=tls_chardev,host=localhost,port=8338,tls-creds=tls0 \ -device sclpconsole,chardev=tls_chardev,id=tls_serial QEMU then aborts after a second or two with: qemu-system-s390x: ../hw/char/sclpconsole.c:73: chr_read: Assertion `size <= SIZE_BUFFER_VT220 - scon->iov_data_len' failed. Aborted (core dumped) It looks like the second read does not trigger the chr_can_read() function to be called before the second read, which should normally always be done before sending bytes to a character device to see how much it can handle, so the s->max_size in tcp_chr_read() still contains the old value from the previous read. Let's make sure that we use the up-to-date value by calling tcp_chr_read_poll() again here. Fixes: ffda5db65a ("io/channel-tls: fix handling of bigger read buffers") Buglink: https://issues.redhat.com/browse/RHEL-24614 Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Message-ID: <20240229104339.42574-1-thuth@redhat.com> Reviewed-by: Antoine Damhet <antoine.damhet@blade-group.com> Tested-by: Antoine Damhet <antoine.damhet@blade-group.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-01tests/unit/test-util-sockets: Remove temporary file after testThomas Huth
test-util-sockets leaves the temporary socket files around in the temporary files folder. Let's better remove them at the end of the testing. Fixes: 4d3a329af5 ("tests/util-sockets: add abstract unix socket cases") Message-ID: <20240226082728.249753-1-thuth@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-01hw/usb/bus.c: PCAP adding 0xA in Windows versionBenjamin David Lunt
Since Windows text files use CRLFs for all \n, the Windows version of QEMU inserts a CR in the PCAP stream when a LF is encountered when using USB PCAP files. This is due to the fact that the PCAP file is opened as TEXT instead of BINARY. To show an example, when using a very common protocol to USB disks, the BBB protocol uses a 10-byte command packet. For example, the READ_CAPACITY(10) command will have a command block length of 10 (0xA). When this 10-byte command (part of the 31-byte CBW) is placed into the PCAP file, the Windows file manager inserts a 0xD before the 0xA, turning the 31-byte CBW into a 32-byte CBW. Actual CBW: 0040 55 53 42 43 01 00 00 00 08 00 00 00 80 00 0a 25 USBC...........% 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ............... PCAP CBW 0040 55 53 42 43 01 00 00 00 08 00 00 00 80 00 0d 0a USBC............ 0050 25 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 %.............. I believe simply opening the PCAP file as BINARY instead of TEXT will fix this issue. Resolves: https://bugs.launchpad.net/qemu/+bug/2054889 Signed-off-by: Benjamin David Lunt <benlunt@fysnet.net> Message-ID: <000101da6823$ce1bbf80$6a533e80$@fysnet.net> [thuth: Break long line to avoid checkpatch.pl error] Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-01hw/intc/Kconfig: Fix GIC settings when using "--without-default-devices"Thomas Huth
When using "--without-default-devices", the ARM_GICV3_TCG and ARM_GIC_KVM settings currently get disabled, though the arm virt machine is only of very limited use in that case. This also causes the migration-test to fail in such builds. Let's make sure that we always keep the GIC switches enabled in the --without-default-devices builds, too. Message-ID: <20240221110059.152665-1-thuth@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-01libqos/virtio.c: fix 'avail_event' offset in qvring_init()Daniel Henrique Barboza
In qvring_init() we're writing vq->used->avail_event at "vq->used + 2 + array_size". The struct pointed by vq->used is, from virtio_ring.h Linux header): * // A ring of used descriptor heads with free-running index. * __virtio16 used_flags; * __virtio16 used_idx; * struct vring_used_elem used[num]; * __virtio16 avail_event_idx; So 'flags' is the word right at vq->used. 'idx' is vq->used + 2. We need to skip 'used_idx' by adding + 2 bytes, and then sum the vector size, to reach avail_event_idx. An example on how to properly access this field can be found in qvirtqueue_kick(): avail_event = qvirtio_readw(d, qts, vq->used + 4 + sizeof(struct vring_used_elem) * vq->size); This error was detected when enabling the RISC-V 'virt' libqos machine. The 'idx' test from vhost-user-blk-test.c errors out with a timeout in qvirtio_wait_used_elem(). The timeout happens because when processing the first element, 'avail_event' is read in qvirtqueue_kick() as non-zero because we didn't initialize it properly (and the memory at that point happened to be non-zero). 'idx' is 0. All of this makes this condition fail because "idx - avail_event" will overflow and be non-zero: /* < 1 because we add elements to avail queue one by one */ if ((flags & VRING_USED_F_NO_NOTIFY) == 0 && (!vq->event || (uint16_t)(idx-avail_event) < 1)) { d->bus->virtqueue_kick(d, vq); } As a result the virtqueue is never kicked and we'll timeout waiting for it. Fixes: 1053587c3f ("libqos: Added EVENT_IDX support") Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-ID: <20240217192607.32565-3-dbarboza@ventanamicro.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-01libqos/virtio.c: init all elems in qvring_indirect_desc_setup()Daniel Henrique Barboza
The loop isn't setting the values for the last element. Every other element is being initialized with addr = 0, flags = VRING_DESC_F_NEXT and next = i + 1. The last elem is never touched. This became a problem when enabling a RISC-V 'virt' libqos machine in the 'indirect' test of virti-blk-test.c. The 'flags' for the last element will end up being an odd number (since we didn't touch it). Being an odd number it will be mistaken by VRING_DESC_F_NEXT, which happens to be 1. Deep into hw/virt/virtio.c, in virtqueue_split_pop(), into virtqueue_split_read_next_desc(), a check for VRING_DESC_F_NEXT will be made to see if we're supposed to chain. The code will keep up chaining in the last element because the uninitialized value happens to be odd. We'll error out right after that because desc->next (which is also uninitialized) will be >= max. A VIRTQUEUE_READ_DESC_ERROR will be returned, with an error message like this in the stderr: qemu-system-riscv64: Desc next is 49391 Since we never returned, we'll end up timing out at qvirtio_wait_used_elem(): ERROR:../tests/qtest/libqos/virtio.c:236:qvirtio_wait_used_elem: assertion failed: (g_get_monotonic_time() - start_time <= timeout_us) The root cause is using uninitialized values from guest_alloc() in qvring_indirect_desc_setup(). There's no guarantee that the memory pages retrieved will be zeroed, so we can't make assumptions. In fact, commit 5b4f72f5e8 ("tests/qtest: properly initialise the vring used idx") fixed a similar problem stating "It is probably not wise to assume guest memory is zeroed anyway". I concur. Initialize all elems in qvring_indirect_desc_setup(). Fixes: f294b029aa ("libqos: Added indirect descriptor support to virtio implementation") Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-ID: <20240217192607.32565-2-dbarboza@ventanamicro.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-01tests/migration: Set compression level in migration testsBryan Zhang
Adds calls to set compression level for `zstd` and `zlib` migration tests, just to make sure that the calls work. Signed-off-by: Bryan Zhang <bryan.zhang@bytedance.com> Link: https://lore.kernel.org/r/20240301035901.4006936-3-bryan.zhang@bytedance.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01migration: Properly apply migration compression level parametersBryan Zhang
Some glue code was missing, so that using `qmp_migrate_set_parameters` to set `multifd-zstd-level` or `multifd-zlib-level` did not work. This commit adds the glue code to fix that. Signed-off-by: Bryan Zhang <bryan.zhang@bytedance.com> Link: https://lore.kernel.org/r/20240301035901.4006936-2-bryan.zhang@bytedance.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01migration: massage cpr-reboot documentationSteve Sistare
Re-wrap the cpr-reboot documentation to 70 columns, use '@' for cpr-reboot references, capitalize COLO and VFIO, and tweak the wording. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/r/1709218462-3640-1-git-send-email-steven.sistare@oracle.com [peterx: s/qemu/QEMU per Markus's suggestion] Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-29linux-user/loongarch64: Remove TARGET_FORCE_SHMLBARichard Henderson
The kernel abi was changed with commit d23b77953f5a4fbf94c05157b186aac2a247ae32 Author: Huacai Chen <chenhuacai@kernel.org> Date: Wed Jan 17 12:43:08 2024 +0800 LoongArch: Change SHMLBA from SZ_64K to PAGE_SIZE during the v6.8 cycle. Reviewed-by: Song Gao <gaosong@loongson.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-29linux-user/x86_64: Handle the vsyscall page in open_self_maps_{2,4}Richard Henderson
This is the only case in which we expect to have no host memory backing for a guest memory page, because in general linux user processes cannot map any pages in the top half of the 64-bit address space. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2170 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-29tcg/optimize: fix uninitialized variablePaolo Bonzini
The variables uext_opc and sext_opc are used without initialization if TCG_TARGET_extract_i{32,64}_valid returns false. The result, depending on the compiler, might be the generation of extract and sextract opcodes with invalid offset and count, or just random data in the TCG opcode stream. Fixes: ceb9ee06b71 ("tcg/optimize: Handle TCG_COND_TST{EQ,NE}", 2024-02-03) Cc: Richard Henderson <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20240228110641.287205-1-pbonzini@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-29linux-user: Remove pgb_dynamic alignment assertionRichard Henderson
The assertion was never correct, because the alignment is a composite of the image alignment and SHMLBA. Even if the image alignment didn't match the image address, an assertion would not be correct -- more appropriate would be an error message about an ill formed image. But the image cannot be held to SHMLBA under any circumstances. Fixes: ee94743034b ("linux-user: completely re-write init_guest_space") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2157 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reported-by: Alexey Sheplyakov <asheplyakov@yandex.ru> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-02-29target/alpha: Enable TARGET_PAGE_BITS_VARY for user-onlyRichard Henderson
Since alpha binaries are generally built for multiple page sizes, it is trivial to allow the page size to vary. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-34-richard.henderson@linaro.org>
2024-02-29target/ppc: Enable TARGET_PAGE_BITS_VARY for user-onlyRichard Henderson
Since ppc binaries are generally built for multiple page sizes, it is trivial to allow the page size to vary. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-33-richard.henderson@linaro.org>
2024-02-29linux-user: Bound mmap_min_addr by host page sizeRichard Henderson
Bizzarely, it is possible to set /proc/sys/vm/mmap_min_addr to a value below the host page size. Fix that. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-32-richard.henderson@linaro.org>
2024-02-29target/arm: Enable TARGET_PAGE_BITS_VARY for AArch64 user-onlyRichard Henderson
Since aarch64 binaries are generally built for multiple page sizes, it is trivial to allow the page size to vary. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-31-richard.henderson@linaro.org>
2024-02-29linux-user: Allow TARGET_PAGE_BITS_VARYRichard Henderson
If set, match the host and guest page sizes. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-30-richard.henderson@linaro.org>
2024-02-29accel/tcg: Disconnect TargetPageDataNode from page sizeRichard Henderson
Dynamically size the node for the runtime target page size. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-29-richard.henderson@linaro.org>
2024-02-29cpu: Remove page_size_initRichard Henderson
Move qemu_host_page_{size,mask} and HOST_PAGE_ALIGN into bsd-user. It should be removed from bsd-user as well, but defer that cleanup. Reviewed-by: Warner Losh <imp@bsdimp.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-28-richard.henderson@linaro.org>
2024-02-29*-user: Deprecate and disable -p pagesizeRichard Henderson
This option controls the host page size. From the mis-usage in our own testsuite, this is easily confused with guest page size. The only thing that occurs when changing the host page size is that stuff breaks, because one cannot actually change the host page size. Therefore reject all but the no-op setting as part of the deprecation process. Reviewed-by: Warner Losh <imp@bsdimp.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-27-richard.henderson@linaro.org>
2024-02-29tests/tcg: Extend file in linux-madvise.cRichard Henderson
When guest page size > host page size, this test can fail due to the SIGBUS protection hack. Avoid this by making sure that the file size is at least one guest page. Visible with alpha guest on x86_64 host. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-26-richard.henderson@linaro.org>
2024-02-29tests/tcg: Remove run-test-mmap-*Richard Henderson
These tests are confused, because -p does not change the guest page size, but the host page size. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-25-richard.henderson@linaro.org>
2024-02-29linux-user: Split out mmap_h_gt_gRichard Henderson
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-24-richard.henderson@linaro.org>
2024-02-29linux-user: Split out mmap_h_lt_gRichard Henderson
Work much harder to get alignment and mapping beyond the end of the file correct. Both of which are excercised by our test-mmap for alpha (8k pages) on any 4k page host. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-23-richard.henderson@linaro.org>
2024-02-29linux-user: Split out mmap_h_eq_gRichard Henderson
Move the MAX_FIXED_NOREPLACE check for reserved_va earlier. Move the computation of host_prot earlier. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-22-richard.henderson@linaro.org>
2024-02-29linux-user: Use do_munmap for target_mmap failureRichard Henderson
For the cases for which the host mmap succeeds, but does not yield the desired address, use do_munmap to restore the reserved_va memory reservation. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-29linux-user: Split out do_munmapRichard Henderson
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-29linux-user: Do early mmap placement only for reserved_vaRichard Henderson
For reserved_va, place all non-fixed maps then proceed as for MAP_FIXED. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-21-richard.henderson@linaro.org>
2024-02-29linux-user: Split out mmap_endRichard Henderson
Use a subroutine instead of a goto within target_mmap__locked. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-20-richard.henderson@linaro.org>
2024-02-29linux-user: Fix sub-host-page mmapRichard Henderson
We cannot skip over the_end1 to the_end, because we fail to record the validity of the guest page with the interval tree. Remove "the_end" and rename "the_end1" to "the_end". Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-19-richard.henderson@linaro.org>
2024-02-29linux-user: Move some mmap checks outside the lockRichard Henderson
Basic validation of operands does not require the lock. Hoist them from target_mmap__locked back into target_mmap. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-18-richard.henderson@linaro.org>
2024-02-29linux-user: Split out target_mmap__lockedRichard Henderson
All "goto fail" may be transformed to "return -1". Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-17-richard.henderson@linaro.org>
2024-02-29linux-user: Remove qemu_host_page_size from mainRichard Henderson
Use qemu_real_host_page_size() instead. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-16-richard.henderson@linaro.org>
2024-02-29softmmu/physmem: Remove HOST_PAGE_ALIGNRichard Henderson
Align allocation sizes to the maximum of host and target page sizes. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-15-richard.henderson@linaro.org>
2024-02-29softmmu/physmem: Remove qemu_host_page_sizeRichard Henderson
Use qemu_real_host_page_size() instead. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-14-richard.henderson@linaro.org>