aboutsummaryrefslogtreecommitdiff
path: root/exec.c
AgeCommit message (Collapse)Author
2019-12-16Memory: Enable writeback for given memory regionBeata Michalska
Add an option to trigger memory writeback to sync given memory region with the corresponding backing store, case one is available. This extends the support for persistent memory, allowing syncing on-demand. Signed-off-by: Beata Michalska <beata.michalska@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20191121000843.24844-3-beata.michalska@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-10-30Merge remote-tracking branch ↵Peter Maydell
'remotes/stsquad/tags/pull-tcg-plugins-281019-4' into staging TCG Plugins initial implementation - use --enable-plugins @ configure - low impact introspection (-plugin empty.so to measure overhead) - plugins cannot alter guest state - example plugins included in source tree (tests/plugins) - -d plugin to enable plugin output in logs - check-tcg runs extra tests when plugins enabled - documentation in docs/devel/plugins.rst # gpg: Signature made Mon 28 Oct 2019 15:13:23 GMT # gpg: using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44 # gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [full] # Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44 * remotes/stsquad/tags/pull-tcg-plugins-281019-4: (57 commits) travis.yml: enable linux-gcc-debug-tcg cache MAINTAINERS: add me for the TCG plugins code scripts/checkpatch.pl: don't complain about (foo, /* empty */) .travis.yml: add --enable-plugins tests include/exec: wrap cpu_ldst.h in CONFIG_TCG accel/stubs: reduce headers from tcg-stub tests/plugin: add hotpages to analyse memory access patterns tests/plugin: add instruction execution breakdown tests/plugin: add a hotblocks plugin tests/tcg: enable plugin testing tests/tcg: drop test-i386-fprem from TESTS when not SLOW tests/tcg: move "virtual" tests to EXTRA_TESTS tests/tcg: set QEMU_OPTS for all cris runs tests/tcg/Makefile.target: fix path to config-host.mak tests/plugin: add sample plugins linux-user: support -plugin option vl: support -plugin option plugin: add qemu_plugin_outs helper plugin: add qemu_plugin_insn_disas helper plugin: expand the plugin_init function to include an info block ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-10-29Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20191028' into stagingPeter Maydell
Improvements for TARGET_PAGE_BITS_VARY Fix for TCI ld16u_i64. Fix for segv on icount execute from i/o memory. Two misc cleanups. # gpg: Signature made Mon 28 Oct 2019 14:55:08 GMT # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/pull-tcg-20191028: translate-all: Remove tb_alloc translate-all: fix uninitialized tb->orig_tb cputlb: Fix tlb_vaddr_to_host exec: Cache TARGET_PAGE_MASK for TARGET_PAGE_BITS_VARY exec: Promote TARGET_PAGE_MASK to target_long exec: Restrict TARGET_PAGE_BITS_VARY assert to CONFIG_DEBUG_TCG exec: Use const alias for TARGET_PAGE_BITS_VARY configure: Detect compiler support for __attribute__((alias)) exec: Split out variable page size support to exec-vary.c cpu: use ROUND_UP() to define xxx_PAGE_ALIGN cputlb: ensure _cmmu helper functions follow the naming standard tci: Add implementation for INDEX_op_ld16u_i64 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-10-28cpu: hook plugin vcpu eventsEmilio G. Cota
Signed-off-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
2019-10-28exec: Split out variable page size support to exec-vary.cRichard Henderson
The next patch will play a trick with "const" that will confuse the compiler about the uses of target_page_bits within exec.c. Moving everything to a new file prevents this confusion. No functional change so far. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-10-26core: replace getpagesize() with qemu_real_host_page_sizeWei Yang
There are three page size in qemu: real host page size host page size target page size All of them have dedicate variable to represent. For the last two, we use the same form in the whole qemu project, while for the first one we use two forms: qemu_real_host_page_size and getpagesize(). qemu_real_host_page_size is defined to be a replacement of getpagesize(), so let it serve the role. [Note] Not fully tested for some arch or device. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-11rcu: Use automatic rc_read unlock in core memory/exec codeDr. David Alan Gilbert
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20191007143642.301445-6-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-10-04memory: allow memory_region_register_iommu_notifier() to failEric Auger
Currently, when a notifier is attempted to be registered and its flags are not supported (especially the MAP one) by the IOMMU MR, we generally abruptly exit in the IOMMU code. The failure could be handled more nicely in the caller and especially in the VFIO code. So let's allow memory_region_register_iommu_notifier() to fail as well as notify_flag_changed() callback. All sites implementing the callback are updated. This patch does not yet remove the exit(1) in the amd_iommu code. in SMMUv3 we turn the warning message into an error message saying that the assigned device would not work properly. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-03replay: don't synchronize memory operations in replay modePavel Dovgalyuk
Commit 9458a9a1df1a4c719e24512394d548c1fc7abd22 added synchronization of vCPU and migration operations through calling run_on_cpu operation. However, in replay mode this synchronization is unneeded, because I/O and vCPU threads are already synchronized. This patch disables such synchronization for record/replay mode. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@gmail.com>
2019-09-25cputlb: Pass retaddr to tb_check_watchpointRichard Henderson
Fixes the previous TLB_WATCHPOINT patches because we are currently failing to set cpu->mem_io_pc with the call to cpu_check_watchpoint. Pass down the retaddr directly because it's readily available. Fixes: 50b107c5d61 Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-25cputlb: Remove tb_invalidate_phys_page_range is_cpu_write_accessRichard Henderson
All callers pass false to this argument. Remove it and pass the constant on to tb_invalidate_phys_page_range__locked. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-25cputlb: Merge and move memory_notdirty_write_{prepare,complete}Richard Henderson
Since 9458a9a1df1a, all readers of the dirty bitmaps wait for the rcu lock, which means that they wait until the end of any executing TranslationBlock. As a consequence, there is no need for the actual access to happen in between the _prepare and _complete. Therefore, we can improve things by merging the two functions into notdirty_write and dropping the NotDirtyInfo structure. In addition, the only users of notdirty_write are in cputlb.c, so move the merged function there. Pass in the CPUIOTLBEntry from which the ram_addr_t may be computed. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-25cputlb: Partially inline memory_region_section_get_iotlbRichard Henderson
There is only one caller, tlb_set_page_with_attrs. We cannot inline the entire function because the AddressSpaceDispatch structure is private to exec.c, and cannot easily be moved to include/exec/memory-internal.h. Compute is_ram and is_romd once within tlb_set_page_with_attrs. Fold the number of tests against these predicates. Compute cpu_physical_memory_is_clean outside of the tlb lock region. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-25cputlb: Move NOTDIRTY handling from I/O path to TLB pathRichard Henderson
Pages that we want to track for NOTDIRTY are RAM. We do not really need to go through the I/O path to handle them. Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-25cputlb: Move ROM handling from I/O path to TLB pathRichard Henderson
It does not require going through the whole I/O path in order to discard a write. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-25exec: Adjust notdirty tracingRichard Henderson
The memory_region_tb_read tracepoint is unreachable, since notdirty is supposed to apply only to writes. The memory_region_tb_write tracepoint is mis-named, because notdirty is not only used for TB invalidation. It is also used for e.g. VGA RAM updates and migration. Replace memory_region_tb_write with memory_notdirty_write_access, and place it in memory_notdirty_write_prepare where it can catch all of the instances. Add memory_notdirty_set_dirty to log when we no longer intercept writes to a page. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-16memory: fetch pmem size in get_file_size()Stefan Hajnoczi
Neither stat(2) nor lseek(2) report the size of Linux devdax pmem character device nodes. Commit 314aec4a6e06844937f1677f6cba21981005f389 ("hostmem-file: reject invalid pmem file sizes") added code to hostmem-file.c to fetch the size from sysfs and compare against the user-provided size=NUM parameter: if (backend->size > size) { error_setg(errp, "size property %" PRIu64 " is larger than " "pmem file \"%s\" size %" PRIu64, backend->size, fb->mem_path, size); return; } It turns out that exec.c:qemu_ram_alloc_from_fd() already has an equivalent size check but it skips devdax pmem character devices because lseek(2) returns 0: if (file_size > 0 && file_size < size) { error_setg(errp, "backing store %s size 0x%" PRIx64 " does not match 'size' option 0x" RAM_ADDR_FMT, mem_path, file_size, size); return NULL; } This patch moves the devdax pmem file size code into get_file_size() so that we check the memory size in a single place: qemu_ram_alloc_from_fd(). This simplifies the code and makes it more general. This also fixes the problem that hostmem-file only checks the devdax pmem file size when the pmem=on parameter is given. An unchecked size=NUM parameter can lead to SIGBUS in QEMU so we must always fetch the file size for Linux devdax pmem character device nodes. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20190830093056.12572-1-stefanha@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-16exec.c: add a check between constants to see whether we could skipWei Yang
The maximum level is defined as P_L2_LEVELS and skip is defined with 6 bits, which means if P_L2_LEVELS < (1 << 6), skip never exceeds the boundary. Since this check is between two constants, which leverages compiler to optimize the code based on different configuration. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190321082555.21118-7-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-16exec.c: correct the maximum skip value during compactWei Yang
skip is defined with 6 bits. So the maximum value should be (1 << 6). Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190321082555.21118-6-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-16exec.c: subpage->sub_section is already initialized to 0Wei Yang
In subpage_init(), we will set subpage->sub_section to PHYS_SECTION_UNASSIGNED by subpage_register. Since PHYS_SECTION_UNASSIGNED is defined to be 0, and we allocate subpage with g_malloc0, this means subpage->sub_section is already initialized to 0. This patch removes the redundant setup for a new subpage and also fix the code style. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190321082555.21118-5-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-16exec.c: get nodes_nb_alloc with one MAX calculationWei Yang
The purpose of these two MAX here is to get the maximum of these three variables: A: map->nodes_nb + nodes B: map->nodes_nb_alloc C: alloc_hint We can write it like MAX(A, B, C). Since the if condition says A > B, this means MAX(A, B, C) = MAX(A, C). This patch just simplify the calculation a bit. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190321082555.21118-4-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-16exec.c: replace hwaddr with uint64_t for better understandingWei Yang
Function phys_page_set() and phys_page_set_level() 's argument *nb* stands for number of pages to set instead of hardware address. This would be more proper to use uint64_t instead of hwaddr for its type. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190321082555.21118-2-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-04Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20190903' into stagingPeter Maydell
Allow page table bit to swap endianness. Reorganize watchpoints out of i/o path. Return host address from probe_write / probe_access. # gpg: Signature made Tue 03 Sep 2019 16:47:50 BST # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/pull-tcg-20190903: (36 commits) tcg: Factor out probe_write() logic into probe_access() tcg: Make probe_write() return a pointer to the host page s390x/tcg: Pass a size to probe_write() in do_csst() hppa/tcg: Call probe_write() also for CONFIG_USER_ONLY mips/tcg: Call probe_write() for CONFIG_USER_ONLY as well tcg: Enforce single page access in probe_write() tcg: Factor out CONFIG_USER_ONLY probe_write() from s390x code s390x/tcg: Fix length calculation in probe_write_access() s390x/tcg: Use guest_addr_valid() instead of h2g_valid() in probe_write_access() tcg: Check for watchpoints in probe_write() cputlb: Handle watchpoints via TLB_WATCHPOINT cputlb: Remove double-alignment in store_helper cputlb: Fix size operand for tlb_fill on unaligned store exec: Factor out cpu_watchpoint_address_matches cputlb: Fold TLB_RECHECK into TLB_INVALID_MASK exec: Factor out core logic of check_watchpoint() exec: Move user-only watchpoint stubs inline target/sparc: sun4u Invert Endian TTE bit target/sparc: Add TLB entry with attributes cputlb: Byte swap memory transaction attribute ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-09-03cputlb: Handle watchpoints via TLB_WATCHPOINTRichard Henderson
The raising of exceptions from check_watchpoint, buried inside of the I/O subsystem, is fundamentally broken. We do not have the helper return address with which we can unwind guest state. Replace PHYS_SECTION_WATCH and io_mem_watch with TLB_WATCHPOINT. Move the call to cpu_check_watchpoint into the cputlb helpers where we do have the helper return address. This allows watchpoints on RAM to bypass the full i/o access path. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03exec: Factor out cpu_watchpoint_address_matchesRichard Henderson
We want to move the check for watchpoints from memory_region_section_get_iotlb to tlb_set_page_with_attrs. Isolate the loop over watchpoints to an exported function. Rename the existing cpu_watchpoint_address_matches to watchpoint_address_matches, since it doesn't actually have a cpu argument. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03exec: Factor out core logic of check_watchpoint()David Hildenbrand
We want to perform the same checks in probe_write() to trigger a cpu exit before doing any modifications. We'll have to pass a PC. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20190823100741.9621-9-david@redhat.com> [rth: Use vaddr for len, like other watchpoint functions; Move user-only stub to static inline.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03exec: Move user-only watchpoint stubs inlineRichard Henderson
Let the user-only watchpoint stubs resolve to empty inline functions. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03memory: Single byte swap along the I/O pathTony Nguyen
Now that MemOp has been pushed down into the memory API, and callers are encoding endianness, we can collapse byte swaps along the I/O path into the accelerator and target independent adjust_endianness. Collapsing byte swaps along the I/O path enables additional endian inversion logic, e.g. SPARC64 Invert Endian TTE bit, with redundant byte swaps cancelling out. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Message-Id: <911ff31af11922a9afba9b7ce128af8b8b80f316.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03memory: Access MemoryRegion with endiannessTony Nguyen
Preparation for collapsing the two byte swaps adjust_endianness and handle_bswap into the former. Call memory_region_dispatch_{read|write} with endianness encoded into the "MemOp op" operand. This patch does not change any behaviour as memory_region_dispatch_{read|write} is yet to handle the endianness. Once it does handle endianness, callers with byte swaps can collapse them into adjust_endianness. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Message-Id: <8066ab3eb037c0388dfadfe53c5118429dd1de3a.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03exec: Access MemoryRegion with MemOpTony Nguyen
The memory_region_dispatch_{read|write} operand "unsigned size" is being converted into a "MemOp op". Convert interfaces by using no-op size_memop. After all interfaces are converted, size_memop will be implemented and the memory_region_dispatch_{read|write} operand "unsigned size" will be converted into a "MemOp op". As size_memop is a no-op, this patch does not change any behaviour. Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <3b042deef0a60dd49ae2320ece92120ba6027f2b.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03numa: move numa global variable numa_info into MachineStateTao Xu
Move existing numa global numa_info (renamed as "nodes") into NumaState. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20190809065731.9097-5-tao3.xu@intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-09-03numa: move numa global variable nb_numa_nodes into MachineStateTao Xu
Add struct NumaState in MachineState and move existing numa global nb_numa_nodes(renamed as "num_nodes") into NumaState. And add variable numa_support into MachineClass to decide which submachines support NUMA. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20190809065731.9097-3-tao3.xu@intel.com> [ehabkost: include hw/boards.h again to fix build failures] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-08-22Merge remote-tracking branch 'remotes/armbru/tags/pull-monitor-2019-08-21' ↵Peter Maydell
into staging Monitor patches for 2019-08-21 # gpg: Signature made Wed 21 Aug 2019 16:35:07 BST # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-monitor-2019-08-21: monitor/qmp: Update comment for commit 4eaca8de268 qdev: Collect HMP handlers command handlers in qdev-monitor.c qapi: Move query-target from misc.json to machine.json hw/core: Move cpu.c, cpu.h from qom/ to hw/core/ Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-08-21hw/core: Move cpu.c, cpu.h from qom/ to hw/core/Markus Armbruster
Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190709152053.16670-2-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> [Rebased onto merge commit 95a9457fd44; missed instances of qom/cpu.h in comments replaced]
2019-08-20memory: fix race between TCG and accesses to dirty bitmapPaolo Bonzini
There is a race between TCG and accesses to the dirty log: vCPU thread reader thread ----------------------- ----------------------- TLB check -> slow path notdirty_mem_write write to RAM set dirty flag clear dirty flag TLB check -> fast path read memory write to RAM Fortunately, in order to fix it, no change is required to the vCPU thread. However, the reader thread must delay the read after the vCPU thread has finished the write. This can be approximated conservatively by run_on_cpu, which waits for the end of the current translation block. A similar technique is used by KVM, which has to do a synchronous TLB flush after doing a test-and-clear of the dirty-page flags. Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-16numa: Move remaining NUMA declarations from sysemu.h to numa.hMarkus Armbruster
Commit e35704ba9c "numa: Move NUMA declarations from sysemu.h to numa.h" left a few NUMA-related macros behind. Move them now. Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190812052359.30071-26-armbru@redhat.com>
2019-08-16Include hw/hw.h exactly where neededMarkus Armbruster
In my "build everything" tree, changing hw/hw.h triggers a recompile of some 2600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). The previous commits have left only the declaration of hw_error() in hw/hw.h. This permits dropping most of its inclusions. Touching it now recompiles less than 200 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190812052359.30071-19-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-07-15memory: Introduce memory listener hook log_clear()Peter Xu
Introduce a new memory region listener hook log_clear() to allow the listeners to hook onto the points where the dirty bitmap is cleared by the bitmap users. Previously log_sync() contains two operations: - dirty bitmap collection, and, - dirty bitmap clear on remote site. Let's take KVM as example - log_sync() for KVM will first copy the kernel dirty bitmap to userspace, and at the same time we'll clear the dirty bitmap there along with re-protecting all the guest pages again. We add this new log_clear() interface only to split the old log_sync() into two separated procedures: - use log_sync() to collect the collection only, and, - use log_clear() to clear the remote dirty bitmap. With the new interface, the memory listener users will still be able to decide how to implement the log synchronization procedure, e.g., they can still only provide log_sync() method only and put all the two procedures within log_sync() (that's how the old KVM works before KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is introduced). However with this new interface the memory listener users will start to have a chance to postpone the log clear operation explicitly if the module supports. That can really benefit users like KVM at least for host kernels that support KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2. There are three places that can clear dirty bits in any one of the dirty bitmap in the ram_list.dirty_memory[3] array: cpu_physical_memory_snapshot_and_clear_dirty cpu_physical_memory_test_and_clear_dirty cpu_physical_memory_sync_dirty_bitmap Currently we hook directly into each of the functions to notify about the log_clear(). Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190603065056.25211-7-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-15memory: Pass mr into snapshot_and_clear_dirtyPeter Xu
Also we change the 2nd parameter of it to be the relative offset within the memory region. This is to be used in follow up patches. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190603065056.25211-6-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-05general: Replace global smp variables with smp machine propertiesLike Xu
Basically, the context could get the MachineState reference via call chains or unrecommended qdev_get_machine() in !CONFIG_USER_ONLY mode. A local variable of the same name would be introduced in the declaration phase out of less effort OR replace it on the spot if it's only used once in the context. No semantic changes. Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190518205428.90532-4-like.xu@linux.intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-06-12Include qemu-common.h exactly where neededMarkus Armbruster
No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]
2019-06-11qemu-common: Move tcg_enabled() etc. to sysemu/tcg.hMarkus Armbruster
Other accelerators have their own headers: sysemu/hax.h, sysemu/hvf.h, sysemu/kvm.h, sysemu/whpx.h. Only tcg_enabled() & friends sit in qemu-common.h. This necessitates inclusion of qemu-common.h into headers, which is against the rules spelled out in qemu-common.h's file comment. Move tcg_enabled() & friends into their own header sysemu/tcg.h, and adjust #include directives. Cc: Richard Henderson <rth@twiddle.net> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-2-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> [Rebased with conflicts resolved automatically, except for accel/tcg/tcg-all.c]
2019-04-26Merge remote-tracking branch ↵Peter Maydell
'remotes/ehabkost/tags/machine-next-pull-request' into staging Machine queue, 2019-04-25 * 4.1 machine-types (Cornelia Huck) * Support MAP_SYNC on pmem memory backends (Zhang Yi) * -cpu parsing fixes and cleanups (Eduardo Habkost) * machine initialization cleanups (Wei Yang, Markus Armbruster) # gpg: Signature made Thu 25 Apr 2019 18:54:57 BST # gpg: using RSA key 2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full] # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/machine-next-pull-request: util/mmap-alloc: support MAP_SYNC in qemu_ram_mmap() linux-headers: add linux/mman.h. scripts/update-linux-headers: add linux/mman.h util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmap cpu: Fix crash with empty -cpu option cpu: Rename parse_cpu_model() to parse_cpu_option() vl: Simplify machine_parse() vl: Clean up after previous commit vl.c: allocate TYPE_MACHINE list once during bootup vl.c: make find_default_machine() local hw: add compat machines for 4.1 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-04-25util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmapZhang Yi
besides the existing 'shared' flags, we are going to add 'is_pmem' to qemu_ram_mmap(), which indicated the memory backend file is a persist memory. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Message-Id: <786c46862cfeb253ee0ea2f44d62ffe76edb7fa4.1549555521.git.yi.z.zhang@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-04-25cpu: Fix crash with empty -cpu optionEduardo Habkost
Fix the following crash: $ qemu-system-x86_64 -cpu '' qemu-system-x86_64: qom/cpu.c:291: cpu_class_by_name: \ Assertion `cpu_model && cc->class_by_name' failed. Regression test script included. Fixes: 99193d8f2ef5 ("cpu: drop unnecessary NULL check and cpu_common_class_by_name()") Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190418034501.5038-1-ehabkost@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-04-25cpu: Rename parse_cpu_model() to parse_cpu_option()Eduardo Habkost
The "model[,option...]" string parsed by the function is not just a CPU model. Rename the function and its argument to indicate it expects the full "-cpu" option to be provided. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190417025944.16154-2-ehabkost@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-04-25exec: Introduce qemu_maxrampagesize() and rename qemu_getrampagesize()David Hildenbrand
Rename qemu_getrampagesize() to qemu_minrampagesize(). While at it, properly rename find_max_supported_pagesize() to find_min_backend_pagesize(). s390x is actually interested into the maximum ram pagesize, so introduce and use qemu_maxrampagesize(). Add a TODO, indicating that looking at any mapped memory backends is not 100% correct in some cases. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190417113143.5551-3-david@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-04-18qom/cpu: Simplify how CPUClass:cpu_dump_state() printsMarkus Armbruster
CPUClass method dump_statistics() takes an fprintf()-like callback and a FILE * to pass to it. Most callers pass fprintf() and stderr. log_cpu_state() passes fprintf() and qemu_log_file. hmp_info_registers() passes monitor_fprintf() and the current monitor cast to FILE *. monitor_fprintf() casts it right back, and is otherwise identical to monitor_printf(). The callback gets passed around a lot, which is tiresome. The type-punning around monitor_fprintf() is ugly. Drop the callback, and call qemu_fprintf() instead. Also gets rid of the type-punning, since qemu_fprintf() takes NULL instead of the current monitor cast to FILE *. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190417191805.28198-15-armbru@redhat.com>
2019-04-18memory: Clean up how mtree_info() printsMarkus Armbruster
mtree_info() takes an fprintf()-like callback and a FILE * to pass to it, and so do its helper functions. Passing around callback and argument is rather tiresome. Its only caller hmp_info_mtree() passes monitor_printf() cast to fprintf_function and the current monitor cast to FILE *. The type-punning is technically undefined behaviour, but works in practice. Clean up: drop the callback, and call qemu_printf() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190417191805.28198-9-armbru@redhat.com>
2019-03-29exec: Only count mapped memory backends for qemu_getrampagesize()David Gibson
qemu_getrampagesize() works out the minimum host page size backing any of guest RAM. This is required in a few places, such as for POWER8 PAPR KVM guests, because limitations of the hardware virtualization mean the guest can't use pagesizes larger than the host pages backing its memory. However, it currently checks against *every* memory backend, whether or not it is actually mapped into guest memory at the moment. This is incorrect. This can cause a problem attempting to add memory to a POWER8 pseries KVM guest which is configured to allow hugepages in the guest (e.g. -machine cap-hpt-max-page-size=16m). If you attempt to add non-hugepage, you can (correctly) create a memory backend, however it (correctly) will throw an error when you attempt to map that memory into the guest by 'device_add'ing a pc-dimm. What's not correct is that if you then reset the guest a startup check against qemu_getrampagesize() will cause a fatal error because of the new memory object, even though it's not mapped into the guest. This patch corrects the problem by adjusting find_max_supported_pagesize() (called from qemu_getrampagesize() via object_child_foreach) to exclude non-mapped memory backends. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Hildenbrand <david@redhat.com>