aboutsummaryrefslogtreecommitdiff
path: root/kvm-all.c
AgeCommit message (Collapse)Author
2012-03-01Merge remote-tracking branch 'qemu-kvm/memory/urgent' into stagingAnthony Liguori
* qemu-kvm/memory/urgent: kvm: fix unaligned slots
2012-03-01Merge remote-tracking branch 'qemu-kvm/memory/core' into stagingAnthony Liguori
* qemu-kvm/memory/core: (30 commits) memory: allow phys_map tree paths to terminate early memory: unify PhysPageEntry::node and ::leaf memory: change phys_page_set() to set multiple pages memory: switch phys_page_set() to a recursive implementation memory: replace phys_page_find_alloc() with phys_page_set() memory: simplify multipage/subpage registration memory: give phys_page_find() its own tree search loop memory: make phys_page_find() return a MemoryRegionSection memory: move tlb flush to MemoryListener commit callback memory: unify the two branches of cpu_register_physical_memory_log() memory: fix RAM subpages in newly initialized pages memory: compress phys_map node pointers to 16 bits memory: store MemoryRegionSection pointers in phys_map memory: unify phys_map last level with intermediate levels memory: remove first level of l1_phys_map memory: change memory registration to rebuild the memory map on each change memory: support stateless memory listeners memory: split memory listener for the two address spaces xen: ignore I/O memory regions memory: allow MemoryListeners to observe a specific address space ...
2012-03-01kvm: fix unaligned slotsAvi Kivity
kvm_set_phys_mem() may be passed sections that are not aligned to a page boundary. The current code simply brute-forces the alignment which leads to an inconsistency and an abort(). Fix by aligning the start and the end of the section correctly, discarding and unaligned head or tail. This was triggered by a guest sizing a 64-bit BAR that is smaller than a page with PCI_COMMAND_MEMORY enabled and the upper dword clear. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-02-29memory: support stateless memory listenersAvi Kivity
Current memory listeners are incremental; that is, they are expected to maintain their own state, and receive callbacks for changes to that state. This patch adds support for stateless listeners; these work by receiving a ->begin() callback (which tells them that new state is coming), a sequence of ->region_add() and ->region_nop() callbacks, and then a ->commit() callback which signifies the end of the new state. They should ignore ->region_del() callbacks. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-02-29memory: allow MemoryListeners to observe a specific address spaceAvi Kivity
Ignore any regions not belonging to a specified address space. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-02-29memory: move ioeventfd ops to MemoryListenerAvi Kivity
This way the accelerator (kvm) can handle them directly. Signed-off-by: Avi Kivity <avi@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>
2012-02-29memory: switch memory listeners to a QTAILQAvi Kivity
This allows reverse iteration, which in turns allows consistent ordering among multiple listeners: l1->add l2->add l2->del l1->del Signed-off-by: Avi Kivity <avi@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>
2012-02-18kvm: Set cpu_single_env only onceJan Kiszka
As we have thread-local cpu_single_env now and KVM uses exactly one thread per VCPU, we can drop the cpu_single_env updates from the loop and initialize this variable only once during setup. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-02-08kvm: Implement kvm_irqchip_in_kernel like kvm_enabledJan Kiszka
To both avoid that kvm_irqchip_in_kernel always has to be paired with kvm_enabled and that the former ends up in a function call, implement it like the latter. This means keeping the state in a global variable and defining kvm_irqchip_in_kernel as a preprocessor macro. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-02-02KVM: Fix compilation on non-x86Alexander Graf
Commit 84b058d broke compilation for KVM on non-x86 targets, which don't have KVM_CAP_IRQ_ROUTING defined. Fix by not using the unavailable constant when it's not around. Signed-off-by: Alexander Graf <agraf@suse.de>
2012-01-25memory: change dirty setting APIs to take a sizeBlue Swirl
Instead of each target knowing or guessing the guest page size, just pass the desired size of dirtied memory area. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-01-23Merge remote-tracking branch 'qemu-kvm/uq/master' into stagingAnthony Liguori
* qemu-kvm/uq/master: kvm: Activate in-kernel irqchip support kvm: x86: Add user space part for in-kernel IOAPIC kvm: x86: Add user space part for in-kernel i8259 kvm: x86: Add user space part for in-kernel APIC kvm: x86: Establish IRQ0 override control kvm: Introduce core services for in-kernel irqchip support memory: Introduce memory_region_init_reservation ioapic: Factor out base class for KVM reuse ioapic: Drop post-load irr initialization i8259: Factor out base class for KVM reuse i8259: Completely privatize PicState apic: Open-code timer save/restore apic: Factor out base class for KVM reuse apic: Introduce apic_report_irq_delivered apic: Inject external NMI events via LINT1 apic: Stop timer on reset kvm: Move kvmclock into hw/kvm folder msi: Generalize msix_supported to msi_supported hyper-v: initialize Hyper-V CPUID leaves. hyper-v: introduce Hyper-V support infrastructure. Conflicts: Makefile.target Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-01-21Fix dirty logging with 32-bit qemu & 64-bit guestsBenjamin Herrenschmidt
The kvm_get_dirty_pages_log_range() function uses two address variables to step through the monitored memory region to update the dirty log. However, these variables have type unsigned long, which can overflow if running a 64-bit guest with a 32-bit qemu binary. This patch changes these to target_phys_addr_t which will have the correct size. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2012-01-19kvm: x86: Establish IRQ0 override controlJan Kiszka
KVM is forced to disable the IRQ0 override when we run with in-kernel irqchip but without IRQ routing support of the kernel. Set the fwcfg value correspondingly. This aligns us with qemu-kvm. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
2012-01-19kvm: Introduce core services for in-kernel irqchip supportJan Kiszka
Add the basic infrastructure to active in-kernel irqchip support, inject interrupts into these models, and maintain IRQ routes. Routing is optional and depends on the host arch supporting KVM_CAP_IRQ_ROUTING. When it's not available on x86, we looe the HPET as we can't route GSI0 to IOAPIC pin 2. In-kernel irqchip support will once be controlled by the machine property 'kernel_irqchip', but this is not yet wired up. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
2012-01-15kvm: flush the dirty log when unregistering a slotAvi Kivity
Otherwise, the dirty log information is lost in the kernel forever. Fixes opensuse-12.1 boot screen, which changes the vga windows rapidly. Signed-off-by: Avi Kivity <avi@redhat.com>
2012-01-03kvm: avoid cpu_get_physical_page_desc()Avi Kivity
This reaches into the innards of the memory core, which are being changed. Switch to a memory API version. Signed-off-by: Avi Kivity <avi@redhat.com>
2011-12-20kvm: convert to MemoryListener APIAvi Kivity
Drop the use of cpu_register_phys_memory_client() in favour of the new MemoryListener API. The new API simplifies the caller, since there is no need to deal with splitting and merging slots; however this is not exploited in this patch. Signed-off-by: Avi Kivity <avi@redhat.com>
2011-12-20kvm: switch kvm slots to use host virtual address instead of ram_addr_tAvi Kivity
This simplifies a later switch to the memory API in slot management. Signed-off-by: Avi Kivity <avi@redhat.com>
2011-12-16kvm: Print something before calling abort() if KVM_RUN failsMichael Ellerman
It's a little unfriendly to call abort() without printing any sort of error message. So turn the DPRINTK() into an fprintf(stderr, ...). Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-11-01kvm_init didn't set return value after create vm failedXu He Jie
And kvm_ioctl(s, KVM_CREATE_VM, 0)'s return value can be < -1, so change the check of vmfd at label 'err'. Signed-off-by: Xu He Jie <xuhj@linux.vnet.ibm.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-10-24kvm: avoid reentring kvm_flush_coalesced_mmio_buffer()Avi Kivity
mmio callbacks invoked by kvm_flush_coalesced_mmio_buffer() may themselves indirectly call kvm_flush_coalesced_mmio_buffer(). Prevent reentering the function by checking a flag that indicates we're processing coalesced mmio requests. Signed-off-by: Avi Kivity <avi@redhat.com>
2011-10-04RunState: Rename enum values as generated by the QAPILuiz Capitulino
Next commit will convert the query-status command to use the RunState type as generated by the QAPI. In order to "transparently" replace the current enum by the QAPI one, we have to make some changes to some enum values. As the changes are simple renames, I'll do them in one shot. The changes are: - Rename the prefix from RSTATE_ to RUN_STATE_ - RUN_STATE_SAVEVM to RUN_STATE_SAVE_VM - RUN_STATE_IN_MIGRATE to RUN_STATE_INMIGRATE - RUN_STATE_PANICKED to RUN_STATE_INTERNAL_ERROR - RUN_STATE_POST_MIGRATE to RUN_STATE_POSTMIGRATE - RUN_STATE_PRE_LAUNCH to RUN_STATE_PRELAUNCH - RUN_STATE_PRE_MIGRATE to RUN_STATE_PREMIGRATE - RUN_STATE_RESTORE to RUN_STATE_RESTORE_VM - RUN_STATE_PRE_MIGRATE to RUN_STATE_FINISH_MIGRATE Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2011-09-15Replace the VMSTOP macros with a proper state typeLuiz Capitulino
Today, when notifying a VM state change with vm_state_notify(), we pass a VMSTOP macro as the 'reason' argument. This is not ideal because the VMSTOP macros tell why qemu stopped and not exactly what the current VM state is. One example to demonstrate this problem is that vm_start() calls vm_state_notify() with reason=0, which turns out to be VMSTOP_USER. This commit fixes that by replacing the VMSTOP macros with a proper state type called RunState. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2011-09-02main: force enabling of I/O threadAnthony Liguori
Enabling the I/O thread by default seems like an important part of declaring 1.0. Besides allowing true SMP support with KVM, the I/O thread means that the TCG VCPU doesn't have to multiplex itself with the I/O dispatch routines which currently requires a (racey) signal based alarm system. I know there have been concerns about performance. I think so far the ones that have come up (virtio-net) are most likely due to secondary reasons like decreased batching. I think we ought to force enabling I/O thread early in 1.0 development and commit to resolving any lingering issues. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-20Use glib memory allocation and free functionsAnthony Liguori
qemu_malloc/qemu_free no longer exist after this commit. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-08-05kvm: Drop obsolete KVM_IOEVENTFD #ifdefsJan Kiszka
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-06-20kvm: Drop KVM_CAP build dependenciesJan Kiszka
No longer needed with accompanied kernel headers. We are only left with build dependencies that are controlled by kvm arch headers. CC: Alexander Graf <agraf@suse.de> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-06-20kvm: Drop useless zero-initializationsJan Kiszka
Backing KVMState is alreay zero-initialized. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-05-09kvm: ppc: warn user on PAGE_SIZE mismatchAlexander Graf
On PPC, the default PAGE_SIZE is 64kb. Unfortunately, the hardware alignments don't match here: There are RAM and MMIO regions within a single page when it's 64kb in size. So the only way out for now is to tell the user that he should use 4k PAGE_SIZE. This patch gives the user a hint on that, telling him that failing to register a prefix slot is most likely to be caused by mismatching PAGE_SIZE. This way it's also more future-proof, as bigger PAGE_SIZE can easily be supported by other machines then, as long as they stick to 64kb granularities. Signed-off-by: Alexander Graf <agraf@suse.de>
2011-05-05Merge remote branch 'origin/master' into pciMichael S. Tsirkin
Conflicts: exec.c
2011-05-02kvm: use qemu_free consistentlyPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-05-02fix crash in migration, 32-bit userspace on 64-bit hostMichael Tokarev
This change fixes a long-standing immediate crash (memory corruption and abort in glibc malloc code) in migration on 32bits. The bug is present since this commit: commit 692d9aca97b865b0f7903565274a52606910f129 Author: Bruce Rogers <brogers@novell.com> Date: Wed Sep 23 16:13:18 2009 -0600 qemu-kvm: allocate correct size for dirty bitmap The dirty bitmap copied out to userspace is stored in a long array, and gets copied out to userspace accordingly. This patch accounts for that correctly. Currently I'm seeing kvm crashing due to writing beyond the end of the alloc'd dirty bitmap memory, because the buffer has the wrong size. Signed-off-by: Bruce Rogers Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ int kvm_get_dirty_pages_range(kvm_context_t kvm, unsigned long phys_addr, - buf = qemu_malloc((slots[i].len / 4096 + 7) / 8 + 2); + buf = qemu_malloc(BITMAP_SIZE(slots[i].len)); r = kvm_get_map(kvm, KVM_GET_DIRTY_LOG, i, buf); BITMAP_SIZE is now open-coded in that function, like this: size = ALIGN(((mem->memory_size) >> TARGET_PAGE_BITS), HOST_LONG_BITS) / 8; The problem is that HOST_LONG_BITS in 32bit userspace is 32 but it's 64 in 64bit kernel. So userspace aligns this to 32, and kernel to 64, but since no length is passed from userspace to kernel on ioctl, kernel uses its size calculation and copies 4 extra bytes to userspace, corrupting memory. Here's how it looks like during migrate execution: our=20, kern=24 our=4, kern=8 ... our=4, kern=8 our=4064, kern=4064 our=512, kern=512 our=4, kern=8 our=20, kern=24 our=4, kern=8 ... our=4, kern=8 our=4064, kern=4064 *** glibc detected *** ./x86_64-softmmu/qemu-system-x86_64: realloc(): invalid next size: 0x08f20528 *** (our is userspace size above, kern is the size as calculated by the kernel). Fix this by always aligning to 64 in a hope that no platform will have sizeof(long)>8 any time soon, and add a comment describing it all. It's a small price to pay for bad kernel design. Alternatively it's possible to fix that in the kernel by using different size calculation depending on the current process. But this becomes quite ugly. Special thanks goes to Stefan Hajnoczi for spotting the fundamental cause of the issue, and to Alexander Graf for his support in #qemu. Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> CC: Bruce Rogers <brogers@novell.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-02kvm: Install specialized interrupt handlerJan Kiszka
KVM only requires to set the raised IRQ in CPUState and to kick the receiving vcpu if it is remote. Installing a specialized handler allows potential future changes to the TCG code path without risking KVM side effects. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-04-06kvm: halve number of set memory calls for vgaMichael S. Tsirkin
use the new api to reduce the number of these (expensive) system calls. Note: using this API, we should be able to get rid of vga_dirty_log_xxx APIs. Using them doesn't affect the performance though because we detects the log_dirty flag set and ignores the call. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2011-04-06cpu: add set_memory flag to request dirty loggingMichael S. Tsirkin
Pass the flag to all cpu notifiers, doing nothing at this point. Will be used by follow-up patches. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2011-03-16kvm: x86: Push kvm_arch_debug to kvm_arch_handle_exitJan Kiszka
There are no generic bits remaining in the handling of KVM_EXIT_DEBUG. So push its logic completely into arch hands, i.e. only x86 so far. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-03-16kvm: Align kvm_arch_handle_exit to kvm_cpu_exec changesJan Kiszka
Make the return code of kvm_arch_handle_exit directly usable for kvm_cpu_exec. This is straightforward for x86 and ppc, just s390 would require more work. Avoid this for now by pushing the return code translation logic into s390's kvm_arch_handle_exit. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> CC: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-03-16kvm: Rework inner loop of kvm_cpu_execJan Kiszka
Let kvm_cpu_exec return EXCP_* values consistently and generate those codes already inside its inner loop. This means we will now re-enter the kernel while ret == 0. Update kvm_handle_internal_error accordingly, but keep kvm_arch_handle_exit untouched, it will be converted in a separate step. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-03-16kvm: Reorder error handling of KVM_RUNJan Kiszka
Test for general errors first as this is the slower path. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-03-16kvm: Keep KVM_RUN return value in separate variableJan Kiszka
Avoid using 'ret' both for the return value of KVM_RUN as well as the code kvm_cpu_exec is supposed to return. Both have no direct relation. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-03-16kvm: Consider EXIT_DEBUG unknown without CAP_SET_GUEST_DEBUGJan Kiszka
Without KVM_CAP_SET_GUEST_DEBUG, we neither motivate the kernel to report KVM_EXIT_DEBUG nor do we expect such exits. So fall through to the arch code which will simply report an unknown exit reason. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-03-16kvm: Mark VCPU state dirty on creationJan Kiszka
This avoids that early cpu_synchronize_state calls try to retrieve an uninitialized state from the kernel. That even causes a deadlock if io-thread is enabled. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-03-15kvm: Rename kvm_arch_process_irqchip_events to async_eventsJan Kiszka
We will broaden the scope of this function on x86 beyond irqchip events. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-03-15kvm: Fix build warning when KVM_CAP_SET_GUEST_DEBUG is lackingJan Kiszka
Original fix by David Gibson. CC: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-02-14kvm: Make kvm_state globally availableJan Kiszka
KVM-assisted devices need access to it but we have no clean channel to distribute a reference. As a workaround until there is a better solution, export kvm_state for global use, though use should remain restricted to the mentioned scenario. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-02-14Introduce log_start/log_stop in CPUPhysMemoryClientAnthony PERARD
In order to use log_start/log_stop with Xen as well in the vga code, this two operations have been put in CPUPhysMemoryClient. The two new functions cpu_physical_log_start,cpu_physical_log_stop are used in hw/vga.c and replace the kvm_log_start/stop. With this, vga does no longer depends on kvm header. [ Jan: rebasing and style fixlets ] Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-02-14kvm: Remove unneeded memory slot reservationJan Kiszka
The number of slots and the location of private ones changed several times in KVM's early days. However, it's stable since 2.6.29 (our required baseline), and slots 8..11 are no longer reserved since then. So remove this unneeded restriction. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> CC: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-02-14kvm: Separate TCG from KVM cpu executionJan Kiszka
Mixing up TCG bits with KVM already led to problems around eflags emulation on x86. Moreover, quite some code that TCG requires on cpu enty/exit is useless for KVM. So dispatch between tcg_cpu_exec and kvm_cpu_exec as early as possible. The core logic of cpu_halted from cpu_exec is added to kvm_arch_process_irqchip_events. Moving away from cpu_exec makes exception_index meaningless for KVM, we can simply pass the exit reason directly (only "EXCP_DEBUG vs. rest" is relevant). Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-02-14Improve vm_stop reason declarationsJan Kiszka
Define and use dedicated constants for vm_stop reasons, they actually have nothing to do with the EXCP_* defines used so far. At this chance, specify more detailed reasons so that VM state change handlers can evaluate them. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>