aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-09-22chardev: add Chardev.gcontext fieldPeter Xu
It caches the gcontext that is used to poll the chardev IO. Before this patch, we only passed it in via chr_update_read_handlers(). However that may not be enough if the char backend is disconnected and reconnected afterward. There are chardev codes that still assumed the context be NULL (which is the main context). Will fix that up in following up patches. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1505975754-21555-3-git-send-email-peterx@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22chardev: new qemu_chr_be_update_read_handlers()Peter Xu
Add a wrapper for the chr_update_read_handler(). Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1505975754-21555-2-git-send-email-peterx@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22scsi: add persistent reservation manager using qemu-pr-helperPaolo Bonzini
This adds a concrete subclass of pr-manager that talks to qemu-pr-helper. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22scsi: add multipath support to qemu-pr-helperPaolo Bonzini
Proper support of persistent reservation for multipath devices requires communication with the multipath daemon, so that the reservation is registered and applied when a path comes up. The device mapper utilities provide a library to do so; this patch makes qemu-pr-helper.c detect multipath devices and, when one is found, delegate the operation to libmpathpersist. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22scsi: build qemu-pr-helperPaolo Bonzini
Introduce a privileged helper to run persistent reservation commands. This lets virtual machines send persistent reservations without using CAP_SYS_RAWIO or out-of-tree patches. The helper uses Unix permissions and SCM_RIGHTS to restrict access to processes that can access its socket and prove that they have an open file descriptor for a raw SCSI device. The next patch will also correct the usage of persistent reservations with multipath devices. It would also be possible to support for Linux's IOC_PR_* ioctls in the future, to support NVMe devices. For now, however, only SCSI is supported. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22scsi, file-posix: add support for persistent reservation managementPaolo Bonzini
It is a common requirement for virtual machine to send persistent reservations, but this currently requires either running QEMU with CAP_SYS_RAWIO, or using out-of-tree patches that let an unprivileged QEMU bypass Linux's filter on SG_IO commands. As an alternative mechanism, the next patches will introduce a privileged helper to run persistent reservation commands without expanding QEMU's attack surface unnecessarily. The helper is invoked through a "pr-manager" QOM object, to which file-posix.c passes SG_IO requests for PERSISTENT RESERVE OUT and PERSISTENT RESERVE IN commands. For example: $ qemu-system-x86_64 -device virtio-scsi \ -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock -drive if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0 -device scsi-block,drive=hd or: $ qemu-system-x86_64 -device virtio-scsi \ -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock -blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0 -device scsi-block,drive=hd Multiple pr-manager implementations are conceivable and possible, though only one is implemented right now. For example, a pr-manager could: - talk directly to the multipath daemon from a privileged QEMU (i.e. QEMU links to libmpathpersist); this makes reservation work properly with multipath, but still requires CAP_SYS_RAWIO - use the Linux IOC_PR_* ioctls (they require CAP_SYS_ADMIN though) - more interestingly, implement reservations directly in QEMU through file system locks or a shared database (e.g. sqlite) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22memory: Share special empty FlatViewAlexey Kardashevskiy
This shares an cached empty FlatView among address spaces. The empty FV is used every time when a root MR renders into a FV without memory sections which happens when MR or its children are not enabled or zero-sized. The empty_view is not NULL to keep the rest of memory API intact; it also has a dispatch tree for the same reason. On POWER8 with 255 CPUs, 255 virtio-net, 40 PCI bridges guest this halves the amount of FlatView's in use (557 -> 260) and dispatch tables (~800000 -> ~370000). In an unrelated experiment with 112 non-virtio devices on x86 ("-M pc"), only 4 FlatViews are alive, and about ~2000 are created at startup. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-16-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22memory: seek FlatView sharing candidates among children subregionsPaolo Bonzini
A container can be used instead of an alias to allow switching between multiple subregions. In this case we cannot directly share the subregions (since they only belong to a single parent), but if the subregions are aliases we can in turn walk those. This is not enough to remove all source of quadratic FlatView creation, but it enables sharing of the PCI bus master FlatViews (and their AddressSpaceDispatch structures) across all PCI devices. For 112 virtio-net-pci devices, boot time is reduced from 25 to 10 seconds and memory consumption from 1.4 to 1 G. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22memory: trace FlatView creation and destructionPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22memory: Create FlatView directlyAlexey Kardashevskiy
This avoids usual memory_region_transaction_commit() which rebuilds all FVs. On POWER8 with 255 CPUs, 255 virtio-net, 40 PCI bridges guest this brings down the boot time from 25s to 20s and reduces the amount of temporary FVs allocated during machine constructon (~800000 -> ~640000) and amount of temporary dispatch trees (~370000 -> ~300000), the total memory footprint goes down (18G -> 17G). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-18-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-22memory: Get rid of address_space_init_shareableAlexey Kardashevskiy
Since FlatViews are shared now and ASes not, this gets rid of address_space_init_shareable(). This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-17-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21memory: Rework "info mtree" to print flat views and dispatch treesAlexey Kardashevskiy
This adds a new "-d" switch to "info mtree" to print dispatch tree internals. This changes the way "-f" is handled - it prints now flat views and associated address spaces. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-15-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21memory: Do not allocate FlatView in address_space_initAlexey Kardashevskiy
This creates a new AS object without any FlatView as memory_region_transaction_commit() may want to reuse the empty FV. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-14-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21memory: Share FlatView's and dispatch trees between address spacesAlexey Kardashevskiy
This allows sharing flat views between address spaces (AS) when the same root memory region is used when creating a new address space. This is done by walking through all ASes and caching one FlatView per a physical root MR (i.e. not aliased). This removes search for duplicates from address_space_init_shareable() as FlatViews are shared elsewhere and keeping as::ref_count correct seems an unnecessary and useless complication. This should cause no change and memory use or boot time yet. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-13-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21memory: Move address_space_update_ioeventfdsAlexey Kardashevskiy
So it is called (twice) from the same function. This is to make the next patches a bit simpler. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-12-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21memory: Alloc dispatch tree where topology is generaredAlexey Kardashevskiy
This is to make next patches simpler. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-11-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21memory: Store physical root MR in FlatViewAlexey Kardashevskiy
Address spaces get to keep a root MR (alias or not) but FlatView stores the actual MR as this is going to be used later on to decide whether to share a particular FlatView or not. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-10-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21memory: Rename mem_begin/mem_commit/mem_add helpersAlexey Kardashevskiy
This renames some helpers to reflect better what they do. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-9-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21memory: Cleanup after switching to FlatViewAlexey Kardashevskiy
We store AddressSpaceDispatch* in FlatView anyway so there is no need to carry it from mem_add() to register_subpage/register_multipage. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-8-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21memory: Switch memory from using AddressSpace to FlatViewAlexey Kardashevskiy
FlatView's will be shared between AddressSpace's and subpage_t and MemoryRegionSection cannot store AS anymore, hence this change. In particular, for: typedef struct subpage_t { MemoryRegion iomem; - AddressSpace *as; + FlatView *fv; hwaddr base; uint16_t sub_section[]; } subpage_t; struct MemoryRegionSection { MemoryRegion *mr; - AddressSpace *address_space; + FlatView *fv; hwaddr offset_within_region; Int128 size; hwaddr offset_within_address_space; bool readonly; }; This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-7-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21memory: Remove AddressSpace pointer from AddressSpaceDispatchAlexey Kardashevskiy
AS in ASD is only used to pass AS from mem_begin() to register_subpage() to store it in MemoryRegionSection, we can do this directly now. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-6-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21memory: Move AddressSpaceDispatch from AddressSpace to FlatViewAlexey Kardashevskiy
As we are going to share FlatView's between AddressSpace's, and AddressSpaceDispatch is a structure to perform quick lookup in FlatView, this moves ASD to FlatView. After previosly open coded ASD rendering, we can also remove as->next_dispatch as the new FlatView pointer is stored on a stack and set to an AS atomically. flatview_destroy() is executed under RCU instead of address_space_dispatch_free() now. This makes mem_begin/mem_commit to work with ASD and mem_add with FV as later on mem_add will be taking FV as an argument anyway. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-5-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21memory: Move FlatView allocation to a helperAlexey Kardashevskiy
This moves a FlatView allocation and initialization to a helper. While we are nere, replace g_new with g_new0 to not to bother if we add new fields in the future. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-4-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21memory: Open code FlatView renderingAlexey Kardashevskiy
We are going to share FlatView's between AddressSpace's and per-AS memory listeners won't suit the purpose anymore so open code the dispatch tree rendering. Since there is a good chance that dispatch_listener was the only listener, this avoids address_space_update_topology_pass() if there is no registered listeners; this should improve starting time. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-3-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21exec: Explicitly export target AS from address_space_translate_internalAlexey Kardashevskiy
This adds an AS** parameter to address_space_do_translate() to make it easier for the next patch to share FlatViews. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170921085110.25598-2-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21memory: avoid "resurrection" of dead FlatViewsPaolo Bonzini
It's possible for address_space_get_flatview() as it currently stands to cause a use-after-free for the returned FlatView, if the reference count is incremented after the FlatView has been replaced by a writer: thread 1 thread 2 RCU thread ------------------------------------------------------------- rcu_read_lock read as->current_map set as->current_map flatview_unref '--> call_rcu flatview_ref [ref=1] rcu_read_unlock flatview_destroy <badness> Since FlatViews are not updated very often, we can just detect the situation using a new atomic op atomic_fetch_inc_nonzero, similar to Linux's atomic_inc_not_zero, which performs the refcount increment only if it hasn't already hit zero. This is similar to Linux commit de09a9771a53 ("CRED: Fix get_task_cred() and task_state() to not resurrect dead credentials", 2010-07-29). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21atomic: update documentationPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21memory: avoid a name clash with access macroKONRAD Frederic
This avoids a name clash with the access macro on windows 64: make CHK version_gen.h CC aarch64-softmmu/memory.o /home/konrad/qemu/memory.c: In function 'access_with_adjusted_size': /home/konrad/qemu/memory.c:591:73: error: macro "access" passed 7 arguments, \ but takes just 2 (size - access_size - i) * 8, access_mask, attrs); ^ Signed-off-by: KONRAD Frederic <frederic.konrad@adacore.com> Message-Id: <1505988260-8483-1-git-send-email-frederic.konrad@adacore.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21kvm: drop wrong assertion creating problems with pflashDavid Hildenbrand
pflash toggles mr->romd_mode. So this assert does not always hold. 1) a device was added with !mr->romd_mode, therefore effectively not creating a kvm slot as we want to trap every access (add = false). 2) mr->romd_mode was toggled on before remove it. There is now actually no slot to remove and the assert is wrong. So let's just drop the assert. Reported-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170920145025.19403-1-david@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-21virtio-serial: add enable_backend callbackPavel Butsykin
We should guarantee that RAM will not be modified while VM has a stopped state, otherwise it can lead to negative consequences during post-copy migration. In RUN_STATE_FINISH_MIGRATE step, it's expected that RAM on source side will not be modified as this could lead to non-consistent vm state on the destination side. Also RAM access during postcopy-ram migration with enabled release-ram capability can lead to sad consequences. Let's add enable_backend() callback to avoid undesirable virtioqueue changes in the guest memory. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Message-Id: <20170919120733.22020-1-pbutsykin@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-20Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into stagingPeter Maydell
These patches fix regressions in 2.10 # gpg: Signature made Wed 20 Sep 2017 07:51:07 BST # gpg: using DSA key 0x02FC3AEB0101DBC2 # gpg: Good signature from "Greg Kurz <groug@kaod.org>" # gpg: aka "Greg Kurz <groug@free.fr>" # gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>" # gpg: aka "Gregory Kurz (Groug) <groug@free.fr>" # gpg: aka "[jpeg image of size 3330]" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2 * remotes/gkurz/tags/for-upstream: 9pfs: check the size of transport buffer before marshaling 9pfs: fix name_to_path assertion in v9fs_complete_rename() 9pfs: fix readdir() for 9p2000.u Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-20Merge remote-tracking branch ↵Peter Maydell
'remotes/ehabkost/tags/machine-next-pull-request' into staging Machine/CPU/NUMA queue, 2017-09-19 # gpg: Signature made Tue 19 Sep 2017 21:17:01 BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/machine-next-pull-request: MAINTAINERS: Update git URLs for my trees hw/acpi-build: Fix SRAT memory building in case of node 0 without RAM NUMA: Replace MAX_NODES with nb_numa_nodes in for loop numa: cpu: calculate/set default node-ids after all -numa CLI options are parsed arm: drop intermediate cpu_model -> cpu type parsing and use cpu type directly pc: use generic cpu_model parsing vl.c: convert cpu_model to cpu type and set of global properties before machine_init() cpu: make cpu_generic_init() abort QEMU on error qom: cpus: split cpu_generic_init() on feature parsing and cpu creation parts hostmem-file: Add "discard-data" option osdep: Define QEMU_MADV_REMOVE vl: Clean up user-creatable objects when exiting Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-209pfs: check the size of transport buffer before marshalingJan Dakinevich
v9fs_do_readdir_with_stat() should check for a maximum buffer size before an attempt to marshal gathered data. Otherwise, buffers assumed as misconfigured and the transport would be broken. The patch brings v9fs_do_readdir_with_stat() in conformity with v9fs_do_readdir() behavior. Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> [groug, regression caused my commit 8d37de41cab1 # 2.10] Signed-off-by: Greg Kurz <groug@kaod.org>
2017-09-209pfs: fix name_to_path assertion in v9fs_complete_rename()Jan Dakinevich
The third parameter of v9fs_co_name_to_path() must not contain `/' character. The issue is most likely related to 9p2000.u protocol only. Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> [groug, regression caused by commit f57f5878578a # 2.10] Signed-off-by: Greg Kurz <groug@kaod.org>
2017-09-209pfs: fix readdir() for 9p2000.uJan Dakinevich
If the client is using 9p2000.u, the following occurs: $ cd ${virtfs_shared_dir} $ mkdir -p a/b/c $ ls a/b ls: cannot access 'a/b/a': No such file or directory ls: cannot access 'a/b/b': No such file or directory a b c instead of the expected: $ ls a/b c This is a regression introduced by commit f57f5878578a; local_name_to_path() now resolves ".." and "." in paths, and v9fs_do_readdir_with_stat()->stat_to_v9stat() then copies the basename of the resulting path to the response. With the example above, this means that "." and ".." are turned into "b" and "a" respectively... stat_to_v9stat() currently assumes it is passed a full canonicalized path and uses it to do two different things: 1) to pass it to v9fs_co_readlink() in case the file is a symbolic link 2) to set the name field of the V9fsStat structure to the basename part of the given path It only has two users: v9fs_stat() and v9fs_do_readdir_with_stat(). v9fs_stat() really needs 1) and 2) to be performed since it starts with the full canonicalized path stored in the fid. It is different for v9fs_do_readdir_with_stat() though because the name we want to put into the V9fsStat structure is the d_name field of the dirent actually (ie, we want to keep the "." and ".." special names). So, we only need 1) in this case. This patch hence adds a basename argument to stat_to_v9stat(), to be used to set the name field of the V9fsStat structure, and moves the basename logic to v9fs_stat(). Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com> (groug, renamed old name argument to path and updated changelog) Signed-off-by: Greg Kurz <groug@kaod.org>
2017-09-19MAINTAINERS: Update git URLs for my treesEduardo Habkost
List the branches where I queue patches for Machine Core, NUMA, Memory Backends, and X86. Update the NUMA section to list the "machine-next" branch instead of "numa". Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170901153928.17058-1-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-19hw/acpi-build: Fix SRAT memory building in case of node 0 without RAMEduardo Habkost
Currently, Using the fisrt node without memory on the machine makes QEMU unhappy. With this example command line: ... \ -m 1024M,slots=4,maxmem=32G \ -numa node,nodeid=0 \ -numa node,mem=1024M,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ Guest reports "No NUMA configuration found" and the NUMA topology is wrong. This is because when QEMU builds ACPI SRAT, it regards node 0 as the default node to deal with the memory hole(640K-1M). this means the node0 must have some memory(>1M), but, actually it can have no memory. Fix this problem by cut out the 640K hole in the same way the PCI 4G hole does. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Message-Id: <1504231805-30957-2-git-send-email-douly.fnst@cn.fujitsu.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-19NUMA: Replace MAX_NODES with nb_numa_nodes in for loopDou Liyang
In QEMU, the number of the NUMA nodes is determined by parse_numa_opts(). Then, QEMU uses it for iteration, for example: for (i = 0; i < nb_numa_nodes; i++) However, in memory_region_allocate_system_memory(), it uses MAX_NODES not nb_numa_nodes. So, replace MAX_NODES with nb_numa_nodes to keep code consistency and reduce the loop times. Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Message-Id: <1503387936-3483-1-git-send-email-douly.fnst@cn.fujitsu.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-19numa: cpu: calculate/set default node-ids after all -numa CLI options are parsedIgor Mammedov
Calculating default node-ids for CPUs in possible_cpu_arch_ids() is rather fragile since defaults calculation uses nb_numa_nodes but callback might be potentially called early before all -numa CLI options are parsed, which would lead to cpus assigned only upto nb_numa_nodes at the time possible_cpu_arch_ids() is called. Issue was introduced by (7c88e65 numa: mirror cpu to node mapping in MachineState::possible_cpus) and for example CLI: -smp 4 -numa node,cpus=0 -numa node would set props.node-id in possible_cpus array for every non explicitly mapped CPU to the first node. Issue is not visible to guest nor to mgmt interface due to 1) implictly mapped cpus are forced to the first node in case of partial mapping 2) in case of default mapping possible_cpu_arch_ids() is called after all -numa options are parsed (resulting in correct mapping). However it's fragile to rely on late execution of possible_cpu_arch_ids(), therefore add machine specific callback that returns node-id for CPU and use it to calculate/ set defaults at machine_numa_finish_init() time when all -numa options are parsed. Reported-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496314408-163972-1-git-send-email-imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-09-19Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20170919-v2' into ↵Peter Maydell
staging Assorted s390x patches: - introduce virtio-gpu-ccw, with virtio-gpu endian fixes - lots of cleanup in the s390x code - make device_add work for s390x cpus - enable seccomp on s390x - an ivshmem endian fix - set the reserved DHCP client architecture id for netboot - fixes in the css and pci support # gpg: Signature made Tue 19 Sep 2017 17:39:45 BST # gpg: using RSA key 0xDECF6B93C6F02FAF # gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>" # gpg: aka "Cornelia Huck <huckc@linux.vnet.ibm.com>" # gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>" # gpg: aka "Cornelia Huck <cohuck@kernel.org>" # gpg: aka "Cornelia Huck <cohuck@redhat.com>" # Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF * remotes/cohuck/tags/s390x-20170919-v2: (38 commits) MAINTAINERS/s390x: add terminal3270.c virtio-ccw: Create a virtio gpu device for the ccw bus virtio-gpu: Handle endian conversion s390x/ccw: create s390 phb for compat reasons as well configure: Allow --enable-seccomp on s390x, too virtio-ccw: remove stale comments on endianness s390x: allow CPU hotplug in random core-id order s390x: generate sclp cpu information from possible_cpus s390x: get rid of cpu_s390x_create() s390x: get rid of cpu_states and use possible_cpus instead s390x: implement query-hotpluggable-cpus s390x: CPU hot unplug via device_del cannot work for now s390x: allow cpu hotplug via device_add s390x: print CPU definitions in sorted order target/s390x: rename next_cpu_id to next_core_id target/s390x: use "core-id" for cpu number/address/id handling target/s390x: set cpu->id for linux user when realizing s390x: allow only 1 CPU with TCG target/s390x: use program_interrupt() in per_check_exception() target/s390x: use trigger_pgm_exception() in s390_cpu_handle_mmu_fault() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-09-19MAINTAINERS/s390x: add terminal3270.cChristian Borntraeger
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com> Message-Id: <20170918130455.144262-1-borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-09-19virtio-ccw: Create a virtio gpu device for the ccw busFarhan Ali
Wire up the virtio-gpu device for the CCW bus. The virtio-gpu is a virtio-1 device, so disable revision 0. Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <6c53f939cf2d64b66d2a6878b29c9bf3820f3d5b.1505485574.git.alifm@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-09-19virtio-gpu: Handle endian conversionFarhan Ali
Virtio GPU code currently only supports litte endian format, and so using the Virtio GPU device on a big endian machine does not work. Let's fix it by supporting the correct host cpu byte order. Signed-off-by: Farhan Ali <alifm@linux.vnet.ibm.com> Message-Id: <dc748e15f36db808f90b4f2393bc29ba7556a9f6.1505485574.git.alifm@linux.vnet.ibm.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-09-19s390x/ccw: create s390 phb for compat reasons as wellCornelia Huck
d32bd032d8 ("s390x/ccw: create s390 phb conditionally") made registering the s390 pci host bridge conditional on presense of the zpci facility bit. Sadly, that breaks migration from machines that did not use the cpu model (2.7 and previous). Create the s390 phb for pre-cpu model machines as well: We can tweak s390_has_feat() to always indicate the zpci facility bit when no cpu model is available (on 2.7 and previous compat machines). Fixes: d32bd032d8 ("s390x/ccw: create s390 phb conditionally") Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-09-19configure: Allow --enable-seccomp on s390x, tooThomas Huth
libseccomp supports s390x since version 2.3.0, and I was able to start a VM with "-sandbox on" without any obvious problems by using this patch, so it should be safe to allow --enable-seccomp on s390x nowadays, too. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <1505385363-27717-1-git-send-email-thuth@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Eduardo Otubo <otubo@redhat.com> Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-09-19virtio-ccw: remove stale comments on endiannessHalil Pasic
We have two stale comments suggesting one should think about virtio config space endianness a bit longer. We have just done that, and came to the conclusion we are fine as is: it's the responsibility of the virtio device and not of the transport (and that is how it works now). Putting the responsibility into the transport isn't even possible, because the transport would have to know about the config space layout of each device. Let us remove the stale comments. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Suggested-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20170914105535.47941-1-pasic@linux.vnet.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-09-19s390x: allow CPU hotplug in random core-id orderDavid Hildenbrand
SCLP correctly indicates the core-id aka. CPU address for each available CPU. As the core-id corresponds to cpu_index, also a newly created kvm vcpu gets assigned this core-id as vcpu id. So SIGP in the kernel works correctly (it uses the vcpu id to lookup the correct CPU). So there should be nothing hindering us from hotplugging CPUs in random core-id order. This now makes sure that the output from "query-hotpluggable-cpus" is completely true. Until now, a specific order is implicit. Performance vice, hotplugging CPUs in non-sequential order might not be the best thing to do, as VCPU lookup inside KVM might be a little slower. But that doesn't hinder us from supporting it. next_core_id is now used by linux user only. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170913132417.24384-23-david@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-09-19s390x: generate sclp cpu information from possible_cpusDavid Hildenbrand
This is the first step to allow hot plugging of CPUs in a non-sequential order. If a cpu is available ("plugged") can directly be decided by looking at the cpu state pointer. This makes sure, that really only cpus attached to the machine are reported. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170913132417.24384-22-david@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-09-19s390x: get rid of cpu_s390x_create()David Hildenbrand
Now that there is only one user of cpu_s390x_create() left, make cpu creation look like on x86. - Perform the model/properties split and checks in s390_init_cpus() - Parse features only once without having to remember if already parsed - Pass only the typename to s390x_new_cpu() - Use the typename of an existing CPU for hotplug via cpu-add Acked-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170913132417.24384-21-david@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-09-19s390x: get rid of cpu_states and use possible_cpus insteadDavid Hildenbrand
Now that we have possible_cpus, we can get rid of the global variable and rewrite s390_cpu_addr2state() to use it. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170913132417.24384-20-david@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>