aboutsummaryrefslogtreecommitdiff
path: root/util/oslib-posix.c
AgeCommit message (Collapse)Author
2015-12-02util/mmap-alloc: fix hugetlb support on ppc64Michael S. Tsirkin
Since commit 8561c9244ddf1122d "exec: allocate PROT_NONE pages on top of RAM", it is no longer possible to back guest RAM with hugepages on ppc64 hosts: mmap(NULL, 285212672, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3fff57000000 mmap(0x3fff57000000, 268435456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 19, 0) = -1 EBUSY (Device or resource busy) This is because on ppc64, Linux fixes a page size for a virtual address at mmap time, so we can't switch a range of memory from anonymous small pages to hugetlbs with MAP_FIXED. See commit d0f13e3c20b6fb73ccb467bdca97fa7cf5a574cd ("[POWERPC] Introduce address space "slices"") in Linux history for the details. Detect this and create the PROT_NONE mapping using the same fd. Naturally, this makes the guard page bigger with hugetlbfs. Based on patch by Greg Kurz. Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-10-22Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell
vhost, pc, virtio features, fixes, cleanups New features: VT-d support for devices behind a bridge vhost-user migration support Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Thu 22 Oct 2015 12:39:19 BST using RSA key ID D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" * remotes/mst/tags/for_upstream: (37 commits) hw/isa/lpc_ich9: inject the SMI on the VCPU that is writing to APM_CNT i386: keep cpu_model field in MachineState uptodate vhost: set the correct queue index in case of migration with multiqueue piix: fix resource leak reported by Coverity seccomp: add memfd_create to whitelist vhost-user-test: check ownership during migration vhost-user-test: add live-migration test vhost-user-test: learn to tweak various qemu arguments vhost-user-test: wrap server in TestServer struct vhost-user-test: remove useless static check vhost-user-test: move wait_for_fds() out vhost: add migration block if memfd failed vhost-user: use an enum helper for features mask vhost user: add rarp sending after live migration for legacy guest vhost user: add support of live migration net: add trace_vhost_user_event vhost-user: document migration log vhost: use a function for each call vhost-user: add a migration blocker vhost-user: send log shm fd along with log_base ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-10-21exec: factor out duplicate mmap codeMichael S. Tsirkin
Anonymous and file-backed RAM allocation are now almost exactly the same. Reduce code duplication by moving RAM mmap code out of oslib-posix.c and exec.c. Reported-by: Marc-André Lureau <mlureau@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-10-20osdep: add qemu_fork() wrapper for safely handling signalsDaniel P. Berrange
When using regular fork() the child process of course inherits all the parents' signal handlers. If the child then proceeds to close() any open file descriptors, it may break some of those registered signal handlers. The child generally does not want to ever run any of the signal handlers that the parent may have installed in the short time before it exec's. The parent may also have blocked various signals which the child process will want enabled. This introduces a wrapper qemu_fork() that takes care to sanitize signal handling across fork. Before forking it blocks all signals in the parent thread. After fork returns, the parent unblocks the signals and carries on as usual. The child, however, resets all the signal handlers back to their defaults before it unblocks signals. The child process can now exec the binary in a "clean" signal environment. Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
2015-10-01oslib: allocate PROT_NONE pages on top of RAMMichael S. Tsirkin
This inserts a read and write protected page between RAM and QEMU memory. This makes it harder to exploit QEMU bugs resulting from buffer overflows in devices using variants of cpu_physical_memory_map, dma_memory_map etc. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01oslib: rework anonimous RAM allocationMichael S. Tsirkin
At the moment we first allocate RAM, sometimes more than necessary for alignment reasons. We then free the extra RAM. Rework this to avoid the temporary allocation: reserve the range by mapping it with PROT_NONE, then use just the necessary range with MAP_FIXED. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2015-05-22util: allow \n to terminate password inputDaniel P. Berrange
The qemu_read_password() method looks for \r to terminate the reading of the a password. This is what will be seen when reading the password from a TTY. When scripting though, it is useful to be able to send the password via a pipe, in which case we must look for \n to terminate password input. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-05-22util: move read_password method out of qemu-img into osdep/oslibDaniel P. Berrange
The qemu-img.c file has a read_password() method impl that is used to prompt for passwords on the console, with impls for POSIX and Windows. This will be needed by qemu-io.c too, so move it into the QEMU osdep/oslib files where it can be shared without code duplication Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10oslib-posix: Fix compiler warning (-Wclobbered) and simplify the codeStefan Weil
gcc reports this warning with -Wclobbered: util/oslib-posix.c: In function ‘os_mem_prealloc’: util/oslib-posix.c:374:49: error: argument ‘memory’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered] Fix this and simplify the code by using an existing macro. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-11-23memory: expose alignment used for allocating RAM as MemoryRegion APIIgor Mammedov
introduce memory_region_get_alignment() that returns underlying memory block alignment or 0 if it's not relevant/implemented for backend. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-11-02util: Improve os_mem_prealloc error messageMichal Privoznik
Currently, when the preallocating guest memory process fails, a not so helpful error message is printed out: # virsh start migt10 error: Failed to start domain migt10 error: internal error: process exited while connecting to monitor: os_mem_prealloc: failed to preallocate pages From the error message it's not clear at the first glance where the problem lies. However, changing the error message might give users a clue. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-08-15block: Introduce qemu_try_blockalign()Kevin Wolf
This function returns NULL instead of aborting when an allocation fails. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-07-10oslib-posix: Fix new compiler error with -WclobberedStefan Weil
Newer versions of gcc report a warning (or an error with -Werror) when compiler option -Wclobbered (or -Wextra) is active: util/oslib-posix.c:372:12: error: variable ‘hpagesize’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered] The rewritten code fixes this warning: variable 'hpagesize' is now set and used in a block without any call of sigsetjmp or similar functions. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-19memory: move preallocation code out of exec.cPaolo Bonzini
So that backends can use it. Since we need the page size for efficiency, move code to compute it out of translate-all.c and into util/oslib-win32.c. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-03-13oslib-posix: Fix build on FreeBSDAndreas Färber
Commit 10f5bff622cad71645e22c027b77ac31e51008ef (util: Split out exec_dir from os_find_datadir) moved code from os-posix.c to util/oslib-posix.c but forgot to move a FreeBSD #include alongside, needed for CTL_KERN among others. Cc: Fam Zheng <famz@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andreas Färber <andreas.faerber@web.de> Message-id: 1394717279-23406-1-git-send-email-andreas.faerber@web.de Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-02-20util: Split out exec_dir from os_find_datadirFam Zheng
With this change, main() calls qemu_init_exec_dir and uses argv[0] to init exec_dir. The saved value can be retrieved with qemu_get_exec_dir later. It will be reused by module loading. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-24qemu_memalign: Allow small alignmentsKevin Wolf
The functions used by qemu_memalign() require an alignment that is at least sizeof(void*). Adjust it if it is too small. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoît Canet <benoit@irqsave.net>
2014-01-22osdep: add qemu_set_tty_echo()Stefan Hajnoczi
Using stdin with readline.c requires disabling echo and line buffering. Add a portable wrapper to set the terminal attributes under Linux and Windows. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-02util: add socket_set_fast_reuse function which will replace setting SO_REUSEADDRSebastian Ottlik
If a socket is closed it remains in TIME_WAIT state for some time. On operating systems using BSD sockets the endpoint of the socket may not be reused while in this state unless SO_REUSEADDR was set on the socket. On windows on the other hand the default behaviour is to allow reuse (i.e. identical to SO_REUSEADDR on other operating systems) and setting SO_REUSEADDR on a socket allows it to be bound to a endpoint even if the endpoint is already used by another socket independently of the other sockets state. This can even result in undefined behaviour. Many sockets used by QEMU should not block the use of their endpoint after being closed while they are still in TIME_WAIT state. Currently QEMU sets SO_REUSEADDR for such sockets, which can lead to problems on Windows. This patch introduces the function socket_set_fast_reuse that should be used instead of setting SO_REUSEADDR when fast socket reuse is desired and behaves correctly on all operating systems. As a failure of this function can only be caused by bad QEMU internal errors, an assertion handles these situations. The return value is still passed on, to minimize changes in client code and prevent unused variable warnings if NDEBUG is defined. Signed-off-by: Sebastian Ottlik <ottlik@fzi.de> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Weil <sw@weilnetz.de>
2013-09-12exec: Don't abort when we can't allocate guest memoryMarkus Armbruster
We abort() on memory allocation failure. abort() is appropriate for programming errors. Maybe most memory allocation failures are programming errors, maybe not. But guest memory allocation failure isn't, and aborting when the user asks for more memory than we can provide is not nice. exit(1) instead, and do it in just one place, so the error message is consistent. Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Andreas Färber <afaerber@suse.de> Acked-by: Laszlo Ersek <lersek@redhat.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-id: 1375276272-15988-8-git-send-email-armbru@redhat.com Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
2013-05-30osdep: add qemu_get_local_state_pathname()Laszlo Ersek
This function returns ${prefix}/var/RELATIVE_PATHNAME on POSIX-y systems, and <CSIDL_COMMON_APPDATA>/RELATIVE_PATHNAME on Win32. http://msdn.microsoft.com/en-us/library/bb762494.aspx [...] This folder is used for application data that is not user specific. For example, an application can store a spell-check dictionary, a database of clip art, or a log file in the CSIDL_COMMON_APPDATA folder. [...] Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2013-05-14osdep: introduce qemu_anon_ram_free to free qemu_anon_ram_alloc-ed memoryPaolo Bonzini
We switched from qemu_memalign to mmap() but then we don't modify qemu_vfree() to do a munmap() over free(). Which we cannot do because qemu_vfree() frees memory allocated by qemu_{mem,block}align. Introduce a new function that does the munmap(), luckily the size is available in the RAMBlock. Reported-by: Amos Kong <akong@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Amos Kong <akong@redhat.com> Message-id: 1368454796-14989-3-git-send-email-pbonzini@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-05-14osdep, kvm: rename low-level RAM allocation functionsPaolo Bonzini
This is preparatory to the introduction of a separate freeing API. Reported-by: Amos Kong <akong@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Amos Kong <akong@redhat.com> Message-id: 1368454796-14989-2-git-send-email-pbonzini@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-04-16migration: initialize RAM to zeroPaolo Bonzini
Using qemu_memalign only leaves the RAM zero by chance, because libc will usually use mmap to satisfy our huge requests. But memory will not be zero when using MALLOC_PERTURB_ with a nonzero value. In the case of incoming migration, this breaks a recently-introduced invariant (commit f1c7279, migration: do not sent zero pages in bulk stage, 2013-03-26). To fix this, use mmap ourselves to get a well-aligned, always zero block for the RAM. Mmap-ed memory is easy to "trim" at the sides. This also removes the need to do something special on valgrind (see commit c2a8238a, Support running QEMU on Valgrind, 2011-10-31), thus effectively reverts that patch. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-id: 1365522223-20153-1-git-send-email-pbonzini@redhat.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-04-02oslib-posix: rename socket_set_nonblock() to qemu_set_nonblock()Stefan Hajnoczi
The fcntl(fd, F_SETFL, O_NONBLOCK) flag is not specific to sockets. Rename to qemu_set_nonblock() just like qemu_set_cloexec(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2013-03-05oslib-posix: Align to permit transparent hugepages on ARM LinuxPeter Maydell
ARM Linux (like x86-64 Linux) can use transparent hugepages for KVM if memory blocks are 2MiB aligned; set QEMU_VMALLOC_ALIGN accordingly. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2013-01-26bsd-user: avoid conflict with qemu_vmallocBlue Swirl
Rename qemu_vmalloc() to bsd_vmalloc(), adjust the only user. Remove #ifdeffery in oslib-posix.c. Tested-by: Andreas Färber <andreas.faerber@web.de> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-12build: move libqemuutil.a components to util/Paolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>