aboutsummaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2021-06-04Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into ↵Peter Maydell
staging # gpg: Signature made Fri 04 Jun 2021 08:26:16 BST # gpg: using RSA key EF04965B398D6211 # gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [marginal] # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211 * remotes/jasowang/tags/net-pull-request: MAINTAINERS: Added eBPF maintainers information. docs: Added eBPF documentation. virtio-net: Added eBPF RSS to virtio-net. ebpf: Added eBPF RSS loader. ebpf: Added eBPF RSS program. net: Added SetSteeringEBPF method for NetClientState. net/tap: Added TUNSETSTEERINGEBPF code. Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-04ebpf: Added eBPF RSS program.Andrew Melnychenko
RSS program and Makefile to build it. The bpftool used to generate '.h' file. The data in that file may be loaded by libbpf. EBPF compilation is not required for building qemu. You can use Makefile if you need to regenerate rss.bpf.skeleton.h. Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2021-05-26tools/virtiofsd/fuse_opt.c: Replaced a malloc with GLib's g_try_mallocMahmoud Mandour
Replaced a malloc() call and its respective free() with GLib's g_try_malloc() and g_free() calls. Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Message-Id: <20210314032324.45142-8-ma.mandourr@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-26tools/virtiofsd/buffer.c: replaced a calloc call with GLib's g_try_new0Mahmoud Mandour
Replaced a call to calloc() and its respective free() call with GLib's g_try_new0() and g_free() calls. Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Message-Id: <20210314032324.45142-7-ma.mandourr@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-26virtiofsd: Set req->reply_sent right after sending replyVivek Goyal
There is no reason to set it in label "err". We should be able to set it right after sending reply. It is easier to read. Also got rid of label "err" because now only thing it was doing was return a code. We can return from the error location itself and no need to first jump to label "err". Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210518213538.693422-8-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-26virtiofsd: Check EOF before short readVivek Goyal
In virtio_send_data_iov() we are checking first for short read and then EOF condition. Change the order. Basically check for error and EOF first and last remaining piece is short ready which will lead to retry automatically at the end of while loop. Just that it is little simpler to read to the code. There is no need to call "continue" and also one less call of "len-=ret". Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210518213538.693422-7-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-26virtiofsd: Simplify skip byte logicVivek Goyal
We need to skip bytes in two cases. a. Before we start reading into in_sg, we need to skip iov_len bytes in the beginning which typically will have fuse_out_header. b. If preadv() does a short read, then we need to retry preadv() with remainig bytes and skip the bytes preadv() read in short read. For case a, there is no reason that skipping logic be inside the while loop. Move it outside. And only retain logic "b" inside while loop. Also get rid of variable "skip_size". Looks like we can do without it. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210518213538.693422-6-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-26virtiofsd: get rid of in_sg_left variableVivek Goyal
in_sg_left seems to be being used primarly for debugging purpose. It is keeping track of how many bytes are left in the scatter list we are reading into. We already have another variable "len" which keeps track how many bytes are left to be read. And in_sg_left is greater than or equal to len. We have already ensured that in the beginning of function. if (in_len < tosend_len) { fuse_log(FUSE_LOG_ERR, "%s: elem %d too small for data len %zd\n", __func__, elem->index, tosend_len); ret = E2BIG; goto err; } So in_sg_left seems like a redundant variable. It probably was useful for debugging when code was being developed. Get rid of it. It helps simplify this function. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210518213538.693422-5-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-26virtiofsd: Use iov_discard_front() to skip bytesVivek Goyal
There are places where we need to skip few bytes from front of the iovec array. We have our own custom code for that. Looks like iov_discard_front() can do same thing. So use that helper instead. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210518213538.693422-4-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-26virtiofsd: Get rid of unreachable code in readVivek Goyal
pvreadv() can return following. - error - 0 in case of EOF - short read We seem to handle all the cases already. We are retrying read in case of short read. So another check for short read seems like dead code. Get rid of it. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210518213538.693422-3-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-26virtiofsd: Check for EINTR in preadv() and retryVivek Goyal
We don't seem to check for EINTR and retry. There are other places in code where we check for EINTR. So lets add a check. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210518213538.693422-2-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-13virtiofsd: Fix check of chown()'s return valueGreg Kurz
Otherwise you always get this warning when using --socket-group=users vhost socket failed to set group to users (100) While here, print out the error if chown() fails. Fixes: f6698f2b03b0 ("tools/virtiofsd: add support for --socket-group") Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <162040394890.714971.15502455176528384778.stgit@bahia.lan> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-05-06virtiofsd/fuse_virtio.c: Changed allocations of locals to GLibMahmoud Mandour
Replaced the allocation of local variables from malloc() to GLib allocation functions. In one instance, dropped the usage to an assert after a malloc() call and used g_malloc() instead. Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210420154643.58439-8-ma.mandourr@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-06virtiofsd/passthrough_ll.c: Changed local allocations to GLib functionsMahmoud Mandour
Changed the allocations of some local variables to GLib's allocation functions, such as g_try_malloc0(), and annotated those variables as g_autofree. Subsequently, I was able to remove the calls to free(). Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210420154643.58439-7-ma.mandourr@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-06virtiofsd: Changed allocations of fv_VuDev & its internals to GLib functionsMahmoud Mandour
Changed the allocations of fv_VuDev structs, VuDev structs, and fv_QueueInfo strcuts from using calloc()/realloc() & free() to using the equivalent functions from GLib. In instances, removed the pair of allocation and assertion for non-NULL checking with a GLib function that aborts on error. Removed NULL-checking for fv_VuDev struct allocation and used a GLib function that crashes on error; namely, g_new0(). This is because allocating one struct should not be a problem on an healthy system. Also following the pattern of aborting-on-null behaviour that is taken with allocating VuDev structs and fv_QueueInfo structs. Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210420154643.58439-6-ma.mandourr@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-06virtiofsd: Changed allocation of lo_map_elems to GLib's functionsMahmoud Mandour
Replaced (re)allocation of lo_map_elem structs from realloc() to GLib's g_try_realloc_n() and replaced the respective free() call with a g_free(). Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210420154643.58439-5-ma.mandourr@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-06virtiofsd: Changed allocations of fuse_session to GLib's functionsMahmoud Mandour
Replaced the allocation and deallocation of fuse_session structs from calloc() and free() calls to g_try_new0() and g_free(). Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210420154643.58439-4-ma.mandourr@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-06virtiofsd: Changed allocations of iovec to GLib's functionsMahmoud Mandour
Replaced the calls to malloc()/calloc() and their respective calls to free() of iovec structs with GLib's allocation and deallocation functions and used g_autofree when appropriate. Replaced the allocation of in_sg_cpy to g_new() instead of a call to calloc() and a null-checking assertion. Not g_new0() because the buffer is immediately overwritten using memcpy. Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Message-Id: <20210427181333.148176-1-ma.mandourr@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-06virtiofsd: Changed allocations of fuse_req to GLib functionsMahmoud Mandour
Replaced the allocation and deallocation of fuse_req structs using calloc()/free() call pairs to a GLib's g_try_new0() and g_free(). Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210420154643.58439-2-ma.mandourr@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-06virtiofsd: Don't assume header layoutDr. David Alan Gilbert
virtiofsd incorrectly assumed a fixed set of header layout in the virt queue; assuming that the fuse and write headers were conveniently separated from the data; the spec doesn't allow us to take that convenience, so fix it up to deal with it the hard way. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210428110100.27757-3-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-05-06virtiofs: Fixup printf argsDr. David Alan Gilbert
Fixup some fuse_log printf args for 32bit compatibility. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210428110100.27757-2-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-05-06virtiofsd: Add help for -o xattr-mappingCarlos Venegas
The option is not documented in help. Add small help about the option. Signed-off-by: Carlos Venegas <jose.carlos.venegas.munoz@intel.com> Message-Id: <20210414201207.3612432-3-jose.carlos.venegas.munoz@intel.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
2021-05-06virtiofsd: Allow use "-o xattrmap" without "-o xattr"Carlos Venegas
When -o xattrmap is used, it will not work unless xattr is enabled. This patch enables xattr when -o xattrmap is used. Signed-off-by: Carlos Venegas <jose.carlos.venegas.munoz@intel.com> Message-Id: <20210414201207.3612432-2-jose.carlos.venegas.munoz@intel.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
2021-05-06virtiofsd: Fix side-effect in assert()Greg Kurz
It is bad practice to put an expression with a side-effect in assert() because the side-effect won't happen if the code is compiled with -DNDEBUG. Use an intermediate variable. Consolidate this in an macro to have proper line numbers when the assertion is hit. virtiofsd: ../../tools/virtiofsd/passthrough_ll.c:2797: lo_getxattr: Assertion `fchdir_res == 0' failed. Aborted 2796 /* fchdir should not fail here */ =>2797 FCHDIR_NOFAIL(lo->proc_self_fd); 2798 ret = getxattr(procname, name, value, size); 2799 FCHDIR_NOFAIL(lo->root.fd); Fixes: bdfd66788349 ("virtiofsd: Fix xattr operations") Cc: misono.tomohiro@jp.fujitsu.com Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210409100627.451573-1-groug@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2021-04-06virtiofsd: Fix security.capability comparisonDr. David Alan Gilbert
My security fix for the security.capability remap has a silly early segfault in a simple case where there is an xattrmapping but it doesn't remap the security.capability. Fixes: e586edcb41054 ("virtiofs: drop remapped security.capability xattr as needed") Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210401145845.78445-1-dgilbert@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-03-24tools/virtiofsd: include --socket-group in helpAlex Bennée
I confused myself wandering if this had been merged by looking at the help output. It seems fuse_opt doesn't automagically add to help output so lets do it now. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Updates: f6698f2b03 ("tools/virtiofsd: add support for --socket-group") Message-Id: <20210323165308.15244-5-alex.bennee@linaro.org>
2021-03-15virtiofsd: Convert some functions to return boolGreg Kurz
Both currently only return 0 or 1. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312141003.819108-3-groug@kaod.org> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-03-15virtiofsd: Don't allow empty paths in lookup_name()Greg Kurz
When passed an empty filename, lookup_name() returns the inode of the parent directory, unless the parent is the root in which case the st_dev doesn't match and lo_find() returns NULL. This is because lookup_name() passes AT_EMPTY_PATH down to fstatat() or statx(). This behavior doesn't quite make sense because users of lookup_name() then pass the name to unlinkat(), renameat() or renameat2(), all of which will always fail on empty names. Drop AT_EMPTY_PATH from the flags in lookup_name() so that it has the consistent behavior of "returning an existing child inode or NULL" for all directories. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312141003.819108-2-groug@kaod.org> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-03-15virtiofsd: Don't allow empty filenamesGreg Kurz
POSIX.1-2017 clearly stipulates that empty filenames aren't allowed ([1] and [2]). Since virtiofsd is supposed to mirror the host file system hierarchy and the host can be assumed to be linux, we don't really expect clients to pass requests with an empty path in it. If they do so anyway, this would eventually cause an error when trying to create/lookup the actual inode on the underlying POSIX filesystem. But this could still confuse some code that wouldn't be ready to cope with this. Filter out empty names coming from the client at the top level, so that the rest doesn't have to care about it. This is done everywhere we already call is_safe_path_component(), but in a separate helper since the usual error for empty path names is ENOENT instead of EINVAL. [1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_170 [2] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_13 Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210312141003.819108-4-groug@kaod.org> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-03-15virtiofsd: Add qemu version and copyright infoVivek Goyal
Option "-V" currently displays the fuse protocol version virtiofsd is using. For example, I see this. $ ./virtiofsd -V "using FUSE kernel interface version 7.33" People also want to know software version of virtiofsd so that they can figure out if a certain fix is part of currently running virtiofsd or not. Eric Ernst ran into this issue. David Gilbert thinks that it probably is best that we simply carry the qemu version and display that information given we are part of qemu tree. So this patch enhances version information and also adds qemu version and copyright info. Not sure if copyright information is supposed to be displayed along with version info. Given qemu-storage-daemon and other utilities are doing it, so I continued with same pattern. This is how now output looks like. $ ./virtiofsd -V virtiofsd version 5.2.50 (v5.2.0-2357-gcbcf09872a-dirty) Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers using FUSE kernel interface version 7.33 Reported-by: Eric Ernst <eric.g.ernst@gmail.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210303195339.GB3793@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-03-15virtiofsd: Release vu_dispatch_lock when stopping queueGreg Kurz
QEMU can stop a virtqueue by sending a VHOST_USER_GET_VRING_BASE request to virtiofsd. As with all other vhost-user protocol messages, the thread that runs the main event loop in virtiofsd takes the vu_dispatch lock in write mode. This ensures that no other thread can access virtqueues or memory tables at the same time. In the case of VHOST_USER_GET_VRING_BASE, the main thread basically notifies the queue thread that it should terminate and waits for its termination: main() virtio_loop() vu_dispatch_wrlock() vu_dispatch() vu_process_message() vu_get_vring_base_exec() fv_queue_cleanup_thread() pthread_join() Unfortunately, the queue thread ends up calling virtio_send_msg() at some point, which itself needs to grab the lock: fv_queue_thread() g_list_foreach() fv_queue_worker() fuse_session_process_buf_int() do_release() lo_release() fuse_reply_err() send_reply() send_reply_iov() fuse_send_reply_iov_nofree() fuse_send_msg() virtio_send_msg() vu_dispatch_rdlock() <-- Deadlock ! Simply have the main thread to release the lock before going to sleep and take it back afterwards. A very similar patch was already sent by Vivek Goyal sometime back: https://listman.redhat.com/archives/virtio-fs/2021-January/msg00073.html The only difference here is that this done in fv_queue_set_started() because fv_queue_cleanup_thread() can also be called from virtio_loop() without the lock being held. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210312092212.782255-8-groug@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-03-04virtiofs: drop remapped security.capability xattr as neededDr. David Alan Gilbert
On Linux, the 'security.capability' xattr holds a set of capabilities that can change when an executable is run, giving a limited form of privilege escalation to those programs that the writer of the file deemed worthy. Any write causes the 'security.capability' xattr to be dropped, stopping anyone from gaining privilege by modifying a blessed file. Fuse relies on the daemon to do this dropping, and in turn the daemon relies on the host kernel to drop the xattr for it. However, with the addition of -o xattrmap, the xattr that the guest stores its capabilities in is now not the same as the one that the host kernel automatically clears. Where the mapping changes 'security.capability', explicitly clear the remapped name to preserve the same behaviour. This bug is assigned CVE-2021-20263. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
2021-02-16virtiofsd: Do not use a thread pool by defaultVivek Goyal
Currently we created a thread pool (With 64 max threads per pool) for each virtqueue. We hoped that this will provide us with better scalability and performance. But in practice, we are getting better numbers in most of the cases when we don't create a thread pool at all and a single thread per virtqueue receives the request and processes it. Hence, I am proposing that we switch to no thread pool by default (equivalent of --thread-pool-size=0). This will provide out of box better performance to most of the users. In fact other users have confirmed that not using a thread pool gives them better numbers. So why not use this as default. It can be changed when somebody can fix the issues with thread pool performance. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20210210182744.27324-2-vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-16viriofsd: Add support for FUSE_HANDLE_KILLPRIV_V2Vivek Goyal
This patch adds basic support for FUSE_HANDLE_KILLPRIV_V2. virtiofsd can enable/disable this by specifying option "-o killpriv_v2/no_killpriv_v2". By default this is enabled as long as client supports it Enabling this option helps with performance in write path. Without this option, currently every write is first preceeded with a getxattr() operation to find out if security.capability is set. (Write is supposed to clear security.capability). With this option enabled, server is signing up for clearing security.capability on every WRITE and also clearing suid/sgid subject to certain rules. This gets rid of extra getxattr() call for every WRITE and improves performance. This is true when virtiofsd is run with option -o xattr. What does enabling FUSE_HANDLE_KILLPRIV_V2 mean for file server implementation. It needs to adhere to following rules. Thanks to Miklos for this summary. - clear "security.capability" on write, truncate and chown unconditionally - clear suid/sgid in case of following. Note, sgid is cleared only if group executable bit is set. o setattr has FATTR_SIZE and FATTR_KILL_SUIDGID set. o setattr has FATTR_UID or FATTR_GID o open has O_TRUNC and FUSE_OPEN_KILL_SUIDGID o create has O_TRUNC and FUSE_OPEN_KILL_SUIDGID flag set. o write has FUSE_WRITE_KILL_SUIDGID >From Linux VFS client perspective, here are the requirements. - caps are always cleared on chown/write/truncate - suid is always cleared on chown, while for truncate/write it is cleared only if caller does not have CAP_FSETID. - sgid is always cleared on chown, while for truncate/write it is cleared only if caller does not have CAP_FSETID as well as file has group execute permission. virtiofsd implementation has not changed much to adhere to above ruls. And reason being that current assumption is that we are running on Linux and on top of filesystems like ext4/xfs which already follow above rules. On write, truncate, chown, seucurity.capability is cleared. And virtiofsd drops CAP_FSETID if need be and that will lead to clearing of suid/sgid. But if virtiofsd is running on top a filesystem which breaks above assumptions, then it will have to take extra actions to emulate above. That's a TODO for later when need arises. Note: create normally is supposed to be called only when file does not exist. So generally there should not be any question of clearing setuid/setgid. But it is possible that after client checks that file is not present, some other client creates file on server and this race can trigger sending FUSE_CREATE. In that case, if O_TRUNC is set, we should clear suid/sgid if FUSE_OPEN_KILL_SUIDGID is also set. v3: - Resolved conflicts due to lo_inode_open() changes. - Moved capability code in lo_do_open() so that both lo_open() and lo_create() can benefit from common code. - Dropped changes to kernel headers as these are part of qemu already. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210208224024.43555-3-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-16virtiofsd: Save error code early at the failure callsiteVivek Goyal
Change error code handling slightly in lo_setattr(). Right now we seem to jump to out_err and assume that "errno" is valid and use that to send reply. But if caller has to do some other operations before jumping to out_err, then it does the dance of first saving errno to saverr and the restore errno before jumping to out_err. This makes it more confusing. I am about to make more changes where caller will have to do some work after error before jumping to out_err. I found it easier to change the convention a bit. That is caller saves error in "saverr" before jumping to out_err. And out_err uses "saverr" to send error back and does not rely on "errno" having actual error. v3: Resolved conflicts in lo_setattr() due to lo_inode_open() changes. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210208224024.43555-2-vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-16tools/virtiofsd: Replace the word 'whitelist'Philippe Mathieu-Daudé
Follow the inclusive terminology from the "Conscious Language in your Open Source Projects" guidelines [*] and replace the words "whitelist" appropriately. [*] https://github.com/conscious-lang/conscious-lang-docs/blob/main/faq.md Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210205171817.2108907-3-philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-16virtiofsd: vu_dispatch locking should never failGreg Kurz
pthread_rwlock_rdlock() and pthread_rwlock_wrlock() can fail if a deadlock condition is detected or the current thread already owns the lock. They can also fail, like pthread_rwlock_unlock(), if the mutex wasn't properly initialized. None of these are ever expected to happen with fv_VuDev::vu_dispatch_rwlock. Some users already check the return value and assert, some others don't. Introduce rdlock/wrlock/unlock wrappers that just do the former and use them everywhere for improved consistency and robustness. This is just cleanup. It doesn't fix any actual issue. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210203182434.93870-1-groug@kaod.org> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-16virtiofsd: Allow to build it without the toolsWainer dos Santos Moschetta
This changed the Meson build script to allow virtiofsd be built even though the tools build is disabled, thus honoring the --enable-virtiofsd option. Fixes: cece116c939d219070b250338439c2d16f94e3da (configure: add option for virtiofsd) Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Message-Id: <20210201211456.1133364-2-wainersm@redhat.com> Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-04virtiofsd: Add restart_syscall to the seccomp whitelistGreg Kurz
This is how linux restarts some system calls after SIGSTOP/SIGCONT. This is needed to avoid virtiofsd termination when resuming execution under GDB for example. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210201193305.136390-1-groug@kaod.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-04virtiofsd: Add _llseek to the seccomp whitelistGreg Kurz
This is how glibc implements lseek(2) on POWER. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1917692 Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210121171540.1449777-1-groug@kaod.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-04virtiofsd: prevent opening of special files (CVE-2020-35517)Stefan Hajnoczi
A well-behaved FUSE client does not attempt to open special files with FUSE_OPEN because they are handled on the client side (e.g. device nodes are handled by client-side device drivers). The check to prevent virtiofsd from opening special files is missing in a few cases, most notably FUSE_OPEN. A malicious client can cause virtiofsd to open a device node, potentially allowing the guest to escape. This can be exploited by a modified guest device driver. It is not exploitable from guest userspace since the guest kernel will handle special files inside the guest instead of sending FUSE requests. This patch fixes this issue by introducing the lo_inode_open() function to check the file type before opening it. This is a short-term solution because it does not prevent a compromised virtiofsd process from opening device nodes on the host. Restructure lo_create() to try O_CREAT | O_EXCL first. Note that O_CREAT | O_EXCL does not follow symlinks, so O_NOFOLLOW masking is not necessary here. If the file exists and the user did not specify O_EXCL, open it via lo_do_open(). Reported-by: Alex Xu <alex@alxu.ca> Fixes: CVE-2020-35517 Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210204150208.367837-4-stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-04virtiofsd: optionally return inode pointer from lo_do_lookup()Stefan Hajnoczi
lo_do_lookup() finds an existing inode or allocates a new one. It increments nlookup so that the inode stays alive until the client releases it. Existing callers don't need the struct lo_inode so the function doesn't return it. Extend the function to optionally return the inode. The next commit will need it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20210204150208.367837-3-stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-04virtiofsd: extract lo_do_open() from lo_open()Stefan Hajnoczi
Both lo_open() and lo_create() have similar code to open a file. Extract a common lo_do_open() function from lo_open() that will be used by lo_create() in a later commit. Since lo_do_open() does not otherwise need fuse_req_t req, convert lo_add_fd_mapping() to use struct lo_data *lo instead. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210204150208.367837-2-stefanha@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-01-06cap_ng: convert to mesonPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-06seccomp: convert to mesonPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-18virtiofsd: Remove useless code about send_notify_iovAlex Chen
The 'ch' will be NULL in the following stack: send_notify_iov()->fuse_send_msg()->virtio_send_msg(), and this may lead to NULL pointer dereferenced in virtio_send_msg(). But send_notify_iov() was never called, so remove the useless code about send_notify_iov() to fix this problem. Signed-off-by: Alex Chen <alex.chen@huawei.com> Message-Id: <20201214121615.29967-1-alex.chen@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-12-18virtiofsd: update FUSE_FORGET comment on "lo_inode.nlookup"Laszlo Ersek
Miklos confirms it's *only* the FUSE_FORGET request that the client can use for decrementing "lo_inode.nlookup". Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Fixes: 1222f015558fc34cea02aa3a5a92de608c82cec8 Signed-off-by: Laszlo Ersek <lersek@redhat.com> Message-Id: <20201208073936.8629-1-lersek@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-12-18virtiofsd: Check file type in lo_flush()Vivek Goyal
Currently lo_flush() is written in such a way that it expects to receive a FLUSH requests on a regular file (and not directories). For example, we call lo_fi_fd() which searches lo->fd_map. If we open directories using opendir(), we keep don't keep track of these in lo->fd_map instead we keep them in lo->dir_map. So we expect lo_flush() to be called on regular files only. Even linux fuse client calls FLUSH only for regular files and not directories. So put a check for filetype and return EBADF if lo_flush() is called on a non-regular file. Reported-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20201211142544.GB3285@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-12-18virtiofsd: Disable posix_lock hash table if remote locks are not enabledVivek Goyal
If remote posix locks are not enabled (lo->posix_lock == false), then disable code paths taken to initialize inode->posix_lock hash table and corresponding destruction and search etc. lo_getlk() and lo_setlk() have been modified to return ENOSYS if daemon does not support posix lock but client still sends a lock/unlock request. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20201207183021.22752-3-vgoyal@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-12-18virtiofsd: Set up posix_lock hash table for root inodeVivek Goyal
We setup per inode hash table ->posix_lock to support remote posix locks. But we forgot to initialize this table for root inode. Laszlo managed to trigger an issue where he sent a FUSE_FLUSH request for root inode and lo_flush() found inode with inode->posix_lock NULL and accessing this table crashed virtiofsd. May be we can get rid of initializing this hash table for directory objects completely. But that optimization is for another day. Reported-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Message-Id: <20201207195539.GB3107@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>