aboutsummaryrefslogtreecommitdiff
path: root/hw/net
AgeCommit message (Collapse)Author
2016-06-02Add ENET/Gbps Ethernet support to FEC deviceJean-Christophe Dubois
The ENET device (present in i.MX6) is "derived" from FEC and backward compatible with it. This patch adds the necessary support of the added feature in the ENET device to allow Linux to use it (on supported processors). Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02i.MX: move FEC device to a register array structure.Jean-Christophe Dubois
This is to prepare for the ENET Gb device of the i.MX6. Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02i.MX: Rename i.MX FEC defines to ENET_XXXJean-Christophe Dubois
Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02i.MX: reset TX/RX descriptors when FEC is disabled.Jean-Christophe Dubois
According to the FEC chapter of i.MX25 reference manual RX adn TX descriptors are reseted when the FEC device is disabled through ECR. Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02i.MX: Fix FEC code for ECR register reset value.Jean-Christophe Dubois
According to the FEC chapter of i.MX25 reference manual ECR register is initialized at 0xf0000000 at reset time. We fix the value. Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02i.MX: Fix FEC code for MDIO address selectionJean-Christophe Dubois
According to the FEC chapter of i.MX25 reference manual When writing to MMFR register, the MDIO device and adress are selected by bit 27 to 23 and bit 22 to 18 respectively. This is a total of 10 bits that need to be used by the Phy chip/address decoding function. This patch fixes the number of bits used from 9 to 10. Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02i.MX: Fix FEC code for MDIO operation selectionJean-Christophe Dubois
According to the FEC chapter of i.MX25 reference manual When writing the MMFR register, bit 29 and 28 select the requested operation. * 10 means read operation with valid MII mgmt frame * 11 means read operation with non compliant MII mgmt frame * 01 means write operation with valid MII mgmt frame * 00 means write operation with non compliant MII mgmt frame So while bit 28 does change beween read/write for valid MII mgmt frame, the mening is inverted for non compliant MII mgmt frame. Bit 29 on the other hand means read/write whatever the type of mgmt frame involved. So this patch change the operation selection from bit 28 to bit 29 as it is more generic. Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02net: Introduce e1000e device emulationDmitry Fleytman
This patch introduces emulation for the Intel 82574 adapter, AKA e1000e. This implementation is derived from the e1000 emulation code, and utilizes the TX/RX packet abstractions that were initially developed for the vmxnet3 device. Although some parts of the introduced code may be shared with e1000, the differences are substantial enough so that the only shared resources for the two devices are the definitions in hw/net/e1000_regs.h. Similarly to vmxnet3, the new device uses virtio headers for task offloads (for backends that support virtio extensions). Usage of virtio headers may be forcibly disabled via a boolean device property "vnet" (which is enabled by default). In such case task offloads will be performed in software, in the same way it is done on backends that do not support virtio headers. The device code is split into two parts: 1. hw/net/e1000e.c: QEMU-specific code for a network device; 2. hw/net/e1000e_core.[hc]: Device emulation according to the spec. The new device name is e1000e. Intel specifications for the 82574 controller are available at: http://www.intel.com/content/dam/doc/datasheet/82574l-gbe-controller-datasheet.pdf Throughput measurement results (iperf2): Fedora 22 guest, TCP, RX 4 ++------------------------------------------+ | | | X X X X X 3.5 ++ X X X X | | X | | | 3 ++ | G | X | b | | / 2.5 ++ | s | | | | 2 ++ | | | | | 1.5 X+ | | | + + + + + + + + + + + + 1 ++--+---+---+---+---+---+---+---+---+---+---+ 32 64 128 256 512 1 2 4 8 16 32 64 B B B B B KB KB KB KB KB KB KB Buffer size Fedora 22 guest, TCP, TX 18 ++-------------------------------------------+ | X | 16 ++ X X X X X | X | 14 ++ | | | 12 ++ | G | X | b 10 ++ | / | | s 8 ++ | | | 6 ++ X | | | 4 ++ | | X | 2 ++ X | X + + + + + + + + + + + 0 ++--+---+---+---+---+----+---+---+---+---+---+ 32 64 128 256 512 1 2 4 8 16 32 64 B B B B B KB KB KB KB KB KB KB Buffer size Fedora 22 guest, UDP, RX 3 ++------------------------------------------+ | X | | 2.5 ++ | | | | | 2 ++ X | G | | b | | / 1.5 ++ | s | X | | | 1 ++ | | | | X | 0.5 ++ | | X | X + + + + + 0 ++-------+--------+-------+--------+--------+ 32 64 128 256 512 1 B B B B B KB Datagram size Fedora 22 guest, UDP, TX 1 ++------------------------------------------+ | X 0.9 ++ | | | 0.8 ++ | 0.7 ++ | | | G 0.6 ++ | b | | / 0.5 ++ | s | X | 0.4 ++ | | | 0.3 ++ | 0.2 ++ X | | | 0.1 ++ X | X X + + + + 0 ++-------+--------+-------+--------+--------+ 32 64 128 256 512 1 B B B B B KB Datagram size Windows 2012R2 guest, TCP, RX 3.2 ++------------------------------------------+ | X | 3 ++ | | | 2.8 ++ | | | 2.6 ++ X | G | X X X X X b 2.4 ++ X X | / | | s 2.2 ++ | | | 2 ++ | | X X | 1.8 ++ | | | 1.6 X+ | + + + + + + + + + + + + 1.4 ++--+---+---+---+---+---+---+---+---+---+---+ 32 64 128 256 512 1 2 4 8 16 32 64 B B B B B KB KB KB KB KB KB KB Buffer size Windows 2012R2 guest, TCP, TX 14 ++-------------------------------------------+ | | | X X 12 ++ | | | 10 ++ | | | G | | b 8 ++ | / | X | s 6 ++ | | | | | 4 ++ X | | | 2 ++ | | X X X | + X X + + X X + + + + + 0 X+--+---+---+---+---+----+---+---+---+---+---+ 32 64 128 256 512 1 2 4 8 16 32 64 B B B B B KB KB KB KB KB KB KB Buffer size Windows 2012R2 guest, UDP, RX 1.6 ++------------------------------------------X | | 1.4 ++ | | | 1.2 ++ | | X | | | G 1 ++ | b | | / 0.8 ++ | s | | 0.6 ++ X | | | 0.4 ++ | | X | | | 0.2 ++ X | X + + + + + 0 ++-------+--------+-------+--------+--------+ 32 64 128 256 512 1 B B B B B KB Datagram size Windows 2012R2 guest, UDP, TX 0.6 ++------------------------------------------+ | X | | 0.5 ++ | | | | | 0.4 ++ | G | | b | | / 0.3 ++ X | s | | | | 0.2 ++ | | | | X | 0.1 ++ | | X | X X + + + + 0 ++-------+--------+-------+--------+--------+ 32 64 128 256 512 1 B B B B B KB Datagram size Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com> Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02e1000: Move out code that will be reused in e1000eDmitry Fleytman
Code that will be shared moved to a separate files. Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com> Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02e1000_regs: Add definitions for Intel 82574-specific bitsDmitry Fleytman
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com> Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02vmxnet3: Use pci_dma_* API instead of cpu_physical_memory_*Dmitry Fleytman
To make this device and network packets abstractions ready for IOMMU. Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com> Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02net_pkt: Extend packet abstraction as required by e1000e functionalityDmitry Fleytman
This patch extends the TX/RX packet abstractions with features that will be used by the e1000e device implementation. Changes are: 1. Support iovec lists for RX buffers 2. Deeper RX packets parsing 3. Loopback option for TX packets 4. Extended VLAN headers handling 5. RSS processing for RX packets Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com> Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02rtl8139: Move more TCP definitions to common headerDmitry Fleytman
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com> Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02net_pkt: Name vmxnet3 packet abstractions more genericDmitry Fleytman
This patch drops "vmx" prefix from packet abstractions names to emphasize the fact they are generic and not tied to any specific network device. These abstractions will be reused by e1000e emulation implementation introduced by following patches so their names need generalization. This patch (except renamed files, adjusted comments and changes in MAINTAINTERS) was produced by: git grep -lz 'vmxnet_tx_pkt' | xargs -0 perl -i'' -pE "s/vmxnet_tx_pkt/net_tx_pkt/g" git grep -lz 'vmxnet_rx_pkt' | xargs -0 perl -i'' -pE "s/vmxnet_rx_pkt/net_rx_pkt/g" git grep -lz 'VmxnetTxPkt' | xargs -0 perl -i'' -pE "s/VmxnetTxPkt/NetTxPkt/g" git grep -lz 'VMXNET_TX_PKT' | xargs -0 perl -i'' -pE "s/VMXNET_TX_PKT/NET_TX_PKT/g" git grep -lz 'VmxnetRxPkt' | xargs -0 perl -i'' -pE "s/VmxnetRxPkt/NetRxPkt/g" git grep -lz 'VMXNET_RX_PKT' | xargs -0 perl -i'' -pE "s/VMXNET_RX_PKT/NET_RX_PKT/g" sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_rx_pkt.c sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_tx_pkt.c Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com> Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02vmxnet3: Use common MAC address tracing macrosDmitry Fleytman
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com> Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-06-02vmxnet3: Use generic function for DSN capability definitionDmitry Fleytman
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com> Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-05-27hw/net/spapr_llan: Provide counter with dropped rx frames to the guestThomas Huth
The last 8 bytes of the receive buffer list page (that has been supplied by the guest with the H_REGISTER_LOGICAL_LAN call) contain a counter for frames that have been dropped because there was no suitable receive buffer available. This patch introduces code to use this field to provide the information about dropped rx packets to the guest. There it can be queried with "ethtool -S eth0 | grep rx_no_buffer". Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-27hw/net/spapr_llan: Delay flushing of the RX queue while adding new RX buffersThomas Huth
Currently, the spapr-vlan device is trying to flush the RX queue after each RX buffer that has been added by the guest via the H_ADD_LOGICAL_LAN_BUFFER hypercall. In case the receive buffer pool was empty before, we only pass single packets to the guest this way. This can cause very bad performance if a sender is trying to stream fragmented UDP packets to the guest. For example when using the UDP_STREAM test from netperf with UDP packets that are much bigger than the MTU size, almost all UDP packets are dropped in the guest since the chances are quite high that at least one of the fragments got lost on the way. When flushing the receive queue, it's much better if we'd have a bunch of receive buffers available already, so that fragmented packets can be passed to the guest in one go. To do this, the spapr_vlan_receive() function should return 0 instead of -1 if there are no more receive buffers available, so that receive_disabled = 1 gets temporarily set for the receive queue, and we have to delay the queue flushing at the end of h_add_logical_lan_buffer() a little bit by using a timer, so that the guest gets a chance to add multiple RX buffers before we flush the queue again. This improves the UDP_STREAM test with the spapr-vlan device a lot: Running netserver -p 44444 -L <guestip> -f -D -4 in the guest, and netperf -p 44444 -L <hostip> -H <guestip> -t UDP_STREAM -l 60 -- -m 16384 in the host, I get the following values _without_ this patch: Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 229376 16384 60.00 1738970 0 3798.83 229376 60.00 23 0.05 That "0.05" means that almost all UDP packets got lost/discarded at the receiving side. With this patch applied, the value look much better: Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 229376 16384 60.00 1789104 0 3908.35 229376 60.00 22818 49.85 Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-25net: mipsnet: check packet length against bufferPrasad J Pandit
When receiving packets over MIPSnet network device, it uses receive buffer of size 1514 bytes. In case the controller accepts large(MTU) packets, it could lead to memory corruption. Add check to avoid it. Reported by: Oleksandr Bazhaniuk <oleksandr.bazhaniuk@intel.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-05-23hw/net/opencores_eth: Allocating Large sized arrays to heapZhou Jie
open_eth_start_xmit has a huge stack usage of 65536 bytes approx. Moving large arrays to heap to reduce stack usage. Reduce size of a buffer allocated on stack to 0x600 bytes, which is the maximal frame length when HUGEN bit is not set in MODER, only allocate buffer on heap when that is too small. Thus heap is not used in typical use case. Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2016-05-23hw/net/opencores_eth: use mii.hMax Filippov
Drop local definitions of MII registers and use constants from mii.h for registers and register bits. No functional changes. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2016-05-19hw: explicitly include qemu/log.hPaolo Bonzini
Move the inclusion out of hw/hw.h, most files do not need it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19hw: do not use VMSTATE_*TLPaolo Bonzini
Reserve this to CPU state serialization. Luckily, they were only used by sPAPR devices and these are ppc64 only. So there is no change to migration format. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-18Fix some typos found by codespellStefan Weil
Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-04-11net: stellaris_enet: check packet length against receive bufferPrasad J Pandit
When receiving packets over Stellaris ethernet controller, it uses receive buffer of size 2048 bytes. In case the controller accepts large(MTU) packets, it could lead to memory corruption. Add check to avoid it. Reported-by: Oleksandr Bazhaniuk <oleksandr.bazhaniuk@intel.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Message-id: 1460095428-22698-1-git-send-email-ppandit@redhat.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-08Merge remote-tracking branch 'remotes/xtensa/tags/20160408-xtensa' into stagingPeter Maydell
Xtensa-related fixes: - fix networking on xtfpga platform in linux v4.5 by indicating autonegotiation completion in opencores_eth MII BMSR. # gpg: Signature made Thu 07 Apr 2016 23:33:59 BST using RSA key ID F83FA044 # gpg: Good signature from "Max Filippov <max.filippov@cogentembedded.com>" # gpg: aka "Max Filippov <jcmvbkbc@gmail.com>" * remotes/xtensa/tags/20160408-xtensa: opencores_eth: indicate autonegotiation completion Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-04-06rtl8139: using CP_TX_OWN for ownership transferring during txJason Wang
Through CP_TX_OWN and CP_RX_OWN points to the same bit, we'd better use CP_TX_OWN for tx descriptor handling. Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-04-04opencores_eth: indicate autonegotiation completionMax Filippov
Indicate that autonegotiation is complete in the MII BMSR. This fixes networking on xtfpga platform in linux v4.5. Cc: qemu-stable@nongnu.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2016-03-30Revert "e1000: fix hang of win2k12 shutdown with flood ping"Sameeh Jubran
This reverts commit 9596ef7c7b8528bedb240792ea1fb598543ad3c4. This workaround in order to fix endless interrupts is no longer needed because it was superseded by the previous patch (e1000: Fixing interrupt pace). Signed-off-by: Sameeh Jubran <sameeh@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-30e1000: Fixing interrupts pace.Sameeh Jubran
This patch introduces an upper bound for number of interrupts per second. Without this bound an interrupt storm can occur as it has been observed on Windows 10 when disabling the device. According to the SPEC - Intel PCI/PCI-X Family of Gigabit Ethernet Controllers Software Developer's Manual, section 13.4.18 - the Ethernet controller guarantees a maximum observable interrupt rate of 7813 interrupts/sec. If there is no upper bound this could lead to an interrupt storm by e1000 (when mit_delay < 500) causing interrupts to fire at a very high pace. Thus if mit_delay < 500 then the delay should be set to the minimum delay possible which is 500. This can be calculated easily as follows: Interval = 10^9 / (7813 * 256) = 500. Signed-off-by: Sameeh Jubran <sameeh@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-24Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell
* Log filtering from Alex and Peter * Chardev fix from Marc-André * config.status tweak from David * Header file tweaks from Markus, myself and Veronia (Outreachy candidate) * get_ticks_per_sec() removal from Rutuja (Outreachy candidate) * Coverity fix from myself * PKE implementation from myself, based on rth's XSAVE support # gpg: Signature made Thu 24 Mar 2016 20:15:11 GMT using RSA key ID 78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" * remotes/bonzini/tags/for-upstream: (28 commits) target-i386: implement PKE for TCG config.status: Pass extra parameters char: translate from QIOChannel error to errno exec: fix error handling in file_ram_alloc cputlb: modernise the debug support qemu-log: support simple pid substitution for logs target-arm: dfilter support for in_asm qemu-log: dfilter-ise exec, out_asm, op and opt_op qemu-log: new option -dfilter to limit output qemu-log: Improve the "exec" TB execution logging qemu-log: Avoid function call for disabled qemu_log_mask logging qemu-log: correct help text for -d cpu tcg: pass down TranslationBlock to tcg_code_gen util: move declarations out of qemu-common.h Replaced get_tick_per_sec() by NANOSECONDS_PER_SECOND hw: explicitly include qemu-common.h and cpu.h include/crypto: Include qapi-types.h or qemu/bswap.h instead of qemu-common.h isa: Move DMA_transfer_handler from qemu-common.h to hw/isa/isa.h Move ParallelIOArg from qemu-common.h to sysemu/char.h Move QEMU_ALIGN_*() from qemu-common.h to qemu/osdep.h ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Conflicts: scripts/clean-includes
2016-03-24hw/net/spapr_llan: Enable the RX buffer pools by default for new machinesThomas Huth
RX buffer pools are now enabled by default for new machine types. For older machine types, they are still disabled to avoid breaking migration. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24hw/net/spapr_llan: Fix receive buffer handling for better performanceThomas Huth
tl;dr: This patch introduces an alternate way of handling the receive buffers of the spapr-vlan device, resulting in much better receive performance for the guest. Full story: One of our testers recently discovered that the performance of the spapr-vlan device is very poor compared to other NICs, and that a simple "ping -i 0.2 -s 65507 someip" in the guest can result in more than 50% lost ping packets (especially with older guest kernels < 3.17). After doing some analysis, it was clear that there is a problem with the way we handle the receive buffers in spapr_llan.c: The ibmveth driver of the guest Linux kernel tries to add a lot of buffers into several buffer pools (with 512, 2048 and 65536 byte sizes by default, but it can be changed via the entries in the /sys/devices/vio/1000/pool* directories of the guest). However, the spapr-vlan device of QEMU only tries to squeeze all receive buffer descriptors into one single page which has been supplied by the guest during the H_REGISTER_LOGICAL_LAN call, without taking care of different buffer sizes. This has two bad effects: First, only a very limited number of buffer descriptors is accepted at all. Second, we also hand 64k buffers to the guest even if the 2k buffers would fit better - and this results in dropped packets in the IP layer of the guest since too much skbuf memory is used. Though it seems at a first glance like PAPR says that we should store the receive buffer descriptors in the page that is supplied during the H_REGISTER_LOGICAL_LAN call, chapter 16.4.1.2 in the LoPAPR spec declares that "the contents of these descriptors are architecturally opaque, none of these descriptors are manipulated by code above the architected interfaces". That means we don't have to store the RX buffer descriptors in this page, but can also manage the receive buffers at the hypervisor level only. This is now what we are doing here: Introducing proper RX buffer pools which are also sorted by size of the buffers, so we can hand out a buffer with the best fitting size when a packet has been received. To avoid problems with migration from/to older version of QEMU, the old behavior is also retained and enabled by default. The new buffer management has to be enabled via a new "use-rx-buffer-pools" property. Now with the new buffer pool management enabled, the problem with "ping -s 65507" is fixed for me, and the throughput of a simple test with wget increases from creeping 3MB/s up to 20MB/s! Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-24hw/net/spapr_llan: Extract rx buffer code into separate functionsThomas Huth
Refactor the code a little bit by extracting the code that reads and writes the receive buffer list page into separate functions. There should be no functional change in this patch, this is just a preparation for the upcoming extensions that introduce receive buffer pools. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-22Replaced get_tick_per_sec() by NANOSECONDS_PER_SECONDRutuja Shah
This patch replaces get_ticks_per_sec() calls with the macro NANOSECONDS_PER_SECOND. Also, as there are no callers, get_ticks_per_sec() is then removed. This replacement improves the readability and understandability of code. For example, timer_mod(fdctrl->result_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + (get_ticks_per_sec() / 50)); NANOSECONDS_PER_SECOND makes it obvious that qemu_clock_get_ns matches the unit of the expression on the right side of the plus. Signed-off-by: Rutuja Shah <rutu.shah.26@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22hw: explicitly include qemu-common.h and cpu.hPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22Clean up includes some moreMarkus Armbruster
Manually drop redundant includes that scripts/clean-includes misses, e.g. because they're hidden in generator programs, or they use the wrong kind of delimiter. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22include/qemu/osdep.h: Don't include qapi/error.hMarkus Armbruster
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the Error typedef. Since then, we've moved to include qemu/osdep.h everywhere. Its file comment explains: "To avoid getting into possible circular include dependencies, this file should not include any other QEMU headers, with the exceptions of config-host.h, compiler.h, os-posix.h and os-win32.h, all of which are doing a similar job to this file and are under similar constraints." qapi/error.h doesn't do a similar job, and it doesn't adhere to similar constraints: it includes qapi-types.h. That's in excess of 100KiB of crap most .c files don't actually need. Add the typedef to qemu/typedefs.h, and include that instead of qapi/error.h. Include qapi/error.h in .c files that need it and don't get it now. Include qapi-types.h in qom/object.h for uint16List. Update scripts/clean-includes accordingly. Update it further to match reality: replace config.h by config-target.h, add sysemu/os-posix.h, sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h comment quoted above similarly. This reduces the number of objects depending on qapi/error.h from "all of them" to less than a third. Unfortunately, the number depending on qapi-types.h shrinks only a little. More work is needed for that one. Signed-off-by: Markus Armbruster <armbru@redhat.com> [Fix compilation without the spice devel packages. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-16i.MX: Add missing descriptions in devices.Jean-Christophe Dubois
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Jean-Christophe Dubois <jcd@tribudubois.net> Message-id: f1f565eb9dffdeb582feb1b15ba9e8b0afcf5468.1456868959.git.jcd@tribudubois.net Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-03-08rocker: allow user to specify rocker world by propertyJiri Pirko
Add property to specify rocker world. All ports will be assigned to this world. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08rocker: add name field into WorldOps ale let world specify its nameJiri Pirko
Also use this in world_name getter function. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08rocker: return -ENOMEM in case of some world alloc failsJiri Pirko
Until now, 0 is returned in this error case. Fix it ro return -ENOMEM. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08rocker: forbid to change world typeJiri Pirko
Port to world assignment should be permitted only by qemu user. Driver should not be able to do it, so forbid that possibility. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-03-08net: ne2000: check ring buffer control registersPrasad J Pandit
Ne2000 NIC uses ring buffer of NE2000_MEM_SIZE(49152) bytes to process network packets. Registers PSTART & PSTOP define ring buffer size & location. Setting these registers to invalid values could lead to infinite loop or OOB r/w access issues. Add check to avoid it. Reported-by: Yang Hongke <yanghongke@huawei.com> Tested-by: Yang Hongke <yanghongke@huawei.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Jason Wang <jasowang@redhat.com>
2016-02-23all: Clean up includesPeter Maydell
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com>
2016-02-18vhost-user interrupt management fixesVictor Kaplansky
Since guest_mask_notifier can not be used in vhost-user mode due to buffering implied by unix control socket, force use_mask_notifier on virtio devices of vhost-user interfaces, and send correct callfd to the guest at vhost start. Using guest_notifier_mask function in vhost-user case may break interrupt mask paradigm, because mask/unmask is not really done when returning from guest_notifier_mask call, instead message is posted in a unix socket, and processed later. Add an option boolean flag 'use_mask_notifier' to disable the use of guest_notifier_mask in virtio pci. Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Signed-off-by: Victor Kaplansky <victork@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-02-16vhost-net: revert support of cross-endian vnet headersGreg Kurz
Cross-endian is now handled by the core virtio-net code. This patch reverts: commit 5be7d9f1b1452613b95c6ba70b8d7ad3d0797991 vhost-net: tell tap backend about the vnet endianness and commit cf0a628f6e81bfc9b7a944fa0b80c3594836df56 net: set endianness on all backend devices Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-02-16virtio-net: use the backend cross-endian capabilitiesGreg Kurz
When running a fully emulated device in cross-endian conditions, including a virtio 1.0 device offered to a big endian guest, we need to fix the vnet headers. This is currently handled by the virtio_net_hdr_swap() function in the core virtio-net code but it should actually be handled by the net backend. With this patch, virtio-net now tries to configure the backend to do the endian fixing when the device starts (i.e. drivers sets the CONFIG_OK bit). If the backend cannot support the requested endiannes, we have to fallback onto virtio_net_hdr_swap(): this is recorded in the needs_vnet_hdr_swap flag, to be used in the TX and RX paths. Note that we reset the backend to the default behaviour (guest native endianness) when the device stops (i.e. device status had CONFIG_OK bit and driver unsets it). This is needed, with the linux tap backend at least, otherwise the guest may lose network connectivity if rebooted into a different endianness. The current vhost-net code also tries to configure net backends. This will be no more needed and will be reverted in a subsequent patch. Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-02-08qom: Swap 'name' next to visitor in ObjectPropertyAccessorEric Blake
Similar to the previous patch, it's nice to have all functions in the tree that involve a visitor and a name for conversion to or from QAPI to consistently stick the 'name' parameter next to the Visitor parameter. Done by manually changing include/qom/object.h and qom/object.c, then running this Coccinelle script and touching up the fallout (Coccinelle insisted on adding some trailing whitespace). @ rule1 @ identifier fn; typedef Object, Visitor, Error; identifier obj, v, opaque, name, errp; @@ void fn - (Object *obj, Visitor *v, void *opaque, const char *name, + (Object *obj, Visitor *v, const char *name, void *opaque, Error **errp) { ... } @@ identifier rule1.fn; expression obj, v, opaque, name, errp; @@ fn(obj, v, - opaque, name, + name, opaque, errp) Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <1454075341-13658-20-git-send-email-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-02-08qapi: Swap visit_* arguments for consistent 'name' placementEric Blake
JSON uses "name":value, but many of our visitor interfaces were called with visit_type_FOO(v, &value, name, errp). This can be a bit confusing to have to mentally swap the parameter order to match JSON order. It's particularly bad for visit_start_struct(), where the 'name' parameter is smack in the middle of the otherwise-related group of 'obj, kind, size' parameters! It's time to do a global swap of the parameter ordering, so that the 'name' parameter is always immediately after the Visitor argument. Additional reason in favor of the swap: the existing include/qjson.h prefers listing 'name' first in json_prop_*(), and I have plans to unify that file with the qapi visitors; listing 'name' first in qapi will minimize churn to the (admittedly few) qjson.h clients. Later patches will then fix docs, object.h, visitor-impl.h, and those clients to match. Done by first patching scripts/qapi*.py by hand to make generated files do what I want, then by running the following Coccinelle script to affect the rest of the code base: $ spatch --sp-file script `git grep -l '\bvisit_' -- '**/*.[ch]'` I then had to apply some touchups (Coccinelle insisted on TAB indentation in visitor.h, and botched the signature of visit_type_enum() by rewriting 'const char *const strings[]' to the syntactically invalid 'const char*const[] strings'). The movement of parameters is sufficient to provoke compiler errors if any callers were missed. // Part 1: Swap declaration order @@ type TV, TErr, TObj, T1, T2; identifier OBJ, ARG1, ARG2; @@ void visit_start_struct -(TV v, TObj OBJ, T1 ARG1, const char *name, T2 ARG2, TErr errp) +(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp) { ... } @@ type bool, TV, T1; identifier ARG1; @@ bool visit_optional -(TV v, T1 ARG1, const char *name) +(TV v, const char *name, T1 ARG1) { ... } @@ type TV, TErr, TObj, T1; identifier OBJ, ARG1; @@ void visit_get_next_type -(TV v, TObj OBJ, T1 ARG1, const char *name, TErr errp) +(TV v, const char *name, TObj OBJ, T1 ARG1, TErr errp) { ... } @@ type TV, TErr, TObj, T1, T2; identifier OBJ, ARG1, ARG2; @@ void visit_type_enum -(TV v, TObj OBJ, T1 ARG1, T2 ARG2, const char *name, TErr errp) +(TV v, const char *name, TObj OBJ, T1 ARG1, T2 ARG2, TErr errp) { ... } @@ type TV, TErr, TObj; identifier OBJ; identifier VISIT_TYPE =~ "^visit_type_"; @@ void VISIT_TYPE -(TV v, TObj OBJ, const char *name, TErr errp) +(TV v, const char *name, TObj OBJ, TErr errp) { ... } // Part 2: swap caller order @@ expression V, NAME, OBJ, ARG1, ARG2, ERR; identifier VISIT_TYPE =~ "^visit_type_"; @@ ( -visit_start_struct(V, OBJ, ARG1, NAME, ARG2, ERR) +visit_start_struct(V, NAME, OBJ, ARG1, ARG2, ERR) | -visit_optional(V, ARG1, NAME) +visit_optional(V, NAME, ARG1) | -visit_get_next_type(V, OBJ, ARG1, NAME, ERR) +visit_get_next_type(V, NAME, OBJ, ARG1, ERR) | -visit_type_enum(V, OBJ, ARG1, ARG2, NAME, ERR) +visit_type_enum(V, NAME, OBJ, ARG1, ARG2, ERR) | -VISIT_TYPE(V, OBJ, NAME, ERR) +VISIT_TYPE(V, NAME, OBJ, ERR) ) Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <1454075341-13658-19-git-send-email-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>