Commit Graph

95 Commits

Author SHA1 Message Date
Zoltan Kiss 00aefceb2f xen-netback: Trivial format string fix
There is a "%" after pending_idx instead of ":".

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-04 10:49:53 -04:00
Zoltan Kiss bdab82759b xen-netback: Grant copy the header instead of map and memcpy
An old inefficiency of the TX path that we are grant mapping the first slot,
and then copy the header part to the linear area. Instead, doing a grant copy
for that header straight on is more reasonable. Especially because there are
ongoing efforts to make Xen avoiding TLB flush after unmap when the page were
not touched in Dom0. In the original way the memcpy ruined that.
The key changes:
- the vif has a tx_copy_ops array again
- xenvif_tx_build_gops sets up the grant copy operations
- we don't have to figure out whether the header and first frag are on the same
  grant mapped page or not
Note, we only grant copy PKT_PROT_LEN bytes from the first slot, the rest (if
any) will be on the first frag, which is grant mapped. If the first slot is
smaller than PKT_PROT_LEN, then we grant copy that, and later __pskb_pull_tail
will pull more from the frags (if any)

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-03 14:22:40 -04:00
Zoltan Kiss 9074ce2493 xen-netback: Rename map ops
Rename identifiers to state explicitly that they refer to map ops.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-03 14:22:38 -04:00
Wei Liu e9d8b2c296 xen-netback: disable rogue vif in kthread context
When netback discovers frontend is sending malformed packet it will
disables the interface which serves that frontend.

However disabling a network interface involving taking a mutex which
cannot be done in softirq context, so we need to defer this process to
kthread context.

This patch does the following:
1. introduce a flag to indicate the interface is disabled.
2. check that flag in TX path, don't do any work if it's true.
3. check that flag in RX path, turn off that interface if it's true.

The reason to disable it in RX path is because RX uses kthread. After
this change the behavior of netback is still consistent -- it won't do
any TX work for a rogue frontend, and the interface will be eventually
turned off.

Also change a "continue" to "break" after xenvif_fatal_tx_err, as it
doesn't make sense to continue processing packets if frontend is rogue.

This is a fix for XSA-90.

Reported-by: Török Edwin <edwin@etorok.net>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-01 16:25:51 -04:00
David S. Miller 0b70195e0c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/xen-netback/netback.c

A bug fix overlapped with changing how the netback SKB control
block is implemented.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 16:56:43 -04:00
Paul Durrant 1425c7a4e8 xen-netback: BUG_ON in xenvif_rx_action() not catching overflow
The BUG_ON to catch ring overflow in xenvif_rx_action() makes the assumption
that meta_slots_used == ring slots used. This is not necessarily the case
for GSO packets, because the non-prefix GSO protocol consumes one more ring
slot than meta-slot for the 'extra_info'. This patch changes the test to
actually check ring slots.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:50:34 -04:00
Paul Durrant a02eb4732c xen-netback: worse-case estimate in xenvif_rx_action is underestimating
The worse-case estimate for skb ring slot usage in xenvif_rx_action()
fails to take fragment page_offset into account. The page_offset does,
however, affect the number of times the fragmentation code calls
start_new_rx_buffer() (i.e. consume another slot) and the worse-case
should assume that will always return true. This patch adds the page_offset
into the DIV_ROUND_UP for each frag.

Unfortunately some frontends aggressively limit the number of requests
they post into the shared ring so to avoid an estimate that is 'too'
pessimal it is capped at MAX_SKB_FRAGS.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:50:34 -04:00
Paul Durrant 0576eddf24 xen-netback: remove pointless clause from if statement
This patch removes a test in start_new_rx_buffer() that checks whether
a copy operation is less than MAX_BUFFER_OFFSET in length, since
MAX_BUFFER_OFFSET is defined to be PAGE_SIZE and the only caller of
start_new_rx_buffer() already limits copy operations to PAGE_SIZE or less.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Reported-By: Sander Eikelenboom <linux@eikelenboom.it>
Tested-By: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:50:34 -04:00
Zoltan Kiss 7aceb47a9d xen-netback: Functional follow-up patch for grant mapping series
Ian made some late comments about the grant mapping series, I incorporated the
functional outcomes into this patch:

- use callback_param macro to shorten access to pending_tx_info in
  xenvif_fill_frags() and xenvif_tx_submit()
- print an error message in xenvif_idx_unmap() before panic

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 16:33:42 -04:00
Zoltan Kiss 0e59a4a553 xen-netback: Non-functional follow-up patch for grant mapping series
Ian made some late comments about the grant mapping series, I incorporated the
non-functional outcomes into this patch:

- typo fixes in a comment of xenvif_free(), and add another one there as well
- typo fix for comment of rx_drain_timeout_msecs
- remove stale comment before calling xenvif_grant_handle_reset()

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 16:33:42 -04:00
Zoltan Kiss 869b9b19b3 xen-netback: Stop using xenvif_tx_pending_slots_available
Since the early days TX stops if there isn't enough free pending slots to
consume a maximum sized (slot-wise) packet. Probably the reason for that is to
avoid the case when we don't have enough free pending slot in the ring to finish
the packet. But if we make sure that the pending ring has the same size as the
shared ring, that shouldn't really happen. The frontend can only post packets
which fit the to the free space of the shared ring. If it doesn't, the frontend
has to stop, as it can only increase the req_prod when the whole packet fits
onto the ring.
This patch avoid using this checking, makes sure the 2 ring has the same size,
and remove a checking from the callback. As now we don't stop the NAPI instance
on this condition, we don't have to wake it up if we free pending slots up.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 16:33:42 -04:00
David S. Miller 2c5f4f8422 xen-netback: Proper printf format for ptrdiff_t is 't'.
This fixes:

drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_dealloc_action’:
drivers/net/xen-netback/netback.c:1573:8: warning: format ‘%x’ expects argument of type ‘unsigned int’, but argument 3 has type ‘long int’ [-Wformat=]

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-25 19:02:16 -04:00
Zoltan Kiss 397dfd9f93 Revert "xen-netback: Aggregate TX unmap operations"
This reverts commit e9275f5e2d. This commit is the
last in the netback grant mapping series, and it tries to do more aggressive
aggreagtion of unmap operations. However practical use showed almost no
positive effect, whilst with certain frontends it causes significant performance
regression.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-25 18:58:07 -04:00
David S. Miller 85dcce7a73 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/r8152.c
	drivers/net/xen-netback/netback.c

Both the r8152 and netback conflicts were simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:31:55 -04:00
Annie Li 5bd0767086 Xen-netback: Fix issue caused by using gso_type wrongly
Current netback uses gso_type to check whether the skb contains
gso offload, and this is wrong. Gso_size is the right one to
check gso existence, and gso_type is only used to check gso type.

Some skbs contains nonzero gso_type and zero gso_size, current
netback would treat these skbs as gso and create wrong response
for this. This also causes ssh failure to domu from other server.

V2: use skb_is_gso function as Paul Durrant suggested

Signed-off-by: Annie Li <annie.li@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-10 21:57:50 -04:00
Zoltan Kiss e9275f5e2d xen-netback: Aggregate TX unmap operations
Unmapping causes TLB flushing, therefore we should make it in the largest
possible batches. However we shouldn't starve the guest for too long. So if
the guest has space for at least two big packets and we don't have at least a
quarter ring to unmap, delay it for at most 1 milisec.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:57:21 -05:00
Zoltan Kiss 093507885a xen-netback: Timeout packets in RX path
A malicious or buggy guest can leave its queue filled indefinitely, in which
case qdisc start to queue packets for that VIF. If those packets came from an
another guest, it can block its slots and prevent shutdown. To avoid that, we
make sure the queue is drained in every 10 seconds.
The QDisc queue in worst case takes 3 round to flush usually.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:57:15 -05:00
Zoltan Kiss e3377f36ca xen-netback: Handle guests with too many frags
Xen network protocol had implicit dependency on MAX_SKB_FRAGS. Netback has to
handle guests sending up to XEN_NETBK_LEGACY_SLOTS_MAX slots. To achieve that:
- create a new skb
- map the leftover slots to its frags (no linear buffer here!)
- chain it to the previous through skb_shinfo(skb)->frag_list
- map them
- copy and coalesce the frags into a brand new one and send it to the stack
- unmap the 2 old skb's pages

It's also introduces new stat counters, which help determine how often the guest
sends a packet with more than MAX_SKB_FRAGS frags.

NOTE: if bisect brought you here, you should apply the series up until
"xen-netback: Timeout packets in RX path", otherwise malicious guests can block
other guests by not releasing their sent packets.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:56:35 -05:00
Zoltan Kiss 1bb332af4c xen-netback: Add stat counters for zerocopy
These counters help determine how often the buffers had to be copied. Also
they help find out if packets are leaked, as if "sent != success + fail",
there are probably packets never freed up properly.

NOTE: if bisect brought you here, you should apply the series up until
"xen-netback: Timeout packets in RX path", otherwise Windows guests can't work
properly and malicious guests can block other guests by not releasing their sent
packets.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:56:35 -05:00
Zoltan Kiss 62bad3199a xen-netback: Remove old TX grant copy definitons and fix indentations
These became obsolete with grant mapping. I've left intentionally the
indentations in this way, to improve readability of previous patches.

NOTE: if bisect brought you here, you should apply the series up until
"xen-netback: Timeout packets in RX path", otherwise Windows guests can't work
properly and malicious guests can block other guests by not releasing their sent
packets.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:56:35 -05:00
Zoltan Kiss f53c3fe8da xen-netback: Introduce TX grant mapping
This patch introduces grant mapping on netback TX path. It replaces grant copy
operations, ditching grant copy coalescing along the way. Another solution for
copy coalescing is introduced in "xen-netback: Handle guests with too many
frags", older guests and Windows can broke before that patch applies.
There is a callback (xenvif_zerocopy_callback) from core stack to release the
slots back to the guests when kfree_skb or skb_orphan_frags called. It feeds a
separate dealloc thread, as scheduling NAPI instance from there is inefficient,
therefore we can't do dealloc from the instance.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:56:35 -05:00
Zoltan Kiss 3e2234b314 xen-netback: Handle foreign mapped pages on the guest RX path
RX path need to know if the SKB fragments are stored on pages from another
domain.
Logically this patch should be after introducing the grant mapping itself, as
it makes sense only after that. But to keep bisectability, I moved it here. It
shouldn't change any functionality here. xenvif_zerocopy_callback and
ubuf_to_vif are just stubs here, they will be introduced properly later on.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:56:35 -05:00
Zoltan Kiss 121fa4b777 xen-netback: Minor refactoring of netback code
This patch contains a few bits of refactoring before introducing the grant
mapping changes:
- introducing xenvif_tx_pending_slots_available(), as this is used several
  times, and will be used more often
- rename the thread to vifX.Y-guest-rx, to signify it does RX work from the
  guest point of view

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:56:34 -05:00
Zoltan Kiss 8f13dd9612 xen-netback: Use skb->cb for pending_idx
Storing the pending_idx at the first byte of the linear buffer never looked
good, skb->cb is a more proper place for this. It also prevents the header to
be directly grant copied there, and we don't have the pending_idx after we
copied the header here, so it's time to change it.
It also introduces helpers for the RX side

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-07 15:56:34 -05:00
Zoltan Kiss 9ab9831b4c xen-netback: Fix Rx stall due to race condition
The recent patch to fix receive side flow control
(11b57f90257c1d6a91cee720151b69e0c2020cf6: xen-netback: stop vif thread
spinning if frontend is unresponsive) solved the spinning thread problem,
however caused an another one. The receive side can stall, if:
- [THREAD] xenvif_rx_action sets rx_queue_stopped to true
- [INTERRUPT] interrupt happens, and sets rx_event to true
- [THREAD] then xenvif_kthread sets rx_event to false
- [THREAD] rx_work_todo doesn't return true anymore

Also, if interrupt sent but there is still no room in the ring, it take quite a
long time until xenvif_rx_action realize it. This patch ditch that two variable,
and rework rx_work_todo. If the thread finds it can't fit more skb's into the
ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's
kept as 0. Then rx_work_todo will check if:
- there is something to send to the ring (like before)
- there is space for the topmost packet in the queue

I think that's more natural and optimal thing to test than two bool which are
set somewhere else.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-05 16:24:08 -08:00
Paul Durrant 2721637c1c xen-netback: use new skb_checksum_setup function
Use skb_checksum_setup to set up partial checksum offsets rather
then a private implementation.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 14:24:19 -08:00
Paul Durrant 11b57f9025 xen-netback: stop vif thread spinning if frontend is unresponsive
The recent patch to improve guest receive side flow control (ca2f09f2) had a
slight flaw in the wait condition for the vif thread in that any remaining
skbs in the guest receive side netback internal queue would prevent the
thread from sleeping. An unresponsive frontend can lead to a permanently
non-empty internal queue and thus the thread will spin. In this case the
thread should really sleep until the frontend becomes responsive again.

This patch adds an extra flag to the vif which is set if the shared ring
is full and cleared when skbs are drained into the shared ring. Thus,
if the thread runs, finds the shared ring full and can make no progress the
flag remains set. If the flag remains set then the thread will sleep,
regardless of a non-empty queue, until the next event from the frontend.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-09 23:05:46 -05:00
David S. Miller 56a4342dfe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
	net/ipv6/ip6_tunnel.c
	net/ipv6/ip6_vti.c

ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.

qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 17:37:45 -05:00
Paul Durrant ac3d5ac277 xen-netback: fix guest-receive-side array sizes
The sizes chosen for the metadata and grant_copy_op arrays on the guest
receive size are wrong;

- The meta array is needlessly twice the ring size, when we only ever
  consume a single array element per RX ring slot
- The grant_copy_op array is way too small. It's sized based on a bogus
  assumption: that at most two copy ops will be used per ring slot. This
  may have been true at some point in the past but it's clear from looking
  at start_new_rx_buffer() that a new ring slot is only consumed if a frag
  would overflow the current slot (plus some other conditions) so the actual
  limit is MAX_SKB_FRAGS grant_copy_ops per ring slot.

This patch fixes those two sizing issues and, because grant_copy_ops grows
so much, it pulls it out into a separate chunk of vmalloc()ed memory.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-29 22:31:30 -05:00
Paul Durrant b89587a7af xen-netback: add gso_segs calculation
netback already has code which parses IPv4 and v6 headers to set up checksum
offsets and these are always applied to GSO packets being sent from
frontends. It's therefore suboptimal that GSOs are being marked
SKB_GSO_DODGY to defer the gso_segs calculation when netback already has all
necessary information to hand to do the calculation. This patch adds that
calculation.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19 15:11:49 -05:00
Wei Yongjun 0c8d087c04 xen-netback: fix some error return code
'err' is overwrited to 0 after maybe_pull_tail() call, so the error
code was not set if skb_partial_csum_set() call failed. Fix to return
error -EPROTO from those error handling case instead of 0.

Fixes: d52eb0d46f ('xen-netback: make sure skb linear area covers checksum field')
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-19 14:58:47 -05:00
David S. Miller 143c905494 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/i40e/i40e_main.c
	drivers/net/macvtap.c

Both minor merge hassles, simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 16:42:06 -05:00
Wei Yongjun 7022ef8b2a xen-netback: fix fragments error handling in checksum_setup_ip()
Fix to return -EPROTO error if fragments detected in checksum_setup_ip().

Fixes: 1431fb31ec ('xen-netback: fix fragment detection in checksum setup')
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 16:18:18 -05:00
Paul Durrant a3314f3d40 xen-netback: fix gso_prefix check
There is a mistake in checking the gso_prefix mask when passing large
packets to a guest. The wrong shift is applied to the bit - the raw skb
gso type is used rather then the translated one. This leads to large packets
being handed to the guest without the GSO metadata. This patch fixes the
check.

The mistake manifested as errors whilst running Microsoft HCK large packet
offload tests between a pair of Windows 8 VMs. I have verified this patch
fixes those errors.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12 15:47:18 -05:00
Paul Durrant d9601a36ff xen-netback: napi: don't prematurely request a tx event
This patch changes the RING_FINAL_CHECK_FOR_REQUESTS in
xenvif_build_tx_gops to a check for RING_HAS_UNCONSUMED_REQUESTS as the
former call has the side effect of advancing the ring event pointer and
therefore inviting another interrupt from the frontend before the napi
poll has actually finished, thereby defeating the point of napi.

The event pointer is updated by RING_FINAL_CHECK_FOR_REQUESTS in
xenvif_poll, the napi poll function, if the work done is less than the
budget i.e. when actually transitioning back to interrupt mode.

Reported-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12 13:35:38 -05:00
Paul Durrant 10574059ce xen-netback: napi: fix abuse of budget
netback seems to be somewhat confused about the napi budget parameter. The
parameter is supposed to limit the number of skbs processed in each poll,
but netback has this confused with grant operations.

This patch fixes that, properly limiting the work done in each poll. Note
that this limit makes sure we do not process any more data from the shared
ring than we intend to pass back from the poll. This is important to
prevent tx_queue potentially growing without bound.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12 13:35:38 -05:00
Paul Durrant d52eb0d46f xen-netback: make sure skb linear area covers checksum field
skb_partial_csum_set requires that the linear area of the skb covers the
checksum field. The checksum setup code in netback was only doing that
pullup in the case when the pseudo header checksum was being recalculated
though. This patch makes that pullup unconditional. (I pullup the whole
transport header just for simplicity; the requirement is only for the check
field but in the case of UDP this is the last field in the header and in the
case of TCP it's the last but one).

The lack of pullup manifested as failures running Microsoft HCK network
tests on a pair of Windows 8 VMs and it has been verified that this patch
fixes the problem.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 16:46:24 -05:00
Paul Durrant ca2f09f2b2 xen-netback: improve guest-receive-side flow control
The way that flow control works without this patch is that, in start_xmit()
the code uses xenvif_count_skb_slots() to predict how many slots
xenvif_gop_skb() will consume and then adds this to a 'req_cons_peek'
counter which it then uses to determine if the shared ring has that amount
of space available by checking whether 'req_prod' has passed that value.
If the ring doesn't have space the tx queue is stopped.
xenvif_gop_skb() will then consume slots and update 'req_cons' and issue
responses, updating 'rsp_prod' as it goes. The frontend will consume those
responses and post new requests, by updating req_prod. So, req_prod chases
req_cons which chases rsp_prod, and can never exceed that value. Thus if
xenvif_count_skb_slots() ever returns a number of slots greater than
xenvif_gop_skb() uses, req_cons_peek will get to a value that req_prod cannot
possibly achieve (since it's limited by the 'real' req_cons) and, if this
happens enough times, req_cons_peek gets more than a ring size ahead of
req_cons and the tx queue then remains stopped forever waiting for an
unachievable amount of space to become available in the ring.

Having two routines trying to calculate the same value is always going to be
fragile, so this patch does away with that. All we essentially need to do is
make sure that we have 'enough stuff' on our internal queue without letting
it build up uncontrollably. So start_xmit() makes a cheap optimistic check
of how much space is needed for an skb and only turns the queue off if that
is unachievable. net_rx_action() is the place where we could do with an
accurate predicition but, since that has proven tricky to calculate, a cheap
worse-case (but not too bad) estimate is all we really need since the only
thing we *must* prevent is xenvif_gop_skb() consuming more slots than are
available.

Without this patch I can trivially stall netback permanently by just doing
a large guest to guest file copy between two Windows Server 2008R2 VMs on a
single host.

Patch tested with frontends in:
- Windows Server 2008R2
- CentOS 6.0
- Debian Squeeze
- Debian Wheezy
- SLES11

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Annie Li <annie.li@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:33:12 -05:00
Paul Durrant 1431fb31ec xen-netback: fix fragment detection in checksum setup
The code to detect fragments in checksum_setup() was missing for IPv4 and
too eager for IPv6. (It transpires that Windows seems to send IPv6 packets
with a fragment header even if they are not a fragment - i.e. offset is zero,
and M bit is not set).

This patch also incorporates a fix to callers of maybe_pull_tail() where
skb->network_header was being erroneously added to the length argument.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
cc: David Miller <davem@davemloft.net>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-05 20:31:40 -05:00
Andy Whitcroft ae5e8127b7 xen-netback: include definition of csum_ipv6_magic
We are now using csum_ipv6_magic, include the appropriate header.
Avoids the following error:

    drivers/net/xen-netback/netback.c:1313:4: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration]
        tcph->check = ~csum_ipv6_magic(&ipv6h->saddr,

Signed-off-by: Andy Whitcroft <apw@canonical.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-28 18:38:06 -05:00
David S. Miller 394efd19d5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be.h
	drivers/net/netconsole.c
	net/bridge/br_private.h

Three mostly trivial conflicts.

The net/bridge/br_private.h conflict was a function signature (argument
addition) change overlapping with the extern removals from Joe Perches.

In drivers/net/netconsole.c we had one change adjusting a printk message
whilst another changed "printk(KERN_INFO" into "pr_info(".

Lastly, the emulex change was a new inline function addition overlapping
with Joe Perches's extern removals.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 13:48:30 -05:00
Wei Liu 059dfa6a93 xen-netback: use jiffies_64 value to calculate credit timeout
time_after_eq() only works if the delta is < MAX_ULONG/2.

For a 32bit Dom0, if netfront sends packets at a very low rate, the time
between subsequent calls to tx_credit_exceeded() may exceed MAX_ULONG/2
and the test for timer_after_eq() will be incorrect. Credit will not be
replenished and the guest may become unable to send packets (e.g., if
prior to the long gap, all credit was exhausted).

Use jiffies_64 variant to mitigate this problem for 32bit Dom0.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jason Luan <jianhai.luan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 00:24:49 -04:00
Paul Durrant 82cada22a0 xen-netback: enable IPv6 TCP GSO to the guest
This patch adds code to handle SKB_GSO_TCPV6 skbs and construct appropriate
extra or prefix segments to pass the large packet to the frontend. New
xenstore flags, feature-gso-tcpv6 and feature-gso-tcpv6-prefix, are sampled
to determine if the frontend is capable of handling such packets.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 15:35:17 -04:00
Paul Durrant a946858768 xen-netback: handle IPv6 TCP GSO packets from the guest
This patch adds a xenstore feature flag, festure-gso-tcpv6, to advertise
that netback can handle IPv6 TCP GSO packets. It creates SKB_GSO_TCPV6 skbs
if the frontend passes an extra segment with the new type
XEN_NETIF_GSO_TYPE_TCPV6 added to netif.h.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 15:35:17 -04:00
Paul Durrant 2eba61d55e xen-netback: add support for IPv6 checksum offload from guest
For performance of VM to VM traffic on a single host it is better to avoid
calculation of TCP/UDP checksum in the sending frontend. To allow this this
patch adds the code necessary to set up partial checksum for IPv6 packets
and xenstore flag feature-ipv6-csum-offload to advertise that fact to
frontends.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-17 15:35:16 -04:00
Wei Liu 33bc801ddd Revert "xen-netback: improve ring effeciency for guest RX"
This reverts commit 4f0581d258.

The named changeset is causing problem. Let's aim to make this part less
fragile before trying to improve things.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Annie Li <annie.li@oracle.com>
Cc: Matt Wilson <msw@amazon.com>
Cc: Xi Xiong <xixiong@amazon.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08 15:10:48 -04:00
Wei Liu 4f0581d258 xen-netback: improve ring effeciency for guest RX
There was a bug that netback routines netbk/xenvif_skb_count_slots and
netbk/xenvif_gop_frag_copy disagreed with each other, which caused
netback to push wrong number of responses to netfront, which caused
netfront to eventually crash. The bug was fixed in 6e43fc04a
("xen-netback: count number required slots for an skb more carefully").

Commit 6e43fc04a focused on backport-ability. The drawback with the
existing packing scheme is that the ring is not used effeciently, as
stated in 6e43fc04a.

skb->data like:
    |        1111|222222222222|3333        |

is arranged as:
    |1111        |222222222222|3333        |

If we can do this:
    |111122222222|22223333    |
That would save one ring slot, which improves ring effeciency.

This patch effectively reverts 6e43fc04a. That patch made count_slots
agree with gop_frag_copy, while this patch goes the other way around --
make gop_frag_copy agree with count_slots. The end result is that they
still agree with each other, and the ring is now arranged like:
    |111122222222|22223333    |

The patch that improves packing was first posted by Xi Xong and Matt
Wilson. I only rebase it on top of net-next and rewrite commit message,
so I retain all their SoBs. For more infomation about the original bug
please refer to email listed below and commit message of 6e43fc04a.

Original patch:
http://lists.xen.org/archives/html/xen-devel/2013-07/msg00760.html

Signed-off-by: Xi Xiong <xixiong@amazon.com>
Reviewed-by: Matt Wilson <msw@amazon.com>
[ msw: minor code cleanups, rewrote commit message, adjusted code
  to count RX slots instead of meta structures ]
Signed-off-by: Matt Wilson <msw@amazon.com>
Cc: Annie Li <annie.li@oracle.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
[ liuw: rebased on top of net-next tree, rewrote commit message, coding
  style cleanup. ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-30 19:14:11 -04:00
David Vrabel 6e43fc04a6 xen-netback: count number required slots for an skb more carefully
When a VM is providing an iSCSI target and the LUN is used by the
backend domain, the generated skbs for direct I/O writes to the disk
have large, multi-page skb->data but no frags.

With some lengths and starting offsets, xen_netbk_count_skb_slots()
would be one short because the simple calculation of
DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the
decisions made by start_new_rx_buffer() which does not guarantee
responses are fully packed.

For example, a skb with length < 2 pages but which spans 3 pages would
be counted as requiring 2 slots but would actually use 3 slots.

skb->data:

    |        1111|222222222222|3333        |

Fully packed, this would need 2 slots:

    |111122222222|22223333    |

But because the 2nd page wholy fits into a slot it is not split across
slots and goes into a slot of its own:

    |1111        |222222222222|3333        |

Miscounting the number of slots means netback may push more responses
than the number of available requests.  This will cause the frontend
to get very confused and report "Too many frags/slots".  The frontend
never recovers and will eventually BUG.

Fix this by counting the number of required slots more carefully.  In
xen_netbk_count_skb_slots(), more closely follow the algorithm used by
xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which
is the dry-run equivalent of netbk_gop_frag_copy().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-12 23:22:13 -04:00
Wei Liu 7376419a46 xen-netback: rename functions
As we move to 1:1 model and melt xen_netbk and xenvif together, it would
be better to use single prefix for all functions in xen-netback.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29 01:18:04 -04:00
Wei Liu b3f980bd82 xen-netback: switch to NAPI + kthread 1:1 model
This patch implements 1:1 model netback. NAPI and kthread are utilized
to do the weight-lifting job:

- NAPI is used for guest side TX (host side RX)
- kthread is used for guest side RX (host side TX)

Xenvif and xen_netbk are made into one structure to reduce code size.

This model provides better scheduling fairness among vifs. It is also
prerequisite for implementing multiqueue for Xen netback.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29 01:18:04 -04:00