tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.

This patchset is intended to reduce the number of extra system calls
imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
this patchset has demonstrated a system call reduction of about 30%
when coupled with userspace changes.

For applications using epoll, returning sk_err along with the result
of tcp receive zerocopy could remove the need to call
recvmsg()=-EAGAIN after a spurious wakeup.

Consider a multi-threaded application using epoll. A thread may awaken
with EPOLLIN but another thread may already be reading. The
spuriously-awoken thread does not necessarily know that another thread
'won'; rather, it may be possible that it was woken up due to the
presence of an error if there is no data. A zerocopy read receiving 0
bytes thus would need to be followed up by recvmsg to be sure.

Instead, we return sk_err directly with zerocopy, so the application
can avoid this extra system call.

Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
Arjun Roy 2020-02-14 15:30:50 -08:00 committed by David S. Miller
parent c8856c0514
commit 33946518d4
2 changed files with 8 additions and 1 deletions

View File

@ -346,5 +346,6 @@ struct tcp_zerocopy_receive {
__u32 length; /* in/out: number of bytes to map/mapped */ __u32 length; /* in/out: number of bytes to map/mapped */
__u32 recv_skip_hint; /* out: amount of bytes to skip */ __u32 recv_skip_hint; /* out: amount of bytes to skip */
__u32 inq; /* out: amount of bytes in read queue */ __u32 inq; /* out: amount of bytes in read queue */
__s32 err; /* out: socket error */
}; };
#endif /* _UAPI_LINUX_TCP_H */ #endif /* _UAPI_LINUX_TCP_H */

View File

@ -3676,14 +3676,20 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
lock_sock(sk); lock_sock(sk);
err = tcp_zerocopy_receive(sk, &zc); err = tcp_zerocopy_receive(sk, &zc);
release_sock(sk); release_sock(sk);
if (len == sizeof(zc))
goto zerocopy_rcv_sk_err;
switch (len) { switch (len) {
case sizeof(zc): case offsetofend(struct tcp_zerocopy_receive, err):
goto zerocopy_rcv_sk_err;
case offsetofend(struct tcp_zerocopy_receive, inq): case offsetofend(struct tcp_zerocopy_receive, inq):
goto zerocopy_rcv_inq; goto zerocopy_rcv_inq;
case offsetofend(struct tcp_zerocopy_receive, length): case offsetofend(struct tcp_zerocopy_receive, length):
default: default:
goto zerocopy_rcv_out; goto zerocopy_rcv_out;
} }
zerocopy_rcv_sk_err:
if (!err)
zc.err = sock_error(sk);
zerocopy_rcv_inq: zerocopy_rcv_inq:
zc.inq = tcp_inq_hint(sk); zc.inq = tcp_inq_hint(sk);
zerocopy_rcv_out: zerocopy_rcv_out: