This is quite a bit of code that logically depends here since
it has to deal with all the remain-on-channel logic.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If a mgmt-tx operation is aborted before it runs, the wrong
cookie is reported back to userspace, and the ack_skb gets
leaked since the frame is freed directly instead of freeing
it using ieee80211_free_txskb(). Fix that.
Fixes: 3b79af973c ("mac80211: stop using pointers as userspace cookies")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If some code stops the queues more times than having started
(for when refcounting is used), warn on and reset the counter
to 0 to avoid blocking forever.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When freeing the TX skb for an off-channel TX, use the correct
API to also free the ACK skb that might have been allocated.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When a new station is added to AP/GO interfaces the default behaviour
is for it to be added authenticated and associated, due to backwards
compatibility. To prevent that, the driver must be able to do that
(setting the NL80211_FEATURE_FULL_AP_CLIENT_STATE feature flag) and
userspace must set the flag mask to auth|assoc and clear the set.
Handle this quirk in the API entirely in nl80211, and always push the
full flags to the drivers. NL80211_FEATURE_FULL_AP_CLIENT_STATE is
still required for userspace to be allowed to set the mask including
those bits, but after checking that add both flags to the mask and
set in case userspace didn't set them otherwise.
This obsoletes the mac80211 code handling this difference, no other
driver is currently using these flags.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fix nl80211_set_station() to use the value of NL80211_ATTR_STA_AID
attribute instead of NL80211_ATTR_PEER_AID attribute.
Signed-off-by: Ayala Beker <ayala.beker@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This commit adds implementation for abort scan in mac80211.
Reviewed-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Vidyullatha Kanchanapally <vkanchan@qti.qualcomm.com>
Signed-off-by: Sunil Dutt <usdutt@qti.qualcomm.com>
[adjust to wdev change in previous patch and clean up code a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Implement new functionality for aborting an ongoing scan.
Add NL80211_CMD_ABORT_SCAN to the nl80211 interface. After
aborting the scan, driver shall provide the scan status by
calling cfg80211_scan_done().
Reviewed-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Vidyullatha Kanchanapally <vkanchan@qti.qualcomm.com>
Signed-off-by: Sunil Dutt <usdutt@qti.qualcomm.com>
[change command to take wdev instead of netdev so that it
can be used on p2p-device scans]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add new VIF flag, that will allow get NOA update
notification when driver will request this, even
this is not pure P2P vif (eg. STA vif).
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We had another change to fix this in mac80211, but the hwsim
"hardware" scan should also be fixed. Obviously this one isn't
important since it's not real hardware, but we'd better be
consistent.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Last caller of this function was removed in 3.17 in commit
97dc94f1d9.
Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
add ieee80211_iter_keys_rcu() to iterate over uploaded
keys in atomic context (when rcu is locked)
The station removal code removes the keys only after
calling synchronize_net(), so it's not safe to iterate
the keys at this point (and postponing the actual key
deletion with call_rcu() might result in some
badly-ordered ops calls).
Add a flag to indicate a station is being removed,
and skip the configured keys if it's set.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This can happen when the driver needs to send less frames
than expected and then needs to close the SP.
Mac80211 still needs to set the more_data properly based
on its buffer state (ps_tx_buffer and buffered frames on
other TIDs).
To that end, refactor the code that delivers frames upon
uAPSD trigger frames to be able to get only the more_data
bit without actually delivering those frames in case the
driver is just asking to set a NDP with EOSP and MORE_DATA
bit properly set.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In this attribute's documentation, it was not clear whether the delay
started counting when WoWLAN net-detect was enabled or when the system
was suspended. The correct answer is that it starts when the system
suspends (which is when, in practice, the scan is scheduled). Clarify
that in the nl80211.h documentation.
Suggested-by: Samuel Tan <samueltan@google.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This really should never happen except very early in the process
of bringing up a new driver, at which point you'll have to add
more debugging in the driver and this string isn't useful. Remove
it and save some size (when it's even compiled in.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This indicates a driver key selection issue, but even then there's
no point in printing it all the time, so ratelimit it. Also remove
the priv pointer from it -- people debugging will only have a single
device anyway and it's useless as anything but a cookie.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's no point in printing the mpath pointer since it can't
be used for anything - print the MAC address instead (like in
the forwarding case.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The function is a very simple wrapper around another one,
just adds a few default parameters, so replace it with a
static inline instead of using EXPORT_SYMBOL, reducing
the module size slightly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Complete the tracepoint with the missing data - it's not printed
by default (a lot of it is dynamic arrays) but will be recorded
and be available during post-processing.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some devices or drivers cannot deal with having the same station
address for different virtual interfaces, say as a client to two
virtual AP interfaces. Rather than requiring each driver with a
limitation like that to enforce it, add a hardware flag for it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
I want to get the full off-channel bugfix since later code depends on
it, as well as the AP client state change so I can revert it correctly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It is not necessary to use two brackets. As such, the redudant brackets
are removed.
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/renesas/ravb_main.c
kernel/bpf/syscall.c
net/ipv4/ipmr.c
All three conflicts were cases of overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"A lot of Thanksgiving turkey leftovers accumulated, here goes:
1) Fix bluetooth l2cap_chan object leak, from Johan Hedberg.
2) IDs for some new iwlwifi chips, from Oren Givon.
3) Fix rtlwifi lockups on boot, from Larry Finger.
4) Fix memory leak in fm10k, from Stephen Hemminger.
5) We have a route leak in the ipv6 tunnel infrastructure, fix from
Paolo Abeni.
6) Fix buffer pointer handling in arm64 bpf JIT,f rom Zi Shen Lim.
7) Wrong lockdep annotations in tcp md5 support, fix from Eric
Dumazet.
8) Work around some middle boxes which prevent proper handling of TCP
Fast Open, from Yuchung Cheng.
9) TCP repair can do huge kmalloc() requests, build paged SKBs
instead. From Eric Dumazet.
10) Fix msg_controllen overflow in scm_detach_fds, from Daniel
Borkmann.
11) Fix device leaks on ipmr table destruction in ipv4 and ipv6, from
Nikolay Aleksandrov.
12) Fix use after free in epoll with AF_UNIX sockets, from Rainer
Weikusat.
13) Fix double free in VRF code, from Nikolay Aleksandrov.
14) Fix skb leaks on socket receive queue in tipc, from Ying Xue.
15) Fix ifup/ifdown crach in xgene driver, from Iyappan Subramanian.
16) Fix clearing of persistent array maps in bpf, from Daniel
Borkmann.
17) In TCP, for the cross-SYN case, we don't initialize tp->copied_seq
early enough. From Eric Dumazet.
18) Fix out of bounds accesses in bpf array implementation when
updating elements, from Daniel Borkmann.
19) Fill gaps in RCU protection of np->opt in ipv6 stack, from Eric
Dumazet.
20) When dumping proxy neigh entries, we have to accomodate NULL
device pointers properly, from Konstantin Khlebnikov.
21) SCTP doesn't release all ipv6 socket resources properly, fix from
Eric Dumazet.
22) Prevent underflows of sch->q.qlen for multiqueue packet
schedulers, also from Eric Dumazet.
23) Fix MAC and unicast list handling in bnxt_en driver, from Jeffrey
Huang and Michael Chan.
24) Don't actively scan radar channels, from Antonio Quartulli"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (110 commits)
net: phy: reset only targeted phy
bnxt_en: Setup uc_list mac filters after resetting the chip.
bnxt_en: enforce proper storing of MAC address
bnxt_en: Fixed incorrect implementation of ndo_set_mac_address
net: lpc_eth: remove irq > NR_IRQS check from probe()
net_sched: fix qdisc_tree_decrease_qlen() races
openvswitch: fix hangup on vxlan/gre/geneve device deletion
ipv4: igmp: Allow removing groups from a removed interface
ipv6: sctp: implement sctp_v6_destroy_sock()
arm64: bpf: add 'store immediate' instruction
ipv6: kill sk_dst_lock
ipv6: sctp: add rcu protection around np->opt
net/neighbour: fix crash at dumping device-agnostic proxy entries
sctp: use GFP_USER for user-controlled kmalloc
sctp: convert sack_needed and sack_generation to bits
ipv6: add complete rcu protection around np->opt
bpf: fix allocation warnings in bpf maps and integer overflow
mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0
net: mvneta: enable setting custom TX IP checksum limit
net: mvneta: fix error path for building skb
...
Pull block fixes from Jens Axboe:
"A collection of fixes from this series. The most important here is a
regression fix for an issue that some folks would hit in blk-merge.c,
and the NVMe queue depth limit for the screwed up Apple "nvme"
controller.
In more detail, this pull request contains:
- a set of fixes for null_blk, including a fix for a few corner cases
where we could hang the device. From Arianna and Paolo.
- lightnvm:
- A build improvement from Keith.
- Update the qemu pci id detection from Matias.
- Error handling fixes for leaks and other little fixes from
Sudip and Wenwei.
- fix from Eric where BLKRRPART would not return EBUSY for whole
device mounts, only when partitions were mounted.
- fix from Jan Kara, where EOF O_DIRECT reads would return
negatively.
- remove check for rq_mergeable() when checking limits for cloned
requests. The check doesn't make any sense. It's assuming that
since NOMERGE is set on the request that we don't have to
recalculate limits since the request didn't change, but that's not
true if the request has been redirected. From Hannes.
- correctly get the bio front segment value set for single segment
bio's, fixing a BUG() in blk-merge. From Ming"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: temporary fix for Apple controller reset
null_blk: change type of completion_nsec to unsigned long
null_blk: guarantee device restart in all irq modes
null_blk: set a separate timer for each command
blk-merge: fix computing bio->bi_seg_front_size in case of single segment
direct-io: Fix negative return from dio read beyond eof
block: Always check queue limits for cloned requests
lightnvm: missing nvm_lock acquire
lightnvm: unconverted ppa returned in get_bb_tbl
lightnvm: refactor and change vendor id for qemu
lightnvm: do device max sectors boundary check first
lightnvm: fix ioctl memory leaks
lightnvm: free memory when gennvm register fails
lightnvm: Simplify config when disabled
Return EBUSY from BLKRRPART for mounted whole-dev fs
events on pids. It filters all events where only tasks with their pid in that
file exists. It also handles the sched_switch and sched_wakeup trace events
where the current task does not have its pid in the file, but the task
either being switched to or awaken does.
Unfortunately, I forgot about sched_wakeup_new and sched_waking. Both of
these tracepoints use the same class as the sched_wakeup tracepoint, and
they too should be included in what gets filtered by the set_event_pid file.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWX7XpAAoJEKKk/i67LK/8G6gH/0W9QxCt/iS+J+x3gAPPs+/9
jtwZgAOWjq+118ZpWORtgRcoH2r/sJUNwlXqhMojAHsPlwZsr6TXkWJkgyNdZZ7B
QdUtZrr+egGYvd7TE0ONi/XrLTe9VLtBQsh5pN7l9fF9TjxYUmu5V9LplH9z0RxW
Hw8EzqGzG2iZnXYCnErtu5jRLmr18f2u9aUptPAc4bYPLVUUw9M9MqRV/ZwQxsaX
1mfIoR5SVC5IWW/R07qjULlbFpvNXkVJ56HwXMVBN44mYz3eUGYBKzjyAJ0Ugymf
CNDPzh4HgVFsEqDedr0D8T5WZNJSUErdbHVSWze+CCUNfYikvU7gNmxNJ89Q3P4=
=LsnU
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"During the merge window I added a new file that is used to filter
trace events on pids. It filters all events where only tasks with
their pid in that file exists. It also handles the sched_switch and
sched_wakeup trace events where the current task does not have its pid
in the file, but the task either being switched to or awaken does.
Unfortunately, I forgot about sched_wakeup_new and sched_waking. Both
of these tracepoints use the same class as the sched_wakeup
tracepoint, and they too should be included in what gets filtered by
the set_event_pid file"
* tag 'trace-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Add sched_wakeup_new and sched_waking tracepoints for pid filter
* fix scanning in mac80211 to not actively scan radar
channels (from Antonio)
* fix uninitialized variable in remain-on-channel that
could lead to treating frame TX as remain-on-channel
and not sending the frame at all
* remove NL80211_FEATURE_FULL_AP_CLIENT_STATE again, it
was broken and needs more work, we'll enable it later
* fix call_rcu() induced use-after-reset/free in mesh
(that was suddenly causing issues in certain tests)
* always request block-ack window size 64 as we found
some APs will otherwise crash (really ...)
* fix P2P-Device teardown sequence to avoid restarting
with uninitialized data
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJWX2JpAAoJEGt7eEactAAd+9cQAJmn3zt0orj/sASv7BeF0h5d
sRfAhkBVOTZur8MgVj1c7fNzT3h1HYNei6c4SA2+rphy6Vbifoli1nLNloC+1Ld4
2WXllEqVe473GqVofCxZHsYZPr2Inmhj7uMiDqvoKUiSRz7phmkY9m+Vju6WZG/W
F6FrTLqFS7UDHIYNYH1DNVSScd/89Gu6pHZvEpoHkrsvt5rEEZiPAQ7sDB4MAMSm
amETtuBqgX83gHR2G4UT2Z9r8TVdzhO+s7vvdVjj0qbP6C6BaS9IUXDjmm3gOvHy
G7j9MJuUC8w/2fZ5A5/l94OuN5rF/ZFMNkn2e6OIzg0HjEZh74CeLl21CnuxdNpB
ECmDVbKoI3OVoFbhEl7P5fBokzZsqhAXpZOmbYEeFRyO6lF2Mv9uzttsF6EOmCX0
BjIoEXOWA2o6IUD/8M6NjW+/B58SDDVi9Mg6D+7Dn7rUFlQ4pddjb0m94bI8GQQU
wl7gROMvYR3tIhiMs1bLF9jJgA831WGWu9eiq8mT2kHPaEV2bFO7OK+SUxyZu1M7
UhN4eoLpU84v9QNJ34N8RCiYxEZ1e6HQxBwQn/fDIWOjOHryZoArhicFY9aOEja4
9xBI9OJhBWOL4N4AFdmTExBdYudSgCTpX+/gQ4tSfedz3lqF79y8+PILwv6E1Q6D
8pScH/4pVo4v5omGaMpA
=vui7
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2015-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A small set of fixes for 4.4:
* fix scanning in mac80211 to not actively scan radar
channels (from Antonio)
* fix uninitialized variable in remain-on-channel that
could lead to treating frame TX as remain-on-channel
and not sending the frame at all
* remove NL80211_FEATURE_FULL_AP_CLIENT_STATE again, it
was broken and needs more work, we'll enable it later
* fix call_rcu() induced use-after-reset/free in mesh
(that was suddenly causing issues in certain tests)
* always request block-ack window size 64 as we found
some APs will otherwise crash (really ...)
* fix P2P-Device teardown sequence to avoid restarting
with uninitialized data
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Better to just warn the user that something really odd is going on and
continue to run.
Suggested-by: Or Gerlitz <gerlitz.or@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible to address another chip on same MDIO bus. The case is
correctly handled for media advertising. It is taken into account
only if mii_data->phy_id == phydev->addr. However, this condition
was missing for reset case.
Signed-off-by: Jérôme Pouiller <jezz@sysmic.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Typically we return error pointers when we want to use those
pointers in the non-error case, but this function is just
returning error pointers or NULL for success. Change the style to
plain int to follow normal kernel coding styles.
Cc: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5405ff6e15 ("tipc: convert node lock to rwlock")
introduced a bug to the node reference counter handling. When a
message is successfully sent in the function tipc_node_xmit(),
we return directly after releasing the node lock, instead of
continuing and decrementing the node reference counter as we
should do.
This commit fixes this bug.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stas Sergeev says:
====================
mvneta: implement ethtool autonegotiation control
These 2 patches add an ability to control the
autonegotiation via ethtool. For example:
ethtool -s eth0 autoneg off
ethtool -s eth0 autoneg on
This is needed if you want to connect the mvneta's MII
to different switches or PHYs: the ones the do support
the in-band status, and the ones that do not.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows to do
ethtool -s eth0 autoneg off
ethtool -s eth0 autoneg on
to disable or enable autonegotiation at run-time.
Without that functionality, the only way to control the autonegotiation
is to modify the device tree.
This is needed if you plan to use the same kernel with
different ethernet switches, the ones that support the in-band
status and the ones that not.
CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This moves autoneg-related bit manipulations to the single place.
CC: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Stas Sergeev <stsp@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These new helpers simplify implementing multi-driver modules and
properly handle failure to register one driver by unregistering all
previously registered drivers.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Register PF_PPPOX with pppox module rather than with pppoe,
so that pppoe doesn't get loaded for any PF_PPPOX socket.
* Register PX_PROTO_* with standard MODULE_ALIAS_NET_PF_PROTO()
instead of using pppox's own naming scheme.
* While there, add auto-loading feature for pptp.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: set mac address and uc_list bug fixes.
Fix ndo_set_mac_address() for PF and VF.
Re-apply uc_list after chip reset.
v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Call bnxt_cfg_rx_mode() in bnxt_init_chip() to setup uc_list and
mc_list mac address filters. Before the patch, uc_list is not
setup again after chip reset (such as ethtool ring size change)
and macvlans don't work any more after that.
Modify bnxt_cfg_rx_mode() to return error codes appropriately so
that the init chip sequence can detect any failures.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For PF, the bp->pf.mac_addr always holds the permanent MAC
addr assigned by the HW. For VF, the bp->vf.mac_addr always
holds the administrator assigned VF MAC addr. The random
generated VF MAC addr should never get stored to bp->vf.mac_addr.
This way, when the VF wants to change the MAC address, we can tell
if the adminstrator has already set it and disallow the VF from
changing it.
v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing ndo_set_mac_address only copies the new MAC addr
and didn't set the new MAC addr to the HW. The correct way is
to delete the existing default MAC filter from HW and add
the new one. Because of RFS filters are also dependent on the
default mac filter l2 context, the driver must go thru
close_nic() to delete the default MAC and RFS filters, then
open_nic() to set the default MAC address to HW.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Hajnoczi says:
====================
Add virtio transport for AF_VSOCK
v2:
* Rebased onto Linux v4.4-rc2
* vhost: Refuse to assign reserved CIDs
* vhost: Refuse guest CID if already in use
* vhost: Only accept correctly addressed packets (no spoofing!)
* vhost: Support flexible rx/tx descriptor layout
* vhost: Add missing total_tx_buf decrement
* virtio_transport: Fix total_tx_buf accounting
* virtio_transport: Add virtio_transport global mutex to prevent races
* common: Notify other side of SOCK_STREAM disconnect (fixes shutdown
semantics)
* common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
* common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
* common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
* common: Fix peer_buf_alloc inheritance on child socket
This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).
AF_VSOCK is designed for communication between virtual machines and
hypervisors. It is currently only implemented for VMware's VMCI transport.
This series implements the proposed virtio-vsock device specification from
here:
http://comments.gmane.org/gmane.comp.emulators.virtio.devel/855
Most of the work was done by Asias He and Gerd Hoffmann a while back. I have
picked up the series again.
The QEMU userspace changes are here:
https://github.com/stefanha/qemu/commits/vsock
Why virtio-vsock?
-----------------
Guest<->host communication is currently done over the virtio-serial device.
This makes it hard to port sockets API-based applications and is limited to
static ports.
virtio-vsock uses the sockets API so that applications can rely on familiar
SOCK_STREAM and SOCK_DGRAM semantics. Applications on the host can easily
connect to guest agents because the sockets API allows multiple connections to
a listen socket (unlike virtio-serial). This simplifies the guest<->host
communication and eliminates the need for extra processes on the host to
arbitrate virtio-serial ports.
Overview
--------
This series adds 3 pieces:
1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko
2. virtio_transport.ko - guest driver
3. drivers/vhost/vsock.ko - host driver
Howto
-----
The following kernel options are needed:
CONFIG_VSOCKETS=y
CONFIG_VIRTIO_VSOCKETS=y
CONFIG_VIRTIO_VSOCKETS_COMMON=y
CONFIG_VHOST_VSOCK=m
Launch QEMU as follows:
# qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3
Guest and host can communicate via AF_VSOCK sockets. The host's CID (address)
is 2 and the guest is automatically assigned a CID (use VMADDR_CID_ANY (-1) to
bind to it).
Status
------
There are a few design changes I'd like to make to the virtio-vsock device:
1. The 3-way handshake isn't necessary over a reliable transport (virtqueue).
Spoofing packets is also impossible so the security aspects of the 3-way
handshake (including syn cookie) add nothing. The next version will have a
single operation to establish a connection.
2. Credit-based flow control doesn't work for SOCK_DGRAM since multiple clients
can transmit to the same listen socket. There is no way for the clients to
coordinate buffer space with each other fairly. The next version will drop
credit-based flow control for SOCK_DGRAM and only rely on best-effort
delivery. SOCK_STREAM still has guaranteed delivery.
3. In the next version only the host will be able to establish connections
(i.e. to connect to a guest agent). This is for security reasons since
there is currently no ability to provide host services only to certain
guests. This also matches how AF_VSOCK works on modern VMware hypervisors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable virtio-vsock and vhost-vsock.
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>