The change to reschedule tasklet if more data arrives in ring buffer
can cause performance regression if host timing is such that the
next response happens in small window.
Go back to a modified version of the original looping behavior.
If the race occurs in a small time, then loop. But if the tasklet
has been running for a long interval due to flood, then reschedule
the tasklet to allow migration to ksoftirqd.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change the simple boolean batched_reading into a tri-value.
For future NAPI support in netvsc driver, the callback needs to
occur directly in interrupt handler.
Batched mode is also changed to disable host interrupts immediately
in interrupt routine (to avoid unnecessary host signals), and the
tasklet is rescheduled if more data is detected.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make the event handling tasklet per channel rather than per-cpu.
This allows for better fairness when getting lots of data on the same
cpu.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The hv_context structure had several arrays which were per-cpu
and was allocating small structures (tasklet_struct). Instead use
a single per-cpu array.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use standard kernel operations for find first set bit to traverse
the channel bit array. This has added benefit of speeding up
lookup on 64 bit and because it uses find first set instruction.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As part of the effort to separate out architecture specific code,
extract hypervisor version information in an architecture specific
file.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
DoS protection conditions were altered in WS2016 and now it's easy to get
-EAGAIN returned from vmbus_post_msg() (e.g. when we try changing MTU on a
netvsc device in a loop). All vmbus_post_msg() callers don't retry the
operation and we usually end up with a non-functional device or crash.
While host's DoS protection conditions are unknown to me my tests show that
it can take up to 10 seconds before the message is sent so doing udelay()
is not an option, we really need to sleep. Almost all vmbus_post_msg()
callers are ready to sleep but there is one special case:
vmbus_initiate_unload() which can be called from interrupt/NMI context and
we can't sleep there. I'm also not sure about the lonely
vmbus_send_tl_connect_request() which has no in-tree users but its external
users are most likely waiting for the host to reply so sleeping there is
also appropriate.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a new driver to enable userspace networking on VMBus.
It is based largely on the similar driver that already exists
for PCI, and earlier work done by Brocade to support DPDK.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current delay between retries is unnecessarily high and is negatively
affecting the time it takes to boot the system.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In preparation for moving some ring buffer functionality out of the
vmbus driver, export the API for signaling the host.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
WS2012 R2 and above hosts can support kexec in that thay can support
reconnecting to the host (as would be needed in the kexec path)
on any CPU. Enable this. Pre ws2012 r2 hosts don't have this ability
and consequently cannot support kexec.
Signed-off-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
wait_for_completion() may sleep, it enables interrupts and this
is something we really want to avoid on crashes because interrupt
handlers can cause other crashes. Switch to the recently introduced
vmbus_wait_for_unload() doing busy wait instead.
Reported-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Radim Kr.má<rkrcmar@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The hvsock driver needs this API to release all the resources related
to the channel.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cleanup vmbus_set_event() by inlining the hypercall to post
the event and since the return value of vmbus_set_event() is not checked,
make it void. As part of this cleanup, get rid of the function
hv_signal_event() as it is only callled from vmbus_set_event().
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Force all channel messages to be delivered on CPU0. These messages are not
performance critical and are used during the setup and teardown of the
channel.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
spinlock is unnecessary here.
mutex is enough.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add support for Windows 10.
Signed-off-by: Keith Mange <keith.mange@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Implement the protocol for tearing down the monitor state established with
the host.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Most of the retries can be done within a millisecond successfully, so we
sleep 1ms before the first retry, then gradually increase the retry
interval to 2^n with max value of 2048ms. Doing so, we will have shorter
overall delay time, because most of the cases succeed within 1-2 attempts.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since the 2 fucntions can safely run in vmbus_connection.work_queue without
hang, we don't need to schedule new work items into the per-channel workqueue.
Actally we can even remove the per-channel workqueue now -- we'll do it
in the next patch.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch is a continuation of the rescind handling cleanup work. We cannot
block in the global message handling work context especially if we are blocking
waiting for the host to wake us up. I would like to thank
Dexuan Cui <decui@microsoft.com> for observing this problem.
The current char-next branch is broken and this patch fixes
the bug.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently we log messages when either we are not able to map an ID to a
channel or when the channel does not have a callback associated
(in the channel interrupt handling path). These messages don't add
any value, get rid of them.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I got HV_STATUS_INVALID_CONNECTION_ID on Hyper-V 2008 R2 when keeping running
"rmmod hv_netvsc; modprobe hv_netvsc; rmmod hv_utils; modprobe hv_utils"
in a Linux guest. Looks the host has some kind of throttling mechanism if
some kinds of hypercalls are sent too frequently.
Without the patch, the driver can occasionally fail to load.
Also let's retry HV_STATUS_INSUFFICIENT_MEMORY, though we didn't get it
before.
Removed 'case -ENOMEM', since the hypervisor doesn't return this.
CC: "K. Y. Srinivasan" <kys@microsoft.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need to destroy hv_vmbus_con on module shutdown, otherwise the following
crash is sometimes observed:
[ 76.569845] hv_vmbus: Hyper-V Host Build:9600-6.3-17-0.17039; Vmbus version:3.0
[ 82.598859] BUG: unable to handle kernel paging request at ffffffffa0003480
[ 82.599287] IP: [<ffffffffa0003480>] 0xffffffffa0003480
[ 82.599287] PGD 1f34067 PUD 1f35063 PMD 3f72d067 PTE 0
[ 82.599287] Oops: 0010 [#1] SMP
[ 82.599287] Modules linked in: [last unloaded: hv_vmbus]
[ 82.599287] CPU: 0 PID: 26 Comm: kworker/0:1 Not tainted 3.19.0-rc5_bug923184+ #488
[ 82.599287] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
[ 82.599287] Workqueue: hv_vmbus_con 0xffffffffa0003480
[ 82.599287] task: ffff88007b6ddfa0 ti: ffff88007f8f8000 task.ti: ffff88007f8f8000
[ 82.599287] RIP: 0010:[<ffffffffa0003480>] [<ffffffffa0003480>] 0xffffffffa0003480
[ 82.599287] RSP: 0018:ffff88007f8fbe00 EFLAGS: 00010202
...
To avoid memory leaks we need to free monitor_pages and int_page for
vmbus_connection. Implement vmbus_disconnect() function by separating cleanup
path from vmbus_connect().
As we use hv_vmbus_con to release channels (see free_channel() in channel_mgmt.c)
we need to make sure the work was done before we remove the queue, do that with
drain_workqueue(). We also need to avoid handling messages which can (potentially)
create new channels, so set vmbus_connection.conn_state = DISCONNECTED at the very
beginning of vmbus_exit() and check for that in vmbus_onmessage_work().
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Replace calls for smp_processor_id() to get_cpu() to get the CPU ID of
the current CPU. In these instances, there is no correctness issue with
regards to preemption, we just need the current CPU ID.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Posting messages to the host can fail because of transient resource
related failures. Correctly deal with these failures and increase the
number of attempts to post the message before giving up.
In this version of the patch, I have normalized the error code to
Linux error code.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Starting with Win8, we have implemented several optimizations to improve the
scalability and performance of the VMBUS transport between the Host and the
Guest. Some of the non-performance critical services cannot leverage these
optimization since they only read and process one message at a time.
Make adjustments to the callback dispatch code to account for the way
non-performance critical drivers handle reading of the channel.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We try to free two pages when only one has been allocated.
Cleanup path is unlikely, so I haven't found any trace that would fit,
but I hope that free_pages_prepare() does catch it.
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Amos Kong <akong@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently the mapping of the relID to channel is done under the protection of a
single spin lock. Starting with ws2012, each channel is bound to a specific VCPU
in the guest. Use this binding to eliminate the spin lock by setting up
per-cpu state for mapping relId to the channel.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
By ensuring that we set the callback handler to NULL in the channel close
path on the same CPU that the channel is bound to, we can eliminate this lock
acquisition and release in a performance critical path.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Only ws2012r2 hosts support the ability to reconnect to the host on VMBUS. This functionality
is needed by kexec in Linux. To use this functionality we need to negotiate version 3.0 of the
VMBUS protocol.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org> [3.9+]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the guest attempts to connect with the host when there may already be a
connection with the host (as would be the case during the kdump/kexec path),
it is difficult to guarantee timely response from the host. Starting with
WS2012 R2, the host supports this ability to re-connect with the host
(explicitly to support kexec). Prior to responding to the guest, the host
needs to ensure that device states based on the previous connection to
the host have been properly torn down. This may introduce unbounded delays.
To deal with this issue, don't do a timed wait during the initial connect
with the host.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
During the initial VMBUS connect phase, starting with WS2012 R2, we should
specify the VPCU in the guest that should receive the notification. Fix this
issue. This fix is required to properly connect to the host in the kexeced
kernel.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org> [3.9+]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need/want the mei fixes in here so we can apply other updates that
are depending on them.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 666b9adc80 terminated vmbus
version negotiation incorrectly. We need to terminate the version
negotiation only if the current negotiation were to timeout.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Olaf Hering <ohering@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
monitor_pages was a void pointer, containing an unknown number of arrays that
we just "knew" were a child and parent array of a specific size. Instead of
that implicit knowledge, let's make them a real pointer, allowing us to have
type safety, and a semblance of sane addressing schemes.
Tested-by: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current code would attempt to negotiate a different protocol version if
the current negotiation timed out. This triggers an assert in the host (on debug
builds). Avoid this by negotiating a newer version only if the host properly
rejects the current version being negotiated.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Starting with Win8, the host supports multiple sub-channels for a given
device. As in the past, the initial channel offer specifies the device and
is associated with both the type and the instance GUIDs. For performance
critical devices, the host may support multiple sub-channels. The sub-channels
share the same type and instance GUID as the primary channel. The number of
sub-channels offerrred to the guest depends on the number of virtual CPUs
assigned to the guest. The guest can request the creation of these sub-channels
and once created and opened, the guest can distribute the traffic across all
the channels (the primary and the sub-channels). A request sent on a sub-channel
will have the response delivered on the same sub-channel.
At channel (sub-channel) creation we bind the channel interrupt to a CPU and
with this sub-channel support we will be able to spread the interrupt load
of a given device across all available CPUs.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now, cleanup and consolidate reporting of host and vmbus version numbers.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that we have implemented all of the Win8 (WS2012) functionality, negotiate
Win8 protocol with the host.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Starting with Win8 (WS2012), the event page can be used to directly get the
channel ID that needs servicing. Modify the channel event handling code
to take advantage of this feature.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On win7 (ws2008 R2) and beyond, we have the notion of having dedicated interrupts on
a per-channel basis. When a channel has a dedicated interrupt assigned, there is no need
to set the interrupt bit in the shared page. Implement this optimization.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To support version specific optimization in various vmbus drivers,
move the vmbus definitions to the public header file.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In preparation for supporting a per-connection signaling mechanism,
change the signature of vmbus_set_event(). This change is also
needed to implement other aspects of the signaling optimization.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In preparation for implementing a per-connection signaling framework,
change the signature of the function hv_signal_event(). The current
code uses a global handle for signaling the host.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Export the negotiated vmbus version as this may be useful for
individual drivers.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current code hard coded the vmbus version independent of the host
it was running on. Add code to dynamically negotiate the most appropriate
version.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that we have the infratructure for correctly determining when we
should signal the host; optimize the signaling on the read side -
signaling the guest from the host.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
No one outside of the hyperv core needs to include the asm/hyperv.h
file, so don't put it in the "global" include/linux/hyperv.h file.
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After many years wandering the desert, it is finally time for the
Microsoft HyperV code to move out of the staging directory. Or at least
the core hyperv bus code, and the utility driver, the rest still have
some review to get through by the various subsystem maintainers.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>