Commit Graph

86 Commits

Author SHA1 Message Date
Vitaly Kuznetsov 5abbbb75d7 Drivers: hv: hv_balloon: don't lose memory when onlining order is not natural
Memory blocks can be onlined in random order. When this order is not natural
some memory pages are not onlined because of the redundant check in
hv_online_page().

Here is a real world scenario:
1) Host tries to hot-add the following (process_hot_add):
  pg_start=rg_start=0x48000, pfn_cnt=111616, rg_size=262144

2) This results in adding 4 memory blocks:
[  109.057866] init_memory_mapping: [mem 0x48000000-0x4fffffff]
[  114.102698] init_memory_mapping: [mem 0x50000000-0x57ffffff]
[  119.168039] init_memory_mapping: [mem 0x58000000-0x5fffffff]
[  124.233053] init_memory_mapping: [mem 0x60000000-0x67ffffff]
The last one is incomplete but we have special has->covered_end_pfn counter to
avoid onlining non-backed frames and hv_bring_pgs_online() function to bring
them online later on.

3) Now we have 4 offline memory blocks: /sys/devices/system/memory/memory9-12
$ for f in /sys/devices/system/memory/memory*/state; do echo $f `cat $f`; done | grep -v onlin
/sys/devices/system/memory/memory10/state offline
/sys/devices/system/memory/memory11/state offline
/sys/devices/system/memory/memory12/state offline
/sys/devices/system/memory/memory9/state offline

4) We bring them online in non-natural order:
$grep MemTotal /proc/meminfo
MemTotal:         966348 kB
$echo online > /sys/devices/system/memory/memory12/state && grep MemTotal /proc/meminfo
MemTotal:        1019596 kB
$echo online > /sys/devices/system/memory/memory11/state && grep MemTotal /proc/meminfo
MemTotal:        1150668 kB
$echo online > /sys/devices/system/memory/memory9/state && grep MemTotal /proc/meminfo
MemTotal:        1150668 kB

As you can see memory9 block gives us zero additional memory. We can also
observe a huge discrepancy between host- and guest-reported memory sizes.

The root cause of the issue is the redundant pg >= covered_start_pfn check (and
covered_start_pfn advancing) in hv_online_page(). When upper memory block in
being onlined before the lower one (memory12 and memory11 in the above case) we
advance the covered_start_pfn pointer and all memory9 pages do not pass the
check. If the assumption that host always gives us requests in sequential order
and pg_start always equals rg_start when the first request for the new HA
region is received (that's the case in my testing) is correct than we can get
rid of covered_start_pfn and pg >= start_pfn check in hv_online_page() is
sufficient.

The current char-next branch is broken and this patch fixes
the bug.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 11:53:54 +01:00
Vitaly Kuznetsov f3f6eb8087 Drivers: hv: hv_balloon: keep locks balanced on add_memory() failure
When add_memory() fails the following BUG is observed:
[  743.646107] hv_balloon: hot_add memory failed error is -17
[  743.679973]
[  743.680930] =====================================
[  743.680930] [ BUG: bad unlock balance detected! ]
[  743.680930] 3.19.0-rc5_bug1131426+ #552 Not tainted
[  743.680930] -------------------------------------
[  743.680930] kworker/0:2/255 is trying to release lock (&dm_device.ha_region_mutex) at:
[  743.680930] [<ffffffff81aae5fe>] mutex_unlock+0xe/0x10
[  743.680930] but there are no more locks to release!

This happens as we don't acquire ha_region_mutex and hot_add_req() expects us
to as it does unconditional mutex_unlock(). Acquire the lock on the error path.

The current char-next branch is broken and this patch fixes
the bug.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 11:53:54 +01:00
Vitaly Kuznetsov 530d15b907 Drivers: hv: hv_balloon: refuse to balloon below the floor
When host asks us to balloon up we need to be sure we're not committing suicide
by overballooning. Use already existent 'floor' metric as our lowest possible
value for free ram.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01 19:31:47 -08:00
Vitaly Kuznetsov 549fd280b1 Drivers: hv: hv_balloon: report offline pages as being used
When hot-added memory pages are not brought online or when some memory blocks
are sent offline the subsequent ballooning process kills the guest with OOM
killer. This happens as we don't report these pages as neither used nor free
and apparently host algorithm considers them as being unused. Keep track of
all online/offline operations and report all currently offline pages as being
used so host won't try to balloon them out.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01 19:31:47 -08:00
Vitaly Kuznetsov b05d8d9ef5 Drivers: hv: hv_balloon: eliminate the trylock path in acquire/release_region_mutex
When many memory regions are being added and automatically onlined the
following lockup is sometimes observed:

INFO: task udevd:1872 blocked for more than 120 seconds.
...
Call Trace:
 [<ffffffff816ec0bc>] schedule_timeout+0x22c/0x350
 [<ffffffff816eb98f>] wait_for_common+0x10f/0x160
 [<ffffffff81067650>] ? default_wake_function+0x0/0x20
 [<ffffffff816eb9fd>] wait_for_completion+0x1d/0x20
 [<ffffffff8144cb9c>] hv_memory_notifier+0xdc/0x120
 [<ffffffff816f298c>] notifier_call_chain+0x4c/0x70
...

When several memory blocks are going online simultaneously we got several
hv_memory_notifier() trying to acquire the ha_region_mutex. When this mutex is
being held by hot_add_req() all these competing acquire_region_mutex() do
mutex_trylock, fail, and queue themselves into wait_for_completion(..). However
when we do complete() from release_region_mutex() only one of them wakes up.
This could be solved by changing complete() -> complete_all() memory onlining
can be delayed as well, in that case we can still get several
hv_memory_notifier() runners at the same time trying to grab the mutex.
Only one of them will succeed and the others will hang for forever as
complete() is not being called. We don't see this issue often because we have
5sec onlining timeout in hv_mem_hot_add() and usually all udev events arrive
in this time frame.

Get rid of the trylock path, waiting on the mutex is supposed to provide the
required serialization.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01 19:31:47 -08:00
Nicholas Mc Guire b057b3ad16 hv: hv_balloon: match var type to return type of wait_for_completion
return type of wait_for_completion_timeout is unsigned long not int, this
patch changes the type of t from int to unsigned long.

Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01 19:30:08 -08:00
K. Y. Srinivasan ab3de22bb4 Drivers: hv: hv_balloon: Don't post pressure status from interrupt context
We currently release memory (balloon down) in the interrupt context and we also
post memory status while releasing memory. Rather than posting the status
in the interrupt context, wakeup the status posting thread to post the status.
This will address the inconsistent lock state that Sitsofe Wheeler <sitsofe@gmail.com>
reported:

http://lkml.iu.edu/hypermail/linux/kernel/1411.1/00075.html

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reported-by: Sitsofe Wheeler <sitsofe@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-25 09:17:57 -08:00
K. Y. Srinivasan 22f88475b6 Drivers: hv: hv_balloon: Fix a locking bug in the balloon driver
We support memory hot-add in the Hyper-V balloon driver by hot adding an appropriately
sized and aligned region and controlling the on-lining of pages within that region
based on the pages that the host wants us to online. We do this because the
granularity and alignment requirements in Linux are different from what Windows
expects. The state to manage the onlining of pages needs to be correctly
protected. Fix this bug.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-25 09:17:57 -08:00
K. Y. Srinivasan 79208c57da Drivers: hv: hv_balloon: Make adjustments in computing the floor
Make adjustments in computing the balloon floor. The current computation
of the balloon floor was not appropriate for virtual machines with more than
10 GB of assigned memory - we would get into situations where the host would
agressively balloon down the guest and leave the guest in an unusable state.
This patch fixes the issue by raising the floor.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-25 09:17:57 -08:00
Dexuan Cui f671223847 hv: hv_balloon: avoid memory leak on alloc_error of 2MB memory block
If num_ballooned is not 0, we shouldn't neglect the
already-partially-allocated 2MB memory block(s).

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-26 19:03:04 -08:00
K. Y. Srinivasan ae339336dc Drivers: hv: balloon: Ensure pressure reports are posted regularly
The current code posts periodic memory pressure status from a dedicated thread.
Under some conditions, especially when we are releasing a lot of memory into
the guest, we may not send timely pressure reports back to the host. Fix this
issue by reporting pressure in all contexts that can be active in this driver.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-03 19:25:17 -04:00
K. Y. Srinivasan 5dba4c56df Drivers: hv: Ballon: Make pressure posting thread sleep interruptibly
The non-interruptible sleep of the memory pressure posting thread
results in higher reported load average. Make this sleep interruptible.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-15 12:08:38 -08:00
Olaf Hering cfc25993e8 Drivers: hv: remove HV_DRV_VERSION
Remove HV_DRV_VERSION, it has no meaning for upstream drivers.

Initially it was supposed to show the "Linux Integration Services"
version, now it is not in sync anymore with the out-of-tree drivers
available from the MSFT website.

The only place where a version string is still required is the KVP
command "IntegrationServicesVersion" which is handled by
tools/hv/hv_kvp_daemon.c. To satisfy such KVP request from the host pass
the current string to the daemon during KVP userland registration.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by:  K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-02 11:34:30 +08:00
Greg Kroah-Hartman 9c5891bd43 Merge 3.11-rc3 into char-misc-next.
This resolves a merge issue with:
	drivers/misc/mei/init.c

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-29 11:50:17 -07:00
K. Y. Srinivasan 20138d6cb8 Drivers: hv: balloon: Initialize the transaction ID just before sending the packet
Each message sent from the guest carries with it a transaction ID.
Assign the transaction ID just before putting the message on the VMBUS.
This would help in debugging on the host side.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-26 16:40:42 -07:00
K. Y. Srinivasan c5e2254f8d Drivers: hv: balloon: Do not post pressure status if interrupted
When we are posting pressure status, we may get interrupted and handle
the un-balloon operation. In this case just don't post the status as we
know the pressure status is stale.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-16 23:19:19 -07:00
K. Y. Srinivasan ed07ec93e8 Drivers: hv: balloon: Fix a bug in the hot-add code
As we hot-add 128 MB chunks of memory, we wait to ensure that the memory
is onlined before attempting to hot-add the next chunk. If the udev rule for
memory hot-add is not executed within the allowed time, we would rollback the
state and abort further hot-add. Since the hot-add has succeeded and the only
failure is that the memory is not onlined within the allowed time, we should not
be rolling back the state. Fix this bug.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Stable <stable@vger.kernel.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-16 23:19:19 -07:00
K. Y. Srinivasan 7f4f2302a1 Drivers: hv: Notify the host of permanent hot-add failures
If memory hot-add fails with the error -EEXIST, then this is a permanent
failure. Notify the host of this information, so the host will not attempt
hot-add again. If the failure were a transient failure, host will attempt
a hot-add after some delay.

In this version of the patch, I have added some additional comments
to clarify how the host treats different failure conditions.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-29 09:03:21 -07:00
K. Y. Srinivasan f766dc1ea5 Drivers: hv: balloon: Support 2M page allocations for ballooning
On Hyper-V it will be very efficient to use 2M allocations in the guest as this
makes the ballooning protocol with the host that much more efficient. Hyper-V
uses page ranges (start pfn : number of pages) to specify memory being moved
around and with 2M pages this encoding can be very efficient. However, when
memory is returned to the guest, the host does not guarantee any granularity.
To deal with this issue, split the page soon after a successful 2M allocation
so that this memory can potentially be freed as 4K pages.

If 2M allocations fail, we revert to 4K allocations.

In this version of the patch, based on the feedback from Michal Hocko
<mhocko@suse.cz>, I have added some additional commentary to the patch
description.

Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-29 09:03:21 -07:00
K. Y. Srinivasan 647965a268 Drivers: hv: balloon: Permit Linux to specify hot-add alignment requirements
Some Windows hosts permit the guest to specify memory hot-add alignment
requirements (if any). Linux currently requires a 128MB alignment on memory
segments that can be hot-added. Specify this alignment requirement to the
host.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-29 08:53:12 -07:00
Wei Yongjun a6025a2a86 Drivers: hv: balloon: make local functions static
local functions that could be static.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-25 13:31:24 -07:00
K. Y. Srinivasan 1cac8cd4d1 Drivers: hv: balloon: Implement hot-add functionality
Implement the memory hot-add functionality. With this, Linux guests can fully
participate in the Dynamic Memory protocol implemented in the Windows hosts.

In this version of the patch, based Olaf Herring's feedback, I have gotten
rid of the module level dependency on MEMORY_HOTPLUG. Instead the code within
the driver that depends on MEMORY_HOTPLUG has the appropriate compilation
switches. This would allow this driver to support pure ballooning in cases
where the kernel does not support memory hotplug.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-15 12:11:51 -07:00
K. Y. Srinivasan 0cf40a3e66 Drivers: hv: balloon: Make the balloon driver not unloadable
The balloon driver is stateful. For instance, it needs to keep track of pages
that have been ballooned out to properly post pressure reports. This state cannot
be re-constructed if the driver were to be unloaded and subsequently loaded.
Furthermore, as we support memory hot-add as part of this driver, this driver becomes
even more stateful and this state cannot be re-created. Make the balloon driver
unloadable to deal with this issue.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-15 12:11:50 -07:00
K. Y. Srinivasan c51af826cf Drivers: hv: balloon: Execute hot-add code in a separate context
Execute the hot-add operation in a separate work context.
This allows us to decouple the pressure reporting activity from the
"hot-add" activity. Testing has shown that this makes the guest more
responsive to hot add requests.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-15 12:11:50 -07:00
K. Y. Srinivasan 6571b2dab4 Drivers: hv: balloon: Execute balloon inflation in a separate context
Execute the balloon inflation operation in a separate work context.
This allows us to decouple the pressure reporting activity from the
ballooning activity. Testing has shown that this decoupling makes the
guest more reponsive.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-15 12:11:50 -07:00
K. Y. Srinivasan 7a64b864a0 Drivers: hv: balloon: Do not request completion notification
There is no need to request completion notification; get rid of it.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-15 12:11:50 -07:00
K. Y. Srinivasan 1c7db96f6f Drivers: hv: balloon: Prevent the host from ballooning the guest too low
Based on the amount of memory being managed set a floor on how low the
guest can be ballooned.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-08 15:42:01 -08:00
K. Y. Srinivasan e500d158fb Drivers: hv: balloon: Add a parameter to delay pressure reporting
Delay reporting memory pressure by a specified amount of time.
This addresses the problem where the host may take memory balancing
decisions based on incorrect memory pressure data that will be posted
as soon as the balloon driver is loaded.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-08 15:42:01 -08:00
K. Y. Srinivasan 0731572b6c Drivers: hv: balloon: Make adjustments to the pressure report
The host expects that the pressure report includes the pressure due to the
pages that have been ballooned. Make necessary adjustments to reflect that.
Also, include the free memory information in the pressure report.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-30 00:06:07 -05:00
Greg Kroah-Hartman 74790147fb Merge 3.8-rc5 into char-misc-next
This pulls in all of the 3.8-rc5 fixes into this branch so we can test easier.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-25 12:34:27 -08:00
K. Y. Srinivasan d13984e5c7 Drivers: hv: Use consolidated GUID definitions
Use the consolidated GUID definitions in the util and balloon drivers.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-25 11:17:31 -08:00
K. Y. Srinivasan 33080c1cda Drivers: hv: balloon: Fix a memory leak
The send buffer was being leaked; fix it.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reported-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 11:58:00 -08:00
K. Y. Srinivasan 6427a0d771 Drivers: hv: balloon: Fix a bug in the definition of struct dm_info_msg
There is bug in the definition of struct dm_info_msg. This patch fixes
the definition of this structure and makes the corresponding adjustments.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 11:58:00 -08:00
Wei Yongjun 10d498b13f hv: hv_balloon: remove duplicated include from hv_balloon.c
Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-17 10:41:46 -08:00
Greg Kroah-Hartman 989623c7d6 hv: hv_balloon: mark a function static
This resolves the following sparse warning:

drivers/hv/hv_balloon.c:548:6: sparse: symbol 'free_balloon_pages' was not declared. Should it be static?

Reported-by: Xie ChanglongX <changlongx.xie@intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-21 12:46:40 -08:00
K. Y. Srinivasan 9aa8b50b2b Drivers: hv: Add Hyper-V balloon driver
Add the basic balloon driver. Windows hosts dynamically manage the guest
memory allocation via a combination memory hot add and ballooning. Memory
hot add is used to grow the guest memory upto the maximum memory that can be
allocatted to the guest. Ballooning is used to both shrink as well as expand
up to the max memory. Supporting hot add needs additional support from the
host. We will support hot add when this support is available. For now,
by setting the VM startup memory to the VM  max memory, we can use
ballooning alone to dynamically manage memory allocation amongst
competing guests on a given host.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-15 15:42:09 -08:00