When the guest schedules a SIE with a FORMAT-0 CRYCB,
we are able to schedule it in the host with a FORMAT-1
CRYCB if the host uses FORMAT-1 or FORMAT-0.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Message-Id: <20180925231641.4954-22-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
When the host and the guest both use a FORMAT-0 CRYCB,
we copy the guest's FORMAT-0 APCB to a shadow CRYCB
for use by vSIE.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Message-Id: <20180925231641.4954-21-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
When the host and guest both use a FORMAT-1 CRYCB, we copy
the guest's FORMAT-0 APCB to a shadow CRYCB for use by
vSIE.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Message-Id: <20180925231641.4954-20-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
When the guest and the host both use CRYCB FORMAT-2,
we copy the guest's FORMAT-1 APCB to a FORMAT-1
shadow APCB.
This patch also cleans up the shadow_crycb() function.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Message-Id: <20180925231641.4954-19-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The comment preceding the shadow_crycb function is
misleading, we effectively accept FORMAT2 CRYCB in the
guest.
When using FORMAT2 in the host we do not need to or with
FORMAT1.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180925231641.4954-18-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
We need to handle the validity checks for the crycb, no matter what the
settings for the keywrappings are. So lets move the keywrapping checks
after we have done the validy checks.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180925231641.4954-17-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
When we clear the Crypto Control Block (CRYCB) used by a guest
level 2, the vSIE shadow CRYCB for guest level 3 must be updated
before the guest uses it.
We achieve this by using the KVM_REQ_VSIE_RESTART synchronous
request for each vCPU belonging to the guest to force the reload
of the shadow CRYCB before rerunning the guest level 3.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Message-Id: <20180925231641.4954-16-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Implements the VFIO_DEVICE_RESET ioctl. This ioctl zeroizes
all of the AP queues assigned to the guest.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Tested-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20180925231641.4954-15-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Let's call PAPQ(ZAPQ) to zeroize a queue for each queue configured
for a mediated matrix device when it is released.
Zeroizing a queue resets the queue, clears all pending
messages for the queue entries and disables adapter interruptions
associated with the queue.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20180925231641.4954-14-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Adds support for the VFIO_DEVICE_GET_INFO ioctl to the VFIO
AP Matrix device driver. This is a minimal implementation,
as vfio-ap does not use I/O regions.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Tested-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20180925231641.4954-13-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Implements the open callback on the mediated matrix device.
The function registers a group notifier to receive notification
of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
the vfio_ap device driver will get access to the guest's
kvm structure. The open callback must ensure that only one
mediated device shall be opened per guest.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Tested-by: Pierre Morel <pmorel@linux.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20180925231641.4954-12-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Introduces a new KVM function to clear the APCB0 and APCB1 in the guest's
CRYCB. This effectively clears all bits of the APM, AQM and ADM masks
configured for the guest. The VCPUs are taken out of SIE to ensure the
VCPUs do not get out of sync.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Tested-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20180925231641.4954-11-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Provides a sysfs interface to view the AP matrix configured for the
mediated matrix device.
The relevant sysfs structures are:
/sys/devices/vfio_ap/matrix/
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
...............[$uuid]
.................. matrix
To view the matrix configured for the mediated matrix device,
print the matrix file:
cat matrix
Below are examples of the output from the above command:
Example 1: Adapters and domains assigned
Assignments:
Adapters 5 and 6
Domains 4 and 71 (0x47)
Output
05.0004
05.0047
06.0004
06.0047
Examples 2: Only adapters assigned
Assignments:
Adapters 5 and 6
Output:
05.
06.
Examples 3: Only domains assigned
Assignments:
Domains 4 and 71 (0x47)
Output:
.0004
.0047
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Tested-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20180925231641.4954-10-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Provides the sysfs interfaces for:
1. Assigning AP control domains to the mediated matrix device
2. Unassigning AP control domains from a mediated matrix device
3. Displaying the control domains assigned to a mediated matrix
device
The IDs of the AP control domains assigned to the mediated matrix
device are stored in an AP domain mask (ADM). The bits in the ADM,
from most significant to least significant bit, correspond to
AP domain numbers 0 to 255. On some systems, the maximum allowable
domain number may be less than 255 - depending upon the host's
AP configuration - and assignment may be rejected if the input
domain ID exceeds the limit.
When a control domain is assigned, the bit corresponding its domain
ID will be set in the ADM. Likewise, when a domain is unassigned,
the bit corresponding to its domain ID will be cleared in the ADM.
The relevant sysfs structures are:
/sys/devices/vfio_ap/matrix/
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
...............[$uuid]
.................. assign_control_domain
.................. unassign_control_domain
To assign a control domain to the $uuid mediated matrix device's
ADM, write its domain number to the assign_control_domain file.
To unassign a domain, write its domain number to the
unassign_control_domain file. The domain number is specified
using conventional semantics: If it begins with 0x the number
will be parsed as a hexadecimal (case insensitive) number;
if it begins with 0, it is parsed as an octal number;
otherwise, it will be parsed as a decimal number.
For example, to assign control domain 173 (0xad) to the mediated
matrix device $uuid:
echo 173 > assign_control_domain
or
echo 0255 > assign_control_domain
or
echo 0xad > assign_control_domain
To unassign control domain 173 (0xad):
echo 173 > unassign_control_domain
or
echo 0255 > unassign_control_domain
or
echo 0xad > unassign_control_domain
The assignment will be rejected if the APQI exceeds the maximum
value for an AP domain:
* If the AP Extended Addressing (APXA) facility is installed,
the max value is 255
* Else the max value is 15
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Tested-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20180925231641.4954-9-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Introduces two new sysfs attributes for the VFIO mediated
matrix device for assigning AP domains to and unassigning
AP domains from a mediated matrix device. The IDs of the
AP domains assigned to the mediated matrix device will be
stored in an AP queue mask (AQM).
The bits in the AQM, from most significant to least
significant bit, correspond to AP queue index (APQI) 0 to
255 (note that an APQI is synonymous with with a domain ID).
On some systems, the maximum allowable domain number may be
less than 255 - depending upon the host's AP configuration -
and assignment may be rejected if the input domain ID exceeds
the limit.
When a domain is assigned, the bit corresponding to the APQI
will be set in the AQM. Likewise, when a domain is unassigned,
the bit corresponding to the APQI will be cleared from the AQM.
In order to successfully assign a domain, the APQNs derived from
the domain ID being assigned and the adapter numbers of all
adapters previously assigned:
1. Must be bound to the vfio_ap device driver.
2. Must not be assigned to any other mediated matrix device.
If there are no adapters assigned to the mdev, then there must
be an AP queue bound to the vfio_ap device driver with an
APQN containing the domain ID (i.e., APQI), otherwise all
adapters subsequently assigned will fail because there will be no
AP queues bound with an APQN containing the APQI.
Assigning or un-assigning an AP domain will also be rejected if
a guest using the mediated matrix device is running.
The relevant sysfs structures are:
/sys/devices/vfio_ap/matrix/
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
...............[$uuid]
.................. assign_domain
.................. unassign_domain
To assign a domain to the $uuid mediated matrix device,
write the domain's ID to the assign_domain file. To
unassign a domain, write the domain's ID to the
unassign_domain file. The ID is specified using
conventional semantics: If it begins with 0x, the number
will be parsed as a hexadecimal (case insensitive) number;
if it begins with 0, it will be parsed as an octal number;
otherwise, it will be parsed as a decimal number.
For example, to assign domain 173 (0xad) to the mediated matrix
device $uuid:
echo 173 > assign_domain
or
echo 0255 > assign_domain
or
echo 0xad > assign_domain
To unassign domain 173 (0xad):
echo 173 > unassign_domain
or
echo 0255 > unassign_domain
or
echo 0xad > unassign_domain
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Tested-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20180925231641.4954-8-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Introduces two new sysfs attributes for the VFIO mediated
matrix device for assigning AP adapters to and unassigning
AP adapters from a mediated matrix device. The IDs of the
AP adapters assigned to the mediated matrix device will be
stored in an AP mask (APM).
The bits in the APM, from most significant to least significant
bit, correspond to AP adapter IDs (APID) 0 to 255. On
some systems, the maximum allowable adapter number may be less
than 255 - depending upon the host's AP configuration - and
assignment may be rejected if the input adapter ID exceeds the
limit.
When an adapter is assigned, the bit corresponding to the APID
will be set in the APM. Likewise, when an adapter is
unassigned, the bit corresponding to the APID will be cleared
from the APM.
In order to successfully assign an adapter, the APQNs derived from
the adapter ID being assigned and the queue indexes of all domains
previously assigned:
1. Must be bound to the vfio_ap device driver.
2. Must not be assigned to any other mediated matrix device
If there are no domains assigned to the mdev, then there must
be an AP queue bound to the vfio_ap device driver with an
APQN containing the APID, otherwise all domains
subsequently assigned will fail because there will be no
AP queues bound with an APQN containing the adapter ID.
Assigning or un-assigning an AP adapter will be rejected if
a guest using the mediated matrix device is running.
The relevant sysfs structures are:
/sys/devices/vfio_ap/matrix/
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
...............[$uuid]
.................. assign_adapter
.................. unassign_adapter
To assign an adapter to the $uuid mediated matrix device's APM,
write the APID to the assign_adapter file. To unassign an adapter,
write the APID to the unassign_adapter file. The APID is specified
using conventional semantics: If it begins with 0x the number will
be parsed as a hexadecimal number; if it begins with a 0 the number
will be parsed as an octal number; otherwise, it will be parsed as a
decimal number.
For example, to assign adapter 173 (0xad) to the mediated matrix
device $uuid:
echo 173 > assign_adapter
or
echo 0xad > assign_adapter
or
echo 0255 > assign_adapter
To unassign adapter 173 (0xad):
echo 173 > unassign_adapter
or
echo 0xad > unassign_adapter
or
echo 0255 > unassign_adapter
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Tested-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20180925231641.4954-7-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Registers the matrix device created by the VFIO AP device
driver with the VFIO mediated device framework.
Registering the matrix device will create the sysfs
structures needed to create mediated matrix devices
each of which will be used to configure the AP matrix
for a guest and connect it to the VFIO AP device driver.
Registering the matrix device with the VFIO mediated device
framework will create the following sysfs structures:
/sys/devices/vfio_ap/matrix/
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ create
To create a mediated device for the AP matrix device, write a UUID
to the create file:
uuidgen > create
A symbolic link to the mediated device's directory will be created in the
devices subdirectory named after the generated $uuid:
/sys/devices/vfio_ap/matrix/
...... [mdev_supported_types]
......... [vfio_ap-passthrough]
............ [devices]
............... [$uuid]
A symbolic link to the mediated device will also be created
in the vfio_ap matrix's directory:
/sys/devices/vfio_ap/matrix/[$uuid]
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Message-Id: <20180925231641.4954-6-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Introduces a new AP device driver. This device driver
is built on the VFIO mediated device framework. The framework
provides sysfs interfaces that facilitate passthrough
access by guests to devices installed on the linux host.
The VFIO AP device driver will serve two purposes:
1. Provide the interfaces to reserve AP devices for exclusive
use by KVM guests. This is accomplished by unbinding the
devices to be reserved for guest usage from the zcrypt
device driver and binding them to the VFIO AP device driver.
2. Implements the functions, callbacks and sysfs attribute
interfaces required to create one or more VFIO mediated
devices each of which will be used to configure the AP
matrix for a guest and serve as a file descriptor
for facilitating communication between QEMU and the
VFIO AP device driver.
When the VFIO AP device driver is initialized:
* It registers with the AP bus for control of type 10 (CEX4
and newer) AP queue devices. This limitation was imposed
due to:
1. A desire to keep the code as simple as possible;
2. Some older models are no longer supported by the kernel
and others are getting close to end of service.
3. A lack of older systems on which to test older devices.
The probe and remove callbacks will be provided to support
the binding/unbinding of AP queue devices to/from the VFIO
AP device driver.
* Creates a matrix device, /sys/devices/vfio_ap/matrix,
to serve as the parent of the mediated devices created, one
for each guest, and to hold the APQNs of the AP devices bound to
the VFIO AP device driver.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20180925231641.4954-5-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch refactors the code that initializes and sets up the
crypto configuration for a guest. The following changes are
implemented via this patch:
1. Introduces a flag indicating AP instructions executed on
the guest shall be interpreted by the firmware. This flag
is used to set a bit in the guest's state description
indicating AP instructions are to be interpreted.
2. Replace code implementing AP interfaces with code supplied
by the AP bus to query the AP configuration.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Message-Id: <20180925231641.4954-4-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
When we change the crycb (or execution controls), we also have to make sure
that the vSIE shadow datastructures properly consider the changed
values before rerunning the vSIE. We can achieve that by simply using a
VCPU request now.
This has to be a synchronous request (== handled before entering the
(v)SIE again).
The request will make sure that the vSIE handler is left, and that the
request will be processed (NOP), therefore forcing a reload of all
vSIE data (including rebuilding the crycb) when re-entering the vSIE
interception handler the next time.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20180925231641.4954-3-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
VCPU requests and VCPU blocking right now don't take care of the vSIE
(as it was not necessary until now). But we want to have synchronous VCPU
requests that will also be handled before running the vSIE again.
So let's simulate a SIE entry of the VCPU when calling the sie during
vSIE handling and check for PROG_ flags. The existing infrastructure
(e.g. exit_sie()) will then detect that the SIE (in form of the vSIE) is
running and properly kick the vSIE CPU, resulting in it leaving the vSIE
loop and therefore the vSIE interception handler, allowing it to handle
VCPU requests.
E.g. if we want to modify the crycb of the VCPU and make sure that any
masks also get applied to the VSIE crycb shadow (which uses masks from the
VCPU crycb), we will need a way to hinder the vSIE from running and make
sure to process the updated crycb before reentering the vSIE again.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20180925231641.4954-2-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
KVM has an old optimization whereby accesses to the kernel GS base MSR
are trapped when the guest is in 32-bit and not when it is in 64-bit mode.
The idea is that swapgs is not available in 32-bit mode, thus the
guest has no reason to access the MSR unless in 64-bit mode and
32-bit applications need not pay the price of switching the kernel GS
base between the host and the guest values.
However, this optimization adds complexity to the code for little
benefit (these days most guests are going to be 64-bit anyway) and in fact
broke after commit 678e315e78 ("KVM: vmx: add dedicated utility to
access guest's kernel_gs_base", 2018-08-06); the guest kernel GS base
can be corrupted across SMIs and UEFI Secure Boot is therefore broken
(a secure boot Linux guest, for example, fails to reach the login prompt
about half the time). This patch just removes the optimization; the
kernel GS base MSR is now never trapped by KVM, similarly to the FS and
GS base MSRs.
Fixes: 678e315e78
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Fix OMAP Device Tree compatible strings to match DT
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEdrbJNaO+IJqU8IdIUa+KL4f8d2EFAlunqzUACgkQUa+KL4f8
d2H0pA/7Bobyk3ENOmKag1KbF6N8cXVhh7RZ4UQ7zZW3ReFmOYcScA9UkFNgYbxx
LSW+0Tw7IjWbeTi43AxYMnOYIXCDcaqJma72XsOPXD8dS2YEv7wHthAxzVdenjUR
7CMA5QrsLZKrJk466iArip4k7VLeUDRW7eVPsE9UOL3jzcaRsfzXxUvA4+BWBl/X
ueSh2NGMD2e6yqAYIwstVaEeUNdjEcSz+vnk0m2UAvJM2ox4e5OBMF5FHucZ42KN
LFQbOBCqhCKJ26hudQ+Zc3purkvHVnKSZHqKA4IvMP5Rb+BjPH0sU6gpTidaxQ2x
S6uzYELA+IbTqlKlgH1W84aThqaMqwSOJzjoi0zAQcNvuaffblNyqLUQEoSsyDgj
2aVAr+A5Duxt2A8qckKWk5dy6TAgcE00ImGSIndferUw1WCj22xsTQc3J3xO46r0
IYT6TLsmDoNo913gFfB6t8OdvNWaUMLGABe89NnQFd2mfBo3HMbS14JJAOnH3V2K
cCSbOdj2jHhw7zBfPTts7qSQC3CJw2QluhOPLf9NBl0AHJZYuk30DsTCNX1GDkyq
m73cP0owXREOSWR3LpS8M3pbbu+Jd/JOWkPVRv3mgA/47DPqNbH1A4nXz8mdDqhQ
ao+44ri2D3hUeFsHsjc+oKYsgxuoPPKGssPIFwJU8dT925RZX/Q=
=5veh
-----END PGP SIGNATURE-----
Merge tag 'mfd-fixes-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Lee writes:
"MFD fixes for v4.19
- Fix Dialog DA9063 regulator constraints issue causing failure in
probe
- Fix OMAP Device Tree compatible strings to match DT"
* tag 'mfd-fixes-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: omap-usb-host: Fix dts probe of children
mfd: da9063: Fix DT probing with constraints
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCW6dGhgAKCRCAXGG7T9hj
vs1UAPwPSDmelfUus+P5ndRQZdK/iQWuRgQUe9Gd3RUVTfcQ7AEAljcN4/dSj7SB
hOgRlCp5WB1s5/vFF7z4jc2wtqvOPAk=
=8P9c
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.19d-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Juergen writes:
"xen:
Two small fixes for xen drivers."
* tag 'for-linus-4.19d-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: issue warning message when out of grant maptrack entries
xen/x86/vpmu: Zero struct pt_regs before calling into sample handling code
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlumsbMQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpvepEACYfXXSUuDrZ/hlTOnkCEuOIMhvwDlCSq+z
QjLPUQCmnIeWthSbTAZ8skUEgqUtlkMkJcTUB+JIDlub3hZVMvNDvVi7JH5kkDVo
GuVXA+Ip4O9AOuwMObgGXD8uWpCJ1muF2oUrFnZsUYwQLyZ/vnQFXPxdyVtagZtL
Udwr9U2jfOzLa4fhhn+lDmLXLQSS4F1s4BxSn0HGlq7MIU3+/miZeMOwo+xtw+cn
a06pdd+sdIHWbOiLP0ycVbtwasaNhdVG5vj0ViMnZw/N1MidBirzqbsdZ3dGdnoE
V1Ba1axoMdm/mpGt7Xb0J2oze7rAcGNAGHJ0dEU51oj8X32kIwlXuLTJWv3Z0joF
KEvJBVGK+2/K3jbmibrHFXE7pe+9cKGgV+s3C973XpoYtdIMxi4D7iADwXvSf+vw
8bWUp5RTf0aMNTdkqn7Mj+w0NzNgd4pSLgNgUZdrWpggTl021xOkyS2T/cNDY9pj
J4eJu+gSNN8dAu3NZmZb+7mdGaXwXoaKMWnYFr3ADf9bpkSosQOfSFClRXFqR1ng
54b3nVL9UdqQNltuppu/LCQWAbU464jAS3uoR3h4e4WGFVL4oMpA6OMc0XzODkEO
X+7LHq5tQcPu9eTFmGjchS1tK6AhyQVkHeOkQaLdJLWvlQoMOQgmeAhh4HaF7XqM
dBeuwZqBqQ==
=7uhx
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20180922' of git://git.kernel.dk/linux-block
Jens writes:
"Just a single fix in this pull request, fixing a regression in
/proc/diskstats caused by the unification of timestamps."
* tag 'for-linus-20180922' of git://git.kernel.dk/linux-block:
block: use nanosecond resolution for iostat
Thomas writes:
"A set of fixes for x86:
- Resolve the kvmclock regression on AMD systems with memory
encryption enabled. The rework of the kvmclock memory allocation
during early boot results in encrypted storage, which is not
shareable with the hypervisor. Create a new section for this data
which is mapped unencrypted and take care that the later
allocations for shared kvmclock memory is unencrypted as well.
- Fix the build regression in the paravirt code introduced by the
recent spectre v2 updates.
- Ensure that the initial static page tables cover the fixmap space
correctly so early console always works. This worked so far by
chance, but recent modifications to the fixmap layout can -
depending on kernel configuration - move the relevant entries to a
different place which is not covered by the initial static page
tables.
- Address the regressions and issues which got introduced with the
recent extensions to the Intel Recource Director Technology code.
- Update maintainer entries to document reality"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Expand static page table for fixmap space
MAINTAINERS: Add X86 MM entry
x86/intel_rdt: Add Reinette as co-maintainer for RDT
MAINTAINERS: Add Borislav to the x86 maintainers
x86/paravirt: Fix some warning messages
x86/intel_rdt: Fix incorrect loop end condition
x86/intel_rdt: Fix exclusive mode handling of MBA resource
x86/intel_rdt: Fix incorrect loop end condition
x86/intel_rdt: Do not allow pseudo-locking of MBA resource
x86/intel_rdt: Fix unchecked MSR access
x86/intel_rdt: Fix invalid mode warning when multiple resources are managed
x86/intel_rdt: Global closid helper to support future fixes
x86/intel_rdt: Fix size reporting of MBA resource
x86/intel_rdt: Fix data type in parsing callbacks
x86/kvm: Use __bss_decrypted attribute in shared variables
x86/mm: Add .bss..decrypted section to hold shared variables
Thomas writes:
"- Provide a strerror_r wrapper so lib/bpf can be built on systems
without _GNU_SOURCE
- Unbreak the man page generator when building out of tree"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf Documentation: Fix out-of-tree asciidoctor man page generation
tools lib bpf: Provide wrapper for strerror_r to build in !_GNU_SOURCE systems
Thomas writes:
"Make the EFI arm stub device tree loader default on to unbreak
existing EFI boot loaders which do not have DTB support."
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/libstub/arm: default EFI_ARMSTUB_DTB_LOADER to y
Klaus Kusche reported that the I/O busy time in /proc/diskstats was not
updating properly on 4.18. This is because we started using ktime to
track elapsed time, and we convert nanoseconds to jiffies when we update
the partition counter. However, this gets rounded down, so any I/Os that
take less than a jiffy are not accounted for. Previously in this case,
the value of jiffies would sometimes increment while we were doing I/O,
so at least some I/Os were accounted for.
Let's convert the stats to use nanoseconds internally. We still report
milliseconds as before, now more accurately than ever. The value is
still truncated to 32 bits for backwards compatibility.
Fixes: 522a777566 ("block: consolidate struct request timestamp fields")
Cc: stable@vger.kernel.org
Reported-by: Klaus Kusche <klaus.kusche@computerix.info>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- Two fixes for the Intel pin controllers than cause
problems on laptops.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbpSNBAAoJEEEQszewGV1zhIEQAJmn3wGuCSnycJghe0oCNtGz
fk2YcWmg1TmzaXsq1zhtL7Nx3wKHrNrvVOw091Wm1nP+XO9REglu+03mXIxpmb8i
wGSBDApsux+5Pit+SxKKbryDHlOeXmhMA3dBPQTdM7phaiLSkH36BucKYNjAesqu
qYa2AXrxy4akb8ot6lLZQNWaBNWRsFZ07X6y9Rf9H4cBEaFf2XSzf4EmbCdMNjtN
agJQURc8pTaJ7Z4ovtx0fs9E4jzvI+4Ln+sT6JlxCv/NL36pI71YaiRSaBI9LphV
jL1nW5cjE6gMdSNZ0LVuR4bD5M+P7mAFx98rVQ/MlSADfn2uxpOLW8CiJHkXCCuI
WG6lRmatJgWfw2be/01GGr9UA9J25QdLragMtz+lIp0SUrI+PTcnir2TksmuEJ0/
xZWHMaTDeFsnmBVhEe22jimnJsTf8tbupDYIp3eYq6TSAxZbFY5YlhxksE8UGJ0t
kgjrapBpQul6hC5KXbYXCOPqbibp6auyu/wThAS3KKU0XrfZOpkPjLvp9gW1306Q
C+wlg7BViQNQzYt9B4/G6jcXB3TDV/AbCQYaMYSRFJ01OW0LA/49HHLt1bfM6h4E
Ve/wwu5FHLoaR1uSK6n68wcONjcU7O2J5lVNxwzJ4KP147piB01NwzxRQqgyGtun
yYHKoEx/hM8eEw2BNYW/
=NdkN
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Linus writes:
"Pin control fixes for v4.19:
- Two fixes for the Intel pin controllers than cause
problems on laptops."
* tag 'pinctrl-v4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: intel: Do pin translation in other GPIO operations as well
pinctrl: cannonlake: Fix gpio base for GPP-E
I swear I would have sent it the same to Linus! The main cause for
this is that I was on vacation until two weeks ago and it took a while
to sort all the pending patches between 4.19 and 4.20, test them and
so on.
It's mostly small bugfixes and cleanups, mostly around x86 nested
virtualization. One important change, not related to nested
virtualization, is that the ability for the guest kernel to trap CPUID
instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is now
masked by default. This is because the feature is detected through an
MSR; a very bad idea that Intel seems to like more and more. Some
applications choke if the other fields of that MSR are not initialized
as on real hardware, hence we have to disable the whole MSR by default,
as was the case before Linux 4.12.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJbpPo1AAoJEL/70l94x66DdxgH/is0qe6ZBtzb6Qc0W+8mHHD7
nxIkWAs2V5NsouJ750YwRQ+0Ym407+wlNt30acdBUEoXhrnA5/TvyGq999XvCL96
upWEIxpIgbvTMX/e2nLhe4wQdhsboUK4r0/B9IFgVFYrdCt5uRXjB2G4ewxcqxL/
GxxqrAKhaRsbQG9Xv0Fw5Vohh/Ls6fQDJcyuY1EBnbMpVenq2QDLI6cOAPXncyFb
uLN6ov4GNCWIPckwxejri5XhZesUOsafrmn48sApShh4T6TrisrdtSYdzl+DGza+
j5vhIEwdFO5kulZ3viuhqKJOnS2+F6wvfZ75IKT0tEKeU2bi+ifGDyGRefSF6Q0=
=YXLw
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Paolo writes:
"It's mostly small bugfixes and cleanups, mostly around x86 nested
virtualization. One important change, not related to nested
virtualization, is that the ability for the guest kernel to trap
CPUID instructions (in Linux that's the ARCH_SET_CPUID arch_prctl) is
now masked by default. This is because the feature is detected
through an MSR; a very bad idea that Intel seems to like more and
more. Some applications choke if the other fields of that MSR are
not initialized as on real hardware, hence we have to disable the
whole MSR by default, as was the case before Linux 4.12."
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits)
KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs
kvm: selftests: Add platform_info_test
KVM: x86: Control guest reads of MSR_PLATFORM_INFO
KVM: x86: Turbo bits in MSR_PLATFORM_INFO
nVMX x86: Check VPID value on vmentry of L2 guests
nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2
KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv
KVM: VMX: check nested state and CR4.VMXE against SMM
kvm: x86: make kvm_{load|put}_guest_fpu() static
x86/hyper-v: rename ipi_arg_{ex,non_ex} structures
KVM: VMX: use preemption timer to force immediate VMExit
KVM: VMX: modify preemption timer bit only when arming timer
KVM: VMX: immediately mark preemption timer expired only for zero value
KVM: SVM: Switch to bitmap_zalloc()
KVM/MMU: Fix comment in walk_shadow_page_lockless_end()
kvm: selftests: use -pthread instead of -lpthread
KVM: x86: don't reset root in kvm_mmu_setup()
kvm: mmu: Don't read PDPTEs when paging is not enabled
x86/kvm/lapic: always disable MMIO interface in x2APIC mode
KVM: s390: Make huge pages unavailable in ucontrol VMs
...
- A wrong UBIFS assertion in mount code
- Fix for a NULL pointer deref in mount code
- Revert of a bad fix for xattrs
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAlukuf8WHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wRnyD/40jc4PA1pP3dJcXqdqwSeN+vJZ
YWu307rE+giWWg6vRKa8zu3z614579UC7EE/I9miSrPNPYCi9VTLNzJGNMvn8aSf
8seMrbhW0kz29T4YZ5fESr6x3hMsSdmIWZkPobruul8oBGmNcUm3Ag3MBM4XbQTm
5AcWUdXVQUXPROK/Xm04hNa5pJ6wkSu5CxA+mf9YM2ZYYPPdCblxxq3NsDX99vDw
gZup7JndSzpk+IpZMQIEKUP83vh5YgCU06YPjXi5YIfXl+sQnpH0fyrWm3+1C0Yz
4T80rd3fm+7+obXdAI29mhcoZ09fGayozfqjU/fV3edW43+pUvRWced13vuLqz9Q
JannCFeN+v/nB6OPlj4JGkgNunZ5M3r6ZZIQBbY9RJuZgG0/FMbpYb62RY1z/jg9
YA7e3J/R02i7tHnPbcIz6ngKE0c42VKxTdATaybXVunx5ZPip7txj9OeJbfIYUrr
CbLMdiqIJUAAheHMUrTiF3jDNwEiMhBZA4dAGDvLmpLuyfMlN8Lwje9BdRd5VNKp
zwEGUQkDWLxniFYLtmbceFSa4IQT6z70XJn1pqdDvgjxfp8trEqhtp9GMxA2u6TA
adcug5gUzRWMQ2QuyoKIv6tkjmnII5N2KzhZH9hwBReRJUlKY9WQ9jowcGrPXhbw
F2prIwBxp8drneOeKg==
=CGLq
-----END PGP SIGNATURE-----
Merge tag 'upstream-4.19-rc4' of git://git.infradead.org/linux-ubifs
Richard writes:
"This pull request contains fixes for UBIFS:
- A wrong UBIFS assertion in mount code
- Fix for a NULL pointer deref in mount code
- Revert of a bad fix for xattrs"
* tag 'upstream-4.19-rc4' of git://git.infradead.org/linux-ubifs:
Revert "ubifs: xattr: Don't operate on deleted inodes"
ubifs: drop false positive assertion
ubifs: Check for name being NULL while mounting
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlukFZsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpi5MD/424bOq72KPsGNGU6io2c56ecIDItpZiDvq
srDPKN/pdGQ5WZuP2E21Zz+Ry/LP4niUvjZmY88UC9SXx9nxkESyCccxVFGmlGXN
Ttno1l47bu9nvAVPBdZ8brvsODCXk03RiJUDf9NrJsROMB3Xhbo1DtxyjwFwJ/Fr
aokAD3KvFitIbrHhTozxyvzpT55icVu+ykNQypnwdmWSyvebuCvrnQTxRJDNyrgk
wYUB9bhjGaCJzfBT4U7Pi5Is7t16cOSLmw0nv7uo/CUvk1xmSNajxRybCG1IEIFs
eg7Xyo2AFd/+BMwp7cFdOncjLbLzNJW0tJn3bsoJXPVE+MAy2EYYgk760fgqePMV
8afzG6l+obDCefCySj6SMVhXZi3YE640MHWeRBxGJVW5HmWq7Himlt2BUJBvcP72
y6syVoU5gkOTtUun60uF4Lloc0If6MPqvHA5mfDQx6hmQEi/H2XBE0YJxyQoxsZg
8FDq6KMBlmBlxdzfCtOMGNPPS9LIilvgFUpfsR3XzbY5nUhKB4ZZmMCWe9qm6hOl
1yoT6RS1tNV5PL1wQu1cj+d/RHaM7ok0F9FeKhofh1w3NG2x1qajTLNHLb/no7u2
eOJ8mmDxOvJOmU+9H7/Eu4jb2jhpaZPlrX//lxF3VzJcfGwU1wprjsbAjB9mxN1i
4k8CjG0h6w==
=z9pB
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20180920' of git://git.kernel.dk/linux-block
Jens writes:
"Storage fixes for 4.19-rc5
- Fix for leaking kernel pointer in floppy ioctl (Andy Whitcroft)
- NVMe pull request from Christoph, and a single ANA log page fix
(Hannes)
- Regression fix for libata qd32 support, where we trigger an illegal
active command transition. This fixes a CD-ROM detection issue that
was reported, but could also trigger premature completion of the
internal tag (me)"
* tag 'for-linus-20180920' of git://git.kernel.dk/linux-block:
floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl
libata: mask swap internal and hardware tag
nvme: count all ANA groups for ANA Log page
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbpDR8AAoJEAx081l5xIa+9doQAJGNbihlBJx9TSYjejuAW5SD
Rb0WQVCaHaHIzyaCTTcN+1GPryCUBPrEnF5DMic8WodWDIxxTGp0c4pjYRazVIqh
V7l+IeZAmUXYMlv5jTGMgYaCXDuxaKPA1kCVAGg/q//XxBR6RCMRweZiGjl/+oqX
Yt3N+nM2NWmzt6t96fYrDyHdPKsdAVB8ogkd4z/GBJ1xo1hU+MlDrJx4j+R/yGHk
WZ34ThPmACMnpnyskamJ2J8gmhmHPe7Qt7/izbsusLH1CYXnvwKL+iMkQ3YGvUuS
9cob6u9ar/8WgfaRevsU0L3t0e68UHiQiD0O4R3YGxl32Wumd7BPCY0uUW3Q+I64
1ikA6AtzdCz64LtWhulxKtnLKN/lekfG5qbSuKxGt/ppHY3PY0qWWUiG2CEIjtPh
OYFLg8/3lEhDAoyW0Qn9oUCQ6uWXUFAqaVV8L5d+zFZzKkJm4msyEVOf3MZXVTTw
EqcogCfNpqm/T5gsyZ4YSWPZhp+OlVJGEXTIgA4TBciKvDjlcD86BH9cdQx5MtnO
pLf1bCAJPVufU1zBa81CNbDzwGl475s0eY8NpFJztCn+43ews0bhTtpwSdn8F4BS
4pU4hlZPbCuAekDUXiDvVzQveXCU45wI2wL1tVqpTZS4MglOFz7nnFriI7I3eh1B
/Dh//KeQ0qDxu4P/XSlL
=56tH
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2018-09-21' of git://anongit.freedesktop.org/drm/drm
David writes:
"drm fixes for 4.19-rc5:
- core: fix debugfs for atomic, fix the check for atomic for
non-modesetting drivers
- amdgpu: adds a new PCI id, some kfd fixes and a sdma fix
- i915: a bunch of GVT fixes.
- vc4: scaling fix
- vmwgfx: modesetting fixes and a old buffer eviction fix
- udl: framebuffer destruction fix
- sun4i: disable on R40 fix until next kernel
- pl111: NULL termination on table fix"
* tag 'drm-fixes-2018-09-21' of git://anongit.freedesktop.org/drm/drm: (21 commits)
drm/amdkfd: Fix ATS capablity was not reported correctly on some APUs
drm/amdkfd: Change the control stack MTYPE from UC to NC on GFX9
drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7
drm/vmwgfx: Fix buffer object eviction
drm/vmwgfx: Don't impose STDU limits on framebuffer size
drm/vmwgfx: limit mode size for all display unit to texture_max
drm/vmwgfx: limit screen size to stdu_max during check_modeset
drm/vmwgfx: don't check for old_crtc_state enable status
drm/amdgpu: add new polaris pci id
drm: sun4i: drop second PLL from A64 HDMI PHY
drm: fix drm_drv_uses_atomic_modeset on non modesetting drivers.
drm/i915/gvt: clear ggtt entries when destroy vgpu
drm/i915/gvt: request srcu_read_lock before checking if one gfn is valid
drm/i915/gvt: Add GEN9_CLKGATE_DIS_4 to default BXT mmio handler
drm/i915/gvt: Init PHY related registers for BXT
drm/atomic: Use drm_drv_uses_atomic_modeset() for debugfs creation
drm/fb-helper: Remove set but not used variable 'connector_funcs'
drm: udl: Destroy framebuffer only if it was initialized
drm/sun4i: Remove R40 display pipeline compatibles
drm/pl111: Make sure of_device_id tables are NULL terminated
...
We met a kernel panic when enabling earlycon, which is due to the fixmap
address of earlycon is not statically setup.
Currently the static fixmap setup in head_64.S only covers 2M virtual
address space, while it actually could be in 4M space with different
kernel configurations, e.g. when VSYSCALL emulation is disabled.
So increase the static space to 4M for now by defining FIXMAP_PMD_NUM to 2,
and add a build time check to ensure that the fixmap is covered by the
initial static page tables.
Fixes: 1ad83c858c ("x86_64,vsyscall: Make vsyscall emulation configurable")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: kernel test robot <rong.a.chen@intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com> (Xen parts)
Cc: H Peter Anvin <hpa@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180920025828.23699-1-feng.tang@intel.com
9092c71bb7 ("mm: use sc->priority for slab shrink targets") changed the
way that the target slab pressure is calculated and made it
priority-based:
delta = freeable >> priority;
delta *= 4;
do_div(delta, shrinker->seeks);
The problem is that on a default priority (which is 12) no pressure is
applied at all, if the number of potentially reclaimable objects is less
than 4096 (1<<12).
This causes the last objects on slab caches of no longer used cgroups to
(almost) never get reclaimed. It's obviously a waste of memory.
It can be especially painful, if these stale objects are holding a
reference to a dying cgroup. Slab LRU lists are reparented on memcg
offlining, but corresponding objects are still holding a reference to the
dying cgroup. If we don't scan these objects, the dying cgroup can't go
away. Most likely, the parent cgroup hasn't any directly charged objects,
only remaining objects from dying children cgroups. So it can easily hold
a reference to hundreds of dying cgroups.
If there are no big spikes in memory pressure, and new memory cgroups are
created and destroyed periodically, this causes the number of dying
cgroups grow steadily, causing a slow-ish and hard-to-detect memory
"leak". It's not a real leak, as the memory can be eventually reclaimed,
but it could not happen in a real life at all. I've seen hosts with a
steadily climbing number of dying cgroups, which doesn't show any signs of
a decline in months, despite the host is loaded with a production
workload.
It is an obvious waste of memory, and to prevent it, let's apply a minimal
pressure even on small shrinker lists. E.g. if there are freeable
objects, let's scan at least min(freeable, scan_batch) objects.
This fix significantly improves a chance of a dying cgroup to be
reclaimed, and together with some previous patches stops the steady growth
of the dying cgroups number on some of our hosts.
Link: http://lkml.kernel.org/r/20180905230759.12236-1-guro@fb.com
Fixes: 9092c71bb7 ("mm: use sc->priority for slab shrink targets")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Rik van Riel <riel@surriel.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The 'm' kcore_list item could point to kclist_head, and it is incorrect to
look at m->addr / m->size in this case.
There is no choice but to run through the list of entries for every
address if we did not find any entry in the previous iteration
Reset 'm' to NULL in that case at Omar Sandoval's suggestion.
[akpm@linux-foundation.org: add comment]
Link: http://lkml.kernel.org/r/1536100702-28706-1-git-send-email-asmadeus@codewreck.org
Fixes: bf991c2231 ("proc/kcore: optimize multiple page reads")
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Omar Sandoval <osandov@osandov.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: James Morse <james.morse@arm.com>
Cc: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make the clone and fork syscalls return EAGAIN when the limit on the
number of pids /proc/sys/kernel/pid_max is exceeded.
Currently, when the pid_max limit is exceeded, the kernel will return
ENOSPC from the fork and clone syscalls. This is contrary to the
documented behaviour, which explicitly calls out the pid_max case as one
where EAGAIN should be returned. It also leads to really confusing error
messages in userspace programs which will complain about a lack of disk
space when they fail to create processes/threads for this reason.
This error is being returned because alloc_pid() uses the idr api to find
a new pid; when there are none available, idr_alloc_cyclic() returns
-ENOSPC, and this is being propagated back to userspace.
This behaviour has been broken before, and was explicitly fixed in
commit 35f71bc0a0 ("fork: report pid reservation failure properly"),
so I think -EAGAIN is definitely the right thing to return in this case.
The current behaviour change dates from commit 95846ecf9d ("pid:
replace pid bitmap implementation with IDR AIP") and was I believe
unintentional.
This patch has no impact on the case where allocating a pid fails because
the child reaper for the namespace is dead; that case will still return
-ENOMEM.
Link: http://lkml.kernel.org/r/20180903111016.46461-1-ktsanaktsidis@zendesk.com
Fixes: 95846ecf9d ("pid: replace pid bitmap implementation with IDR AIP")
Signed-off-by: KJ Tsanaktsidis <ktsanaktsidis@zendesk.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Gargi Sharma <gs051095@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dave, Andy and Peter are de facto overseing the mm parts of X86. Add an
explicit maintainers entry.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reinette Chatre is doing great job on enabling pseudo-locking and other
features in RDT. Add her as co-maintainer for RDT.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1537472228-221799-1-git-send-email-fenghua.yu@intel.com
This reverts commit 11a6fc3dc7.
UBIFS wants to assert that xattr operations are only issued on files
with positive link count. The said patch made this operations return
-ENOENT for unlinked files such that the asserts will no longer trigger.
This was wrong since xattr operations are perfectly fine on unlinked
files.
Instead the assertions need to be fixed/removed.
Cc: <stable@vger.kernel.org>
Fixes: 11a6fc3dc7 ("ubifs: xattr: Don't operate on deleted inodes")
Reported-by: Koen Vandeputte <koen.vandeputte@ncentric.com>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Richard Weinberger <richard@nod.at>
The following sequence triggers
ubifs_assert(c, c->lst.taken_empty_lebs > 0);
at the end of ubifs_remount_fs():
mount -t ubifs /dev/ubi0_0 /mnt
echo 1 > /sys/kernel/debug/ubifs/ubi0_0/ro_error
umount /mnt
mount -t ubifs -o ro /dev/ubix_y /mnt
mount -o remount,ro /mnt
The resulting
UBIFS assert failed in ubifs_remount_fs at 1878 (pid 161)
is a false positive. In the case above c->lst.taken_empty_lebs has
never been changed from its initial zero value. This will only happen
when the deferred recovery is done.
Fix this by doing the assertion only when recovery has been done
already.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
The requested device name can be NULL or an empty string.
Check for that and refuse to continue. UBIFS has to do this manually
since we cannot use mount_bdev(), which checks for this condition.
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Reported-by: syzbot+38bd0f7865e5c6379280@syzkaller.appspotmail.com
Signed-off-by: Richard Weinberger <richard@nod.at>