All currently known ECs in the wild are very sensitive to timing.
Specifically the ECs are known to drop a transfer if more than 8 ms
passes from the assertion of the chip select until the transfer
finishes.
Let's use the new feature introduced in the patch (spi: Allow SPI
devices to request the pumping thread be realtime") to request the SPI
pumping thread be realtime. This means that if we get shunted off to
the SPI thread for whatever reason we won't get downgraded to low
priority.
NOTES:
- We still need to keep ourselves as high priority since the SPI core
doesn't guarantee that all transfers end up on the pumping thread
(in fact, it tries pretty hard to do them in the calling context).
- If future Chrome OS ECs ever fix themselves to be less sensitive
then we could consider adding a property (or compatible string) to
not set this property. For now we need it across the board.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
In commit 37a186225a ("platform/chrome: cros_ec_spi: Transfer
messages at high priority") we moved transfers to a high priority
workqueue. This helped make them much more reliable.
...but, we still saw failures.
We were actually finding ourselves competing for time with dm-crypt
which also scheduled work on HIGHPRI workqueues. While we can
consider reverting the change that made dm-crypt run its work at
HIGHPRI, the argument in commit a1b89132dc ("dm crypt: use
WQ_HIGHPRI for the IO and crypt workqueues") is somewhat compelling.
It does make sense for IO to be scheduled at a priority that's higher
than the default user priority. It also turns out that dm-crypt isn't
alone in using high priority like this. loop_prepare_queue() does
something similar for loopback devices.
Looking in more detail, it can be seen that the high priority
workqueue isn't actually that high of a priority. It runs at MIN_NICE
which is _fairly_ high priority but still below all real time
priority.
Should we move cros_ec_spi to real time priority to fix our problems,
or is this just escalating a priority war? I'll argue here that
cros_ec_spi _does_ belong at real time priority. Specifically
cros_ec_spi actually needs to run quickly for correctness. As I
understand this is exactly what real time priority is for.
There currently doesn't appear to be any way to use the standard
workqueue APIs with a real time priority, so we'll switch over to
using using a kthread worker. We'll match the priority that the SPI
core uses when it wants to do things on a realtime thread and just use
"MAX_RT_PRIO - 1".
This commit plus the patch ("platform/chrome: cros_ec_spi: Request the
SPI thread be realtime") are enough to get communications very close
to 100% reliable (the only known problem left is when serial console
is turned on, which isn't something that happens in shipping devices).
Specifically this test case now passes (tested on rk3288-veyron-jerry):
dd if=/dev/zero of=/var/log/foo.txt bs=4M count=512&
while true; do
ectool version > /dev/null;
done
It should be noted that "/var/log" is encrypted (and goes through
dm-crypt) and also passes through a loopback device.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
The Chrome OS EC driver attaches to devices using the of_match_table
even when ACPI is the underlying firmware. It does this using the
magic PRP0001 ACPI HID, which tells ACPI to go find an OF compatible
string under the hood and match on that.
The cros_ec_spi driver needs to provide the of_match_table regardless
of whether CONFIG_OF is enabled or not, since the table is used by
ACPI for PRP0001 devices.
Signed-off-by: Evan Green <evgreen@chromium.org>
Reviewed-by: Benson Leung <bleung@chromium.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
The software running on the Chrome OS Embedded Controller (cros_ec)
handles SPI transfers in a bit of a wonky way. Specifically if the EC
sees too long of a delay in a SPI transfer it will give up and the
transfer will be counted as failed. Unfortunately the timeout is
fairly short, though the actual number may be different for different
EC codebases.
We can end up tripping the timeout pretty easily if we happen to
preempt the task running the SPI transfer and don't get back to it for
a little while.
Historically this hasn't been a _huge_ deal because:
1. On old devices Chrome OS used to run PREEMPT_VOLUNTARY. That meant
we were pretty unlikely to take a big break from the transfer.
2. On recent devices we had faster / more processors.
3. Recent devices didn't use "cros-ec-spi-pre-delay". Using that
delay makes us more likely to trip this use case.
4. For whatever reasons (I didn't dig) old kernels seem to be less
likely to trip this.
5. For the most part it's kinda OK if a few transfers to the EC fail.
Mostly we're just polling the battery or doing some other task
where we'll try again.
Even with the above things, this issue has reared its ugly head
periodically. We could solve this in a nice way by adding reliable
retries to the EC protocol [1] or by re-designing the code in the EC
codebase to allow it to wait longer, but that code doesn't ever seem
to get changed. ...and even if it did, it wouldn't help old devices.
It's now time to finally take a crack at making this a little better.
This patch isn't guaranteed to make every cros_ec SPI transfer
perfect, but it should improve things by a few orders of magnitude.
Specifically you can try this on a rk3288-veyron Chromebook (which is
slower and also _does_ need "cros-ec-spi-pre-delay"):
md5sum /dev/zero &
md5sum /dev/zero &
md5sum /dev/zero &
md5sum /dev/zero &
while true; do
cat /sys/class/power_supply/sbs-20-000b/charge_now > /dev/null;
done
...before this patch you'll see boatloads of errors. After this patch I
don't see any in the testing I did.
The way this patch works is by effectively boosting the priority of
the cros_ec transfers. As far as I know there is no simple way to
just boost the priority of the current process temporarily so the way
we accomplish this is by queuing the work on the system_highpri_wq.
NOTE: this patch relies on the fact that the SPI framework attempts to
push the messages out on the calling context (which is the one that is
boosted to high priority). As I understand from earlier (long ago)
discussions with Mark Brown this should be a fine assumption. Even if
it isn't true sometimes this patch will still not make things worse.
[1] https://crbug.com/678675
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Adopt the SPDX license identifier headers to ease license compliance
management. Also change the description for one more appropriate.
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Use devm_mfd_add_devices() for adding cros-ec core MFD child devices. This
reduces the need of remove callback from platform/chrome for removing the
MFD child devices.
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
There are some cros-ec transport drivers (I2C, SPI) living in MFD, while
others (LPC) living in drivers/platform. The transport drivers are more
platform specific. So, move the I2C and SPI transport drivers to the
platform/chrome directory. The patch also removes the MFD_ prefix of
their Kconfig symbols.
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Benson Leung <bleung@chromium.org>