Merge second patch-bomb from Andrew Morton:
- most of the rest of MM
- procfs
- lib/ updates
- printk updates
- bitops infrastructure tweaks
- checkpatch updates
- nilfs2 update
- signals
- various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
dma-debug, dma-mapping, ...
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
ipc,msg: drop dst nil validation in copy_msg
include/linux/zutil.h: fix usage example of zlib_adler32()
panic: release stale console lock to always get the logbuf printed out
dma-debug: check nents in dma_sync_sg*
dma-mapping: tidy up dma_parms default handling
pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
kexec: use file name as the output message prefix
fs, seqfile: always allow oom killer
seq_file: reuse string_escape_str()
fs/seq_file: use seq_* helpers in seq_hex_dump()
coredump: change zap_threads() and zap_process() to use for_each_thread()
coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
signals: kill block_all_signals() and unblock_all_signals()
nilfs2: fix gcc uninitialized-variable warnings in powerpc build
nilfs2: fix gcc unused-but-set-variable warnings
MAINTAINERS: nilfs2: add header file for tracing
nilfs2: add tracepoints for analyzing reading and writing metadata files
...
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are multiple types of users of mtd->reboot_notifier.notifier_call:
(1) A while back, the cfi_cmdset_000{1,2} chip drivers implemented a
reboot notifier to (on a best effort basis) attempt to reset their flash
chips before rebooting.
(2) More recently, we implemented a common _reboot() hook so that MTD
drivers (particularly, NAND flash) could better halt I/O operations
without having to reimplement the same notifier boilerplate.
Currently, the WARN_ONCE() condition here was written to handle (2), but
at the same time it mis-diagnosed case (1) as an already-registered MTD.
Let's fix this by having the WARN_ONCE() condition better imitate the
condition that immediately follows it. (Wow, I don't know how I missed
that one.)
(Side note: Unfortunately, we can't yet combine the reboot notifier code
for (1) and (2) with a patch like [1], because some users of (1) also
use mtdconcat, and so the mtd_info struct from cfi_cmdset_000{1,2} won't
actually get registered with mtdcore, and therefore their reboot
notifier won't get registered.)
[1] http://patchwork.ozlabs.org/patch/417981/
Suggested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Jesper Nilsson <jespern@axis.com>
Cc: linux-cris-kernel@axis.com
Tested-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Due to wrong assumption in ofpart ofpart fails on Exynos on SPI chips
with no partitions because the subnode containing controller data
confuses the ofpart parser.
Thus compiling in ofpart support automatically fails probing any SPI NOR
flash without partitions on Exynos.
Compiling in a partitioning scheme should not cause probe of otherwise
valid device to fail.
Instead, let's do the following:
* try parsers until one succeeds
* if no parser succeeds, report the first error we saw
* even in the failure case, allow MTD to probe, with fallback
partitions or no partitions at all -- the master device will still be
registered
Issue report and comments initially by Michal Suchanek.
Reported-by: Michal Suchanek <hramrach@gmail.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
When CONFIG_MTD_PARTITIONED_MASTER=y, it is fatal to call
mtd_device_parse_register() twice on the same MTD, as we try to register
the same device/kobject multipile times.
When CONFIG_MTD_PARTITIONED_MASTER=n, calling
mtd_device_parse_register() is more of just a nuisance, as we can mostly
navigate around any conflicting actions.
But anyway, doing so is a Bad Thing (TM), and we should complain loudly
for any drivers that try to do this.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Since commit 3efe41be22 ("mtd: implement common reboot notifier
boilerplate"), we might try to register a reboot notifier for an MTD
that failed to register. Let's avoid this by making the error path
clearer.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
If a parent device is set, add_mtd_device() has enough knowledge to fill
in some sane default values for the module name and owner. Do so if they
aren't already set.
Signed-off-by: Frans Klaver <fransklaver@gmail.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
add_mtd_device() has a comment suggesting that the caller should have
set dev.parent. This is required to have the parent device symlink show
up in sysfs, but not for proper operation of the mtd device itself.
Currently we have five drivers registering mtd devices during module
initialization, so they don't actually provide a parent device to link
to. That means we cannot WARN_ON() here, as it would trigger false
positives.
Make the comment a bit less firm in its assertion that dev.parent should
be set.
Signed-off-by: Frans Klaver <fransklaver@gmail.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
It makes more sense to return error statuses, not 1/0.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Use dev_pm_ops instead of the legacy suspend/resume callbacks for the MTD
class suspend and resume operations.
While we are at it slightly reorder things to avoid the need for forward
declarations.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
For many use cases, it helps to have a device node for the entire
MTD device as well as device nodes for the individual partitions.
For example, this allows querying the entire device's properties.
A common idiom is to create an additional partition which spans
over the whole device.
This patch makes a config option, CONFIG_MTD_PARTITIONED_MASTER,
which makes the master partition present even when the device is
partitioned. This isn't turned on by default since it presents
a backwards-incompatible device numbering.
The patch also makes the parent of a partition device be the master,
if the config flag is set, now that the master is a full device.
Signed-off-by: Dan Ehrenberg <dehrenberg@chromium.org>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
NAND:
* Add new Hisilicon NAND driver for Hip04
* Add default reboot handler, to ensure all outstanding erase transactions
complete in time
* jz4740: convert to use GPIO descriptor API
* Atmel: add support for sama5d4
* Change default bitflip threshold to 75% of correction strength
* Miscellaneous cleanups and bugfixes
SPI NOR:
* Freescale QuadSPI:
- Fix a few probe() and remove() issues
- Add a MAINTAINERS entry for this driver
- Tweak transfer size to increase read performance
- Add suspend/resume support
* Add Micron quad I/O support
* ST FSM SPI: miscellaneous fixes
JFFS2:
* gracefully handle corrupted 'offset' field found on flash
Other:
* bcm47xxpart: add tweaks for a few new devices
* mtdconcat: set return lengths properly for mtd_write_oob()
* map_ram: enable use with mtdoops
* maps: support fallback to ROM/UBI for write-protected NOR flash
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU4qf2AAoJEFySrpd9RFgtmo4P/i7KD+Xx12SgBbO+ZUCqBJhh
X+gorTFr0YpItdn53i1PA8t+WnnXi4BHY07Y8fCj/JL+lxzS+00156o+hsYAFWIl
TVvjlFHxUYS/rh7plshd5kbEZunlXBOpWw2Qr4dSoIIuOChaRDm9eGNHJ75D/ImO
Cr+83cyYAm0F+fCHavZKHUq/iFmpDcrt3vbPx/Rv51W+rs/HqPPUcKxt4iaL5Thk
R0pkcaZHfJ+pkXfjkgRu/L35RLRVxRkycYvLlVSOyE/KqnzE1RRgFeHUYUiPeCem
xUEoI0OqIYlR5LuKTt/NsBtz1W0Kcm3AcQDC5QliKnbGCwm9nbHAjqfraaZ4Ks2Z
4YL/2pJCyJFT6NPjsiwiYkJOzJHvN8tLCSIQrXCtAKAkMn8YMHvWIEC/bVsAkpVq
V3ke3gmZ8bY7sXyY+Fi5WVW4uxKCwSVtGiAw3i74v3z5hZZ818hkbtPc1J0CANiE
iqbkLMJ5pvWuVT9V2qGlDqK1MDqNXNLXZgBfT9tJx/q5Ptitva79Ift4teRwery2
5pD3uSaA3vJE2AGHKPfIyTDFqdDDUDCOWJIGbIKsYoKXSAmuOxuWKEhRMWeZMmjo
o0ZOrhJqBNp4ZqvAxUddUOsGhRKNa3btPoB+IhAQG4+OBwxknsAY39BzPcBjKrkG
iEKHgRDXXMe8W2wCalLw
=+nRk
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20150216' of git://git.infradead.org/linux-mtd
Pull MTD updates from Brian Norris:
"NAND:
- Add new Hisilicon NAND driver for Hip04
- Add default reboot handler, to ensure all outstanding erase
transactions complete in time
- jz4740: convert to use GPIO descriptor API
- Atmel: add support for sama5d4
- Change default bitflip threshold to 75% of correction strength
- Miscellaneous cleanups and bugfixes
SPI NOR:
- Freescale QuadSPI:
- Fix a few probe() and remove() issues
- Add a MAINTAINERS entry for this driver
- Tweak transfer size to increase read performance
- Add suspend/resume support
- Add Micron quad I/O support
- ST FSM SPI: miscellaneous fixes
JFFS2:
- gracefully handle corrupted 'offset' field found on flash
Other:
- bcm47xxpart: add tweaks for a few new devices
- mtdconcat: set return lengths properly for mtd_write_oob()
- map_ram: enable use with mtdoops
- maps: support fallback to ROM/UBI for write-protected NOR flash"
* tag 'for-linus-20150216' of git://git.infradead.org/linux-mtd: (46 commits)
mtd: hisilicon: && vs & typo
jffs2: fix handling of corrupted summary length
mtd: hisilicon: add device tree binding documentation
mtd: hisilicon: add a new NAND controller driver for hisilicon hip04 Soc
mtd: avoid registering reboot notifier twice
mtd: concat: set the return lengths properly
mtd: kconfig: replace PPC_OF with PPC
mtd: denali: remove unnecessary stubs
mtd: nand: remove redundant local variable
MAINTAINERS: add maintainer entry for FREESCALE QUAD SPI driver
mtd: fsl-quadspi: improve read performance by increase AHB transfer size
mtd: fsl-quadspi: Remove unnecessary 'map_failed' label
mtd: fsl-quadspi: Remove unneeded success/error messages
mtd: fsl-quadspi: Fix the error paths
mtd: nand: omap: drop condition with no effect
mtd: nand: jz4740: Convert to GPIO descriptor API
mtd: nand: Request strength instead of bytes for soft BCH
mtd: nand: default bitflip-reporting threshold to 75% of correction strength
mtd: atmel_nand: introduce a new compatible string for sama5d4 chip
mtd: atmel_nand: return max bitflips in all sectors in pmecc_correction()
...
Calling mtd_device_parse_register with the same mtd_info
(e.g. registering several partitions on a single device)
would add the same reboot notifier twice, causing an
infinte loop in notifier_chain_register during boot up.
Signed-off-by: Niklas Cassel <nks@flawful.org>
[Brian: add FIXME comments]
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
The recently added mtd_mmap_capabilities can be used from loadable
modules, in particular romfs, but is not exported, so we get
ERROR: "mtd_mmap_capabilities" [fs/romfs/romfs.ko] undefined!
This adds the missing export.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: b4caecd480 ("fs: introduce f_op->mmap_capabilities for nommu mmap support")
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Since "BDI: Provide backing device capability information [try #3]" the
backing_dev_info structure also provides flags for the kind of mmap
operation available in a nommu environment, which is entirely unrelated
to it's original purpose.
Introduce a new nommu-only file operation to provide this information to
the nommu mmap code instead. Splitting this from the backing_dev_info
structure allows to remove lots of backing_dev_info instance that aren't
otherwise needed, and entirely gets rid of the concept of providing a
backing_dev_info for a character device. It also removes the need for
the mtd_inodefs filesystem.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Tejun Heo <tj@kernel.org>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
cfi_cmdset_000{1,2}.c already implement their own reboot notifiers, and
we're going to add one for NAND. Let's put the boilerplate in one place.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Tested-by: Scott Branden <sbranden@broadcom.com>
MTD used to allow compiling out character device support. This was
dropped in the following commit, but some of the accompanying logic was
never dropped:
commit 660685d9d1
Author: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Date: Thu Mar 14 13:27:40 2013 +0200
mtd: merge mtdchar module with mtdcore
The weird logic was flagged by Coverity.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
When checking the upper boundary (i.e., whether an address is higher
than the maximum size of the MTD), we should be doing an inclusive check
(greater or equal). For instance, an address of 16MB (0x1000000) on a
16MB device is invalid.
The strengthening of this bounds check is redundant for those which
already have a address+length check and ensure that the length is
non-zero, but let's just fix them all, for completeness.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
In addition to mtd_block_isbad(), which checks if a block is bad or
reserved, it's needed to check if a block is reserved only (but not
bad). This commit adds an MTD interface for it, in a similar fashion to
mtd_block_isbad().
While here, fix mtd_block_isbad() so the out-of-bounds checking is done
before the callback check.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Tested-by: Pekon Gupta <pekon@ti.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
These new sysfs device attributes allow us to retrieve the ECC and bad
block stats by poking a sysfs file, which is often more convenient than
using the ioctl.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Tested-by: Pekon Gupta <pekon@ti.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
If a write to one time programmable memory (OTP) hits the end of this
memory area, no more data can be written. The count variable in
mtdchar_write() in drivers/mtd/mtdchar.c is not decreased anymore.
We are trapped in the loop forever, mtdchar_write() will never return
in this case.
The desired behavior of a write in such a case is described in [1]:
- Try to write as much data as possible, truncate the write to fit into
the available memory and return the number of bytes that actually
have been written.
- If no data could be written at all, return -ENOSPC.
This patch fixes the behavior of OTP write if there is not enough space
for all data:
1) mtd_write_user_prot_reg() in drivers/mtd/mtdcore.c is modified to
return -ENOSPC if no data could be written at all.
2) mtdchar_write() is modified to handle -ENOSPC correctly. Exit if a
write returned -ENOSPC and yield the correct return value, either
then number of bytes that could be written, or -ENOSPC, if no data
could be written at all.
Furthermore the patch harmonizes the behavior of the OTP memory write
in drivers/mtd/devices/mtd_dataflash.c with the other implementations
and the requirements from [1]. Instead of returning -EINVAL if the data
does not fit into the OTP memory, we try to write as much data as
possible/truncate the write.
[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
Signed-off-by: Christian Riesch <christian.riesch@omicron.at>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Use new ATTRIBUTE_GROUPS macro to declare attribute groups.
Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
This patch moves the char and block major number definitions
to major.h to be with the rest of the major numbers.
While doing this, include major.h in the files that need it.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
The current mtd_type_show() misses the MTD_MLCNANDFLASH case.
This patch adds the case for it, and also updates the ABI.
Signed-off-by: Huang Shijie <b32955@freescale.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Add a new sys node to show the ecc step size.
The application then can uses this node to get the ecc step
size.
Signed-off-by: Huang Shijie <b32955@freescale.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Calling dev_set_name with a single paramter causes it to be handled as a
format string. Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents,
including wrappers like device_create*() and bdi_register().
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Support partitions larger than 4GiB in device tree
- Support for new SPI chips
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iEYEABECAAYFAlGLxzEACgkQdwG7hYl686M+PgCdHAn3fDzGW7gUL1tj43NCqaC8
PWoAoNAD5YpI3wYEBxped2MjSfgbQMvq
=hM2T
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20130509' of git://git.infradead.org/linux-mtd
Pull MTD update from David Woodhouse:
- Lots of cleanups from Artem, including deletion of some obsolete
drivers
- Support partitions larger than 4GiB in device tree
- Support for new SPI chips
* tag 'for-linus-20130509' of git://git.infradead.org/linux-mtd: (83 commits)
mtd: omap2: Use module_platform_driver()
mtd: bf5xx_nand: Use module_platform_driver()
mtd: denali_dt: Remove redundant use of of_match_ptr
mtd: denali_dt: Change return value to fix smatch warning
mtd: denali_dt: Use module_platform_driver()
mtd: denali_dt: Fix incorrect error check
mtd: nand: subpage write support for hardware based ECC schemes
mtd: omap2: use msecs_to_jiffies()
mtd: nand_ids: use size macros
mtd: nand_ids: improve LEGACY_ID_NAND macro a bit
mtd: add 4 Toshiba nand chips for the full-id case
mtd: add the support to parse out the full-id nand type
mtd: add new fields to nand_flash_dev{}
mtd: sh_flctl: Use of_match_ptr() macro
mtd: gpio: Use of_match_ptr() macro
mtd: gpio: Use devm_kzalloc()
mtd: davinci_nand: Use of_match_ptr()
mtd: dataflash: Use of_match_ptr() macro
mtd: remove h720x flash support
mtd: onenand: remove OneNAND simulator
...
The MTD subsystem has historically tried to be as configurable as possible. The
side-effect of this is that its configuration menu is rather large, and we are
gradually shrinking it. For example, we recently merged partitions support with
the mtdcore.
This patch does the next step - it merges the mtdchar module to mtdcore. And in
this case this is not only about eliminating too fine-grained separation and
simplifying the configuration menu. This is also about eliminating seemingly
useless kernel module.
Indeed, mtdchar is a module that allows user-space making use of MTD devices
via /dev/mtd* character devices. If users do not enable it, they simply cannot
use MTD devices at all. They cannot read or write the flash contents. Is it a
sane and useful setup? I believe not. And everyone just enables mtdchar.
Having mtdchar separate is also a little bit harmful. People sometimes miss the
fact that they need to enable an additional configuration option to have
user-space MTD interfaces, and then they wonder why on earth the kernel does
not allow using the flash? They spend time asking around.
Thus, let's just get rid of this module and make it part of mtd core.
Note, mtdchar had additional configuration option to enable OTP interfaces,
which are present on some flashes. I removed that option as well - it saves a
really tiny amount space.
[dwmw2: Strictly speaking, you can mount file systems on MTD devices just
fine without the mtdchar (or mtdblock) devices; you just can't do
other manipulations directly on the underlying device. But still I
agree that it makes sense to make this unconditional. And Yay! we
get to kill off an instance of checking CONFIG_foo_MODULE, which is
an abomination that should never happen.]
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Remove a couple of useles '#ifdef CONFIG_PROC_FS's around procfs functions
which anyway turn into empty function in 'proc_fs.h' file when CONFIG_PROC_FS
is not defined.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
'mtd_device_parse_register()' and 'parse_mtd_partitions()' functions accept a
an array of character pointers. These functions modify neither the pointers nor
the characters they point to. The characters are actually names of the MTD
parsers.
At the moment, the argument type is 'const char **', which means that only the
names of the parsers are constant. Let's turn the argument type into 'const
char * const *', which means that both names and the pointers which point to
them are constant.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Convert to the much saner new idr interface.
Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commits a50915394f and
d7c3b937bd.
This is a revert of a revert of a revert. In addition, it reverts the
even older i915 change to stop using the __GFP_NO_KSWAPD flag due to the
original commits in linux-next.
It turns out that the original patch really was bogus, and that the
original revert was the correct thing to do after all. We thought we
had fixed the problem, and then reverted the revert, but the problem
really is fundamental: waking up kswapd simply isn't the right thing to
do, and direct reclaim sometimes simply _is_ the right thing to do.
When certain allocations fail, we simply should try some direct reclaim,
and if that fails, fail the allocation. That's the right thing to do
for THP allocations, which can easily fail, and the GPU allocations want
to do that too.
So starting kswapd is sometimes simply wrong, and removing the flag that
said "don't start kswapd" was a mistake. Let's hope we never revisit
this mistake again - and certainly not this many times ;)
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It apepars that this patch was innocent, and we hope that "mm: avoid
waking kswapd for THP allocations when compaction is deferred or
contended" will fix the final kswapd-spinning cause.
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
based on failures" reverted, Zdenek Kabelac reported the following
Hmm, so it's just took longer to hit the problem and observe
kswapd0 spinning on my CPU again - it's not as endless like before -
but still it easily eats minutes - it helps to turn off Firefox
or TB (memory hungry apps) so kswapd0 stops soon - and restart
those apps again. (And I still have like >1GB of cached memory)
kswapd0 R running task 0 30 2 0x00000000
Call Trace:
preempt_schedule+0x42/0x60
_raw_spin_unlock+0x55/0x60
put_super+0x31/0x40
drop_super+0x22/0x30
prune_super+0x149/0x1b0
shrink_slab+0xba/0x510
The sysrq+m indicates the system has no swap so it'll never reclaim
anonymous pages as part of reclaim/compaction. That is one part of the
problem but not the root cause as file-backed pages could also be
reclaimed.
The likely underlying problem is that kswapd is woken up or kept awake
for each THP allocation request in the page allocator slow path.
If compaction fails for the requesting process then compaction will be
deferred for a time and direct reclaim is avoided. However, if there
are a storm of THP requests that are simply rejected, it will still be
the the case that kswapd is awake for a prolonged period of time as
pgdat->kswapd_max_order is updated each time. This is noticed by the
main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead
it will loopp, shrinking a small number of pages and calling
shrink_slab() on each iteration.
The temptation is to supply a patch that checks if kswapd was woken for
THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not
backed up by proper testing. As 3.7 is very close to release and this
is not a bug we should release with, a safer path is to revert "mm:
remove __GFP_NO_KSWAPD" for now and revisit it with the view to ironing
out the balance_pgdat() logic in general.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When transparent huge pages were introduced, memory compaction and swap
storms were an issue, and the kernel had to be careful to not make THP
allocations cause pageout or compaction.
Now that we have working compaction deferral, kswapd is smart enough to
invoke compaction and the quadratic behaviour around isolate_free_pages
has been fixed, it should be safe to remove __GFP_NO_KSWAPD.
[minchan@kernel.org: Comment fix]
[mgorman@suse.de: Avoid direct reclaim for deferred compaction]
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mtd_read_oob() has some unexpected similarities to mtd_read(). For
instance, when ops->datbuf != NULL, nand_base.c might return max_bitflips;
however, when ops->datbuf == NULL, nand_base's code potentially could
return -EUCLEAN (no in-tree drivers do this yet). In any case where the
driver might return max_bitflips, we should translate this into an
appropriate return code using the bitflip_threshold.
Essentially, mtd_read_oob() duplicates the logic from mtd_read().
This prevents users of mtd_read_oob() from receiving a positive return
value (i.e., from max_bitflips) and interpreting it as an unknown error.
Artem: amend comments.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
mtd_read_oob() will be expanded a little, so don't leave it in the header
as a static inline function.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The drivers' _read() method, absent an error, returns a non-negative integer
indicating the maximum number of bit errors that were corrected in any one
region comprising an ecc step. MTD returns -EUCLEAN if this is >=
bitflip_threshold, 0 otherwise. If bitflip_threshold is zero, the comparison is
not made since these devices lack ECC and always return zero in the non-error
case (thanks Brian)¹. Note that this is a subtle change to the driver
interface.
This and the preceding patches in this set were tested with ubi on top of the
nandsim and docg4 devices, running the ubi test io_basic from mtd-utils.
¹ http://lists.infradead.org/pipermail/linux-mtd/2012-March/040468.html
Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Ivan Djelic <ivan.djelic@parrot.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
An element 'bitflip_threshold' is added to struct mtd_info, and also exposed as
a read/write variable in sysfs. This will be used to determine whether or not
mtd_read() returns -EUCLEAN or 0 (absent a hard error). If the driver leaves it
as zero, mtd will set it to a default value of ecc_strength.
This v2 adds the line that propagates bitflip_threshold from the master to the
partitions - thanks Ivan¹.
¹ http://lists.infradead.org/pipermail/linux-mtd/2012-April/040900.html
Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
ecc_strength element of struct mtd_info is exposed as a read-only variable in
sysfs.
Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Initialization of 'erase_info->fail_addr' to MTD_FAIL_ADDR_UNKNOWN prior
erase operation is duplicated accross several MTD drivers, and also taken
care of by some MTD users as well.
Harmonize it: initialize 'fail_addr' within 'mtd_erase()' interface.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch changes all the OTP functions like 'mtd_get_fact_prot_info()' and
makes them return zero immediately if the input 'len' parameter is 0. This is
not really needed currently, but most of the other functions do this, and it is
just consistent to do the same in the OTP functions.
This patch also moves the OTP functions from the header file to mtdcore.c
because they become a bit too big for being inlined.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In many places in drivers we verify for the zero length, but this is very
inconsistent across drivers. This is obviously the right thing to do, though.
This patch moves the check to the MTD API functions instead and removes a lot
of duplication.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Some MTD drivers return -EINVAL if the 'phys' parameter is not NULL, trying to
convey that they cannot return the physical address. However, this is not very
logical because they still can return the virtual address ('virt'). But some
drivers (lpddr) just ignore the 'phys' parameter instead, which is a more
logical thing to do.
Let's harmonize this and:
1. Always initialize 'virt' and 'phys' to 'NULL' in 'mtd_point()'.
2. Do not return an error if the physical address cannot be found.
So as a result, all drivers will set 'phys' to 'NULL' if it is not supported.
None of the 'mtd_point()' users use 'phys' anyway, so this should not break
anything. I guess we could also just delete this parameter later.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The MTD API function now zero the 'retlen' parameter before calling
the driver's method — do not do this again in drivers. This removes
duplicated '*retlen = 0' assignent from the following methods:
'mtd_point()'
'mtd_read()'
'mtd_write()'
'mtd_writev()'
'mtd_panic_write()'
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>