Commit Graph

213802 Commits

Author SHA1 Message Date
Ryusuke Konishi c05dbfc260 nilfs2: accept 64-bit checkpoint numbers in cp mount option
The current implementation doesn't mount snapshots with checkpoint
numbers larger than INT_MAX since it uses match_int() for parsing
"cp=" mount option.

This uses simple_strtoull() for the conversion to resolve the issue.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:39 +09:00
Ryusuke Konishi 2879ed66e4 nilfs2: remove own inode allocator and destructor for metadata files
This finally removes own inode allocator and destructor functions for
metadata files.  Several routines, nilfs_mdt_new(),
nilfs_mdt_new_common(), nilfs_mdt_clear(), nilfs_mdt_destroy(), and
nilfs_alloc_inode_common() will be gone.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:39 +09:00
Ryusuke Konishi 090fd5b101 nilfs2: get rid of back pointer to writable sb instance
Nilfs object holds a back pointer to a writable super block instance
in nilfs->ns_writer, and this became eliminable since sb is now made
per device and all inodes have a valid pointer to it.

This deletes the ns_writer pointer and a reader/writer semaphore
protecting it.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:38 +09:00
Ryusuke Konishi c6e071884a nilfs2: get rid of mi_nilfs back pointer to nilfs object
This removes a back pointer to nilfs object from nilfs_mdt_info
structure that is attached to metadata files.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:38 +09:00
Ryusuke Konishi 032dbb3b50 nilfs2: see state of root dentry for mount check of snapshots
After applied the patch that unified sb instances, root dentry of
snapshots can be left in dcache even after their trees are unmounted.

The orphan root dentry/inode keeps a root object, and this causes
false positive of nilfs_checkpoint_is_mounted function.

This resolves the issue by having nilfs_checkpoint_is_mounted test
whether the root dentry is busy or not.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:38 +09:00
Ryusuke Konishi f1e89c86fd nilfs2: use iget for all metadata files
This makes use of iget5_locked to allocate or get inode for metadata
files to stop using own inode allocator.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:38 +09:00
Ryusuke Konishi c1c1d70920 nilfs2: get rid of GCDAT inode
This applies prepared rollback function and redirect function of
metadata file to DAT file, and eliminates GCDAT inode.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:38 +09:00
Ryusuke Konishi b1f6a4f294 nilfs2: add routines to redirect access to buffers of DAT file
During garbage collection (GC), DAT file, which converts virtual block
number to real block number, may return disk block number that is not
yet written to the device.

To avoid access to unwritten blocks, the current implementation stores
changes to the caches of GCDAT during GC and atomically commit the
changes into the DAT file after they are written to the device.

This patch, instead, adds a function that makes a copy of specified
buffer and stores it in nilfs_shadow_map, and a function to get the
backup copy as needed (nilfs_mdt_freeze_buffer and
nilfs_mdt_get_frozen_buffer respectively).

Before DAT changes block number in an entry block, it makes a copy and
redirect access to the buffer so that address conversion function
(i.e. nilfs_dat_translate) refers to the old address saved in the
copy.

This patch gives requisites for such redirection.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:37 +09:00
Ryusuke Konishi ebdfed4dc5 nilfs2: add routines to roll back state of DAT file
This adds optional function to metadata files which makes a copy of
bmap, page caches, and b-tree node cache, and rolls back to the copy
as needed.

This enhancement is intended to displace gcdat inode that provides a
similar function in a different way.

In this patch, nilfs_shadow_map structure is added to store a copy of
the foregoing states.  nilfs_mdt_setup_shadow_map relates this
structure to a metadata file.  And, nilfs_mdt_save_to_shadow_map() and
nilfs_mdt_restore_from_shadow_map() provides save and restore
functions respectively.  Finally, nilfs_mdt_clear_shadow_map() clears
states of nilfs_shadow_map.

The copy of b-tree node cache and page cache is made by duplicating
only dirty pages into corresponding caches in nilfs_shadow_map.  Their
restoration is done by clearing dirty pages from original caches and
by copying dirty pages back from nilfs_shadow_map.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:37 +09:00
Ryusuke Konishi a8070dd365 nilfs2: add routines to save and restore bmap state
This adds routines to save and restore the state of bmap structure.
The bmap state is stored in a given nilfs_bmap_store object.

These routines will be used to roll back the state of dat inode
without using gcdat inode.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:37 +09:00
Ryusuke Konishi adbb39b548 nilfs2: do not allocate nilfs_mdt_info structure to gc-inodes
GC-inode now doesn't need the nilfs_mdt_info structure and there is no
reason that it is a sort of metadata files.

This stops the allocation and makes them not dependent on metadata
file routines.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:37 +09:00
Ryusuke Konishi 518d1a6a1d nilfs2: allow nilfs_clear_inode to clear metadata file inodes
Allows clear inode function (nilfs_clear_inode) to handle metadata
files that uses bitmap-based object alloctor.  DAT and ifile
correspond to this.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:37 +09:00
Ryusuke Konishi b453c95eb8 nilfs2: get rid of snapshot mount flag
This flag is a fake used to distinguish type of super block instance.
And, it got obsolete by the unification of sb.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:36 +09:00
Ryusuke Konishi 348fe8da13 nilfs2: simplify life cycle management of nilfs object
This stops pre-allocating nilfs object in nilfs_get_sb routine, and
stops managing its life cycle by reference counting.

nilfs_find_or_create_nilfs() function, nilfs->ns_mount_mutex,
nilfs_objects list, and the reference counter will be removed through
the simplification.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:36 +09:00
Ryusuke Konishi f11459ad7d nilfs2: do not allocate multiple super block instances for a device
This stops allocating multiple super block instances for a device.

All snapshots and a current mode mount (i.e. latest tree) will be
controlled with nilfs_root objects that are kept within an sb
instance.

nilfs_get_sb() is rewritten so that it always has a root object for
the latest tree and snapshots make additional root objects.

The root dentry of the latest tree is binded to sb->s_root even if it
isn't attached on a directory.  Root dentries of snapshots or the
latest tree are binded to mnt->mnt_root on which they are mounted.

With this patch, nilfs_find_sbinfo() function, nilfs->ns_supers list,
and nilfs->ns_current back pointer, are deleted.  In addition,
init_nilfs() and load_nilfs() are simplified since they will be called
once for a device, not repeatedly called for mount points.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:36 +09:00
Ryusuke Konishi ab4d8f7ebf nilfs2: split out nilfs_attach_snapshot
This splits the code to attach snapshots into a separate routine for
convenience sake.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:36 +09:00
Ryusuke Konishi 367ea33486 nilfs2: split out nilfs_get_root_dentry
This splits the code to allocate root dentry into a separate routine
for convenience in successive changes.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:35 +09:00
Ryusuke Konishi dc3d3b810a nilfs2: deny write access to inodes in snapshots
Snapshots of nilfs are read-only.

After super block instances (sb) will be unified, nilfs will need to
check write access by a way other than implicit test with
IS_RDONLY(inode).  This is because IS_RDONLY() refers to MS_RDONLY bit
of inode->i_sb->s_flags and it will become inaccurate after the
unification of sb.

To prepare for the issue, this uses i_op->permission to deny write
access to inodes in snapshots.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:35 +09:00
Ryusuke Konishi fd52202930 nilfs2: use checkpoint tree for mount check of snapshots
This rewrites nilfs_checkpoint_is_mounted() function so that it
decides whether a checkpoint is mounted by whether the corresponding
root object is found in checkpoint tree.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:35 +09:00
Ryusuke Konishi b7c0634204 nilfs2: move inode count and block count into root object
This moves sbi->s_inodes_count and sbi->s_blocks_count into nilfs_root
object.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:35 +09:00
Ryusuke Konishi e912a5b668 nilfs2: use root object to get ifile
This rewrites functions using ifile so that they get ifile from
nilfs_root object, and will remove sbi->s_ifile.  Some functions that
don't know the root object are extended to receive it from caller.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:35 +09:00
Ryusuke Konishi 8e656fd518 nilfs2: make snapshots in checkpoint tree exportable
The previous export operations cannot handle multiple versions of
a filesystem if they belong to the same sb instance.

This adds a new type of file handle and extends export operations so
that they can get the inode specified by a checkpoint number as well
as an inode number and a generation number.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:34 +09:00
Ryusuke Konishi 4d8d9293dc nilfs2: set pointer to root object in inodes
This puts a pointer to nilfs_root object in the private part of
on-memory inode, and makes nilfs_iget function pick up the inode with
the same root object.

Non-root inodes inherit its nilfs_root object from parent inode.  That
of the root inode is allocated through nilfs_attach_checkpoint()
function.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:34 +09:00
Ryusuke Konishi ba65ae4729 nilfs2: add checkpoint tree to nilfs object
To hold multiple versions of a filesystem in one sb instance, a new
on-memory structure is necessary to handle one or more checkpoints.

This adds a red-black tree of checkpoints to nilfs object, and adds
lookup and create functions for them.

Each checkpoint is represented by "nilfs_root" structure, and this
structure has rb_node to configure the rb-tree.

The nilfs_root object is identified with a checkpoint number.  For
each snapshot, a nilfs_root object is allocated and the checkpoint
number of snapshot is assigned to it.  For a regular mount
(i.e. current mode mount), NILFS_CPTREE_CURRENT_CNO constant is
assigned to the corresponding nilfs_root object.

Each nilfs_root object has an ifile inode and some counters.  These
items will displace those of nilfs_sb_info structure in successive
patches.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:34 +09:00
Ryusuke Konishi 263d90cefc nilfs2: remove own inode hash used for GC
This uses inode hash function that vfs provides instead of the own
hash table for caching gc inodes.  This finally removes the own inode
hash from nilfs.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:34 +09:00
Ryusuke Konishi 5e19a995f4 nilfs2: separate initializer of metadata file inode
This separates a part of initialization code of metadata file inode,
and makes it available from the nilfs iget function that a later patch
will add to.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:34 +09:00
Ryusuke Konishi 0e14a3595b nilfs2: use iget5_locked to get inode
This uses iget5_locked instead of iget_locked so that gc cache can
look up inodes with an inode number and an optional checkpoint number.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:33 +09:00
Ryusuke Konishi 6c43f41000 nilfs2: keep zero value in i_cno except for gc-inodes
On-memory inode structures of nilfs have a member "i_cno" which stores
a checkpoint number related to the inode.  For gc-inodes, this field
indicates version of data each gc-inode caches for GC.  Log writer
temporarily uses "i_cno" to transfer the latest checkpoint number.

This stops the latter use and lets only gc-inodes use it.

The purpose of this patch is to allow the successive change use
"i_cno" for inode lookup.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:33 +09:00
Ryusuke Konishi 7d6cd92fe2 nilfs2: allow nilfs_dirty_inode to mark metadata file inodes dirty
This allows sop->dirty_inode callback function (nilfs_dirty_inode) to
handle metadata file inodes.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:33 +09:00
Ryusuke Konishi b91c9a97c9 nilfs2: allow nilfs_destroy_inode to destroy metadata file inodes
The current nilfs_destroy_inode() doesn't handle metadata file inodes
including gc inodes (dummy inodes used for garbage collection).

This allows nilfs_destroy_inode() to destroy inodes of metadata files.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:33 +09:00
Ryusuke Konishi 9566a7a851 nilfs2: accept future revisions
Compatibility of nilfs partitions is now managed with three feature
sets.  This changes old compatibility check with revision number so
that it can accept future revisions.

Note that we can stop support of experimental versions of nilfs that
doesn't know the feature sets by incrementing NILFS_CURRENT_REV.  We
don't have to do it soon, but it would be a possible option whenever
the need arises.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-10-23 09:24:33 +09:00
Linus Torvalds 91b745016c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: remove in_workqueue_context()
  workqueue: Clarify that schedule_on_each_cpu is synchronous
  memory_hotplug: drop spurious calls to flush_scheduled_work()
  shpchp: update workqueue usage
  pciehp: update workqueue usage
  isdn/eicon: don't call flush_scheduled_work() from diva_os_remove_soft_isr()
  workqueue: add and use WQ_MEM_RECLAIM flag
  workqueue: fix HIGHPRI handling in keep_working()
  workqueue: add queue_work and activate_work trace points
  workqueue: prepare for more tracepoints
  workqueue: implement flush[_delayed]_work_sync()
  workqueue: factor out start_flush_work()
  workqueue: cleanup flush/cancel functions
  workqueue: implement alloc_ordered_workqueue()

Fix up trivial conflict in fs/gfs2/main.c as per Tejun
2010-10-22 17:13:10 -07:00
Linus Torvalds 04cc69768e Merge branch 'for-2.6.37/misc' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.37/misc' of git://git.kernel.dk/linux-2.6-block:
  pipe: fix failure to return error code on ->confirm()
2010-10-22 17:07:56 -07:00
Linus Torvalds a2887097f2 Merge branch 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
  xen-blkfront: disable barrier/flush write support
  Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
  block: remove BLKDEV_IFL_WAIT
  aic7xxx_old: removed unused 'req' variable
  block: remove the BH_Eopnotsupp flag
  block: remove the BLKDEV_IFL_BARRIER flag
  block: remove the WRITE_BARRIER flag
  swap: do not send discards as barriers
  fat: do not send discards as barriers
  ext4: do not send discards as barriers
  jbd2: replace barriers with explicit flush / FUA usage
  jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
  jbd: replace barriers with explicit flush / FUA usage
  nilfs2: replace barriers with explicit flush / FUA usage
  reiserfs: replace barriers with explicit flush / FUA usage
  gfs2: replace barriers with explicit flush / FUA usage
  btrfs: replace barriers with explicit flush / FUA usage
  xfs: replace barriers with explicit flush / FUA usage
  block: pass gfp_mask and flags to sb_issue_discard
  dm: convey that all flushes are processed as empty
  ...
2010-10-22 17:07:18 -07:00
Linus Torvalds 8abfc6e7a4 Merge branch 'for-2.6.37/drivers' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.37/drivers' of git://git.kernel.dk/linux-2.6-block: (95 commits)
  cciss: fix PCI IDs for new Smart Array controllers
  drbd: add race-breaker to drbd_go_diskless
  drbd: use dynamic_dev_dbg to optionally log uuid changes
  dynamic_debug.h: Fix dynamic_dev_dbg() macro if CONFIG_DYNAMIC_DEBUG not set
  drbd: cleanup: change "<= 0" to "== 0"
  drbd: relax the grace period of the md_sync timer again
  drbd: add some more explicit drbd_md_sync
  drbd: drop wrong debug asserts, fix recently introduced race
  drbd: cleanup useless leftover warn/error printk's
  drbd: add explicit drbd_md_sync to drbd_resync_finished
  drbd: Do not log an ASSERT for P_OV_REQUEST packets while C_CONNECTED
  drbd: fix for possible deadlock on IO error during resync
  drbd: fix unlikely access after free and list corruption
  drbd: fix for spurious fullsync (uuids rotated too fast)
  drbd: allow for explicit resync-finished notifications
  drbd: preparation commit, using full state in receive_state()
  drbd: drbd_send_ack_dp must not rely on header information
  drbd: Fix regression in recv_bm_rle_bits (compressed bitmap)
  drbd: Fixed a stupid copy and paste error
  drbd: Allow larger values for c-fill-target.
  ...

Fix up trivial conflict in drivers/block/ataflop.c due to BKL removal
2010-10-22 17:03:12 -07:00
Linus Torvalds e9dd2b6837 Merge branch 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits)
  cfq-iosched: Fix a gcc 4.5 warning and put some comments
  block: Turn bvec_k{un,}map_irq() into static inline functions
  block: fix accounting bug on cross partition merges
  block: Make the integrity mapped property a bio flag
  block: Fix double free in blk_integrity_unregister
  block: Ensure physical block size is unsigned int
  blkio-throttle: Fix possible multiplication overflow in iops calculations
  blkio-throttle: limit max iops value to UINT_MAX
  blkio-throttle: There is no need to convert jiffies to milli seconds
  blkio-throttle: Fix link failure failure on i386
  blkio: Recalculate the throttled bio dispatch time upon throttle limit change
  blkio: Add root group to td->tg_list
  blkio: deletion of a cgroup was causes oops
  blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n
  block: set the bounce_pfn to the actual DMA limit rather than to max memory
  block: revert bad fix for memory hotplug causing bounces
  Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK
  block: set the bounce_pfn to the actual DMA limit rather than to max memory
  block: Prevent hang_check firing during long I/O
  cfq: improve fsync performance for small files
  ...

Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h
2010-10-22 17:00:32 -07:00
Linus Torvalds 4f3a29dada Merge branch 'linux-next' of git://git.infradead.org/ubi-2.6
* 'linux-next' of git://git.infradead.org/ubi-2.6:
  UBI: tighten the corrupted PEB criteria
  UBI: fix check_data_ff return code
  UBI: remember copy_flag while scanning
  UBI: preserve corrupted PEBs
  UBI: add truly corrupted PEBs to corrupted list
  UBI: introduce debugging helper function
  UBI: make check_pattern function non-static
  UBI: do not put eraseblocks to the corrupted list unnecessarily
  UBI: separate out corrupted list
  UBI: change cascade of ifs to switch statements
  UBI: rename a local variable
  UBI: handle bit-flips when no header found
  UBI: remove duplicate IO error codes
  UBI: rename IO error code
  UBI: fix small 80 characters limit style issue
  UBI: cleanup and simplify Kconfig
2010-10-22 16:34:23 -07:00
Linus Torvalds 06d362931a Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
  UBIFS: do not allocate unneeded scan buffer
  UBIFS: do not forget to cancel timers
  UBIFS: remove a bit of unneeded code
  UBIFS: add a commentary about log recovery
  UBIFS: avoid kernel error if ubifs superblock read fails
  UBIFS: introduce new flags for RO mounts
  UBIFS: introduce new flag for RO due to errors
  UBIFS: check return code of pnode_lookup
  UBIFS: check return code of ubifs_lpt_lookup
  UBIFS: improve error reporting when reading bad node
  UBIFS: introduce list sorting debugging checks
  UBIFS: fix assertion warnings in comparison function
  UBIFS: mark unused key objects as invalid
  UBIFS: do not write rubbish into truncation scanning node
  UBIFS: improve assertion in node comparison functions
  UBIFS: do not use key type in list_sort
  UBIFS: do not look up truncation nodes
  UBIFS: fix assertion warning
  UBIFS: do not treat ENOSPC specially
  UBIFS: switch to RO mode after synchronizing
2010-10-22 16:34:03 -07:00
Jason Wessel 495363d380 kdb,debug_core: adjust master cpu switch logic against new debug_core locking
The kdb shell needs to enforce switching back to the original CPU that
took the exception before restoring normal kernel execution.  Resuming
from a different CPU than what took the original exception will cause
problems with spin locks that are freed from the a different processor
than had taken the lock.

The special logic in dbg_cpu_switch() can go away entirely with
because the state of what cpus want to be masters or slaves will
remain unchanged between entry and exit of the debug_core exception
context.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-10-22 15:34:13 -05:00
Jason Wessel dfee3a7b92 debug_core: refactor locking for master/slave cpus
For quite some time there have been problems with memory barriers and
various races with NMI on multi processor systems using the kernel
debugger.  The algorithm for entering the kernel debug core and
resuming kernel execution was racy and had several known edge case
problems with attempting to debug something on a heavily loaded system
using breakpoints that are hit repeatedly and quickly.

The prior "locking" design entry worked as follows:

  * The atomic counter kgdb_active was used with atomic exchange in
    order to elect a master cpu out of all the cpus that may have
    taken a debug exception.
  * The master cpu increments all elements of passive_cpu_wait[].
  * The master cpu issues the round up cpus message.
  * Each "slave cpu" that enters the debug core increments its own
    element in cpu_in_kgdb[].
  * Each "slave cpu" spins on passive_cpu_wait[] until it becomes 0.
  * The master cpu debugs the system.

The new scheme removes the two arrays of atomic counters and replaces
them with 2 single counters.  One counter is used to count the number
of cpus waiting to become a master cpu (because one or more hit an
exception). The second counter is use to indicate how many cpus have
entered as slave cpus.

The new entry logic works as follows:

  * One or more cpus enters via kgdb_handle_exception() and increments
    the masters_in_kgdb. Each cpu attempts to get the spin lock called
    dbg_master_lock.
  * The master cpu sets kgdb_active to the current cpu.
  * The master cpu takes the spinlock dbg_slave_lock.
  * The master cpu asks to round up all the other cpus.
  * Each slave cpu that is not already in kgdb_handle_exception()
    will enter and increment slaves_in_kgdb.  Each slave will now spin
    try_locking on dbg_slave_lock.
  * The master cpu waits for the sum of masters_in_kgdb and slaves_in_kgdb
    to be equal to the sum of the online cpus.
  * The master cpu debugs the system.

In the new design the kgdb_active can only be changed while holding
dbg_master_lock.  Stress testing has not turned up any further
entry/exit races that existed in the prior locking design.  The prior
locking design suffered from atomic variables not being truly atomic
(in the capacity as used by kgdb) along with memory barrier races.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Dongdong Deng <dongdong.deng@windriver.com>
2010-10-22 15:34:13 -05:00
Dongdong Deng 39a0715f5a x86,kgdb: remove unnecessary call to kgdb_correct_hw_break()
The kernel debug_core invokes hw breakpoint install and removal via
call backs.  The architecture specific kgdb stubs only need to
implement the call backs and not actually call the functions.

Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
CC: x86@kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: H. Peter Anvin <hpa@zytor.com>
2010-10-22 15:34:13 -05:00
Dongdong Deng c1bb9a9c19 debug_core: disable hw_breakpoints on all cores in kgdb_cpu_enter()
The slave cpus do not have the hw breakpoints disabled upon entry to
the debug_core and as a result could cause unrecoverable recursive
faults on badly placed breakpoints, or get out of sync with the arch
specific hw breakpoint operations.

This patch addresses the problem by invoking kgdb_disable_hw_debug()
earlier in kgdb_enter_cpu for each cpu that enters the debug core.

The hw breakpoint dis/enable flow should be:

master_debug_cpu   slave_debug_cpu
         \              /
          kgdb_cpu_enter
                |
        kgdb_disable_hw_debug --> uninstall pre-enabled hw_breakpoint
                |
 do add/rm dis/enable operates to hw_breakpoints on master_debug_cpu..
                |
        correct_hw_break --> correct/install the enabled hw_breakpoint
                |
           leave_kgdb

Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-10-22 15:34:12 -05:00
Jason Wessel 91b152aa85 kdb,kgdb: fix sparse fixups
Fix the following sparse warnings:

kdb_main.c:328:5: warning: symbol 'kdbgetu64arg' was not declared. Should it be static?
kgdboc.c:246:12: warning: symbol 'kgdboc_early_init' was not declared. Should it be static?
kgdb.c:652:26: warning: incorrect type in argument 1 (different address spaces)
kgdb.c:652:26:    expected void const *ptr
kgdb.c:652:26:    got struct perf_event *[noderef] <asn:3>*pev

The one in kgdb.c required the (void * __force) because of the return
code from register_wide_hw_breakpoint looking like:

        return (void __percpu __force *)ERR_PTR(err);

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-10-22 15:34:12 -05:00
Jason Wessel 75d14edee5 kdb: Fix oops in kdb_unregister
Nothing should try to use kdb_commands directly as sometimes it is
null.  Instead, use the for_each_kdbcmd() iterator.

This particular problem dates back to the initial kdb merge (2.6.35),
but at that point nothing was dynamically unregistering commands from
the kdb shell.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-10-22 15:34:12 -05:00
Jason Wessel e3bda3ac33 kdb,ftdump: Remove reference to internal kdb include
Now that include/linux/kdb.h properly exports all the functions
required to dynamically add a kdb shell command, the reference to the
private kdb header can be removed.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-10-22 15:34:11 -05:00
Jason Wessel f7030bbc44 kdb: Allow kernel loadable modules to add kdb shell functions
In order to allow kernel modules to dynamically add a command to the
kdb shell the kdb_register, kdb_register_repeat, kdb_unregister, and
kdb_printf need to be exported as GPL symbols.

Any kernel module that adds a dynamic kdb shell function should only
need to include linux/kdb.h.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-10-22 15:34:11 -05:00
Jason Wessel fb70b5888b debug_core: stop rcu warnings on kernel resume
When returning from the kernel debugger reset the rcu jiffies_stall
value to prevent the rcu stall detector from sending NMI events which
invoke a stack dump for each cpu in the system.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-10-22 15:34:10 -05:00
Jason Wessel 16cdc628c3 debug_core: move all watch dog syncs to a single function
Move the various clock and watch dog syncs to a single function in
advance of adding another sync for the rcu stall detector.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-10-22 15:34:10 -05:00
Jason Wessel fad99fac26 x86,kgdb: fix debugger hw breakpoint test regression in 2.6.35
HW breakpoints events stopped working correctly with kgdb as a result
of commit: 018cbffe68 (Merge commit
'v2.6.33' into perf/core), later commit:
ba773f7c51 (x86,kgdb: Fix hw breakpoint
regression) allowed breakpoints to propagate to the debugger core but
did not completely address the original regression in functionality
found in 2.6.35.

When the DR_STEP flag is set in dr6 along with any of the DR_TRAP
bits, the kgdb exception handler will enter once from the
hw_breakpoint API call back and again from the die notifier for
do_debug(), which causes the debugger to stop twice and also for the
kgdb regression tests to fail running under kvm with:

echo V2I1 > /sys/module/kgdbts/parameters/kgdbts

To address the problem, the kgdb overflow handler needs to implement
the same logic as the ptrace overflow handler call back with respect
to updating the virtual copy of dr6.  This will allow the kgdb
do_debug() die notifier to properly handle the exception and the
attached debugger, or kgdb test suite, will only receive a single
notification.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: x86@kernel.org
2010-10-22 15:34:10 -05:00
Mike Frysinger b9ac41e314 Blackfin: bfin_spi.h: add MMR peripheral layout
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2010-10-22 16:30:03 -04:00