linux

Commit Graph

Author	SHA1	Message	Date
Chris Metcalf	e658a6f14d	tile: avoid using clocksource_cyc2ns with absolute cycle count For large values of "mult" and long uptimes, the intermediate result of "cycles * mult" can overflow 64 bits. For example, the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock; we have mult = 853, and after 208.5 days, we overflow 64 bits. Since clocksource_cyc2ns() is intended to be used for relative cycle counts, not absolute cycle counts, performance is more importance than accepting a wider range of cycle values. So, just use mult_frac() directly in tile's sched_clock(). Commit `4cecf6d401` ("sched, x86: Avoid unnecessary overflow in sched_clock") by Salman Qazi results in essentially the same generated code for x86 as this change does for tile. In fact, a follow-on change by Salman introduced mult_frac() and switched to using it, so the C code was largely identical at that point too. Peter Zijlstra then added mul_u64_u32_shr() and switched x86 to use it. This is, in principle, better; by optimizing the 64x64->64 multiplies to be 32x32->64 multiplies we can potentially save some time. However, the compiler piplines the 64x64->64 multiplies pretty well, and the conditional branch in the generic mul_u64_u32_shr() causes some bubbles in execution, with the result that it's pretty much a wash. If tilegx provided its own implementation of mul_u64_u32_shr() without the conditional branch, we could potentially save 3 cycles, but that seems like small gain for a fair amount of additional build scaffolding; no other platform currently provides a mul_u64_u32_shr() override, and tile doesn't currently have an <asm/div64.h> header to put the override in. Additionally, gcc currently has an optimization bug that prevents it from recognizing the opportunity to use a 32x32->64 multiply, and so the result would be no better than the existing mult_frac() until such time as the compiler is fixed. For now, just using mult_frac() seems like the right answer. Cc: stable@kernel.org [v3.4+] Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>	2016-11-23 15:28:54 -05:00
Viresh Kumar	38715df206	tile/time: Migrate to new 'set-state' interface Migrate tile driver to the new 'set-state' interface provided by clockevents core, the earlier 'set-mode' interface is marked obsolete now. This also enables us to implement callbacks for new states of clockevent devices, for example: ONESHOT_STOPPED. Cc: Chris Metcalf <cmetcalf@ezchip.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>	2015-07-30 12:32:15 -04:00
Peter Zijlstra	876e78818d	time: Rename timekeeper::tkr to timekeeper::tkr_mono In preparation of adding another tkr field, rename this one to tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base, since the structure is not specific to CLOCK_MONOTONIC and the mono name got added to the tk_read_base instance. Lots of trivial churn. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 09:45:06 +01:00
Joe Perches	f47436734d	tile: Use the more common pr_warn instead of pr_warning And other message logging neatening. Other miscellanea: o coalesce formats o realign arguments o standardize a couple of macros o use __func__ instead of embedding the function name Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>	2014-11-11 15:51:42 -05:00
Linus Torvalds	0429fbc0bd	Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...	2014-10-15 07:48:18 +02:00
Chris Metcalf	78410af511	tile: add clock_gettime support to vDSO This change adds support for clock_gettime with CLOCK_REALTIME and CLOCK_MONOTONIC using vDSO. It also updates the vdso struct nomenclature used for the clocks to match the x86 code to keep it easier to update going forward. We also support the *_COARSE clockid_t, for apps that want speed but aren't concerned about fine-grained timestamps; this saves about 20 cycles per call (see http://lwn.net/Articles/342018/). Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: John Stultz <john.stultz@linaro.org>	2014-10-02 13:56:07 -04:00
Chris Metcalf	94fb1afbcb	tile: switch to using seqlocks for the vDSO time code Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>	2014-10-02 10:48:12 -04:00
Christoph Lameter	b4f501916c	tile: Replace __get_cpu_var uses __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int x = &__get_cpu_var(y); Converts to int x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-08-26 13:45:54 -04:00
Thomas Gleixner	d28ede8379	timekeeping: Create struct tk_read_base and use it in struct timekeeper The members of the new struct are the required ones for the new NMI safe accessor to clcok monotonic. In order to reuse the existing timekeeping code and to make the update of the fast NMI safe timekeepers a simple memcpy use the struct for the timekeeper as well and convert all users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 15:01:53 -07:00
Thomas Gleixner	4a0e637738	clocksource: Get rid of cycle_last cycle_last was added to the clocksource to support the TSC validation. We moved that to the core code, so we can get rid of the extra copy. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 15:01:52 -07:00
Thomas Gleixner	dc01c9fae1	tile: Convert VDSO timekeeping to the precise mechanism The code was only halfarsed converted to the new VSDO update mechanism and still uses the inaccurate base value which lacks the fractional part of xtime_nsec. Fix it up. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 10:16:50 -07:00
Henrik Austad	767f30210b	tile: avoid overflow in ns2cycles In commit `4cecf6d401` ("sched, x86: Avoid unnecessary overflow in sched_clock") and in recent patch "clocksource: avoid unnecessary overflow in cyclecounter_cyc2ns()" https://lkml.org/lkml/2014/3/4/17, the mult-shift approach is replaced by 2 steps to avoid storing a large, intermediate value that could overflow. arch/tile/kernel/time.c has a similar pattern in cycles2ns, and this copies the same pattern in this function CC: John Stultz <johnstul@us.ibm.com> CC: Mike Galbraith <bitbucket@online.de> CC: Salman Qazi <sqazi@google.com> Signed-off-by: Henrik Austad <henrik@austad.us> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>	2014-03-06 11:53:12 -05:00
Linus Torvalds	4de9ad9bc0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile Pull Tile arch updates from Chris Metcalf: "These changes bring in a bunch of new functionality that has been maintained internally at Tilera over the last year, plus other stray bits of work that I've taken into the tile tree from other folks. The changes include some PCI root complex work, interrupt-driven console support, support for performing fast-path unaligned data fixups by kernel-based JIT code generation, CONFIG_PREEMPT support, vDSO support for gettimeofday(), a serial driver for the tilegx on-chip UART, KGDB support, more optimized string routines, support for ftrace and kprobes, improved ASLR, and many bug fixes. We also remove support for the old TILE64 chip, which is no longer buildable" * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (85 commits) tile: refresh tile defconfig files tile: rework <asm/cmpxchg.h> tile PCI RC: make default consistent DMA mask 32-bit tile: add null check for kzalloc in tile/kernel/setup.c tile: make __write_once a synonym for __read_mostly tile: remove support for TILE64 tile: use asm-generic/bitops/builtin-*.h tile: eliminate no-op "noatomichash" boot argument tile: use standard tile_bundle_bits type in traps.c tile: simplify code referencing hypervisor API addresses tile: change <asm/system.h> to <asm/switch_to.h> in comments tile: mark pcibios_init() as __init tile: check for correct compiler earlier in asm-offsets.c tile: use standard 'generic-y' model for <asm/hw_irq.h> tile: use asm-generic version of <asm/local64.h> tile PCI RC: add comment about "PCI hole" problem tile: remove DEBUG_EXTRA_FLAGS kernel config option tile: add virt_to_kpte() API and clean up and document behavior tile: support FRAME_POINTER tile: support reporting Tilera hypervisor statistics ...	2013-09-06 11:14:33 -07:00
Chris Metcalf	4a556f4f56	tile: implement gettimeofday() via vDSO This change creates the framework for vDSO calls, makes the existing rt_sigreturn() mechanism use it, and adds a fast gettimeofday(). Now that we need to expose the vDSO address to userspace, we add AT_SYSINFO_EHDR to the set of aux entries provided to userspace. (You can disable any extra vDSO support by booting with vdso=0, but the rt_sigreturn vDSO page will still be provided.) Note that glibc has supported the tile vDSO since release 2.17. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>	2013-08-13 16:26:21 -04:00
Paul Gortmaker	18f894c145	tile: delete __cpuinit usage from all tile files The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit `5e427ec2d0` ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) are flagged as __cpuinit -- so if we remove the __cpuinit from arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the arch/tile uses of the __cpuinit macros from all C files. Currently tile does not have any __CPUINIT used in assembly files. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2013-07-14 19:36:54 -04:00
Henrik Austad	39e8202bb3	tile: ns2cycles should use __raw_get_cpu_var ns2cycles use per_cpu variables, and will, eventually, find its way into smp_processor_id(). This is not safe in a preemptible kernel; preemption should ideally be disabled. BUG: using smp_processor_id() in preemptible [00000000] code: systemd-modules/367 caller is ns2cycles+0x40/0xb8 Starting stack dump of tid 367, pid 367 (systemd-modules) on cpu 2 at cycle 20969956421 frame 0: 0xfffffff70004b860 dump_stack+0x0/0x20 (sp 0xfffffe407993fa90) frame 1: 0xfffffff7006abc28 debug_smp_processor_id+0x1a8/0x1e0 (sp 0xfffffe407993fa90) frame 2: 0xfffffff7004d7b40 ns2cycles+0x40/0xb8 (sp 0xfffffe407993fab8) frame 3: 0xfffffff7004dc578 __ndelay+0x38/0x80 (sp 0xfffffe407993fae0) However, in this case: - the frequency is the same accross all cores - we use the data read-only - we do not scale the frequency Which means that we can use the __raw_get_cpu_var instead. Signed-off-by: Henrik Austad <haustad@cisco.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>	2013-03-26 13:52:16 -04:00
John Stultz	9f14517bd6	clocksource: tile: convert to use clocksource_register_hz Convert tile to use clocksource_register_hz. CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>	2011-06-03 17:26:21 -04:00
Chris Metcalf	28d717411b	arch/tile: various header improvements for building drivers This change adds a number of missing headers in asm (fb.h, parport.h, serial.h, and vga.h) using the minimal generic versions. It also adds a number of missing interfaces that showed up as build failures when trying to build various drivers not normally included in the "tile" distribution: ioremap_wc(), memset_io(), io{read,write}{16,32}be(), virt_to_bus(), bus_to_virt(), irq_canonicalize(), __pte(), __pgd(), and __pmd(). I also added a cast in virt_to_page() since not all callers pass a pointer. I fixed <asm/stat.h> to properly include a __KERNEL__ guard for the __ARCH_WANT_STAT64 symbol, and <asm/swab.h> to use __builtin_bswap32() even for our 64-bit architecture, since the same code is produced. I added an export for get_cycles(), since it's used in some modules. And I made <arch/spr_def.h> properly include the __KERNEL__ guard, even though it's not yet exported, since it likely will be soon. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>	2011-05-04 14:40:54 -04:00
Chris Metcalf	1337173148	arch/tile: fix __ndelay etc to work better The current implementations of __ndelay and __udelay call a hypervisor service to delay, but the hypervisor service isn't actually implemented very well, and the consensus is that Linux should handle figuring this out natively and not use a hypervisor service. By converting nanoseconds to cycles, and then spinning until the cycle counter reaches the desired cycle, we get several benefits: first, we are sensitive to the actual clock speed; second, we use less power by issuing a slow SPR read once every six cycles while we delay; and third, we properly handle the case of an interrupt by exiting at the target time rather than after some number of cycles. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>	2011-03-01 16:20:04 -05:00
Chris Metcalf	5d966115de	arch/tile: bomb raw_local_irq_ to arch_local_irq_ This completes the tile migration to the new naming scheme for the architecture-specific irq management code. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>	2010-11-01 15:30:42 -04:00
Chris Metcalf	749dc6f252	arch/tile: Use separate, better minsec values for clocksource and sched_clock. We were using the same 5-sec minsec for the clocksource and sched_clock that we were using for the clock_event_device. For the clock_event_device that's exactly right since it has a short maximum countdown time. But for sched_clock we want to avoid wraparound when converting from ticks to nsec over a much longer window, so we force a shift of 10. And for clocksource it seems dodgy to use a 5-sec minsec as well, so we copy some other platforms and force a shift of 22. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>	2010-08-13 08:24:22 -04:00
Chris Metcalf	0707ad30d1	arch/tile: Miscellaneous cleanup changes. This commit is primarily changes caused by reviewing "sparse" and "checkpatch" output on our sources, so is somewhat noisy, since things like "printk() -> pr_err()" (or whatever) throughout the codebase tend to get tedious to read. Rather than trying to tease apart precisely which things changed due to which type of code review, this commit includes various cleanups in the code: - sparse: Add declarations in headers for globals. - sparse: Fix __user annotations. - sparse: Using gfp_t consistently instead of int. - sparse: removing functions not actually used. - checkpatch: Clean up printk() warnings by using pr_info(), etc.; also avoid partial-line printks except in bootup code. - checkpatch: Use exposed structs rather than typedefs. - checkpatch: Change some C99 comments to C89 comments. In addition, a couple of minor other changes are rolled in to this commit: - Add support for a "raise" instruction to cause SIGFPE, etc., to be raised. - Remove some compat code that is unnecessary when we fully eliminate some of the deprecated syscalls from the generic syscall ABI. - Update the tile_defconfig to reflect current config contents. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: Arnd Bergmann <arnd@arndb.de>	2010-07-06 13:41:51 -04:00
Chris Metcalf	867e359b97	arch/tile: core support for Tilera 32-bit chips. This change is the core kernel support for TILEPro and TILE64 chips. No driver support (except the console driver) is included yet. This includes the relevant Linux headers in asm/; the low-level low-level "Tile architecture" headers in arch/, which are shared with the hypervisor, etc., and are build-system agnostic; and the relevant hypervisor headers in hv/. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Reviewed-by: Paul Mundt <lethal@linux-sh.org>	2010-06-04 17:11:18 -04:00

23 Commits