Import Upstream version 0.2.5

This commit is contained in:
su-fang 2022-10-10 17:01:59 +08:00
commit 3cb230e352
31 changed files with 8736 additions and 0 deletions

27
LICENSE Normal file
View File

@ -0,0 +1,27 @@
The MIT License (MIT)
Copyright (c) 2014 Jeff Quast <contact@jeffquast.com>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Markus Kuhn -- 2007-05-26 (Unicode 5.0)
Permission to use, copy, modify, and distribute this software
for any purpose and without fee is hereby granted. The author
disclaims all warranties with regard to this software.

2
MANIFEST.in Normal file
View File

@ -0,0 +1,2 @@
include LICENSE *.rst
recursive-include tests *.py

306
PKG-INFO Normal file
View File

@ -0,0 +1,306 @@
Metadata-Version: 1.1
Name: wcwidth
Version: 0.2.5
Summary: Measures the displayed width of unicode strings in a terminal
Home-page: https://github.com/jquast/wcwidth
Author: Jeff Quast
Author-email: contact@jeffquast.com
License: MIT
Description: |pypi_downloads| |codecov| |license|
============
Introduction
============
This library is mainly for CLI programs that carefully produce output for
Terminals, or make pretend to be an emulator.
**Problem Statement**: The printable length of *most* strings are equal to the
number of cells they occupy on the screen ``1 charater : 1 cell``. However,
there are categories of characters that *occupy 2 cells* (full-wide), and
others that *occupy 0* cells (zero-width).
**Solution**: POSIX.1-2001 and POSIX.1-2008 conforming systems provide
`wcwidth(3)`_ and `wcswidth(3)`_ C functions of which this python module's
functions precisely copy. *These functions return the number of cells a
unicode string is expected to occupy.*
Installation
------------
The stable version of this package is maintained on pypi, install using pip::
pip install wcwidth
Example
-------
**Problem**: given the following phrase (Japanese),
>>> text = u'コンニチハ'
Python **incorrectly** uses the *string length* of 5 codepoints rather than the
*printible length* of 10 cells, so that when using the `rjust` function, the
output length is wrong::
>>> print(len('コンニチハ'))
5
>>> print('コンニチハ'.rjust(20, '_'))
_____コンニチハ
By defining our own "rjust" function that uses wcwidth, we can correct this::
>>> def wc_rjust(text, length, padding=' '):
... from wcwidth import wcswidth
... return padding * max(0, (length - wcswidth(text))) + text
...
Our **Solution** uses wcswidth to determine the string length correctly::
>>> from wcwidth import wcswidth
>>> print(wcswidth('コンニチハ'))
10
>>> print(wc_rjust('コンニチハ', 20, '_'))
__________コンニチハ
Choosing a Version
------------------
Export an environment variable, ``UNICODE_VERSION``. This should be done by
*terminal emulators* or those developers experimenting with authoring one of
their own, from shell::
$ export UNICODE_VERSION=13.0
If unspecified, the latest version is used. If your Terminal Emulator does not
export this variable, you can use the `jquast/ucs-detect`_ utility to
automatically detect and export it to your shell.
wcwidth, wcswidth
-----------------
Use function ``wcwidth()`` to determine the length of a *single unicode
character*, and ``wcswidth()`` to determine the length of many, a *string
of unicode characters*.
Briefly, return values of function ``wcwidth()`` are:
``-1``
Indeterminate (not printable).
``0``
Does not advance the cursor, such as NULL or Combining.
``2``
Characters of category East Asian Wide (W) or East Asian
Full-width (F) which are displayed using two terminal cells.
``1``
All others.
Function ``wcswidth()`` simply returns the sum of all values for each character
along a string, or ``-1`` when it occurs anywhere along a string.
Full API Documentation at http://wcwidth.readthedocs.org
==========
Developing
==========
Install wcwidth in editable mode::
pip install -e.
Execute unit tests using tox_::
tox
Regenerate python code tables from latest Unicode Specification data files::
tox -eupdate
Supplementary tools for browsing and testing terminals for wide unicode
characters are found in the `bin/`_ of this project's source code. Just ensure
to first ``pip install -erequirements-develop.txt`` from this projects main
folder. For example, an interactive browser for testing::
./bin/wcwidth-browser.py
Uses
----
This library is used in:
- `jquast/blessed`_: a thin, practical wrapper around terminal capabilities in
Python.
- `jonathanslenders/python-prompt-toolkit`_: a Library for building powerful
interactive command lines in Python.
- `dbcli/pgcli`_: Postgres CLI with autocompletion and syntax highlighting.
- `thomasballinger/curtsies`_: a Curses-like terminal wrapper with a display
based on compositing 2d arrays of text.
- `selectel/pyte`_: Simple VTXXX-compatible linux terminal emulator.
- `astanin/python-tabulate`_: Pretty-print tabular data in Python, a library
and a command-line utility.
- `LuminosoInsight/python-ftfy`_: Fixes mojibake and other glitches in Unicode
text.
- `nbedos/termtosvg`_: Terminal recorder that renders sessions as SVG
animations.
- `peterbrittain/asciimatics`_: Package to help people create full-screen text
UIs.
Other Languages
---------------
- `timoxley/wcwidth`_: JavaScript
- `janlelis/unicode-display_width`_: Ruby
- `alecrabbit/php-wcwidth`_: PHP
- `Text::CharWidth`_: Perl
- `bluebear94/Terminal-WCWidth`: Perl 6
- `mattn/go-runewidth`_: Go
- `emugel/wcwidth`_: Haxe
- `aperezdc/lua-wcwidth`: Lua
- `joachimschmidt557/zig-wcwidth`: Zig
- `fumiyas/wcwidth-cjk`: `LD_PRELOAD` override
- `joshuarubin/wcwidth9`: Unicode version 9 in C
History
-------
0.2.0 *2020-06-01*
* **Enhancement**: Unicode version may be selected by exporting the
Environment variable ``UNICODE_VERSION``, such as ``13.0``, or ``6.3.0``.
See the `jquast/ucs-detect`_ CLI utility for automatic detection.
* **Enhancement**:
API Documentation is published to readthedocs.org.
* **Updated** tables for *all* Unicode Specifications with files
published in a programmatically consumable format, versions 4.1.0
through 13.0
that are published
, versions
0.1.9 *2020-03-22*
* **Performance** optimization by `Avram Lubkin`_, `PR #35`_.
* **Updated** tables to Unicode Specification 13.0.0.
0.1.8 *2020-01-01*
* **Updated** tables to Unicode Specification 12.0.0. (`PR #30`_).
0.1.7 *2016-07-01*
* **Updated** tables to Unicode Specification 9.0.0. (`PR #18`_).
0.1.6 *2016-01-08 Production/Stable*
* ``LICENSE`` file now included with distribution.
0.1.5 *2015-09-13 Alpha*
* **Bugfix**:
Resolution of "combining_ character width" issue, most especially
those that previously returned -1 now often (correctly) return 0.
resolved by `Philip Craig`_ via `PR #11`_.
* **Deprecated**:
The module path ``wcwidth.table_comb`` is no longer available,
it has been superseded by module path ``wcwidth.table_zero``.
0.1.4 *2014-11-20 Pre-Alpha*
* **Feature**: ``wcswidth()`` now determines printable length
for (most) combining_ characters. The developer's tool
`bin/wcwidth-browser.py`_ is improved to display combining_
characters when provided the ``--combining`` option
(`Thomas Ballinger`_ and `Leta Montopoli`_ `PR #5`_).
* **Feature**: added static analysis (prospector_) to testing
framework.
0.1.3 *2014-10-29 Pre-Alpha*
* **Bugfix**: 2nd parameter of wcswidth was not honored.
(`Thomas Ballinger`_, `PR #4`_).
0.1.2 *2014-10-28 Pre-Alpha*
* **Updated** tables to Unicode Specification 7.0.0.
(`Thomas Ballinger`_, `PR #3`_).
0.1.1 *2014-05-14 Pre-Alpha*
* Initial release to pypi, Based on Unicode Specification 6.3.0
This code was originally derived directly from C code of the same name,
whose latest version is available at
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
*
* Permission to use, copy, modify, and distribute this software
* for any purpose and without fee is hereby granted. The author
* disclaims all warranties with regard to this software.
.. _`tox`: https://testrun.org/tox/latest/install.html
.. _`prospector`: https://github.com/landscapeio/prospector
.. _`combining`: https://en.wikipedia.org/wiki/Combining_character
.. _`bin/`: https://github.com/jquast/wcwidth/tree/master/bin
.. _`bin/wcwidth-browser.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py
.. _`Thomas Ballinger`: https://github.com/thomasballinger
.. _`Leta Montopoli`: https://github.com/lmontopo
.. _`Philip Craig`: https://github.com/philipc
.. _`PR #3`: https://github.com/jquast/wcwidth/pull/3
.. _`PR #4`: https://github.com/jquast/wcwidth/pull/4
.. _`PR #5`: https://github.com/jquast/wcwidth/pull/5
.. _`PR #11`: https://github.com/jquast/wcwidth/pull/11
.. _`PR #18`: https://github.com/jquast/wcwidth/pull/18
.. _`PR #30`: https://github.com/jquast/wcwidth/pull/30
.. _`PR #35`: https://github.com/jquast/wcwidth/pull/35
.. _`jquast/blessed`: https://github.com/jquast/blessed
.. _`selectel/pyte`: https://github.com/selectel/pyte
.. _`thomasballinger/curtsies`: https://github.com/thomasballinger/curtsies
.. _`dbcli/pgcli`: https://github.com/dbcli/pgcli
.. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit
.. _`timoxley/wcwidth`: https://github.com/timoxley/wcwidth
.. _`wcwidth(3)`: http://man7.org/linux/man-pages/man3/wcwidth.3.html
.. _`wcswidth(3)`: http://man7.org/linux/man-pages/man3/wcswidth.3.html
.. _`astanin/python-tabulate`: https://github.com/astanin/python-tabulate
.. _`janlelis/unicode-display_width`: https://github.com/janlelis/unicode-display_width
.. _`LuminosoInsight/python-ftfy`: https://github.com/LuminosoInsight/python-ftfy
.. _`alecrabbit/php-wcwidth`: https://github.com/alecrabbit/php-wcwidth
.. _`Text::CharWidth`: https://metacpan.org/pod/Text::CharWidth
.. _`bluebear94/Terminal-WCWidth`: https://github.com/bluebear94/Terminal-WCWidth
.. _`mattn/go-runewidth`: https://github.com/mattn/go-runewidth
.. _`emugel/wcwidth`: https://github.com/emugel/wcwidth
.. _`jquast/ucs-detect`: https://github.com/jquast/ucs-detect
.. _`Avram Lubkin`: https://github.com/avylove
.. _`nbedos/termtosvg`: https://github.com/nbedos/termtosvg
.. _`peterbrittain/asciimatics`: https://github.com/peterbrittain/asciimatics
.. _`aperezdc/lua-wcwidth`: https://github.com/aperezdc/lua-wcwidth
.. _`fumiyas/wcwidth-cjk`: https://github.com/fumiyas/wcwidth-cjk
.. |pypi_downloads| image:: https://img.shields.io/pypi/dm/wcwidth.svg?logo=pypi
:alt: Downloads
:target: https://pypi.org/project/wcwidth/
.. |codecov| image:: https://codecov.io/gh/jquast/wcwidth/branch/master/graph/badge.svg
:alt: codecov.io Code Coverage
:target: https://codecov.io/gh/jquast/wcwidth/
.. |license| image:: https://img.shields.io/github/license/jquast/wcwidth.svg
:target: https://pypi.python.org/pypi/wcwidth/
:alt: MIT License
Keywords: cjk,combining,console,eastasian,emojiemulator,terminal,unicode,wcswidth,wcwidth,xterm
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Localization
Classifier: Topic :: Software Development :: Internationalization
Classifier: Topic :: Terminals

280
README.rst Normal file
View File

@ -0,0 +1,280 @@
|pypi_downloads| |codecov| |license|
============
Introduction
============
This library is mainly for CLI programs that carefully produce output for
Terminals, or make pretend to be an emulator.
**Problem Statement**: The printable length of *most* strings are equal to the
number of cells they occupy on the screen ``1 charater : 1 cell``. However,
there are categories of characters that *occupy 2 cells* (full-wide), and
others that *occupy 0* cells (zero-width).
**Solution**: POSIX.1-2001 and POSIX.1-2008 conforming systems provide
`wcwidth(3)`_ and `wcswidth(3)`_ C functions of which this python module's
functions precisely copy. *These functions return the number of cells a
unicode string is expected to occupy.*
Installation
------------
The stable version of this package is maintained on pypi, install using pip::
pip install wcwidth
Example
-------
**Problem**: given the following phrase (Japanese),
>>> text = u'コンニチハ'
Python **incorrectly** uses the *string length* of 5 codepoints rather than the
*printible length* of 10 cells, so that when using the `rjust` function, the
output length is wrong::
>>> print(len('コンニチハ'))
5
>>> print('コンニチハ'.rjust(20, '_'))
_____コンニチハ
By defining our own "rjust" function that uses wcwidth, we can correct this::
>>> def wc_rjust(text, length, padding=' '):
... from wcwidth import wcswidth
... return padding * max(0, (length - wcswidth(text))) + text
...
Our **Solution** uses wcswidth to determine the string length correctly::
>>> from wcwidth import wcswidth
>>> print(wcswidth('コンニチハ'))
10
>>> print(wc_rjust('コンニチハ', 20, '_'))
__________コンニチハ
Choosing a Version
------------------
Export an environment variable, ``UNICODE_VERSION``. This should be done by
*terminal emulators* or those developers experimenting with authoring one of
their own, from shell::
$ export UNICODE_VERSION=13.0
If unspecified, the latest version is used. If your Terminal Emulator does not
export this variable, you can use the `jquast/ucs-detect`_ utility to
automatically detect and export it to your shell.
wcwidth, wcswidth
-----------------
Use function ``wcwidth()`` to determine the length of a *single unicode
character*, and ``wcswidth()`` to determine the length of many, a *string
of unicode characters*.
Briefly, return values of function ``wcwidth()`` are:
``-1``
Indeterminate (not printable).
``0``
Does not advance the cursor, such as NULL or Combining.
``2``
Characters of category East Asian Wide (W) or East Asian
Full-width (F) which are displayed using two terminal cells.
``1``
All others.
Function ``wcswidth()`` simply returns the sum of all values for each character
along a string, or ``-1`` when it occurs anywhere along a string.
Full API Documentation at http://wcwidth.readthedocs.org
==========
Developing
==========
Install wcwidth in editable mode::
pip install -e.
Execute unit tests using tox_::
tox
Regenerate python code tables from latest Unicode Specification data files::
tox -eupdate
Supplementary tools for browsing and testing terminals for wide unicode
characters are found in the `bin/`_ of this project's source code. Just ensure
to first ``pip install -erequirements-develop.txt`` from this projects main
folder. For example, an interactive browser for testing::
./bin/wcwidth-browser.py
Uses
----
This library is used in:
- `jquast/blessed`_: a thin, practical wrapper around terminal capabilities in
Python.
- `jonathanslenders/python-prompt-toolkit`_: a Library for building powerful
interactive command lines in Python.
- `dbcli/pgcli`_: Postgres CLI with autocompletion and syntax highlighting.
- `thomasballinger/curtsies`_: a Curses-like terminal wrapper with a display
based on compositing 2d arrays of text.
- `selectel/pyte`_: Simple VTXXX-compatible linux terminal emulator.
- `astanin/python-tabulate`_: Pretty-print tabular data in Python, a library
and a command-line utility.
- `LuminosoInsight/python-ftfy`_: Fixes mojibake and other glitches in Unicode
text.
- `nbedos/termtosvg`_: Terminal recorder that renders sessions as SVG
animations.
- `peterbrittain/asciimatics`_: Package to help people create full-screen text
UIs.
Other Languages
---------------
- `timoxley/wcwidth`_: JavaScript
- `janlelis/unicode-display_width`_: Ruby
- `alecrabbit/php-wcwidth`_: PHP
- `Text::CharWidth`_: Perl
- `bluebear94/Terminal-WCWidth`: Perl 6
- `mattn/go-runewidth`_: Go
- `emugel/wcwidth`_: Haxe
- `aperezdc/lua-wcwidth`: Lua
- `joachimschmidt557/zig-wcwidth`: Zig
- `fumiyas/wcwidth-cjk`: `LD_PRELOAD` override
- `joshuarubin/wcwidth9`: Unicode version 9 in C
History
-------
0.2.0 *2020-06-01*
* **Enhancement**: Unicode version may be selected by exporting the
Environment variable ``UNICODE_VERSION``, such as ``13.0``, or ``6.3.0``.
See the `jquast/ucs-detect`_ CLI utility for automatic detection.
* **Enhancement**:
API Documentation is published to readthedocs.org.
* **Updated** tables for *all* Unicode Specifications with files
published in a programmatically consumable format, versions 4.1.0
through 13.0
that are published
, versions
0.1.9 *2020-03-22*
* **Performance** optimization by `Avram Lubkin`_, `PR #35`_.
* **Updated** tables to Unicode Specification 13.0.0.
0.1.8 *2020-01-01*
* **Updated** tables to Unicode Specification 12.0.0. (`PR #30`_).
0.1.7 *2016-07-01*
* **Updated** tables to Unicode Specification 9.0.0. (`PR #18`_).
0.1.6 *2016-01-08 Production/Stable*
* ``LICENSE`` file now included with distribution.
0.1.5 *2015-09-13 Alpha*
* **Bugfix**:
Resolution of "combining_ character width" issue, most especially
those that previously returned -1 now often (correctly) return 0.
resolved by `Philip Craig`_ via `PR #11`_.
* **Deprecated**:
The module path ``wcwidth.table_comb`` is no longer available,
it has been superseded by module path ``wcwidth.table_zero``.
0.1.4 *2014-11-20 Pre-Alpha*
* **Feature**: ``wcswidth()`` now determines printable length
for (most) combining_ characters. The developer's tool
`bin/wcwidth-browser.py`_ is improved to display combining_
characters when provided the ``--combining`` option
(`Thomas Ballinger`_ and `Leta Montopoli`_ `PR #5`_).
* **Feature**: added static analysis (prospector_) to testing
framework.
0.1.3 *2014-10-29 Pre-Alpha*
* **Bugfix**: 2nd parameter of wcswidth was not honored.
(`Thomas Ballinger`_, `PR #4`_).
0.1.2 *2014-10-28 Pre-Alpha*
* **Updated** tables to Unicode Specification 7.0.0.
(`Thomas Ballinger`_, `PR #3`_).
0.1.1 *2014-05-14 Pre-Alpha*
* Initial release to pypi, Based on Unicode Specification 6.3.0
This code was originally derived directly from C code of the same name,
whose latest version is available at
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
*
* Permission to use, copy, modify, and distribute this software
* for any purpose and without fee is hereby granted. The author
* disclaims all warranties with regard to this software.
.. _`tox`: https://testrun.org/tox/latest/install.html
.. _`prospector`: https://github.com/landscapeio/prospector
.. _`combining`: https://en.wikipedia.org/wiki/Combining_character
.. _`bin/`: https://github.com/jquast/wcwidth/tree/master/bin
.. _`bin/wcwidth-browser.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py
.. _`Thomas Ballinger`: https://github.com/thomasballinger
.. _`Leta Montopoli`: https://github.com/lmontopo
.. _`Philip Craig`: https://github.com/philipc
.. _`PR #3`: https://github.com/jquast/wcwidth/pull/3
.. _`PR #4`: https://github.com/jquast/wcwidth/pull/4
.. _`PR #5`: https://github.com/jquast/wcwidth/pull/5
.. _`PR #11`: https://github.com/jquast/wcwidth/pull/11
.. _`PR #18`: https://github.com/jquast/wcwidth/pull/18
.. _`PR #30`: https://github.com/jquast/wcwidth/pull/30
.. _`PR #35`: https://github.com/jquast/wcwidth/pull/35
.. _`jquast/blessed`: https://github.com/jquast/blessed
.. _`selectel/pyte`: https://github.com/selectel/pyte
.. _`thomasballinger/curtsies`: https://github.com/thomasballinger/curtsies
.. _`dbcli/pgcli`: https://github.com/dbcli/pgcli
.. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit
.. _`timoxley/wcwidth`: https://github.com/timoxley/wcwidth
.. _`wcwidth(3)`: http://man7.org/linux/man-pages/man3/wcwidth.3.html
.. _`wcswidth(3)`: http://man7.org/linux/man-pages/man3/wcswidth.3.html
.. _`astanin/python-tabulate`: https://github.com/astanin/python-tabulate
.. _`janlelis/unicode-display_width`: https://github.com/janlelis/unicode-display_width
.. _`LuminosoInsight/python-ftfy`: https://github.com/LuminosoInsight/python-ftfy
.. _`alecrabbit/php-wcwidth`: https://github.com/alecrabbit/php-wcwidth
.. _`Text::CharWidth`: https://metacpan.org/pod/Text::CharWidth
.. _`bluebear94/Terminal-WCWidth`: https://github.com/bluebear94/Terminal-WCWidth
.. _`mattn/go-runewidth`: https://github.com/mattn/go-runewidth
.. _`emugel/wcwidth`: https://github.com/emugel/wcwidth
.. _`jquast/ucs-detect`: https://github.com/jquast/ucs-detect
.. _`Avram Lubkin`: https://github.com/avylove
.. _`nbedos/termtosvg`: https://github.com/nbedos/termtosvg
.. _`peterbrittain/asciimatics`: https://github.com/peterbrittain/asciimatics
.. _`aperezdc/lua-wcwidth`: https://github.com/aperezdc/lua-wcwidth
.. _`fumiyas/wcwidth-cjk`: https://github.com/fumiyas/wcwidth-cjk
.. |pypi_downloads| image:: https://img.shields.io/pypi/dm/wcwidth.svg?logo=pypi
:alt: Downloads
:target: https://pypi.org/project/wcwidth/
.. |codecov| image:: https://codecov.io/gh/jquast/wcwidth/branch/master/graph/badge.svg
:alt: codecov.io Code Coverage
:target: https://codecov.io/gh/jquast/wcwidth/
.. |license| image:: https://img.shields.io/github/license/jquast/wcwidth.svg
:target: https://pypi.python.org/pypi/wcwidth/
:alt: MIT License

47
bin/new-wide-by-version.py Executable file
View File

@ -0,0 +1,47 @@
#!/usr/bin/env python3
"""
Display new wide unicode point values, by version.
For example::
"5.0.0": [
12752,
12753,
12754,
...
Means that chr(12752) through chr(12754) are new WIDE values
for Unicode vesion 5.0.0, and were not WIDE values for the
previous version (4.1.0).
"""
# std imports
import sys
import json
# List new WIDE characters at each unicode version.
#
def main():
from wcwidth import WIDE_EASTASIAN, _bisearch
versions = list(WIDE_EASTASIAN.keys())
results = {}
for version in versions:
prev_idx = versions.index(version) - 1
if prev_idx == -1:
continue
previous_version = versions[prev_idx]
previous_table = WIDE_EASTASIAN[previous_version]
for value_pair in WIDE_EASTASIAN[version]:
for value in range(*value_pair):
if not _bisearch(value, previous_table):
results[version] = results.get(version, []) + [value]
if '--debug' in sys.argv:
print(f'version {version} has unicode character '
f'0x{value:05x} ({chr(value)}) but previous '
f'version, {previous_version} does not.',
file=sys.stderr)
print(json.dumps(results, indent=4))
if __name__ == '__main__':
main()

42
bin/run_codecov.py Normal file
View File

@ -0,0 +1,42 @@
"""Workaround for https://github.com/codecov/codecov-python/issues/158."""
# std imports
import sys
import time
# 3rd party
import codecov
RETRIES = 5
TIMEOUT = 2
def main():
"""Run codecov up to RETRIES times On the final attempt, let it exit normally."""
# Make a copy of argv and make sure --required is in it
args = sys.argv[1:]
if '--required' not in args:
args.append('--required')
for num in range(1, RETRIES + 1):
print('Running codecov attempt %d: ' % num)
# On the last, let codecov handle the exit
if num == RETRIES:
codecov.main()
try:
codecov.main(*args)
except SystemExit as err:
# If there's no exit code, it was successful
if err.code:
time.sleep(TIMEOUT)
else:
sys.exit(err.code)
else:
break
if __name__ == '__main__':
main()

331
bin/update-tables.py Normal file
View File

@ -0,0 +1,331 @@
#!/usr/bin/env python
"""
Update the python Unicode tables for wcwidth.
https://github.com/jquast/wcwidth
"""
from __future__ import print_function
# std imports
import os
import re
import glob
import json
import codecs
import string
import urllib
import datetime
import collections
import unicodedata
try:
# py2
from urllib2 import urlopen
except ImportError:
# py3
from urllib.request import urlopen
URL_UNICODE_DERIVED_AGE = 'file:///usr/share/unicode/DerivedAge.txt'
EXCLUDE_VERSIONS = ['2.0.0', '2.1.2', '3.0.0', '3.1.0', '3.2.0', '4.0.0']
PATH_UP = os.path.relpath(
os.path.join(
os.path.dirname(__file__),
os.path.pardir))
PATH_DOCS = os.path.join(PATH_UP, 'docs')
PATH_DATA = os.path.join(PATH_UP, 'data')
PATH_CODE = os.path.join(PATH_UP, 'wcwidth')
FILE_RST = os.path.join(PATH_DOCS, 'unicode_version.rst')
FILE_PATCH_FROM = "release files:"
FILE_PATCH_TO = "======="
# use chr() for py3.x,
# unichr() for py2.x
try:
_ = unichr(0)
except NameError as err:
if err.args[0] == "name 'unichr' is not defined":
# pylint: disable=C0103,W0622
# Invalid constant name "unichr" (col 8)
# Redefining built-in 'unichr' (col 8)
unichr = chr
else:
raise
TableDef = collections.namedtuple('table', ['version', 'date', 'values'])
def main():
"""Update east-asian, combining and zero width tables."""
versions = get_unicode_versions()
do_east_asian(versions)
do_zero_width(versions)
do_rst_file_update()
do_unicode_versions(versions)
def get_unicode_versions():
"""Fetch, determine, and return Unicode Versions for processing."""
fname = os.path.join(PATH_DATA, 'DerivedAge.txt')
do_retrieve(url=URL_UNICODE_DERIVED_AGE, fname=fname)
pattern = re.compile(r'#.*assigned in Unicode ([0-9.]+)')
versions = []
for line in open(fname, 'r'):
if match := re.match(pattern, line):
version = match.group(1)
if version not in EXCLUDE_VERSIONS:
versions.append(version)
versions.sort(key=lambda ver: list(map(int, ver.split('.'))))
return versions
def do_rst_file_update():
"""Patch unicode_versions.rst to reflect the data files used in release."""
# read in,
data_in = codecs.open(FILE_RST, 'r', 'utf8').read()
# search for beginning and end positions,
pos_begin = data_in.find(FILE_PATCH_FROM)
assert pos_begin != -1, (pos_begin, FILE_PATCH_FROM)
pos_begin += len(FILE_PATCH_FROM)
data_out = data_in[:pos_begin] + '\n\n'
# find all filenames with a version number in it,
# sort filenames by name, then dotted number, ascending
glob_pattern = os.path.join(PATH_DATA, '*[0-9]*.txt')
filenames = glob.glob(glob_pattern)
filenames.sort(key=lambda ver: [ver.split(
'-')[0]] + list(map(int, ver.split('-')[-1][:-4].split('.'))))
# copy file description as-is, formatted
for fpath in filenames:
if description := describe_file_header(fpath):
data_out += f'\n{description}'
# write.
print(f"patching {FILE_RST} ..")
codecs.open(
FILE_RST, 'w', 'utf8').write(data_out)
def do_east_asian(versions):
"""Fetch and update east-asian tables."""
table = {}
for version in versions:
fin = os.path.join(PATH_DATA, 'EastAsianWidth-{version}.txt')
fout = os.path.join(PATH_CODE, 'table_wide.py')
url = ('file:///usr/share/unicode/EastAsianWidth.txt')
try:
do_retrieve(url=url.format(version=version),
fname=fin.format(version=version))
except urllib.error.HTTPError as err:
if err.code != 404:
raise
else:
table[version] = parse_east_asian(
fname=fin.format(version=version),
properties=(u'W', u'F',))
do_write_table(fname=fout, variable='WIDE_EASTASIAN', table=table)
def do_zero_width(versions):
"""Fetch and update zero width tables."""
table = {}
fout = os.path.join(PATH_CODE, 'table_zero.py')
for version in versions:
fin = os.path.join(PATH_DATA, 'DerivedGeneralCategory-{version}.txt')
url = ('file:///usr/share/unicode/extracted/DerivedGeneralCategory.txt')
try:
do_retrieve(url=url.format(version=version),
fname=fin.format(version=version))
except urllib.error.HTTPError as err:
if err.code != 404:
raise
else:
table[version] = parse_category(
fname=fin.format(version=version),
categories=('Me', 'Mn',))
do_write_table(fname=fout, variable='ZERO_WIDTH', table=table)
def make_table(values):
"""Return a tuple of lookup tables for given values."""
table = collections.deque()
start, end = values[0], values[0]
for num, value in enumerate(values):
if num == 0:
table.append((value, value,))
continue
start, end = table.pop()
if end == value - 1:
table.append((start, value,))
else:
table.append((start, end,))
table.append((value, value,))
return tuple(table)
def do_retrieve(url, fname):
"""Retrieve given url to target filepath fname."""
folder = os.path.dirname(fname)
if not os.path.exists(folder):
os.makedirs(folder)
print(f"{folder}{os.path.sep} created.")
if not os.path.exists(fname):
try:
with open(fname, 'wb') as fout:
print(f"retrieving {url}: ", end='', flush=True)
resp = urlopen(url)
fout.write(resp.read())
except BaseException:
print('failed')
os.unlink(fname)
raise
print(f"{fname} saved.")
return fname
def describe_file_header(fpath):
header_2 = [line.lstrip('# ').rstrip() for line in
codecs.open(fpath, 'r', 'utf8').readlines()[:2]]
# fmt:
#
# ``EastAsianWidth-8.0.0.txt``
# *2015-02-10, 21:00:00 GMT [KW, LI]*
fmt = '``{0}``\n *{1}*\n'
if len(header_2) == 0:
return ''
assert len(header_2) == 2, (fpath, header_2)
return fmt.format(*header_2)
def parse_east_asian(fname, properties=(u'W', u'F',)):
"""Parse unicode east-asian width tables."""
print(f'parsing {fname}: ', end='', flush=True)
version, date, values = None, None, []
for line in open(fname, 'rb'):
uline = line.decode('utf-8')
if version is None:
version = uline.split(None, 1)[1].rstrip()
continue
if date is None:
date = uline.split(':', 1)[1].rstrip()
continue
if uline.startswith('#') or not uline.lstrip():
continue
addrs, details = uline.split(';', 1)
if any(details.startswith(property)
for property in properties):
start, stop = addrs, addrs
if '..' in addrs:
start, stop = addrs.split('..')
values.extend(range(int(start, 16), int(stop, 16) + 1))
print('ok')
return TableDef(version, date, values)
def parse_category(fname, categories):
"""Parse unicode category tables."""
print(f'parsing {fname}: ', end='', flush=True)
version, date, values = None, None, []
for line in open(fname, 'rb'):
uline = line.decode('utf-8')
if version is None:
version = uline.split(None, 1)[1].rstrip()
continue
if date is None:
date = uline.split(':', 1)[1].rstrip()
continue
if uline.startswith('#') or not uline.lstrip():
continue
addrs, details = uline.split(';', 1)
addrs, details = addrs.rstrip(), details.lstrip()
if any(details.startswith(f'{value} #')
for value in categories):
start, stop = addrs, addrs
if '..' in addrs:
start, stop = addrs.split('..')
values.extend(range(int(start, 16), int(stop, 16) + 1))
print('ok')
return TableDef(version, date, sorted(values))
def do_write_table(fname, variable, table):
"""Write combining tables to filesystem as python code."""
# pylint: disable=R0914
# Too many local variables (19/15) (col 4)
utc_now = datetime.datetime.utcnow()
indent = ' ' * 8
with open(fname, 'w') as fout:
print(f"writing {fname} ... ", end='')
fout.write(
f'"""{variable.title()} table, created by bin/update-tables.py."""\n'
f"{variable} = {{\n")
for version_key, version_table in table.items():
if not version_table.values:
continue
fout.write(
f"{indent[:-4]}'{version_key}': (\n"
f"{indent}# Source: {version_table.version}\n"
f"{indent}# Date: {version_table.date}\n"
f"{indent}#")
for start, end in make_table(version_table.values):
ucs_start, ucs_end = unichr(start), unichr(end)
hex_start, hex_end = (f'0x{start:05x}', f'0x{end:05x}')
try:
name_start = string.capwords(unicodedata.name(ucs_start))
except ValueError:
name_start = u'(nil)'
try:
name_end = string.capwords(unicodedata.name(ucs_end))
except ValueError:
name_end = u'(nil)'
fout.write(f'\n{indent}')
comment_startpart = name_start[:24].rstrip()
comment_endpart = name_end[:24].rstrip()
fout.write(f'({hex_start}, {hex_end},),')
fout.write(f' # {comment_startpart:24s}..{comment_endpart}')
fout.write(f'\n{indent[:-4]}),\n')
fout.write('}\n')
print("complete.")
def do_unicode_versions(versions):
"""Write unicode_versions.py function list_versions()."""
fname = os.path.join(PATH_CODE, 'unicode_versions.py')
print(f"writing {fname} ... ", end='')
utc_now = datetime.datetime.utcnow()
version_tuples_str = '\n '.join(
f'"{ver}",' for ver in versions)
with open(fname, 'w') as fp:
fp.write(f"""\"\"\"
Exports function list_versions() for unicode version level support.
This code generated by {__file__} on {utc_now}.
\"\"\"
def list_versions():
\"\"\"
Return Unicode version levels supported by this module release.
Any of the version strings returned may be used as keyword argument
``unicode_version`` to the ``wcwidth()`` family of functions.
:returns: Supported Unicode version numbers in ascending sorted order.
:rtype: list[str]
\"\"\"
return (
{version_tuples_str}
)
""")
print('done.')
if __name__ == '__main__':
main()

706
bin/wcwidth-browser.py Executable file
View File

@ -0,0 +1,706 @@
#!/usr/bin/env python
"""
A terminal browser, similar to less(1) for testing printable width of unicode.
This displays the full range of unicode points for 1 or 2-character wide
ideograms, with pipes ('|') that should always align for any terminal that
supports utf-8.
Usage:
./bin/wcwidth-browser.py [--wide=<n>]
[--alignment=<str>]
[--combining]
[--help]
Options:
--wide=<int> Browser 1 or 2 character-wide cells.
--alignment=<str> Chose left or right alignment. [default: left]
--combining Use combining character generator. [default: 2]
--help Display usage
"""
# pylint: disable=C0103,W0622
# Invalid constant name "echo"
# Invalid constant name "flushout" (col 4)
# Invalid module name "wcwidth-browser"
from __future__ import division, print_function
# std imports
import sys
import signal
import string
import functools
import unicodedata
# 3rd party
import docopt
import blessed
# local
from wcwidth import ZERO_WIDTH, wcwidth, list_versions, _wcmatch_version
#: print function alias, does not end with line terminator.
echo = functools.partial(print, end='')
flushout = functools.partial(print, end='', flush=True)
#: printable length of highest unicode character description
LIMIT_UCS = 0x3fffd
UCS_PRINTLEN = len('{value:0x}'.format(value=LIMIT_UCS))
def readline(term, width):
"""A rudimentary readline implementation."""
text = ''
while True:
inp = term.inkey()
if inp.code == term.KEY_ENTER:
break
if inp.code == term.KEY_ESCAPE or inp == chr(3):
text = None
break
if not inp.is_sequence and len(text) < width:
text += inp
echo(inp)
flushout()
elif inp.code in (term.KEY_BACKSPACE, term.KEY_DELETE):
if text:
text = text[:-1]
echo('\b \b')
flushout()
return text
class WcWideCharacterGenerator(object):
"""Generator yields unicode characters of the given ``width``."""
# pylint: disable=R0903
# Too few public methods (0/2)
def __init__(self, width=2, unicode_version='auto'):
"""
Class constructor.
:param width: generate characters of given width.
:param str unicode_version: Unicode Version for render.
:type width: int
"""
self.characters = (
chr(idx) for idx in range(LIMIT_UCS)
if wcwidth(chr(idx), unicode_version=unicode_version) == width)
def __iter__(self):
"""Special method called by iter()."""
return self
def __next__(self):
"""Special method called by next()."""
while True:
ucs = next(self.characters)
try:
name = string.capwords(unicodedata.name(ucs))
except ValueError:
continue
return (ucs, name)
class WcCombinedCharacterGenerator(object):
"""Generator yields unicode characters with combining."""
# pylint: disable=R0903
# Too few public methods (0/2)
def __init__(self, width=1):
"""
Class constructor.
:param int width: generate characters of given width.
:param str unicode_version: Unicode version.
"""
self.characters = []
letters_o = ('o' * width)
last_version = list_versions()[-1]
for (begin, end) in ZERO_WIDTH[last_version].items():
for val in [_val for _val in
range(begin, end + 1)
if _val <= LIMIT_UCS]:
self.characters.append(
letters_o[:1] +
chr(val) +
letters_o[wcwidth(chr(val)) + 1:])
self.characters.reverse()
def __iter__(self):
"""Special method called by iter()."""
return self
def __next__(self):
"""
Special method called by next().
:return: unicode character and name, as tuple.
:rtype: tuple[unicode, unicode]
:raises StopIteration: no more characters
"""
while True:
if not self.characters:
raise StopIteration
ucs = self.characters.pop()
try:
name = string.capwords(unicodedata.name(ucs[1]))
except ValueError:
continue
return (ucs, name)
# python 2.6 - 3.3 compatibility
next = __next__
class Style(object):
"""Styling decorator class instance for terminal output."""
# pylint: disable=R0903
# Too few public methods (0/2)
@staticmethod
def attr_major(text):
"""non-stylized callable for "major" text, for non-ttys."""
return text
@staticmethod
def attr_minor(text):
"""non-stylized callable for "minor" text, for non-ttys."""
return text
delimiter = '|'
continuation = ' $'
header_hint = '-'
header_fill = '='
name_len = 10
alignment = 'right'
def __init__(self, **kwargs):
"""
Class constructor.
Any given keyword arguments are assigned to the class attribute of the same name.
"""
for key, val in kwargs.items():
setattr(self, key, val)
class Screen(object):
"""Represents terminal style, data dimensions, and drawables."""
intro_msg_fmt = ('Delimiters ({delim}) should align, '
'unicode version is {version}.')
def __init__(self, term, style, wide=2):
"""Class constructor."""
self.term = term
self.style = style
self.wide = wide
@property
def header(self):
"""Text of joined segments producing full heading."""
return self.head_item * self.num_columns
@property
def hint_width(self):
"""Width of a column segment."""
return sum((len(self.style.delimiter),
self.wide,
len(self.style.delimiter),
len(' '),
UCS_PRINTLEN + 2,
len(' '),
self.style.name_len,))
@property
def head_item(self):
"""Text of a single column heading."""
delimiter = self.style.attr_minor(self.style.delimiter)
hint = self.style.header_hint * self.wide
heading = ('{delimiter}{hint}{delimiter}'
.format(delimiter=delimiter, hint=hint))
def alignment(*args):
if self.style.alignment == 'right':
return self.term.rjust(*args)
return self.term.ljust(*args)
txt = alignment(heading, self.hint_width, self.style.header_fill)
return self.style.attr_major(txt)
def msg_intro(self, version):
"""Introductory message disabled above heading."""
return self.term.center(self.intro_msg_fmt.format(
delim=self.style.attr_minor(self.style.delimiter),
version=self.style.attr_minor(version))).rstrip()
@property
def row_ends(self):
"""Bottom of page."""
return self.term.height - 1
@property
def num_columns(self):
"""Number of columns displayed."""
if self.term.is_a_tty:
return self.term.width // self.hint_width
return 1
@property
def num_rows(self):
"""Number of rows displayed."""
return self.row_ends - self.row_begins - 1
@property
def row_begins(self):
"""Top row displayed for content."""
# pylint: disable=R0201
# Method could be a function (col 4)
return 2
@property
def page_size(self):
"""Number of unicode text displayed per page."""
return self.num_rows * self.num_columns
class Pager(object):
"""A less(1)-like browser for browsing unicode characters."""
# pylint: disable=too-many-instance-attributes
#: screen state for next draw method(s).
STATE_CLEAN, STATE_DIRTY, STATE_REFRESH = 0, 1, 2
def __init__(self, term, screen, character_factory):
"""
Class constructor.
:param term: blessed Terminal class instance.
:type term: blessed.Terminal
:param screen: Screen class instance.
:type screen: Screen
:param character_factory: Character factory generator.
:type character_factory: callable returning iterable.
"""
self.term = term
self.screen = screen
self.character_factory = character_factory
self.unicode_version = 'auto'
self.dirty = self.STATE_REFRESH
self.last_page = 0
self._page_data = list()
def on_resize(self, *args):
"""Signal handler callback for SIGWINCH."""
# pylint: disable=W0613
# Unused argument 'args'
assert self.term.width >= self.screen.hint_width, (
'Screen to small {}, must be at least {}'.format(
self.term.width, self.screen.hint_width))
self._set_lastpage()
self.dirty = self.STATE_REFRESH
def _set_lastpage(self):
"""Calculate value of class attribute ``last_page``."""
self.last_page = (len(self._page_data) - 1) // self.screen.page_size
def display_initialize(self):
"""Display 'please wait' message, and narrow build warning."""
echo(self.term.home + self.term.clear)
echo(self.term.move_y(self.term.height // 2))
echo(self.term.center('Initializing page data ...').rstrip())
flushout()
def initialize_page_data(self):
"""Initialize the page data for the given screen."""
# pylint: disable=attribute-defined-outside-init
if self.term.is_a_tty:
self.display_initialize()
self.character_generator = self.character_factory(
self.screen.wide)
self._page_data = list()
while True:
try:
self._page_data.append(next(self.character_generator))
except StopIteration:
break
self._set_lastpage()
def page_data(self, idx, offset):
"""
Return character data for page of given index and offset.
:param idx: page index.
:type idx: int
:param offset: scrolling region offset of current page.
:type offset: int
:returns: list of tuples in form of ``(ucs, name)``
:rtype: list[(unicode, unicode)]
"""
size = self.screen.page_size
while offset < 0 and idx:
offset += size
idx -= 1
offset = max(0, offset)
while offset >= size:
offset -= size
idx += 1
if idx == self.last_page:
offset = 0
idx = min(max(0, idx), self.last_page)
start = (idx * self.screen.page_size) + offset
end = start + self.screen.page_size
return (idx, offset), self._page_data[start:end]
def _run_notty(self, writer):
"""Pager run method for terminals that are not a tty."""
page_idx = page_offset = 0
while True:
npage_idx, _ = self.draw(writer, page_idx + 1, page_offset)
if npage_idx == self.last_page:
# page displayed was last page, quit.
break
page_idx = npage_idx
self.dirty = self.STATE_DIRTY
def _run_tty(self, writer, reader):
"""Pager run method for terminals that are a tty."""
# allow window-change signal to reflow screen
signal.signal(signal.SIGWINCH, self.on_resize)
page_idx = page_offset = 0
while True:
if self.dirty:
page_idx, page_offset = self.draw(writer,
page_idx,
page_offset)
self.dirty = self.STATE_CLEAN
inp = reader(timeout=0.25)
if inp is not None:
nxt, noff = self.process_keystroke(inp,
page_idx,
page_offset)
if self.dirty:
continue
if not self.dirty:
self.dirty = nxt != page_idx or noff != page_offset
page_idx, page_offset = nxt, noff
if page_idx == -1:
return
def run(self, writer, reader):
"""
Pager entry point.
In interactive mode (terminal is a tty), run until
``process_keystroke()`` detects quit keystroke ('q'). In
non-interactive mode, exit after displaying all unicode points.
:param writer: callable writes to output stream, receiving unicode.
:type writer: callable
:param reader: callable reads keystrokes from input stream, sending
instance of blessed.keyboard.Keystroke.
:type reader: callable
"""
self.initialize_page_data()
if not self.term.is_a_tty:
self._run_notty(writer)
else:
self._run_tty(writer, reader)
def process_keystroke(self, inp, idx, offset):
"""
Process keystroke ``inp``, adjusting screen parameters.
:param inp: return value of blessed.Terminal.inkey().
:type inp: blessed.keyboard.Keystroke
:param idx: page index.
:type idx: int
:param offset: scrolling region offset of current page.
:type offset: int
:returns: tuple of next (idx, offset).
:rtype: (int, int)
"""
if inp.lower() in ('q', 'Q'):
# exit
return (-1, -1)
self._process_keystroke_commands(inp)
idx, offset = self._process_keystroke_movement(inp, idx, offset)
return idx, offset
def _process_keystroke_commands(self, inp):
"""Process keystrokes that issue commands (side effects)."""
if inp in ('1', '2') and self.screen.wide != int(inp):
# change between 1 or 2-character wide mode.
self.screen.wide = int(inp)
self.initialize_page_data()
self.on_resize(None, None)
elif inp == 'c':
# switch on/off combining characters
self.character_factory = (
WcWideCharacterGenerator
if self.character_factory != WcWideCharacterGenerator
else WcCombinedCharacterGenerator)
self.initialize_page_data()
self.on_resize(None, None)
elif inp in ('_', '-'):
# adjust name length -2
nlen = max(1, self.screen.style.name_len - 2)
if nlen != self.screen.style.name_len:
self.screen.style.name_len = nlen
self.on_resize(None, None)
elif inp in ('+', '='):
# adjust name length +2
nlen = min(self.term.width - 8, self.screen.style.name_len + 2)
if nlen != self.screen.style.name_len:
self.screen.style.name_len = nlen
self.on_resize(None, None)
elif inp == 'v':
with self.term.location(x=0, y=self.term.height - 2):
print(self.term.clear_eos())
input_selection_msg = (
"--> Enter unicode version [{versions}] ("
"current: {self.unicode_version}):".format(
versions=', '.join(list_versions()),
self=self))
echo('\n'.join(self.term.wrap(input_selection_msg,
subsequent_indent=' ')))
echo(' ')
flushout()
inp = readline(self.term, width=max(map(len, list_versions())))
if inp.strip() and inp != self.unicode_version:
# set new unicode version -- page data must be
# re-initialized. Any version is legal, underlying
# library performs best-match (with warnings)
self.unicode_version = _wcmatch_version(inp)
self.initialize_page_data()
self.on_resize(None, None)
def _process_keystroke_movement(self, inp, idx, offset):
"""Process keystrokes that adjust index and offset."""
term = self.term
# a little vi-inspired.
if inp in ('y', 'k') or inp.code in (term.KEY_UP,):
# scroll backward 1 line
offset -= self.screen.num_columns
elif inp in ('e', 'j') or inp.code in (term.KEY_ENTER,
term.KEY_DOWN,):
# scroll forward 1 line
offset = offset + self.screen.num_columns
elif inp in ('f', ' ') or inp.code in (term.KEY_PGDOWN,):
# scroll forward 1 page
idx += 1
elif inp == 'b' or inp.code in (term.KEY_PGUP,):
# scroll backward 1 page
idx = max(0, idx - 1)
elif inp == 'F' or inp.code in (term.KEY_SDOWN,):
# scroll forward 10 pages
idx = max(0, idx + 10)
elif inp == 'B' or inp.code in (term.KEY_SUP,):
# scroll backward 10 pages
idx = max(0, idx - 10)
elif inp.code == term.KEY_HOME:
# top
idx, offset = (0, 0)
elif inp == 'G' or inp.code == term.KEY_END:
# bottom
idx, offset = (self.last_page, 0)
elif inp == '\x0c':
self.dirty = True
return idx, offset
def draw(self, writer, idx, offset):
"""
Draw the current page view to ``writer``.
:param callable writer: callable writes to output stream, receiving unicode.
:param int idx: current page index.
:param int offset: scrolling region offset of current page.
:returns: tuple of next (idx, offset).
:rtype: (int, int)
"""
# as our screen can be resized while we're mid-calculation,
# our self.dirty flag can become re-toggled; because we are
# not re-flowing our pagination, we must begin over again.
while self.dirty:
self.draw_heading(writer)
self.dirty = self.STATE_CLEAN
(idx, offset), data = self.page_data(idx, offset)
for txt in self.page_view(data):
writer(txt)
self.draw_status(writer, idx)
flushout()
return idx, offset
def draw_heading(self, writer):
"""
Conditionally redraw screen when ``dirty`` attribute is valued REFRESH.
When Pager attribute ``dirty`` is ``STATE_REFRESH``, cursor is moved
to (0,0), screen is cleared, and heading is displayed.
:param callable writer: callable writes to output stream, receiving unicode.
:return: True if class attribute ``dirty`` is ``STATE_REFRESH``.
:rtype: bool
"""
if self.dirty == self.STATE_REFRESH:
writer(''.join(
(self.term.home, self.term.clear,
self.screen.msg_intro(version=self.unicode_version), '\n',
self.screen.header, '\n',)))
return True
return False
def draw_status(self, writer, idx):
"""
Conditionally draw status bar when output terminal is a tty.
:param callable writer: callable writes to output stream, receiving unicode.
:param int idx: current page position index.
:type idx: int
"""
if self.term.is_a_tty:
writer(self.term.hide_cursor())
style = self.screen.style
writer(self.term.move(self.term.height - 1))
if idx == self.last_page:
last_end = '(END)'
else:
last_end = '/{0}'.format(self.last_page)
txt = ('Page {idx}{last_end} - '
'{q} to quit, [keys: {keyset}]'
.format(idx=style.attr_minor('{0}'.format(idx)),
last_end=style.attr_major(last_end),
keyset=style.attr_major('kjfbvc12-='),
q=style.attr_minor('q')))
writer(self.term.center(txt).rstrip())
def page_view(self, data):
"""
Generator yields text to be displayed for the current unicode pageview.
:param list[(unicode, unicode)] data: The current page's data as tuple
of ``(ucs, name)``.
:returns: generator for full-page text for display
"""
if self.term.is_a_tty:
yield self.term.move(self.screen.row_begins, 0)
# sequence clears to end-of-line
clear_eol = self.term.clear_eol
# sequence clears to end-of-screen
clear_eos = self.term.clear_eos
# track our current column and row, where column is
# the whole segment of unicode value text, and draw
# only self.screen.num_columns before end-of-line.
#
# use clear_eol at end of each row to erase over any
# "ghosted" text, and clear_eos at end of screen to
# clear the same, especially for the final page which
# is often short.
col = 0
for ucs, name in data:
val = self.text_entry(ucs, name)
col += 1
if col == self.screen.num_columns:
col = 0
if self.term.is_a_tty:
val = ''.join((val, clear_eol, '\n'))
else:
val = ''.join((val.rstrip(), '\n'))
yield val
if self.term.is_a_tty:
yield ''.join((clear_eol, '\n', clear_eos))
def text_entry(self, ucs, name):
"""
Display a single column segment row describing ``(ucs, name)``.
:param str ucs: target unicode point character string.
:param str name: name of unicode point.
:return: formatted text for display.
:rtype: unicode
"""
style = self.screen.style
if len(name) > style.name_len:
idx = max(0, style.name_len - len(style.continuation))
name = ''.join((name[:idx], style.continuation if idx else ''))
if style.alignment == 'right':
fmt = ' '.join(('0x{val:0>{ucs_printlen}x}',
'{name:<{name_len}s}',
'{delimiter}{ucs}{delimiter}'
))
else:
fmt = ' '.join(('{delimiter}{ucs}{delimiter}',
'0x{val:0>{ucs_printlen}x}',
'{name:<{name_len}s}'))
delimiter = style.attr_minor(style.delimiter)
if len(ucs) != 1:
# determine display of combining characters
val = ord(ucs[1])
# a combining character displayed of any fg color
# will reset the foreground character of the cell
# combined with (iTerm2, OSX).
disp_ucs = style.attr_major(ucs[0:2])
if len(ucs) > 2:
disp_ucs += ucs[2]
else:
# non-combining
val = ord(ucs)
disp_ucs = style.attr_major(ucs)
return fmt.format(name_len=style.name_len,
ucs_printlen=UCS_PRINTLEN,
delimiter=delimiter,
name=name,
ucs=disp_ucs,
val=val)
def validate_args(opts):
"""Validate and return options provided by docopt parsing."""
if opts['--wide'] is None:
opts['--wide'] = 2
else:
assert opts['--wide'] in ("1", "2"), opts['--wide']
if opts['--alignment'] is None:
opts['--alignment'] = 'left'
else:
assert opts['--alignment'] in ('left', 'right'), opts['--alignment']
opts['--wide'] = int(opts['--wide'])
opts['character_factory'] = WcWideCharacterGenerator
if opts['--combining']:
opts['character_factory'] = WcCombinedCharacterGenerator
return opts
def main(opts):
"""Program entry point."""
term = blessed.Terminal()
style = Style()
# if the terminal supports colors, use a Style instance with some
# standout colors (magenta, cyan).
if term.number_of_colors:
style = Style(attr_major=term.magenta,
attr_minor=term.bright_cyan,
alignment=opts['--alignment'])
style.name_len = 10
screen = Screen(term, style, wide=opts['--wide'])
pager = Pager(term, screen, opts['character_factory'])
with term.location(), term.cbreak(), \
term.fullscreen(), term.hidden_cursor():
pager.run(writer=echo, reader=term.inkey)
return 0
if __name__ == '__main__':
sys.exit(main(validate_args(docopt.docopt(__doc__))))

138
bin/wcwidth-libc-comparator.py Executable file
View File

@ -0,0 +1,138 @@
#!/usr/bin/env python
# coding: utf-8
"""
Manual tests comparing wcwidth.py to libc's wcwidth(3) and wcswidth(3).
https://github.com/jquast/wcwidth
This suite of tests compares the libc return values with the pure-python return
values. Although wcwidth(3) is POSIX, its actual implementation may differ,
so these tests are not guaranteed to be successful on all platforms, especially
where wcwidth(3)/wcswidth(3) is out of date. This is especially true for many
platforms -- usually conforming only to unicode specification 1.0 or 2.0.
This program accepts one optional command-line argument, the unicode version
level for our library to use when comparing to libc.
"""
# pylint: disable=C0103
# Invalid module name "wcwidth-libc-comparator"
# standard imports
from __future__ import print_function
# std imports
import sys
import locale
import warnings
import ctypes.util
import unicodedata
# local
# local imports
import wcwidth
def is_named(ucs):
"""
Whether the unicode point ``ucs`` has a name.
:rtype bool
"""
try:
return bool(unicodedata.name(ucs))
except ValueError:
return False
def is_not_combining(ucs):
return not unicodedata.combining(ucs)
def report_ucs_msg(ucs, wcwidth_libc, wcwidth_local):
"""
Return string report of combining character differences.
:param ucs: unicode point.
:type ucs: unicode
:param wcwidth_libc: libc-wcwidth's reported character length.
:type comb_py: int
:param wcwidth_local: wcwidth's reported character length.
:type comb_wc: int
:rtype: unicode
"""
ucp = (ucs.encode('unicode_escape')[2:]
.decode('ascii')
.upper()
.lstrip('0'))
url = "http://codepoints.net/U+{}".format(ucp)
name = unicodedata.name(ucs)
return (u"libc,ours={},{} [--o{}o--] name={} val={} {}"
" ".format(wcwidth_libc, wcwidth_local, ucs, name, ord(ucs), url))
# use chr() for py3.x,
# unichr() for py2.x
try:
_ = unichr(0)
except NameError as err:
if err.args[0] == "name 'unichr' is not defined":
# pylint: disable=W0622
# Redefining built-in 'unichr' (col 8)
unichr = chr
else:
raise
if sys.maxunicode < 1114111:
warnings.warn('narrow Python build, only a small subset of '
'characters may be tested.')
def _is_equal_wcwidth(libc, ucs, unicode_version):
w_libc = libc.wcwidth(ucs)
w_local = wcwidth.wcwidth(ucs, unicode_version)
assert w_libc == w_local, report_ucs_msg(ucs, w_libc, w_local)
def main(using_locale=('en_US', 'UTF-8',)):
"""
Program entry point.
Load the entire Unicode table into memory, excluding those that:
- are not named (func unicodedata.name returns empty string),
- are combining characters.
Using ``locale``, for each unicode character string compare libc's
wcwidth with local wcwidth.wcwidth() function; when they differ,
report a detailed AssertionError to stdout.
"""
all_ucs = (ucs for ucs in
[unichr(val) for val in range(sys.maxunicode)]
if is_named(ucs) and is_not_combining(ucs))
libc_name = ctypes.util.find_library('c')
if not libc_name:
raise ImportError("Can't find C library.")
libc = ctypes.cdll.LoadLibrary(libc_name)
libc.wcwidth.argtypes = [ctypes.c_wchar, ]
libc.wcwidth.restype = ctypes.c_int
assert getattr(libc, 'wcwidth', None) is not None
assert getattr(libc, 'wcswidth', None) is not None
locale.setlocale(locale.LC_ALL, using_locale)
unicode_version = 'latest'
if len(sys.argv) > 1:
unicode_version = sys.argv[1]
for ucs in all_ucs:
try:
_is_equal_wcwidth(libc, ucs, unicode_version)
except AssertionError as err:
print(err)
if __name__ == '__main__':
main()

38
docs/api.rst Normal file
View File

@ -0,0 +1,38 @@
==========
Public API
==========
This package follows SEMVER_ rules for version, therefor, for all of the
given functions signatures, at example version 1.1.1, you may use version
dependency ``>=1.1.1,<2.0`` for forward compatibility of future wcwidth
versions.
.. autofunction:: wcwidth.wcwidth
.. autofunction:: wcwidth.wcswidth
.. autofunction:: wcwidth.list_versions
.. _SEMVER: https://semver.org
===========
Private API
===========
These functions should only be used for wcwidth development, and not used by
dependent packages except with care and by use of frozen version dependency,
as these functions may change names, signatures, or disappear entirely at any
time in the future, and not reflected by SEMVER rules.
If stable public API for any of the given functions is needed, please suggest a
Pull Request!
.. autofunction:: wcwidth._bisearch
.. autofunction:: wcwidth._wcversion_value
.. autofunction:: wcwidth._wcmatch_version
.. autofunction:: wcwidth._get_package_version
.. autofunction:: wcwidth._wcmatch_version

178
docs/conf.py Normal file
View File

@ -0,0 +1,178 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# wcwidth documentation build configuration file, created by
# sphinx-quickstart on Fri Oct 20 15:18:02 2017.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
# local
# 3rd-party imports
import wcwidth
# -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.autodoc',
'sphinx.ext.doctest',
'sphinx.ext.intersphinx',
'sphinx.ext.coverage',
'sphinx.ext.viewcode']
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = 'wcwidth'
copyright = '2017, Jeff Quast'
author = 'Jeff Quast'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version,
# The full version, including alpha/beta/rc tags.
release = version = wcwidth.__version__
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = []
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = False
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'alabaster'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
# html_theme_options = {}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
#
# This is required for the alabaster theme
# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
# html_sidebars = {
# '**': [
# 'about.html',
# 'navigation.html',
# 'relations.html', # needs 'show_related': True theme option to display
# 'searchbox.html',
# 'donate.html',
# ]
# }
# -- Options for HTMLHelp output ------------------------------------------
# Output file base name for HTML help builder.
htmlhelp_basename = 'wcwidthdoc'
# -- Options for LaTeX output ---------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'wcwidth.tex', 'wcwidth Documentation',
'Jeff Quast', 'manual'),
]
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'wcwidth', 'wcwidth Documentation',
[author], 1)
]
# -- Options for Texinfo output -------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'wcwidth', 'wcwidth Documentation',
author, 'wcwidth', 'One line description of project.',
'Miscellaneous'),
]
intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}

15
docs/index.rst Normal file
View File

@ -0,0 +1,15 @@
wcwidth
=======
.. toctree::
intro
unicode_version
api
Indices and tables
------------------
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

280
docs/intro.rst Normal file
View File

@ -0,0 +1,280 @@
|pypi_downloads| |codecov| |license|
============
Introduction
============
This library is mainly for CLI programs that carefully produce output for
Terminals, or make pretend to be an emulator.
**Problem Statement**: The printable length of *most* strings are equal to the
number of cells they occupy on the screen ``1 charater : 1 cell``. However,
there are categories of characters that *occupy 2 cells* (full-wide), and
others that *occupy 0* cells (zero-width).
**Solution**: POSIX.1-2001 and POSIX.1-2008 conforming systems provide
`wcwidth(3)`_ and `wcswidth(3)`_ C functions of which this python module's
functions precisely copy. *These functions return the number of cells a
unicode string is expected to occupy.*
Installation
------------
The stable version of this package is maintained on pypi, install using pip::
pip install wcwidth
Example
-------
**Problem**: given the following phrase (Japanese),
>>> text = u'コンニチハ'
Python **incorrectly** uses the *string length* of 5 codepoints rather than the
*printible length* of 10 cells, so that when using the `rjust` function, the
output length is wrong::
>>> print(len('コンニチハ'))
5
>>> print('コンニチハ'.rjust(20, '_'))
_____コンニチハ
By defining our own "rjust" function that uses wcwidth, we can correct this::
>>> def wc_rjust(text, length, padding=' '):
... from wcwidth import wcswidth
... return padding * max(0, (length - wcswidth(text))) + text
...
Our **Solution** uses wcswidth to determine the string length correctly::
>>> from wcwidth import wcswidth
>>> print(wcswidth('コンニチハ'))
10
>>> print(wc_rjust('コンニチハ', 20, '_'))
__________コンニチハ
Choosing a Version
------------------
Export an environment variable, ``UNICODE_VERSION``. This should be done by
*terminal emulators* or those developers experimenting with authoring one of
their own, from shell::
$ export UNICODE_VERSION=13.0
If unspecified, the latest version is used. If your Terminal Emulator does not
export this variable, you can use the `jquast/ucs-detect`_ utility to
automatically detect and export it to your shell.
wcwidth, wcswidth
-----------------
Use function ``wcwidth()`` to determine the length of a *single unicode
character*, and ``wcswidth()`` to determine the length of many, a *string
of unicode characters*.
Briefly, return values of function ``wcwidth()`` are:
``-1``
Indeterminate (not printable).
``0``
Does not advance the cursor, such as NULL or Combining.
``2``
Characters of category East Asian Wide (W) or East Asian
Full-width (F) which are displayed using two terminal cells.
``1``
All others.
Function ``wcswidth()`` simply returns the sum of all values for each character
along a string, or ``-1`` when it occurs anywhere along a string.
Full API Documentation at http://wcwidth.readthedocs.org
==========
Developing
==========
Install wcwidth in editable mode::
pip install -e.
Execute unit tests using tox_::
tox
Regenerate python code tables from latest Unicode Specification data files::
tox -eupdate
Supplementary tools for browsing and testing terminals for wide unicode
characters are found in the `bin/`_ of this project's source code. Just ensure
to first ``pip install -erequirements-develop.txt`` from this projects main
folder. For example, an interactive browser for testing::
./bin/wcwidth-browser.py
Uses
----
This library is used in:
- `jquast/blessed`_: a thin, practical wrapper around terminal capabilities in
Python.
- `jonathanslenders/python-prompt-toolkit`_: a Library for building powerful
interactive command lines in Python.
- `dbcli/pgcli`_: Postgres CLI with autocompletion and syntax highlighting.
- `thomasballinger/curtsies`_: a Curses-like terminal wrapper with a display
based on compositing 2d arrays of text.
- `selectel/pyte`_: Simple VTXXX-compatible linux terminal emulator.
- `astanin/python-tabulate`_: Pretty-print tabular data in Python, a library
and a command-line utility.
- `LuminosoInsight/python-ftfy`_: Fixes mojibake and other glitches in Unicode
text.
- `nbedos/termtosvg`_: Terminal recorder that renders sessions as SVG
animations.
- `peterbrittain/asciimatics`_: Package to help people create full-screen text
UIs.
Other Languages
---------------
- `timoxley/wcwidth`_: JavaScript
- `janlelis/unicode-display_width`_: Ruby
- `alecrabbit/php-wcwidth`_: PHP
- `Text::CharWidth`_: Perl
- `bluebear94/Terminal-WCWidth`: Perl 6
- `mattn/go-runewidth`_: Go
- `emugel/wcwidth`_: Haxe
- `aperezdc/lua-wcwidth`: Lua
- `joachimschmidt557/zig-wcwidth`: Zig
- `fumiyas/wcwidth-cjk`: `LD_PRELOAD` override
- `joshuarubin/wcwidth9`: Unicode version 9 in C
History
-------
0.2.0 *2020-06-01*
* **Enhancement**: Unicode version may be selected by exporting the
Environment variable ``UNICODE_VERSION``, such as ``13.0``, or ``6.3.0``.
See the `jquast/ucs-detect`_ CLI utility for automatic detection.
* **Enhancement**:
API Documentation is published to readthedocs.org.
* **Updated** tables for *all* Unicode Specifications with files
published in a programmatically consumable format, versions 4.1.0
through 13.0
that are published
, versions
0.1.9 *2020-03-22*
* **Performance** optimization by `Avram Lubkin`_, `PR #35`_.
* **Updated** tables to Unicode Specification 13.0.0.
0.1.8 *2020-01-01*
* **Updated** tables to Unicode Specification 12.0.0. (`PR #30`_).
0.1.7 *2016-07-01*
* **Updated** tables to Unicode Specification 9.0.0. (`PR #18`_).
0.1.6 *2016-01-08 Production/Stable*
* ``LICENSE`` file now included with distribution.
0.1.5 *2015-09-13 Alpha*
* **Bugfix**:
Resolution of "combining_ character width" issue, most especially
those that previously returned -1 now often (correctly) return 0.
resolved by `Philip Craig`_ via `PR #11`_.
* **Deprecated**:
The module path ``wcwidth.table_comb`` is no longer available,
it has been superseded by module path ``wcwidth.table_zero``.
0.1.4 *2014-11-20 Pre-Alpha*
* **Feature**: ``wcswidth()`` now determines printable length
for (most) combining_ characters. The developer's tool
`bin/wcwidth-browser.py`_ is improved to display combining_
characters when provided the ``--combining`` option
(`Thomas Ballinger`_ and `Leta Montopoli`_ `PR #5`_).
* **Feature**: added static analysis (prospector_) to testing
framework.
0.1.3 *2014-10-29 Pre-Alpha*
* **Bugfix**: 2nd parameter of wcswidth was not honored.
(`Thomas Ballinger`_, `PR #4`_).
0.1.2 *2014-10-28 Pre-Alpha*
* **Updated** tables to Unicode Specification 7.0.0.
(`Thomas Ballinger`_, `PR #3`_).
0.1.1 *2014-05-14 Pre-Alpha*
* Initial release to pypi, Based on Unicode Specification 6.3.0
This code was originally derived directly from C code of the same name,
whose latest version is available at
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
*
* Permission to use, copy, modify, and distribute this software
* for any purpose and without fee is hereby granted. The author
* disclaims all warranties with regard to this software.
.. _`tox`: https://testrun.org/tox/latest/install.html
.. _`prospector`: https://github.com/landscapeio/prospector
.. _`combining`: https://en.wikipedia.org/wiki/Combining_character
.. _`bin/`: https://github.com/jquast/wcwidth/tree/master/bin
.. _`bin/wcwidth-browser.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py
.. _`Thomas Ballinger`: https://github.com/thomasballinger
.. _`Leta Montopoli`: https://github.com/lmontopo
.. _`Philip Craig`: https://github.com/philipc
.. _`PR #3`: https://github.com/jquast/wcwidth/pull/3
.. _`PR #4`: https://github.com/jquast/wcwidth/pull/4
.. _`PR #5`: https://github.com/jquast/wcwidth/pull/5
.. _`PR #11`: https://github.com/jquast/wcwidth/pull/11
.. _`PR #18`: https://github.com/jquast/wcwidth/pull/18
.. _`PR #30`: https://github.com/jquast/wcwidth/pull/30
.. _`PR #35`: https://github.com/jquast/wcwidth/pull/35
.. _`jquast/blessed`: https://github.com/jquast/blessed
.. _`selectel/pyte`: https://github.com/selectel/pyte
.. _`thomasballinger/curtsies`: https://github.com/thomasballinger/curtsies
.. _`dbcli/pgcli`: https://github.com/dbcli/pgcli
.. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit
.. _`timoxley/wcwidth`: https://github.com/timoxley/wcwidth
.. _`wcwidth(3)`: http://man7.org/linux/man-pages/man3/wcwidth.3.html
.. _`wcswidth(3)`: http://man7.org/linux/man-pages/man3/wcswidth.3.html
.. _`astanin/python-tabulate`: https://github.com/astanin/python-tabulate
.. _`janlelis/unicode-display_width`: https://github.com/janlelis/unicode-display_width
.. _`LuminosoInsight/python-ftfy`: https://github.com/LuminosoInsight/python-ftfy
.. _`alecrabbit/php-wcwidth`: https://github.com/alecrabbit/php-wcwidth
.. _`Text::CharWidth`: https://metacpan.org/pod/Text::CharWidth
.. _`bluebear94/Terminal-WCWidth`: https://github.com/bluebear94/Terminal-WCWidth
.. _`mattn/go-runewidth`: https://github.com/mattn/go-runewidth
.. _`emugel/wcwidth`: https://github.com/emugel/wcwidth
.. _`jquast/ucs-detect`: https://github.com/jquast/ucs-detect
.. _`Avram Lubkin`: https://github.com/avylove
.. _`nbedos/termtosvg`: https://github.com/nbedos/termtosvg
.. _`peterbrittain/asciimatics`: https://github.com/peterbrittain/asciimatics
.. _`aperezdc/lua-wcwidth`: https://github.com/aperezdc/lua-wcwidth
.. _`fumiyas/wcwidth-cjk`: https://github.com/fumiyas/wcwidth-cjk
.. |pypi_downloads| image:: https://img.shields.io/pypi/dm/wcwidth.svg?logo=pypi
:alt: Downloads
:target: https://pypi.org/project/wcwidth/
.. |codecov| image:: https://codecov.io/gh/jquast/wcwidth/branch/master/graph/badge.svg
:alt: codecov.io Code Coverage
:target: https://codecov.io/gh/jquast/wcwidth/
.. |license| image:: https://img.shields.io/github/license/jquast/wcwidth.svg
:target: https://pypi.python.org/pypi/wcwidth/
:alt: MIT License

4
docs/requirements.txt Normal file
View File

@ -0,0 +1,4 @@
Sphinx
sphinx-paramlinks
sphinx_rtd_theme
sphinxcontrib-manpage

104
docs/unicode_version.rst Normal file
View File

@ -0,0 +1,104 @@
=====================
Unicode release files
=====================
This library aims to be forward-looking, portable, and most correct.
The most current release of this API is based on the Unicode Standard
release files:
``DerivedGeneralCategory-4.1.0.txt``
*Date: 2005-02-26, 02:35:50 GMT [MD]*
``DerivedGeneralCategory-5.0.0.txt``
*Date: 2006-02-27, 23:41:27 GMT [MD]*
``DerivedGeneralCategory-5.1.0.txt``
*Date: 2008-03-20, 17:54:57 GMT [MD]*
``DerivedGeneralCategory-5.2.0.txt``
*Date: 2009-08-22, 04:58:21 GMT [MD]*
``DerivedGeneralCategory-6.0.0.txt``
*Date: 2010-08-19, 00:48:09 GMT [MD]*
``DerivedGeneralCategory-6.1.0.txt``
*Date: 2011-11-27, 05:10:22 GMT [MD]*
``DerivedGeneralCategory-6.2.0.txt``
*Date: 2012-05-20, 00:42:34 GMT [MD]*
``DerivedGeneralCategory-6.3.0.txt``
*Date: 2013-07-05, 14:08:45 GMT [MD]*
``DerivedGeneralCategory-7.0.0.txt``
*Date: 2014-02-07, 18:42:12 GMT [MD]*
``DerivedGeneralCategory-8.0.0.txt``
*Date: 2015-02-13, 13:47:11 GMT [MD]*
``DerivedGeneralCategory-9.0.0.txt``
*Date: 2016-06-01, 10:34:26 GMT*
``DerivedGeneralCategory-10.0.0.txt``
*Date: 2017-03-08, 08:41:49 GMT*
``DerivedGeneralCategory-11.0.0.txt``
*Date: 2018-02-21, 05:34:04 GMT*
``DerivedGeneralCategory-12.0.0.txt``
*Date: 2019-01-22, 08:18:28 GMT*
``DerivedGeneralCategory-12.1.0.txt``
*Date: 2019-03-10, 10:53:08 GMT*
``DerivedGeneralCategory-13.0.0.txt``
*Date: 2019-10-21, 14:30:32 GMT*
``EastAsianWidth-4.1.0.txt``
*Date: 2005-03-17, 15:21:00 PST [KW]*
``EastAsianWidth-5.0.0.txt``
*Date: 2006-02-15, 14:39:00 PST [KW]*
``EastAsianWidth-5.1.0.txt``
*Date: 2008-03-20, 17:42:00 PDT [KW]*
``EastAsianWidth-5.2.0.txt``
*Date: 2009-06-09, 17:47:00 PDT [KW]*
``EastAsianWidth-6.0.0.txt``
*Date: 2010-08-17, 12:17:00 PDT [KW]*
``EastAsianWidth-6.1.0.txt``
*Date: 2011-09-19, 18:46:00 GMT [KW]*
``EastAsianWidth-6.2.0.txt``
*Date: 2012-05-15, 18:30:00 GMT [KW]*
``EastAsianWidth-6.3.0.txt``
*Date: 2013-02-05, 20:09:00 GMT [KW, LI]*
``EastAsianWidth-7.0.0.txt``
*Date: 2014-02-28, 23:15:00 GMT [KW, LI]*
``EastAsianWidth-8.0.0.txt``
*Date: 2015-02-10, 21:00:00 GMT [KW, LI]*
``EastAsianWidth-9.0.0.txt``
*Date: 2016-05-27, 17:00:00 GMT [KW, LI]*
``EastAsianWidth-10.0.0.txt``
*Date: 2017-03-08, 02:00:00 GMT [KW, LI]*
``EastAsianWidth-11.0.0.txt``
*Date: 2018-05-14, 09:41:59 GMT [KW, LI]*
``EastAsianWidth-12.0.0.txt``
*Date: 2019-01-21, 14:12:58 GMT [KW, LI]*
``EastAsianWidth-12.1.0.txt``
*Date: 2019-03-31, 22:01:58 GMT [KW, LI]*
``EastAsianWidth-13.0.0.txt``
*Date: 2029-01-21, 18:14:00 GMT [KW, LI]*

10
setup.cfg Normal file
View File

@ -0,0 +1,10 @@
[bdist_wheel]
universal = 1
[metadata]
license_file = LICENSE
[egg_info]
tag_build =
tag_date = 0

99
setup.py Executable file
View File

@ -0,0 +1,99 @@
#!/usr/bin/env python
"""
Setup.py distribution file for wcwidth.
https://github.com/jquast/wcwidth
"""
# std imports
import os
import codecs
# 3rd party
import setuptools
def _get_here(fname):
return os.path.join(os.path.dirname(__file__), fname)
class _SetupUpdate(setuptools.Command):
# This is a compatibility, some downstream distributions might
# still call "setup.py update".
#
# New entry point is tox, 'tox -eupdate'.
description = "Fetch and update unicode code tables"
user_options = []
def initialize_options(self):
pass
def finalize_options(self):
pass
def run(self):
import sys
import subprocess
retcode = subprocess.Popen([
sys.executable,
_get_here(os.path.join('bin', 'update-tables.py'))]).wait()
assert retcode == 0, ('non-zero exit code', retcode)
def main():
"""Setup.py entry point."""
setuptools.setup(
name='wcwidth',
# NOTE: manually manage __version__ in wcwidth/__init__.py !
version='0.2.5',
description=(
"Measures the displayed width of unicode strings in a terminal"),
long_description=codecs.open(
_get_here('README.rst'), 'rb', 'utf8').read(),
author='Jeff Quast',
author_email='contact@jeffquast.com',
install_requires=('backports.functools-lru-cache>=1.2.1;'
'python_version < "3.2"'),
license='MIT',
packages=['wcwidth'],
url='https://github.com/jquast/wcwidth',
package_data={
'wcwidth': ['*.json'],
'': ['LICENSE', '*.rst'],
},
zip_safe=True,
classifiers=[
'Intended Audience :: Developers',
'Natural Language :: English',
'Development Status :: 5 - Production/Stable',
'Environment :: Console',
'License :: OSI Approved :: MIT License',
'Operating System :: POSIX',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
'Topic :: Software Development :: Libraries',
'Topic :: Software Development :: Localization',
'Topic :: Software Development :: Internationalization',
'Topic :: Terminals'
],
keywords=[
'cjk',
'combining',
'console',
'eastasian',
'emoji'
'emulator',
'terminal',
'unicode',
'wcswidth',
'wcwidth',
'xterm',
],
cmdclass={'update': _SetupUpdate},
)
if __name__ == '__main__':
main()

1
tests/__init__.py Normal file
View File

@ -0,0 +1 @@
"""This file intentionally left blank."""

154
tests/test_core.py Executable file
View File

@ -0,0 +1,154 @@
# coding: utf-8
"""Core tests for wcwidth module."""
# 3rd party
import pkg_resources
# local
import wcwidth
def test_package_version():
"""wcwidth.__version__ is expected value."""
# given,
expected = pkg_resources.get_distribution('wcwidth').version
# exercise,
result = wcwidth.__version__
# verify.
assert result == expected
def test_hello_jp():
u"""
Width of Japanese phrase: コンニチハ, セカイ!
Given a phrase of 5 and 3 Katakana ideographs, joined with
3 English-ASCII punctuation characters, totaling 11, this
phrase consumes 19 cells of a terminal emulator.
"""
# given,
phrase = u'コンニチハ, セカイ!'
expect_length_each = (2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1)
expect_length_phrase = sum(expect_length_each)
# exercise,
length_each = tuple(map(wcwidth.wcwidth, phrase))
length_phrase = wcwidth.wcswidth(phrase)
# verify.
assert length_each == expect_length_each
assert length_phrase == expect_length_phrase
def test_wcswidth_substr():
"""
Test wcswidth() optional 2nd parameter, ``n``.
``n`` determines at which position of the string
to stop counting length.
"""
# given,
phrase = u'コンニチハ, セカイ!'
end = 7
expect_length_each = (2, 2, 2, 2, 2, 1, 1,)
expect_length_phrase = sum(expect_length_each)
# exercise,
length_phrase = wcwidth.wcswidth(phrase, end)
# verify.
assert length_phrase == expect_length_phrase
def test_null_width_0():
"""NULL (0) reports width 0."""
# given,
phrase = u'abc\x00def'
expect_length_each = (1, 1, 1, 0, 1, 1, 1)
expect_length_phrase = sum(expect_length_each)
# exercise,
length_each = tuple(map(wcwidth.wcwidth, phrase))
length_phrase = wcwidth.wcswidth(phrase, len(phrase))
# verify.
assert length_each == expect_length_each
assert length_phrase == expect_length_phrase
def test_control_c0_width_negative_1():
"""CSI (Control sequence initiate) reports width -1 for ESC."""
# given,
phrase = u'\x1b[0m'
expect_length_each = (-1, 1, 1, 1)
expect_length_phrase = -1
# exercise,
length_each = tuple(map(wcwidth.wcwidth, phrase))
length_phrase = wcwidth.wcswidth(phrase, len(phrase))
# verify.
assert length_each == expect_length_each
assert length_phrase == expect_length_phrase
def test_combining_width():
"""Simple test combining reports total width of 4."""
# given,
phrase = u'--\u05bf--'
expect_length_each = (1, 1, 0, 1, 1)
expect_length_phrase = 4
# exercise,
length_each = tuple(map(wcwidth.wcwidth, phrase))
length_phrase = wcwidth.wcswidth(phrase, len(phrase))
# verify.
assert length_each == expect_length_each
assert length_phrase == expect_length_phrase
def test_combining_cafe():
u"""Phrase cafe + COMBINING ACUTE ACCENT is café of length 4."""
phrase = u"cafe\u0301"
expect_length_each = (1, 1, 1, 1, 0)
expect_length_phrase = 4
# exercise,
length_each = tuple(map(wcwidth.wcwidth, phrase))
length_phrase = wcwidth.wcswidth(phrase, len(phrase))
# verify.
assert length_each == expect_length_each
assert length_phrase == expect_length_phrase
def test_combining_enclosing():
u"""CYRILLIC CAPITAL LETTER A + COMBINING CYRILLIC HUNDRED THOUSANDS SIGN is А҈ of length 1."""
phrase = u"\u0410\u0488"
expect_length_each = (1, 0)
expect_length_phrase = 1
# exercise,
length_each = tuple(map(wcwidth.wcwidth, phrase))
length_phrase = wcwidth.wcswidth(phrase, len(phrase))
# verify.
assert length_each == expect_length_each
assert length_phrase == expect_length_phrase
def test_combining_spacing():
u"""Balinese kapal (ship) is ᬓᬨᬮ᭄ of length 4."""
phrase = u"\u1B13\u1B28\u1B2E\u1B44"
expect_length_each = (1, 1, 1, 1)
expect_length_phrase = 4
# exercise,
length_each = tuple(map(wcwidth.wcwidth, phrase))
length_phrase = wcwidth.wcswidth(phrase, len(phrase))
# verify.
assert length_each == expect_length_each
assert length_phrase == expect_length_phrase

184
tests/test_ucslevel.py Normal file
View File

@ -0,0 +1,184 @@
# coding: utf-8
"""Unicode version level tests for wcwidth."""
# std imports
import json
import warnings
# 3rd party
import pytest
import pkg_resources
# local
import wcwidth
def test_latest():
"""wcwidth._wcmatch_version('latest') returns tail item."""
# given,
expected = wcwidth.list_versions()[-1]
# exercise,
result = wcwidth._wcmatch_version('latest')
# verify.
assert result == expected
def test_exact_410_str():
"""wcwidth._wcmatch_version('4.1.0') returns equal value (str)."""
# given,
given = expected = '4.1.0'
# exercise,
result = wcwidth._wcmatch_version(given)
# verify.
assert result == expected
def test_exact_410_unicode():
"""wcwidth._wcmatch_version(u'4.1.0') returns equal value (unicode)."""
# given,
given = expected = u'4.1.0'
# exercise,
result = wcwidth._wcmatch_version(given)
# verify.
assert result == expected
def test_nearest_505_str():
"""wcwidth._wcmatch_version('5.0.5') returns nearest '5.0.0'. (str)"""
# given
given, expected = '5.0.5', '5.0.0'
# exercise
result = wcwidth._wcmatch_version(given)
# verify.
assert result == expected
def test_nearest_505_unicode():
"""wcwidth._wcmatch_version(u'5.0.5') returns nearest u'5.0.0'. (unicode)"""
# given
given, expected = u'5.0.5', u'5.0.0'
# exercise
result = wcwidth._wcmatch_version(given)
# verify.
assert result == expected
def test_nearest_lowint40_str():
"""wcwidth._wcmatch_version('4.0') returns nearest '4.1.0'."""
# given
given, expected = '4.0', '4.1.0'
warnings.resetwarnings()
wcwidth._wcmatch_version.cache_clear()
# exercise
with pytest.warns(UserWarning):
# warns that given version is lower than any available
result = wcwidth._wcmatch_version(given)
# verify.
assert result == expected
def test_nearest_lowint40_unicode():
"""wcwidth._wcmatch_version(u'4.0') returns nearest u'4.1.0'."""
# given
given, expected = u'4.0', u'4.1.0'
warnings.resetwarnings()
wcwidth._wcmatch_version.cache_clear()
# exercise
with pytest.warns(UserWarning):
# warns that given version is lower than any available
result = wcwidth._wcmatch_version(given)
# verify.
assert result == expected
def test_nearest_800_str():
"""wcwidth._wcmatch_version('8') returns nearest '8.0.0'."""
# given
given, expected = '8', '8.0.0'
# exercise
result = wcwidth._wcmatch_version(given)
# verify.
assert result == expected
def test_nearest_800_unicode():
"""wcwidth._wcmatch_version(u'8') returns nearest u'8.0.0'."""
# given
given, expected = u'8', u'8.0.0'
# exercise
result = wcwidth._wcmatch_version(given)
# verify.
assert result == expected
def test_nearest_999_str():
"""wcwidth._wcmatch_version('999.0') returns nearest (latest)."""
# given
given, expected = '999.0', wcwidth.list_versions()[-1]
# exercise
result = wcwidth._wcmatch_version(given)
# verify.
assert result == expected
def test_nearest_999_unicode():
"""wcwidth._wcmatch_version(u'999.0') returns nearest (latest)."""
# given
given, expected = u'999.0', wcwidth.list_versions()[-1]
# exercise
result = wcwidth._wcmatch_version(given)
# verify.
assert result == expected
def test_nonint_unicode():
"""wcwidth._wcmatch_version(u'x.y.z') returns latest (unicode)."""
# given
given, expected = u'x.y.z', wcwidth.list_versions()[-1]
warnings.resetwarnings()
wcwidth._wcmatch_version.cache_clear()
# exercise
with pytest.warns(UserWarning):
# warns that given version is not valid
result = wcwidth._wcmatch_version(given)
# verify.
assert result == expected
def test_nonint_str():
"""wcwidth._wcmatch_version(u'x.y.z') returns latest (str)."""
# given
given, expected = 'x.y.z', wcwidth.list_versions()[-1]
warnings.resetwarnings()
wcwidth._wcmatch_version.cache_clear()
# exercise
with pytest.warns(UserWarning):
# warns that given version is not valid
result = wcwidth._wcmatch_version(given)
# verify.
assert result == expected

306
wcwidth.egg-info/PKG-INFO Normal file
View File

@ -0,0 +1,306 @@
Metadata-Version: 1.1
Name: wcwidth
Version: 0.2.5
Summary: Measures the displayed width of unicode strings in a terminal
Home-page: https://github.com/jquast/wcwidth
Author: Jeff Quast
Author-email: contact@jeffquast.com
License: MIT
Description: |pypi_downloads| |codecov| |license|
============
Introduction
============
This library is mainly for CLI programs that carefully produce output for
Terminals, or make pretend to be an emulator.
**Problem Statement**: The printable length of *most* strings are equal to the
number of cells they occupy on the screen ``1 charater : 1 cell``. However,
there are categories of characters that *occupy 2 cells* (full-wide), and
others that *occupy 0* cells (zero-width).
**Solution**: POSIX.1-2001 and POSIX.1-2008 conforming systems provide
`wcwidth(3)`_ and `wcswidth(3)`_ C functions of which this python module's
functions precisely copy. *These functions return the number of cells a
unicode string is expected to occupy.*
Installation
------------
The stable version of this package is maintained on pypi, install using pip::
pip install wcwidth
Example
-------
**Problem**: given the following phrase (Japanese),
>>> text = u'コンニチハ'
Python **incorrectly** uses the *string length* of 5 codepoints rather than the
*printible length* of 10 cells, so that when using the `rjust` function, the
output length is wrong::
>>> print(len('コンニチハ'))
5
>>> print('コンニチハ'.rjust(20, '_'))
_____コンニチハ
By defining our own "rjust" function that uses wcwidth, we can correct this::
>>> def wc_rjust(text, length, padding=' '):
... from wcwidth import wcswidth
... return padding * max(0, (length - wcswidth(text))) + text
...
Our **Solution** uses wcswidth to determine the string length correctly::
>>> from wcwidth import wcswidth
>>> print(wcswidth('コンニチハ'))
10
>>> print(wc_rjust('コンニチハ', 20, '_'))
__________コンニチハ
Choosing a Version
------------------
Export an environment variable, ``UNICODE_VERSION``. This should be done by
*terminal emulators* or those developers experimenting with authoring one of
their own, from shell::
$ export UNICODE_VERSION=13.0
If unspecified, the latest version is used. If your Terminal Emulator does not
export this variable, you can use the `jquast/ucs-detect`_ utility to
automatically detect and export it to your shell.
wcwidth, wcswidth
-----------------
Use function ``wcwidth()`` to determine the length of a *single unicode
character*, and ``wcswidth()`` to determine the length of many, a *string
of unicode characters*.
Briefly, return values of function ``wcwidth()`` are:
``-1``
Indeterminate (not printable).
``0``
Does not advance the cursor, such as NULL or Combining.
``2``
Characters of category East Asian Wide (W) or East Asian
Full-width (F) which are displayed using two terminal cells.
``1``
All others.
Function ``wcswidth()`` simply returns the sum of all values for each character
along a string, or ``-1`` when it occurs anywhere along a string.
Full API Documentation at http://wcwidth.readthedocs.org
==========
Developing
==========
Install wcwidth in editable mode::
pip install -e.
Execute unit tests using tox_::
tox
Regenerate python code tables from latest Unicode Specification data files::
tox -eupdate
Supplementary tools for browsing and testing terminals for wide unicode
characters are found in the `bin/`_ of this project's source code. Just ensure
to first ``pip install -erequirements-develop.txt`` from this projects main
folder. For example, an interactive browser for testing::
./bin/wcwidth-browser.py
Uses
----
This library is used in:
- `jquast/blessed`_: a thin, practical wrapper around terminal capabilities in
Python.
- `jonathanslenders/python-prompt-toolkit`_: a Library for building powerful
interactive command lines in Python.
- `dbcli/pgcli`_: Postgres CLI with autocompletion and syntax highlighting.
- `thomasballinger/curtsies`_: a Curses-like terminal wrapper with a display
based on compositing 2d arrays of text.
- `selectel/pyte`_: Simple VTXXX-compatible linux terminal emulator.
- `astanin/python-tabulate`_: Pretty-print tabular data in Python, a library
and a command-line utility.
- `LuminosoInsight/python-ftfy`_: Fixes mojibake and other glitches in Unicode
text.
- `nbedos/termtosvg`_: Terminal recorder that renders sessions as SVG
animations.
- `peterbrittain/asciimatics`_: Package to help people create full-screen text
UIs.
Other Languages
---------------
- `timoxley/wcwidth`_: JavaScript
- `janlelis/unicode-display_width`_: Ruby
- `alecrabbit/php-wcwidth`_: PHP
- `Text::CharWidth`_: Perl
- `bluebear94/Terminal-WCWidth`: Perl 6
- `mattn/go-runewidth`_: Go
- `emugel/wcwidth`_: Haxe
- `aperezdc/lua-wcwidth`: Lua
- `joachimschmidt557/zig-wcwidth`: Zig
- `fumiyas/wcwidth-cjk`: `LD_PRELOAD` override
- `joshuarubin/wcwidth9`: Unicode version 9 in C
History
-------
0.2.0 *2020-06-01*
* **Enhancement**: Unicode version may be selected by exporting the
Environment variable ``UNICODE_VERSION``, such as ``13.0``, or ``6.3.0``.
See the `jquast/ucs-detect`_ CLI utility for automatic detection.
* **Enhancement**:
API Documentation is published to readthedocs.org.
* **Updated** tables for *all* Unicode Specifications with files
published in a programmatically consumable format, versions 4.1.0
through 13.0
that are published
, versions
0.1.9 *2020-03-22*
* **Performance** optimization by `Avram Lubkin`_, `PR #35`_.
* **Updated** tables to Unicode Specification 13.0.0.
0.1.8 *2020-01-01*
* **Updated** tables to Unicode Specification 12.0.0. (`PR #30`_).
0.1.7 *2016-07-01*
* **Updated** tables to Unicode Specification 9.0.0. (`PR #18`_).
0.1.6 *2016-01-08 Production/Stable*
* ``LICENSE`` file now included with distribution.
0.1.5 *2015-09-13 Alpha*
* **Bugfix**:
Resolution of "combining_ character width" issue, most especially
those that previously returned -1 now often (correctly) return 0.
resolved by `Philip Craig`_ via `PR #11`_.
* **Deprecated**:
The module path ``wcwidth.table_comb`` is no longer available,
it has been superseded by module path ``wcwidth.table_zero``.
0.1.4 *2014-11-20 Pre-Alpha*
* **Feature**: ``wcswidth()`` now determines printable length
for (most) combining_ characters. The developer's tool
`bin/wcwidth-browser.py`_ is improved to display combining_
characters when provided the ``--combining`` option
(`Thomas Ballinger`_ and `Leta Montopoli`_ `PR #5`_).
* **Feature**: added static analysis (prospector_) to testing
framework.
0.1.3 *2014-10-29 Pre-Alpha*
* **Bugfix**: 2nd parameter of wcswidth was not honored.
(`Thomas Ballinger`_, `PR #4`_).
0.1.2 *2014-10-28 Pre-Alpha*
* **Updated** tables to Unicode Specification 7.0.0.
(`Thomas Ballinger`_, `PR #3`_).
0.1.1 *2014-05-14 Pre-Alpha*
* Initial release to pypi, Based on Unicode Specification 6.3.0
This code was originally derived directly from C code of the same name,
whose latest version is available at
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
*
* Permission to use, copy, modify, and distribute this software
* for any purpose and without fee is hereby granted. The author
* disclaims all warranties with regard to this software.
.. _`tox`: https://testrun.org/tox/latest/install.html
.. _`prospector`: https://github.com/landscapeio/prospector
.. _`combining`: https://en.wikipedia.org/wiki/Combining_character
.. _`bin/`: https://github.com/jquast/wcwidth/tree/master/bin
.. _`bin/wcwidth-browser.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py
.. _`Thomas Ballinger`: https://github.com/thomasballinger
.. _`Leta Montopoli`: https://github.com/lmontopo
.. _`Philip Craig`: https://github.com/philipc
.. _`PR #3`: https://github.com/jquast/wcwidth/pull/3
.. _`PR #4`: https://github.com/jquast/wcwidth/pull/4
.. _`PR #5`: https://github.com/jquast/wcwidth/pull/5
.. _`PR #11`: https://github.com/jquast/wcwidth/pull/11
.. _`PR #18`: https://github.com/jquast/wcwidth/pull/18
.. _`PR #30`: https://github.com/jquast/wcwidth/pull/30
.. _`PR #35`: https://github.com/jquast/wcwidth/pull/35
.. _`jquast/blessed`: https://github.com/jquast/blessed
.. _`selectel/pyte`: https://github.com/selectel/pyte
.. _`thomasballinger/curtsies`: https://github.com/thomasballinger/curtsies
.. _`dbcli/pgcli`: https://github.com/dbcli/pgcli
.. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit
.. _`timoxley/wcwidth`: https://github.com/timoxley/wcwidth
.. _`wcwidth(3)`: http://man7.org/linux/man-pages/man3/wcwidth.3.html
.. _`wcswidth(3)`: http://man7.org/linux/man-pages/man3/wcswidth.3.html
.. _`astanin/python-tabulate`: https://github.com/astanin/python-tabulate
.. _`janlelis/unicode-display_width`: https://github.com/janlelis/unicode-display_width
.. _`LuminosoInsight/python-ftfy`: https://github.com/LuminosoInsight/python-ftfy
.. _`alecrabbit/php-wcwidth`: https://github.com/alecrabbit/php-wcwidth
.. _`Text::CharWidth`: https://metacpan.org/pod/Text::CharWidth
.. _`bluebear94/Terminal-WCWidth`: https://github.com/bluebear94/Terminal-WCWidth
.. _`mattn/go-runewidth`: https://github.com/mattn/go-runewidth
.. _`emugel/wcwidth`: https://github.com/emugel/wcwidth
.. _`jquast/ucs-detect`: https://github.com/jquast/ucs-detect
.. _`Avram Lubkin`: https://github.com/avylove
.. _`nbedos/termtosvg`: https://github.com/nbedos/termtosvg
.. _`peterbrittain/asciimatics`: https://github.com/peterbrittain/asciimatics
.. _`aperezdc/lua-wcwidth`: https://github.com/aperezdc/lua-wcwidth
.. _`fumiyas/wcwidth-cjk`: https://github.com/fumiyas/wcwidth-cjk
.. |pypi_downloads| image:: https://img.shields.io/pypi/dm/wcwidth.svg?logo=pypi
:alt: Downloads
:target: https://pypi.org/project/wcwidth/
.. |codecov| image:: https://codecov.io/gh/jquast/wcwidth/branch/master/graph/badge.svg
:alt: codecov.io Code Coverage
:target: https://codecov.io/gh/jquast/wcwidth/
.. |license| image:: https://img.shields.io/github/license/jquast/wcwidth.svg
:target: https://pypi.python.org/pypi/wcwidth/
:alt: MIT License
Keywords: cjk,combining,console,eastasian,emojiemulator,terminal,unicode,wcswidth,wcwidth,xterm
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Localization
Classifier: Topic :: Software Development :: Internationalization
Classifier: Topic :: Terminals

View File

@ -0,0 +1,19 @@
LICENSE
MANIFEST.in
README.rst
setup.cfg
setup.py
tests/__init__.py
tests/test_core.py
tests/test_ucslevel.py
wcwidth/__init__.py
wcwidth/table_wide.py
wcwidth/table_zero.py
wcwidth/unicode_versions.py
wcwidth/wcwidth.py
wcwidth.egg-info/PKG-INFO
wcwidth.egg-info/SOURCES.txt
wcwidth.egg-info/dependency_links.txt
wcwidth.egg-info/requires.txt
wcwidth.egg-info/top_level.txt
wcwidth.egg-info/zip-safe

View File

@ -0,0 +1 @@

View File

@ -0,0 +1,3 @@
[:python_version < "3.2"]
backports.functools-lru-cache>=1.2.1

View File

@ -0,0 +1 @@
wcwidth

View File

@ -0,0 +1 @@

37
wcwidth/__init__.py Normal file
View File

@ -0,0 +1,37 @@
"""
wcwidth module.
https://github.com/jquast/wcwidth
"""
# re-export all functions & definitions, even private ones, from top-level
# module path, to allow for 'from wcwidth import _private_func'. Of course,
# user beware that any _private function may disappear or change signature at
# any future version.
# local
from .wcwidth import ZERO_WIDTH # noqa
from .wcwidth import (WIDE_EASTASIAN,
wcwidth,
wcswidth,
_bisearch,
list_versions,
_wcmatch_version,
_wcversion_value)
# The __all__ attribute defines the items exported from statement,
# 'from wcwidth import *', but also to say, "This is the public API".
__all__ = ('wcwidth', 'wcswidth', 'list_versions')
# I used to use a _get_package_version() function to use the `pkg_resources'
# module to parse the package version from our version.json file, but this blew
# some folks up, or more particularly, just the `xonsh' shell.
#
# Yikes! I always wanted to like xonsh and tried it many times but issues like
# these always bit me, too, so I can sympathize -- this version is now manually
# kept in sync with version.json to help them out. Shucks, this variable is just
# for legacy, from the days before 'pip freeze' was a thing.
#
# We also used pkg_resources to load unicode version tables from version.json,
# generated by bin/update-tables.py, but some environments are unable to
# import pkg_resources for one reason or another, yikes!
__version__ = '0.2.5'

1102
wcwidth/table_wide.py Normal file

File diff suppressed because it is too large Load Diff

3910
wcwidth/table_zero.py Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,35 @@
"""
Exports function list_versions() for unicode version level support.
This code generated by bin/update-tables.py on 2020-06-23 16:03:21.350604.
"""
def list_versions():
"""
Return Unicode version levels supported by this module release.
Any of the version strings returned may be used as keyword argument
``unicode_version`` to the ``wcwidth()`` family of functions.
:returns: Supported Unicode version numbers in ascending sorted order.
:rtype: list[str]
"""
return (
"4.1.0",
"5.0.0",
"5.1.0",
"5.2.0",
"6.0.0",
"6.1.0",
"6.2.0",
"6.3.0",
"7.0.0",
"8.0.0",
"9.0.0",
"10.0.0",
"11.0.0",
"12.0.0",
"12.1.0",
"13.0.0",
)

375
wcwidth/wcwidth.py Normal file
View File

@ -0,0 +1,375 @@
"""
This is a python implementation of wcwidth() and wcswidth().
https://github.com/jquast/wcwidth
from Markus Kuhn's C code, retrieved from:
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
This is an implementation of wcwidth() and wcswidth() (defined in
IEEE Std 1002.1-2001) for Unicode.
http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html
http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
In fixed-width output devices, Latin characters all occupy a single
"cell" position of equal width, whereas ideographic CJK characters
occupy two such cells. Interoperability between terminal-line
applications and (teletype-style) character terminals using the
UTF-8 encoding requires agreement on which character should advance
the cursor by how many cell positions. No established formal
standards exist at present on which Unicode character shall occupy
how many cell positions on character terminals. These routines are
a first attempt of defining such behavior based on simple rules
applied to data provided by the Unicode Consortium.
For some graphical characters, the Unicode standard explicitly
defines a character-cell width via the definition of the East Asian
FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
In all these cases, there is no ambiguity about which width a
terminal shall use. For characters in the East Asian Ambiguous (A)
class, the width choice depends purely on a preference of backward
compatibility with either historic CJK or Western practice.
Choosing single-width for these characters is easy to justify as
the appropriate long-term solution, as the CJK practice of
displaying these characters as double-width comes from historic
implementation simplicity (8-bit encoded characters were displayed
single-width and 16-bit ones double-width, even for Greek,
Cyrillic, etc.) and not any typographic considerations.
Much less clear is the choice of width for the Not East Asian
(Neutral) class. Existing practice does not dictate a width for any
of these characters. It would nevertheless make sense
typographically to allocate two character cells to characters such
as for instance EM SPACE or VOLUME INTEGRAL, which cannot be
represented adequately with a single-width glyph. The following
routines at present merely assign a single-cell width to all
neutral characters, in the interest of simplicity. This is not
entirely satisfactory and should be reconsidered before
establishing a formal standard in this area. At the moment, the
decision which Not East Asian (Neutral) characters should be
represented by double-width glyphs cannot yet be answered by
applying a simple rule from the Unicode database content. Setting
up a proper standard for the behavior of UTF-8 character terminals
will require a careful analysis not only of each Unicode character,
but also of each presentation form, something the author of these
routines has avoided to do so far.
http://www.unicode.org/unicode/reports/tr11/
Latest version: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
"""
from __future__ import division
# std imports
import os
import sys
import warnings
# local
from .table_wide import WIDE_EASTASIAN
from .table_zero import ZERO_WIDTH
from .unicode_versions import list_versions
try:
from functools import lru_cache
except ImportError:
# lru_cache was added in Python 3.2
from backports.functools_lru_cache import lru_cache
# global cache
_UNICODE_CMPTABLE = None
_PY3 = (sys.version_info[0] >= 3)
# NOTE: created by hand, there isn't anything identifiable other than
# general Cf category code to identify these, and some characters in Cf
# category code are of non-zero width.
# Also includes some Cc, Mn, Zl, and Zp characters
ZERO_WIDTH_CF = set([
0, # Null (Cc)
0x034F, # Combining grapheme joiner (Mn)
0x200B, # Zero width space
0x200C, # Zero width non-joiner
0x200D, # Zero width joiner
0x200E, # Left-to-right mark
0x200F, # Right-to-left mark
0x2028, # Line separator (Zl)
0x2029, # Paragraph separator (Zp)
0x202A, # Left-to-right embedding
0x202B, # Right-to-left embedding
0x202C, # Pop directional formatting
0x202D, # Left-to-right override
0x202E, # Right-to-left override
0x2060, # Word joiner
0x2061, # Function application
0x2062, # Invisible times
0x2063, # Invisible separator
])
def _bisearch(ucs, table):
"""
Auxiliary function for binary search in interval table.
:arg int ucs: Ordinal value of unicode character.
:arg list table: List of starting and ending ranges of ordinal values,
in form of ``[(start, end), ...]``.
:rtype: int
:returns: 1 if ordinal value ucs is found within lookup table, else 0.
"""
lbound = 0
ubound = len(table) - 1
if ucs < table[0][0] or ucs > table[ubound][1]:
return 0
while ubound >= lbound:
mid = (lbound + ubound) // 2
if ucs > table[mid][1]:
lbound = mid + 1
elif ucs < table[mid][0]:
ubound = mid - 1
else:
return 1
return 0
@lru_cache(maxsize=1000)
def wcwidth(wc, unicode_version='auto'):
r"""
Given one Unicode character, return its printable length on a terminal.
:param str wc: A single Unicode character.
:param str unicode_version: A Unicode version number, such as
``'6.0.0'``, the list of available version levels may be
listed by pairing function :func:`list_versions`.
Any version string may be specified without error -- the nearest
matching version is selected. When ``latest`` (default), the
highest Unicode version level is used.
:return: The width, in cells, necessary to display the character of
Unicode string character, ``wc``. Returns 0 if the ``wc`` argument has
no printable effect on a terminal (such as NUL '\0'), -1 if ``wc`` is
not printable, or has an indeterminate effect on the terminal, such as
a control character. Otherwise, the number of column positions the
character occupies on a graphic terminal (1 or 2) is returned.
:rtype: int
The following have a column width of -1:
- C0 control characters (U+001 through U+01F).
- C1 control characters and DEL (U+07F through U+0A0).
The following have a column width of 0:
- Non-spacing and enclosing combining characters (general
category code Mn or Me in the Unicode database).
- NULL (``U+0000``).
- COMBINING GRAPHEME JOINER (``U+034F``).
- ZERO WIDTH SPACE (``U+200B``) *through*
RIGHT-TO-LEFT MARK (``U+200F``).
- LINE SEPARATOR (``U+2028``) *and*
PARAGRAPH SEPARATOR (``U+2029``).
- LEFT-TO-RIGHT EMBEDDING (``U+202A``) *through*
RIGHT-TO-LEFT OVERRIDE (``U+202E``).
- WORD JOINER (``U+2060``) *through*
INVISIBLE SEPARATOR (``U+2063``).
The following have a column width of 1:
- SOFT HYPHEN (``U+00AD``).
- All remaining characters, including all printable ISO 8859-1
and WGL4 characters, Unicode control characters, etc.
The following have a column width of 2:
- Spacing characters in the East Asian Wide (W) or East Asian
Full-width (F) category as defined in Unicode Technical
Report #11 have a column width of 2.
- Some kinds of Emoji or symbols.
"""
# NOTE: created by hand, there isn't anything identifiable other than
# general Cf category code to identify these, and some characters in Cf
# category code are of non-zero width.
ucs = ord(wc)
if ucs in ZERO_WIDTH_CF:
return 0
# C0/C1 control characters
if ucs < 32 or 0x07F <= ucs < 0x0A0:
return -1
_unicode_version = _wcmatch_version(unicode_version)
# combining characters with zero width
if _bisearch(ucs, ZERO_WIDTH[_unicode_version]):
return 0
return 1 + _bisearch(ucs, WIDE_EASTASIAN[_unicode_version])
def wcswidth(pwcs, n=None, unicode_version='auto'):
"""
Given a unicode string, return its printable length on a terminal.
:param str pwcs: Measure width of given unicode string.
:param int n: When ``n`` is None (default), return the length of the
entire string, otherwise width the first ``n`` characters specified.
:param str unicode_version: An explicit definition of the unicode version
level to use for determination, may be ``auto`` (default), which uses
the Environment Variable, ``UNICODE_VERSION`` if defined, or the latest
available unicode version, otherwise.
:rtype: int
:returns: The width, in cells, necessary to display the first ``n``
characters of the unicode string ``pwcs``. Returns ``-1`` if
a non-printable character is encountered.
"""
# pylint: disable=C0103
# Invalid argument name "n"
end = len(pwcs) if n is None else n
idx = slice(0, end)
width = 0
for char in pwcs[idx]:
wcw = wcwidth(char, unicode_version)
if wcw < 0:
return -1
width += wcw
return width
@lru_cache(maxsize=128)
def _wcversion_value(ver_string):
"""
Integer-mapped value of given dotted version string.
:param str ver_string: Unicode version string, of form ``n.n.n``.
:rtype: tuple(int)
:returns: tuple of digit tuples, ``tuple(int, [...])``.
"""
retval = tuple(map(int, (ver_string.split('.'))))
return retval
@lru_cache(maxsize=8)
def _wcmatch_version(given_version):
"""
Return nearest matching supported Unicode version level.
If an exact match is not determined, the nearest lowest version level is
returned after a warning is emitted. For example, given supported levels
``4.1.0`` and ``5.0.0``, and a version string of ``4.9.9``, then ``4.1.0``
is selected and returned:
>>> _wcmatch_version('4.9.9')
'4.1.0'
>>> _wcmatch_version('8.0')
'8.0.0'
>>> _wcmatch_version('1')
'4.1.0'
:param str given_version: given version for compare, may be ``auto``
(default), to select Unicode Version from Environment Variable,
``UNICODE_VERSION``. If the environment variable is not set, then the
latest is used.
:rtype: str
:returns: unicode string, or non-unicode ``str`` type for python 2
when given ``version`` is also type ``str``.
"""
# Design note: the choice to return the same type that is given certainly
# complicates it for python 2 str-type, but allows us to define an api that
# to use 'string-type', for unicode version level definitions, so all of our
# example code works with all versions of python. That, along with the
# string-to-numeric and comparisons of earliest, latest, matching, or
# nearest, greatly complicates this function.
_return_str = not _PY3 and isinstance(given_version, str)
if _return_str:
unicode_versions = [ucs.encode() for ucs in list_versions()]
else:
unicode_versions = list_versions()
latest_version = unicode_versions[-1]
if given_version in (u'auto', 'auto'):
given_version = os.environ.get(
'UNICODE_VERSION',
'latest' if not _return_str else latest_version.encode())
if given_version in (u'latest', 'latest'):
# default match, when given as 'latest', use the most latest unicode
# version specification level supported.
return latest_version if not _return_str else latest_version.encode()
if given_version in unicode_versions:
# exact match, downstream has specified an explicit matching version
# matching any value of list_versions().
return given_version if not _return_str else given_version.encode()
# The user's version is not supported by ours. We return the newest unicode
# version level that we support below their given value.
try:
cmp_given = _wcversion_value(given_version)
except ValueError:
# submitted value raises ValueError in int(), warn and use latest.
warnings.warn("UNICODE_VERSION value, {given_version!r}, is invalid. "
"Value should be in form of `integer[.]+', the latest "
"supported unicode version {latest_version!r} has been "
"inferred.".format(given_version=given_version,
latest_version=latest_version))
return latest_version if not _return_str else latest_version.encode()
# given version is less than any available version, return earliest
# version.
earliest_version = unicode_versions[0]
cmp_earliest_version = _wcversion_value(earliest_version)
if cmp_given <= cmp_earliest_version:
# this probably isn't what you wanted, the oldest wcwidth.c you will
# find in the wild is likely version 5 or 6, which we both support,
# but it's better than not saying anything at all.
warnings.warn("UNICODE_VERSION value, {given_version!r}, is lower "
"than any available unicode version. Returning lowest "
"version level, {earliest_version!r}".format(
given_version=given_version,
earliest_version=earliest_version))
return earliest_version if not _return_str else earliest_version.encode()
# create list of versions which are less than our equal to given version,
# and return the tail value, which is the highest level we may support,
# or the latest value we support, when completely unmatched or higher
# than any supported version.
#
# function will never complete, always returns.
for idx, unicode_version in enumerate(unicode_versions):
# look ahead to next value
try:
cmp_next_version = _wcversion_value(unicode_versions[idx + 1])
except IndexError:
# at end of list, return latest version
return latest_version if not _return_str else latest_version.encode()
# Maybe our given version has less parts, as in tuple(8, 0), than the
# next compare version tuple(8, 0, 0). Test for an exact match by
# comparison of only the leading dotted piece(s): (8, 0) == (8, 0).
if cmp_given == cmp_next_version[:len(cmp_given)]:
return unicode_versions[idx + 1]
# Or, if any next value is greater than our given support level
# version, return the current value in index. Even though it must
# be less than the given value, its our closest possible match. That
# is, 4.1 is returned for given 4.9.9, where 4.1 and 5.0 are available.
if cmp_next_version > cmp_given:
return unicode_version
assert False, ("Code path unreachable", given_version, unicode_versions)