mirror of https://gitee.com/openkylin/wcwidth.git
Import Upstream version 0.2.5
This commit is contained in:
commit
3cb230e352
|
@ -0,0 +1,27 @@
|
|||
The MIT License (MIT)
|
||||
|
||||
Copyright (c) 2014 Jeff Quast <contact@jeffquast.com>
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
|
||||
Markus Kuhn -- 2007-05-26 (Unicode 5.0)
|
||||
|
||||
Permission to use, copy, modify, and distribute this software
|
||||
for any purpose and without fee is hereby granted. The author
|
||||
disclaims all warranties with regard to this software.
|
|
@ -0,0 +1,2 @@
|
|||
include LICENSE *.rst
|
||||
recursive-include tests *.py
|
|
@ -0,0 +1,306 @@
|
|||
Metadata-Version: 1.1
|
||||
Name: wcwidth
|
||||
Version: 0.2.5
|
||||
Summary: Measures the displayed width of unicode strings in a terminal
|
||||
Home-page: https://github.com/jquast/wcwidth
|
||||
Author: Jeff Quast
|
||||
Author-email: contact@jeffquast.com
|
||||
License: MIT
|
||||
Description: |pypi_downloads| |codecov| |license|
|
||||
|
||||
============
|
||||
Introduction
|
||||
============
|
||||
|
||||
This library is mainly for CLI programs that carefully produce output for
|
||||
Terminals, or make pretend to be an emulator.
|
||||
|
||||
**Problem Statement**: The printable length of *most* strings are equal to the
|
||||
number of cells they occupy on the screen ``1 charater : 1 cell``. However,
|
||||
there are categories of characters that *occupy 2 cells* (full-wide), and
|
||||
others that *occupy 0* cells (zero-width).
|
||||
|
||||
**Solution**: POSIX.1-2001 and POSIX.1-2008 conforming systems provide
|
||||
`wcwidth(3)`_ and `wcswidth(3)`_ C functions of which this python module's
|
||||
functions precisely copy. *These functions return the number of cells a
|
||||
unicode string is expected to occupy.*
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
The stable version of this package is maintained on pypi, install using pip::
|
||||
|
||||
pip install wcwidth
|
||||
|
||||
Example
|
||||
-------
|
||||
|
||||
**Problem**: given the following phrase (Japanese),
|
||||
|
||||
>>> text = u'コンニチハ'
|
||||
|
||||
Python **incorrectly** uses the *string length* of 5 codepoints rather than the
|
||||
*printible length* of 10 cells, so that when using the `rjust` function, the
|
||||
output length is wrong::
|
||||
|
||||
>>> print(len('コンニチハ'))
|
||||
5
|
||||
|
||||
>>> print('コンニチハ'.rjust(20, '_'))
|
||||
_____コンニチハ
|
||||
|
||||
By defining our own "rjust" function that uses wcwidth, we can correct this::
|
||||
|
||||
>>> def wc_rjust(text, length, padding=' '):
|
||||
... from wcwidth import wcswidth
|
||||
... return padding * max(0, (length - wcswidth(text))) + text
|
||||
...
|
||||
|
||||
Our **Solution** uses wcswidth to determine the string length correctly::
|
||||
|
||||
>>> from wcwidth import wcswidth
|
||||
>>> print(wcswidth('コンニチハ'))
|
||||
10
|
||||
|
||||
>>> print(wc_rjust('コンニチハ', 20, '_'))
|
||||
__________コンニチハ
|
||||
|
||||
|
||||
Choosing a Version
|
||||
------------------
|
||||
|
||||
Export an environment variable, ``UNICODE_VERSION``. This should be done by
|
||||
*terminal emulators* or those developers experimenting with authoring one of
|
||||
their own, from shell::
|
||||
|
||||
$ export UNICODE_VERSION=13.0
|
||||
|
||||
If unspecified, the latest version is used. If your Terminal Emulator does not
|
||||
export this variable, you can use the `jquast/ucs-detect`_ utility to
|
||||
automatically detect and export it to your shell.
|
||||
|
||||
wcwidth, wcswidth
|
||||
-----------------
|
||||
Use function ``wcwidth()`` to determine the length of a *single unicode
|
||||
character*, and ``wcswidth()`` to determine the length of many, a *string
|
||||
of unicode characters*.
|
||||
|
||||
Briefly, return values of function ``wcwidth()`` are:
|
||||
|
||||
``-1``
|
||||
Indeterminate (not printable).
|
||||
|
||||
``0``
|
||||
Does not advance the cursor, such as NULL or Combining.
|
||||
|
||||
``2``
|
||||
Characters of category East Asian Wide (W) or East Asian
|
||||
Full-width (F) which are displayed using two terminal cells.
|
||||
|
||||
``1``
|
||||
All others.
|
||||
|
||||
Function ``wcswidth()`` simply returns the sum of all values for each character
|
||||
along a string, or ``-1`` when it occurs anywhere along a string.
|
||||
|
||||
Full API Documentation at http://wcwidth.readthedocs.org
|
||||
|
||||
==========
|
||||
Developing
|
||||
==========
|
||||
|
||||
Install wcwidth in editable mode::
|
||||
|
||||
pip install -e.
|
||||
|
||||
Execute unit tests using tox_::
|
||||
|
||||
tox
|
||||
|
||||
Regenerate python code tables from latest Unicode Specification data files::
|
||||
|
||||
tox -eupdate
|
||||
|
||||
Supplementary tools for browsing and testing terminals for wide unicode
|
||||
characters are found in the `bin/`_ of this project's source code. Just ensure
|
||||
to first ``pip install -erequirements-develop.txt`` from this projects main
|
||||
folder. For example, an interactive browser for testing::
|
||||
|
||||
./bin/wcwidth-browser.py
|
||||
|
||||
Uses
|
||||
----
|
||||
|
||||
This library is used in:
|
||||
|
||||
- `jquast/blessed`_: a thin, practical wrapper around terminal capabilities in
|
||||
Python.
|
||||
|
||||
- `jonathanslenders/python-prompt-toolkit`_: a Library for building powerful
|
||||
interactive command lines in Python.
|
||||
|
||||
- `dbcli/pgcli`_: Postgres CLI with autocompletion and syntax highlighting.
|
||||
|
||||
- `thomasballinger/curtsies`_: a Curses-like terminal wrapper with a display
|
||||
based on compositing 2d arrays of text.
|
||||
|
||||
- `selectel/pyte`_: Simple VTXXX-compatible linux terminal emulator.
|
||||
|
||||
- `astanin/python-tabulate`_: Pretty-print tabular data in Python, a library
|
||||
and a command-line utility.
|
||||
|
||||
- `LuminosoInsight/python-ftfy`_: Fixes mojibake and other glitches in Unicode
|
||||
text.
|
||||
|
||||
- `nbedos/termtosvg`_: Terminal recorder that renders sessions as SVG
|
||||
animations.
|
||||
|
||||
- `peterbrittain/asciimatics`_: Package to help people create full-screen text
|
||||
UIs.
|
||||
|
||||
Other Languages
|
||||
---------------
|
||||
|
||||
- `timoxley/wcwidth`_: JavaScript
|
||||
- `janlelis/unicode-display_width`_: Ruby
|
||||
- `alecrabbit/php-wcwidth`_: PHP
|
||||
- `Text::CharWidth`_: Perl
|
||||
- `bluebear94/Terminal-WCWidth`: Perl 6
|
||||
- `mattn/go-runewidth`_: Go
|
||||
- `emugel/wcwidth`_: Haxe
|
||||
- `aperezdc/lua-wcwidth`: Lua
|
||||
- `joachimschmidt557/zig-wcwidth`: Zig
|
||||
- `fumiyas/wcwidth-cjk`: `LD_PRELOAD` override
|
||||
- `joshuarubin/wcwidth9`: Unicode version 9 in C
|
||||
|
||||
History
|
||||
-------
|
||||
|
||||
0.2.0 *2020-06-01*
|
||||
* **Enhancement**: Unicode version may be selected by exporting the
|
||||
Environment variable ``UNICODE_VERSION``, such as ``13.0``, or ``6.3.0``.
|
||||
See the `jquast/ucs-detect`_ CLI utility for automatic detection.
|
||||
* **Enhancement**:
|
||||
API Documentation is published to readthedocs.org.
|
||||
* **Updated** tables for *all* Unicode Specifications with files
|
||||
published in a programmatically consumable format, versions 4.1.0
|
||||
through 13.0
|
||||
that are published
|
||||
, versions
|
||||
|
||||
0.1.9 *2020-03-22*
|
||||
* **Performance** optimization by `Avram Lubkin`_, `PR #35`_.
|
||||
* **Updated** tables to Unicode Specification 13.0.0.
|
||||
|
||||
0.1.8 *2020-01-01*
|
||||
* **Updated** tables to Unicode Specification 12.0.0. (`PR #30`_).
|
||||
|
||||
0.1.7 *2016-07-01*
|
||||
* **Updated** tables to Unicode Specification 9.0.0. (`PR #18`_).
|
||||
|
||||
0.1.6 *2016-01-08 Production/Stable*
|
||||
* ``LICENSE`` file now included with distribution.
|
||||
|
||||
0.1.5 *2015-09-13 Alpha*
|
||||
* **Bugfix**:
|
||||
Resolution of "combining_ character width" issue, most especially
|
||||
those that previously returned -1 now often (correctly) return 0.
|
||||
resolved by `Philip Craig`_ via `PR #11`_.
|
||||
* **Deprecated**:
|
||||
The module path ``wcwidth.table_comb`` is no longer available,
|
||||
it has been superseded by module path ``wcwidth.table_zero``.
|
||||
|
||||
0.1.4 *2014-11-20 Pre-Alpha*
|
||||
* **Feature**: ``wcswidth()`` now determines printable length
|
||||
for (most) combining_ characters. The developer's tool
|
||||
`bin/wcwidth-browser.py`_ is improved to display combining_
|
||||
characters when provided the ``--combining`` option
|
||||
(`Thomas Ballinger`_ and `Leta Montopoli`_ `PR #5`_).
|
||||
* **Feature**: added static analysis (prospector_) to testing
|
||||
framework.
|
||||
|
||||
0.1.3 *2014-10-29 Pre-Alpha*
|
||||
* **Bugfix**: 2nd parameter of wcswidth was not honored.
|
||||
(`Thomas Ballinger`_, `PR #4`_).
|
||||
|
||||
0.1.2 *2014-10-28 Pre-Alpha*
|
||||
* **Updated** tables to Unicode Specification 7.0.0.
|
||||
(`Thomas Ballinger`_, `PR #3`_).
|
||||
|
||||
0.1.1 *2014-05-14 Pre-Alpha*
|
||||
* Initial release to pypi, Based on Unicode Specification 6.3.0
|
||||
|
||||
This code was originally derived directly from C code of the same name,
|
||||
whose latest version is available at
|
||||
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
|
||||
|
||||
* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
|
||||
*
|
||||
* Permission to use, copy, modify, and distribute this software
|
||||
* for any purpose and without fee is hereby granted. The author
|
||||
* disclaims all warranties with regard to this software.
|
||||
|
||||
.. _`tox`: https://testrun.org/tox/latest/install.html
|
||||
.. _`prospector`: https://github.com/landscapeio/prospector
|
||||
.. _`combining`: https://en.wikipedia.org/wiki/Combining_character
|
||||
.. _`bin/`: https://github.com/jquast/wcwidth/tree/master/bin
|
||||
.. _`bin/wcwidth-browser.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py
|
||||
.. _`Thomas Ballinger`: https://github.com/thomasballinger
|
||||
.. _`Leta Montopoli`: https://github.com/lmontopo
|
||||
.. _`Philip Craig`: https://github.com/philipc
|
||||
.. _`PR #3`: https://github.com/jquast/wcwidth/pull/3
|
||||
.. _`PR #4`: https://github.com/jquast/wcwidth/pull/4
|
||||
.. _`PR #5`: https://github.com/jquast/wcwidth/pull/5
|
||||
.. _`PR #11`: https://github.com/jquast/wcwidth/pull/11
|
||||
.. _`PR #18`: https://github.com/jquast/wcwidth/pull/18
|
||||
.. _`PR #30`: https://github.com/jquast/wcwidth/pull/30
|
||||
.. _`PR #35`: https://github.com/jquast/wcwidth/pull/35
|
||||
.. _`jquast/blessed`: https://github.com/jquast/blessed
|
||||
.. _`selectel/pyte`: https://github.com/selectel/pyte
|
||||
.. _`thomasballinger/curtsies`: https://github.com/thomasballinger/curtsies
|
||||
.. _`dbcli/pgcli`: https://github.com/dbcli/pgcli
|
||||
.. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit
|
||||
.. _`timoxley/wcwidth`: https://github.com/timoxley/wcwidth
|
||||
.. _`wcwidth(3)`: http://man7.org/linux/man-pages/man3/wcwidth.3.html
|
||||
.. _`wcswidth(3)`: http://man7.org/linux/man-pages/man3/wcswidth.3.html
|
||||
.. _`astanin/python-tabulate`: https://github.com/astanin/python-tabulate
|
||||
.. _`janlelis/unicode-display_width`: https://github.com/janlelis/unicode-display_width
|
||||
.. _`LuminosoInsight/python-ftfy`: https://github.com/LuminosoInsight/python-ftfy
|
||||
.. _`alecrabbit/php-wcwidth`: https://github.com/alecrabbit/php-wcwidth
|
||||
.. _`Text::CharWidth`: https://metacpan.org/pod/Text::CharWidth
|
||||
.. _`bluebear94/Terminal-WCWidth`: https://github.com/bluebear94/Terminal-WCWidth
|
||||
.. _`mattn/go-runewidth`: https://github.com/mattn/go-runewidth
|
||||
.. _`emugel/wcwidth`: https://github.com/emugel/wcwidth
|
||||
.. _`jquast/ucs-detect`: https://github.com/jquast/ucs-detect
|
||||
.. _`Avram Lubkin`: https://github.com/avylove
|
||||
.. _`nbedos/termtosvg`: https://github.com/nbedos/termtosvg
|
||||
.. _`peterbrittain/asciimatics`: https://github.com/peterbrittain/asciimatics
|
||||
.. _`aperezdc/lua-wcwidth`: https://github.com/aperezdc/lua-wcwidth
|
||||
.. _`fumiyas/wcwidth-cjk`: https://github.com/fumiyas/wcwidth-cjk
|
||||
.. |pypi_downloads| image:: https://img.shields.io/pypi/dm/wcwidth.svg?logo=pypi
|
||||
:alt: Downloads
|
||||
:target: https://pypi.org/project/wcwidth/
|
||||
.. |codecov| image:: https://codecov.io/gh/jquast/wcwidth/branch/master/graph/badge.svg
|
||||
:alt: codecov.io Code Coverage
|
||||
:target: https://codecov.io/gh/jquast/wcwidth/
|
||||
.. |license| image:: https://img.shields.io/github/license/jquast/wcwidth.svg
|
||||
:target: https://pypi.python.org/pypi/wcwidth/
|
||||
:alt: MIT License
|
||||
|
||||
Keywords: cjk,combining,console,eastasian,emojiemulator,terminal,unicode,wcswidth,wcwidth,xterm
|
||||
Platform: UNKNOWN
|
||||
Classifier: Intended Audience :: Developers
|
||||
Classifier: Natural Language :: English
|
||||
Classifier: Development Status :: 5 - Production/Stable
|
||||
Classifier: Environment :: Console
|
||||
Classifier: License :: OSI Approved :: MIT License
|
||||
Classifier: Operating System :: POSIX
|
||||
Classifier: Programming Language :: Python :: 2.7
|
||||
Classifier: Programming Language :: Python :: 3.5
|
||||
Classifier: Programming Language :: Python :: 3.6
|
||||
Classifier: Programming Language :: Python :: 3.7
|
||||
Classifier: Programming Language :: Python :: 3.8
|
||||
Classifier: Topic :: Software Development :: Libraries
|
||||
Classifier: Topic :: Software Development :: Localization
|
||||
Classifier: Topic :: Software Development :: Internationalization
|
||||
Classifier: Topic :: Terminals
|
|
@ -0,0 +1,280 @@
|
|||
|pypi_downloads| |codecov| |license|
|
||||
|
||||
============
|
||||
Introduction
|
||||
============
|
||||
|
||||
This library is mainly for CLI programs that carefully produce output for
|
||||
Terminals, or make pretend to be an emulator.
|
||||
|
||||
**Problem Statement**: The printable length of *most* strings are equal to the
|
||||
number of cells they occupy on the screen ``1 charater : 1 cell``. However,
|
||||
there are categories of characters that *occupy 2 cells* (full-wide), and
|
||||
others that *occupy 0* cells (zero-width).
|
||||
|
||||
**Solution**: POSIX.1-2001 and POSIX.1-2008 conforming systems provide
|
||||
`wcwidth(3)`_ and `wcswidth(3)`_ C functions of which this python module's
|
||||
functions precisely copy. *These functions return the number of cells a
|
||||
unicode string is expected to occupy.*
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
The stable version of this package is maintained on pypi, install using pip::
|
||||
|
||||
pip install wcwidth
|
||||
|
||||
Example
|
||||
-------
|
||||
|
||||
**Problem**: given the following phrase (Japanese),
|
||||
|
||||
>>> text = u'コンニチハ'
|
||||
|
||||
Python **incorrectly** uses the *string length* of 5 codepoints rather than the
|
||||
*printible length* of 10 cells, so that when using the `rjust` function, the
|
||||
output length is wrong::
|
||||
|
||||
>>> print(len('コンニチハ'))
|
||||
5
|
||||
|
||||
>>> print('コンニチハ'.rjust(20, '_'))
|
||||
_____コンニチハ
|
||||
|
||||
By defining our own "rjust" function that uses wcwidth, we can correct this::
|
||||
|
||||
>>> def wc_rjust(text, length, padding=' '):
|
||||
... from wcwidth import wcswidth
|
||||
... return padding * max(0, (length - wcswidth(text))) + text
|
||||
...
|
||||
|
||||
Our **Solution** uses wcswidth to determine the string length correctly::
|
||||
|
||||
>>> from wcwidth import wcswidth
|
||||
>>> print(wcswidth('コンニチハ'))
|
||||
10
|
||||
|
||||
>>> print(wc_rjust('コンニチハ', 20, '_'))
|
||||
__________コンニチハ
|
||||
|
||||
|
||||
Choosing a Version
|
||||
------------------
|
||||
|
||||
Export an environment variable, ``UNICODE_VERSION``. This should be done by
|
||||
*terminal emulators* or those developers experimenting with authoring one of
|
||||
their own, from shell::
|
||||
|
||||
$ export UNICODE_VERSION=13.0
|
||||
|
||||
If unspecified, the latest version is used. If your Terminal Emulator does not
|
||||
export this variable, you can use the `jquast/ucs-detect`_ utility to
|
||||
automatically detect and export it to your shell.
|
||||
|
||||
wcwidth, wcswidth
|
||||
-----------------
|
||||
Use function ``wcwidth()`` to determine the length of a *single unicode
|
||||
character*, and ``wcswidth()`` to determine the length of many, a *string
|
||||
of unicode characters*.
|
||||
|
||||
Briefly, return values of function ``wcwidth()`` are:
|
||||
|
||||
``-1``
|
||||
Indeterminate (not printable).
|
||||
|
||||
``0``
|
||||
Does not advance the cursor, such as NULL or Combining.
|
||||
|
||||
``2``
|
||||
Characters of category East Asian Wide (W) or East Asian
|
||||
Full-width (F) which are displayed using two terminal cells.
|
||||
|
||||
``1``
|
||||
All others.
|
||||
|
||||
Function ``wcswidth()`` simply returns the sum of all values for each character
|
||||
along a string, or ``-1`` when it occurs anywhere along a string.
|
||||
|
||||
Full API Documentation at http://wcwidth.readthedocs.org
|
||||
|
||||
==========
|
||||
Developing
|
||||
==========
|
||||
|
||||
Install wcwidth in editable mode::
|
||||
|
||||
pip install -e.
|
||||
|
||||
Execute unit tests using tox_::
|
||||
|
||||
tox
|
||||
|
||||
Regenerate python code tables from latest Unicode Specification data files::
|
||||
|
||||
tox -eupdate
|
||||
|
||||
Supplementary tools for browsing and testing terminals for wide unicode
|
||||
characters are found in the `bin/`_ of this project's source code. Just ensure
|
||||
to first ``pip install -erequirements-develop.txt`` from this projects main
|
||||
folder. For example, an interactive browser for testing::
|
||||
|
||||
./bin/wcwidth-browser.py
|
||||
|
||||
Uses
|
||||
----
|
||||
|
||||
This library is used in:
|
||||
|
||||
- `jquast/blessed`_: a thin, practical wrapper around terminal capabilities in
|
||||
Python.
|
||||
|
||||
- `jonathanslenders/python-prompt-toolkit`_: a Library for building powerful
|
||||
interactive command lines in Python.
|
||||
|
||||
- `dbcli/pgcli`_: Postgres CLI with autocompletion and syntax highlighting.
|
||||
|
||||
- `thomasballinger/curtsies`_: a Curses-like terminal wrapper with a display
|
||||
based on compositing 2d arrays of text.
|
||||
|
||||
- `selectel/pyte`_: Simple VTXXX-compatible linux terminal emulator.
|
||||
|
||||
- `astanin/python-tabulate`_: Pretty-print tabular data in Python, a library
|
||||
and a command-line utility.
|
||||
|
||||
- `LuminosoInsight/python-ftfy`_: Fixes mojibake and other glitches in Unicode
|
||||
text.
|
||||
|
||||
- `nbedos/termtosvg`_: Terminal recorder that renders sessions as SVG
|
||||
animations.
|
||||
|
||||
- `peterbrittain/asciimatics`_: Package to help people create full-screen text
|
||||
UIs.
|
||||
|
||||
Other Languages
|
||||
---------------
|
||||
|
||||
- `timoxley/wcwidth`_: JavaScript
|
||||
- `janlelis/unicode-display_width`_: Ruby
|
||||
- `alecrabbit/php-wcwidth`_: PHP
|
||||
- `Text::CharWidth`_: Perl
|
||||
- `bluebear94/Terminal-WCWidth`: Perl 6
|
||||
- `mattn/go-runewidth`_: Go
|
||||
- `emugel/wcwidth`_: Haxe
|
||||
- `aperezdc/lua-wcwidth`: Lua
|
||||
- `joachimschmidt557/zig-wcwidth`: Zig
|
||||
- `fumiyas/wcwidth-cjk`: `LD_PRELOAD` override
|
||||
- `joshuarubin/wcwidth9`: Unicode version 9 in C
|
||||
|
||||
History
|
||||
-------
|
||||
|
||||
0.2.0 *2020-06-01*
|
||||
* **Enhancement**: Unicode version may be selected by exporting the
|
||||
Environment variable ``UNICODE_VERSION``, such as ``13.0``, or ``6.3.0``.
|
||||
See the `jquast/ucs-detect`_ CLI utility for automatic detection.
|
||||
* **Enhancement**:
|
||||
API Documentation is published to readthedocs.org.
|
||||
* **Updated** tables for *all* Unicode Specifications with files
|
||||
published in a programmatically consumable format, versions 4.1.0
|
||||
through 13.0
|
||||
that are published
|
||||
, versions
|
||||
|
||||
0.1.9 *2020-03-22*
|
||||
* **Performance** optimization by `Avram Lubkin`_, `PR #35`_.
|
||||
* **Updated** tables to Unicode Specification 13.0.0.
|
||||
|
||||
0.1.8 *2020-01-01*
|
||||
* **Updated** tables to Unicode Specification 12.0.0. (`PR #30`_).
|
||||
|
||||
0.1.7 *2016-07-01*
|
||||
* **Updated** tables to Unicode Specification 9.0.0. (`PR #18`_).
|
||||
|
||||
0.1.6 *2016-01-08 Production/Stable*
|
||||
* ``LICENSE`` file now included with distribution.
|
||||
|
||||
0.1.5 *2015-09-13 Alpha*
|
||||
* **Bugfix**:
|
||||
Resolution of "combining_ character width" issue, most especially
|
||||
those that previously returned -1 now often (correctly) return 0.
|
||||
resolved by `Philip Craig`_ via `PR #11`_.
|
||||
* **Deprecated**:
|
||||
The module path ``wcwidth.table_comb`` is no longer available,
|
||||
it has been superseded by module path ``wcwidth.table_zero``.
|
||||
|
||||
0.1.4 *2014-11-20 Pre-Alpha*
|
||||
* **Feature**: ``wcswidth()`` now determines printable length
|
||||
for (most) combining_ characters. The developer's tool
|
||||
`bin/wcwidth-browser.py`_ is improved to display combining_
|
||||
characters when provided the ``--combining`` option
|
||||
(`Thomas Ballinger`_ and `Leta Montopoli`_ `PR #5`_).
|
||||
* **Feature**: added static analysis (prospector_) to testing
|
||||
framework.
|
||||
|
||||
0.1.3 *2014-10-29 Pre-Alpha*
|
||||
* **Bugfix**: 2nd parameter of wcswidth was not honored.
|
||||
(`Thomas Ballinger`_, `PR #4`_).
|
||||
|
||||
0.1.2 *2014-10-28 Pre-Alpha*
|
||||
* **Updated** tables to Unicode Specification 7.0.0.
|
||||
(`Thomas Ballinger`_, `PR #3`_).
|
||||
|
||||
0.1.1 *2014-05-14 Pre-Alpha*
|
||||
* Initial release to pypi, Based on Unicode Specification 6.3.0
|
||||
|
||||
This code was originally derived directly from C code of the same name,
|
||||
whose latest version is available at
|
||||
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
|
||||
|
||||
* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
|
||||
*
|
||||
* Permission to use, copy, modify, and distribute this software
|
||||
* for any purpose and without fee is hereby granted. The author
|
||||
* disclaims all warranties with regard to this software.
|
||||
|
||||
.. _`tox`: https://testrun.org/tox/latest/install.html
|
||||
.. _`prospector`: https://github.com/landscapeio/prospector
|
||||
.. _`combining`: https://en.wikipedia.org/wiki/Combining_character
|
||||
.. _`bin/`: https://github.com/jquast/wcwidth/tree/master/bin
|
||||
.. _`bin/wcwidth-browser.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py
|
||||
.. _`Thomas Ballinger`: https://github.com/thomasballinger
|
||||
.. _`Leta Montopoli`: https://github.com/lmontopo
|
||||
.. _`Philip Craig`: https://github.com/philipc
|
||||
.. _`PR #3`: https://github.com/jquast/wcwidth/pull/3
|
||||
.. _`PR #4`: https://github.com/jquast/wcwidth/pull/4
|
||||
.. _`PR #5`: https://github.com/jquast/wcwidth/pull/5
|
||||
.. _`PR #11`: https://github.com/jquast/wcwidth/pull/11
|
||||
.. _`PR #18`: https://github.com/jquast/wcwidth/pull/18
|
||||
.. _`PR #30`: https://github.com/jquast/wcwidth/pull/30
|
||||
.. _`PR #35`: https://github.com/jquast/wcwidth/pull/35
|
||||
.. _`jquast/blessed`: https://github.com/jquast/blessed
|
||||
.. _`selectel/pyte`: https://github.com/selectel/pyte
|
||||
.. _`thomasballinger/curtsies`: https://github.com/thomasballinger/curtsies
|
||||
.. _`dbcli/pgcli`: https://github.com/dbcli/pgcli
|
||||
.. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit
|
||||
.. _`timoxley/wcwidth`: https://github.com/timoxley/wcwidth
|
||||
.. _`wcwidth(3)`: http://man7.org/linux/man-pages/man3/wcwidth.3.html
|
||||
.. _`wcswidth(3)`: http://man7.org/linux/man-pages/man3/wcswidth.3.html
|
||||
.. _`astanin/python-tabulate`: https://github.com/astanin/python-tabulate
|
||||
.. _`janlelis/unicode-display_width`: https://github.com/janlelis/unicode-display_width
|
||||
.. _`LuminosoInsight/python-ftfy`: https://github.com/LuminosoInsight/python-ftfy
|
||||
.. _`alecrabbit/php-wcwidth`: https://github.com/alecrabbit/php-wcwidth
|
||||
.. _`Text::CharWidth`: https://metacpan.org/pod/Text::CharWidth
|
||||
.. _`bluebear94/Terminal-WCWidth`: https://github.com/bluebear94/Terminal-WCWidth
|
||||
.. _`mattn/go-runewidth`: https://github.com/mattn/go-runewidth
|
||||
.. _`emugel/wcwidth`: https://github.com/emugel/wcwidth
|
||||
.. _`jquast/ucs-detect`: https://github.com/jquast/ucs-detect
|
||||
.. _`Avram Lubkin`: https://github.com/avylove
|
||||
.. _`nbedos/termtosvg`: https://github.com/nbedos/termtosvg
|
||||
.. _`peterbrittain/asciimatics`: https://github.com/peterbrittain/asciimatics
|
||||
.. _`aperezdc/lua-wcwidth`: https://github.com/aperezdc/lua-wcwidth
|
||||
.. _`fumiyas/wcwidth-cjk`: https://github.com/fumiyas/wcwidth-cjk
|
||||
.. |pypi_downloads| image:: https://img.shields.io/pypi/dm/wcwidth.svg?logo=pypi
|
||||
:alt: Downloads
|
||||
:target: https://pypi.org/project/wcwidth/
|
||||
.. |codecov| image:: https://codecov.io/gh/jquast/wcwidth/branch/master/graph/badge.svg
|
||||
:alt: codecov.io Code Coverage
|
||||
:target: https://codecov.io/gh/jquast/wcwidth/
|
||||
.. |license| image:: https://img.shields.io/github/license/jquast/wcwidth.svg
|
||||
:target: https://pypi.python.org/pypi/wcwidth/
|
||||
:alt: MIT License
|
|
@ -0,0 +1,47 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Display new wide unicode point values, by version.
|
||||
|
||||
For example::
|
||||
|
||||
"5.0.0": [
|
||||
12752,
|
||||
12753,
|
||||
12754,
|
||||
...
|
||||
|
||||
Means that chr(12752) through chr(12754) are new WIDE values
|
||||
for Unicode vesion 5.0.0, and were not WIDE values for the
|
||||
previous version (4.1.0).
|
||||
"""
|
||||
# std imports
|
||||
import sys
|
||||
import json
|
||||
|
||||
|
||||
# List new WIDE characters at each unicode version.
|
||||
#
|
||||
def main():
|
||||
from wcwidth import WIDE_EASTASIAN, _bisearch
|
||||
versions = list(WIDE_EASTASIAN.keys())
|
||||
results = {}
|
||||
for version in versions:
|
||||
prev_idx = versions.index(version) - 1
|
||||
if prev_idx == -1:
|
||||
continue
|
||||
previous_version = versions[prev_idx]
|
||||
previous_table = WIDE_EASTASIAN[previous_version]
|
||||
for value_pair in WIDE_EASTASIAN[version]:
|
||||
for value in range(*value_pair):
|
||||
if not _bisearch(value, previous_table):
|
||||
results[version] = results.get(version, []) + [value]
|
||||
if '--debug' in sys.argv:
|
||||
print(f'version {version} has unicode character '
|
||||
f'0x{value:05x} ({chr(value)}) but previous '
|
||||
f'version, {previous_version} does not.',
|
||||
file=sys.stderr)
|
||||
print(json.dumps(results, indent=4))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
|
@ -0,0 +1,42 @@
|
|||
"""Workaround for https://github.com/codecov/codecov-python/issues/158."""
|
||||
|
||||
# std imports
|
||||
import sys
|
||||
import time
|
||||
|
||||
# 3rd party
|
||||
import codecov
|
||||
|
||||
RETRIES = 5
|
||||
TIMEOUT = 2
|
||||
|
||||
|
||||
def main():
|
||||
"""Run codecov up to RETRIES times On the final attempt, let it exit normally."""
|
||||
|
||||
# Make a copy of argv and make sure --required is in it
|
||||
args = sys.argv[1:]
|
||||
if '--required' not in args:
|
||||
args.append('--required')
|
||||
|
||||
for num in range(1, RETRIES + 1):
|
||||
|
||||
print('Running codecov attempt %d: ' % num)
|
||||
# On the last, let codecov handle the exit
|
||||
if num == RETRIES:
|
||||
codecov.main()
|
||||
|
||||
try:
|
||||
codecov.main(*args)
|
||||
except SystemExit as err:
|
||||
# If there's no exit code, it was successful
|
||||
if err.code:
|
||||
time.sleep(TIMEOUT)
|
||||
else:
|
||||
sys.exit(err.code)
|
||||
else:
|
||||
break
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
|
@ -0,0 +1,331 @@
|
|||
#!/usr/bin/env python
|
||||
"""
|
||||
Update the python Unicode tables for wcwidth.
|
||||
|
||||
https://github.com/jquast/wcwidth
|
||||
"""
|
||||
from __future__ import print_function
|
||||
|
||||
# std imports
|
||||
import os
|
||||
import re
|
||||
import glob
|
||||
import json
|
||||
import codecs
|
||||
import string
|
||||
import urllib
|
||||
import datetime
|
||||
import collections
|
||||
import unicodedata
|
||||
|
||||
try:
|
||||
# py2
|
||||
from urllib2 import urlopen
|
||||
except ImportError:
|
||||
# py3
|
||||
from urllib.request import urlopen
|
||||
|
||||
URL_UNICODE_DERIVED_AGE = 'file:///usr/share/unicode/DerivedAge.txt'
|
||||
EXCLUDE_VERSIONS = ['2.0.0', '2.1.2', '3.0.0', '3.1.0', '3.2.0', '4.0.0']
|
||||
PATH_UP = os.path.relpath(
|
||||
os.path.join(
|
||||
os.path.dirname(__file__),
|
||||
os.path.pardir))
|
||||
PATH_DOCS = os.path.join(PATH_UP, 'docs')
|
||||
PATH_DATA = os.path.join(PATH_UP, 'data')
|
||||
PATH_CODE = os.path.join(PATH_UP, 'wcwidth')
|
||||
FILE_RST = os.path.join(PATH_DOCS, 'unicode_version.rst')
|
||||
FILE_PATCH_FROM = "release files:"
|
||||
FILE_PATCH_TO = "======="
|
||||
|
||||
|
||||
# use chr() for py3.x,
|
||||
# unichr() for py2.x
|
||||
try:
|
||||
_ = unichr(0)
|
||||
except NameError as err:
|
||||
if err.args[0] == "name 'unichr' is not defined":
|
||||
# pylint: disable=C0103,W0622
|
||||
# Invalid constant name "unichr" (col 8)
|
||||
# Redefining built-in 'unichr' (col 8)
|
||||
unichr = chr
|
||||
else:
|
||||
raise
|
||||
|
||||
|
||||
TableDef = collections.namedtuple('table', ['version', 'date', 'values'])
|
||||
|
||||
|
||||
def main():
|
||||
"""Update east-asian, combining and zero width tables."""
|
||||
versions = get_unicode_versions()
|
||||
do_east_asian(versions)
|
||||
do_zero_width(versions)
|
||||
do_rst_file_update()
|
||||
do_unicode_versions(versions)
|
||||
|
||||
|
||||
def get_unicode_versions():
|
||||
"""Fetch, determine, and return Unicode Versions for processing."""
|
||||
fname = os.path.join(PATH_DATA, 'DerivedAge.txt')
|
||||
do_retrieve(url=URL_UNICODE_DERIVED_AGE, fname=fname)
|
||||
pattern = re.compile(r'#.*assigned in Unicode ([0-9.]+)')
|
||||
versions = []
|
||||
for line in open(fname, 'r'):
|
||||
if match := re.match(pattern, line):
|
||||
version = match.group(1)
|
||||
if version not in EXCLUDE_VERSIONS:
|
||||
versions.append(version)
|
||||
versions.sort(key=lambda ver: list(map(int, ver.split('.'))))
|
||||
return versions
|
||||
|
||||
|
||||
def do_rst_file_update():
|
||||
"""Patch unicode_versions.rst to reflect the data files used in release."""
|
||||
|
||||
# read in,
|
||||
data_in = codecs.open(FILE_RST, 'r', 'utf8').read()
|
||||
|
||||
# search for beginning and end positions,
|
||||
pos_begin = data_in.find(FILE_PATCH_FROM)
|
||||
assert pos_begin != -1, (pos_begin, FILE_PATCH_FROM)
|
||||
pos_begin += len(FILE_PATCH_FROM)
|
||||
data_out = data_in[:pos_begin] + '\n\n'
|
||||
|
||||
# find all filenames with a version number in it,
|
||||
# sort filenames by name, then dotted number, ascending
|
||||
glob_pattern = os.path.join(PATH_DATA, '*[0-9]*.txt')
|
||||
filenames = glob.glob(glob_pattern)
|
||||
filenames.sort(key=lambda ver: [ver.split(
|
||||
'-')[0]] + list(map(int, ver.split('-')[-1][:-4].split('.'))))
|
||||
|
||||
# copy file description as-is, formatted
|
||||
for fpath in filenames:
|
||||
if description := describe_file_header(fpath):
|
||||
data_out += f'\n{description}'
|
||||
|
||||
# write.
|
||||
print(f"patching {FILE_RST} ..")
|
||||
codecs.open(
|
||||
FILE_RST, 'w', 'utf8').write(data_out)
|
||||
|
||||
|
||||
def do_east_asian(versions):
|
||||
"""Fetch and update east-asian tables."""
|
||||
table = {}
|
||||
for version in versions:
|
||||
fin = os.path.join(PATH_DATA, 'EastAsianWidth-{version}.txt')
|
||||
fout = os.path.join(PATH_CODE, 'table_wide.py')
|
||||
url = ('file:///usr/share/unicode/EastAsianWidth.txt')
|
||||
try:
|
||||
do_retrieve(url=url.format(version=version),
|
||||
fname=fin.format(version=version))
|
||||
except urllib.error.HTTPError as err:
|
||||
if err.code != 404:
|
||||
raise
|
||||
else:
|
||||
table[version] = parse_east_asian(
|
||||
fname=fin.format(version=version),
|
||||
properties=(u'W', u'F',))
|
||||
do_write_table(fname=fout, variable='WIDE_EASTASIAN', table=table)
|
||||
|
||||
|
||||
def do_zero_width(versions):
|
||||
"""Fetch and update zero width tables."""
|
||||
table = {}
|
||||
fout = os.path.join(PATH_CODE, 'table_zero.py')
|
||||
for version in versions:
|
||||
fin = os.path.join(PATH_DATA, 'DerivedGeneralCategory-{version}.txt')
|
||||
url = ('file:///usr/share/unicode/extracted/DerivedGeneralCategory.txt')
|
||||
try:
|
||||
do_retrieve(url=url.format(version=version),
|
||||
fname=fin.format(version=version))
|
||||
except urllib.error.HTTPError as err:
|
||||
if err.code != 404:
|
||||
raise
|
||||
else:
|
||||
table[version] = parse_category(
|
||||
fname=fin.format(version=version),
|
||||
categories=('Me', 'Mn',))
|
||||
do_write_table(fname=fout, variable='ZERO_WIDTH', table=table)
|
||||
|
||||
|
||||
def make_table(values):
|
||||
"""Return a tuple of lookup tables for given values."""
|
||||
table = collections.deque()
|
||||
start, end = values[0], values[0]
|
||||
for num, value in enumerate(values):
|
||||
if num == 0:
|
||||
table.append((value, value,))
|
||||
continue
|
||||
start, end = table.pop()
|
||||
if end == value - 1:
|
||||
table.append((start, value,))
|
||||
else:
|
||||
table.append((start, end,))
|
||||
table.append((value, value,))
|
||||
return tuple(table)
|
||||
|
||||
|
||||
def do_retrieve(url, fname):
|
||||
"""Retrieve given url to target filepath fname."""
|
||||
folder = os.path.dirname(fname)
|
||||
if not os.path.exists(folder):
|
||||
os.makedirs(folder)
|
||||
print(f"{folder}{os.path.sep} created.")
|
||||
if not os.path.exists(fname):
|
||||
try:
|
||||
with open(fname, 'wb') as fout:
|
||||
print(f"retrieving {url}: ", end='', flush=True)
|
||||
resp = urlopen(url)
|
||||
fout.write(resp.read())
|
||||
except BaseException:
|
||||
print('failed')
|
||||
os.unlink(fname)
|
||||
raise
|
||||
print(f"{fname} saved.")
|
||||
return fname
|
||||
|
||||
|
||||
def describe_file_header(fpath):
|
||||
header_2 = [line.lstrip('# ').rstrip() for line in
|
||||
codecs.open(fpath, 'r', 'utf8').readlines()[:2]]
|
||||
# fmt:
|
||||
#
|
||||
# ``EastAsianWidth-8.0.0.txt``
|
||||
# *2015-02-10, 21:00:00 GMT [KW, LI]*
|
||||
fmt = '``{0}``\n *{1}*\n'
|
||||
if len(header_2) == 0:
|
||||
return ''
|
||||
assert len(header_2) == 2, (fpath, header_2)
|
||||
return fmt.format(*header_2)
|
||||
|
||||
|
||||
def parse_east_asian(fname, properties=(u'W', u'F',)):
|
||||
"""Parse unicode east-asian width tables."""
|
||||
print(f'parsing {fname}: ', end='', flush=True)
|
||||
version, date, values = None, None, []
|
||||
for line in open(fname, 'rb'):
|
||||
uline = line.decode('utf-8')
|
||||
if version is None:
|
||||
version = uline.split(None, 1)[1].rstrip()
|
||||
continue
|
||||
if date is None:
|
||||
date = uline.split(':', 1)[1].rstrip()
|
||||
continue
|
||||
if uline.startswith('#') or not uline.lstrip():
|
||||
continue
|
||||
addrs, details = uline.split(';', 1)
|
||||
if any(details.startswith(property)
|
||||
for property in properties):
|
||||
start, stop = addrs, addrs
|
||||
if '..' in addrs:
|
||||
start, stop = addrs.split('..')
|
||||
values.extend(range(int(start, 16), int(stop, 16) + 1))
|
||||
print('ok')
|
||||
return TableDef(version, date, values)
|
||||
|
||||
|
||||
def parse_category(fname, categories):
|
||||
"""Parse unicode category tables."""
|
||||
print(f'parsing {fname}: ', end='', flush=True)
|
||||
version, date, values = None, None, []
|
||||
for line in open(fname, 'rb'):
|
||||
uline = line.decode('utf-8')
|
||||
if version is None:
|
||||
version = uline.split(None, 1)[1].rstrip()
|
||||
continue
|
||||
if date is None:
|
||||
date = uline.split(':', 1)[1].rstrip()
|
||||
continue
|
||||
if uline.startswith('#') or not uline.lstrip():
|
||||
continue
|
||||
addrs, details = uline.split(';', 1)
|
||||
addrs, details = addrs.rstrip(), details.lstrip()
|
||||
if any(details.startswith(f'{value} #')
|
||||
for value in categories):
|
||||
start, stop = addrs, addrs
|
||||
if '..' in addrs:
|
||||
start, stop = addrs.split('..')
|
||||
values.extend(range(int(start, 16), int(stop, 16) + 1))
|
||||
print('ok')
|
||||
return TableDef(version, date, sorted(values))
|
||||
|
||||
|
||||
def do_write_table(fname, variable, table):
|
||||
"""Write combining tables to filesystem as python code."""
|
||||
# pylint: disable=R0914
|
||||
# Too many local variables (19/15) (col 4)
|
||||
utc_now = datetime.datetime.utcnow()
|
||||
indent = ' ' * 8
|
||||
with open(fname, 'w') as fout:
|
||||
print(f"writing {fname} ... ", end='')
|
||||
fout.write(
|
||||
f'"""{variable.title()} table, created by bin/update-tables.py."""\n'
|
||||
f"{variable} = {{\n")
|
||||
|
||||
for version_key, version_table in table.items():
|
||||
if not version_table.values:
|
||||
continue
|
||||
fout.write(
|
||||
f"{indent[:-4]}'{version_key}': (\n"
|
||||
f"{indent}# Source: {version_table.version}\n"
|
||||
f"{indent}# Date: {version_table.date}\n"
|
||||
f"{indent}#")
|
||||
|
||||
for start, end in make_table(version_table.values):
|
||||
ucs_start, ucs_end = unichr(start), unichr(end)
|
||||
hex_start, hex_end = (f'0x{start:05x}', f'0x{end:05x}')
|
||||
try:
|
||||
name_start = string.capwords(unicodedata.name(ucs_start))
|
||||
except ValueError:
|
||||
name_start = u'(nil)'
|
||||
try:
|
||||
name_end = string.capwords(unicodedata.name(ucs_end))
|
||||
except ValueError:
|
||||
name_end = u'(nil)'
|
||||
fout.write(f'\n{indent}')
|
||||
comment_startpart = name_start[:24].rstrip()
|
||||
comment_endpart = name_end[:24].rstrip()
|
||||
fout.write(f'({hex_start}, {hex_end},),')
|
||||
fout.write(f' # {comment_startpart:24s}..{comment_endpart}')
|
||||
fout.write(f'\n{indent[:-4]}),\n')
|
||||
fout.write('}\n')
|
||||
print("complete.")
|
||||
|
||||
|
||||
def do_unicode_versions(versions):
|
||||
"""Write unicode_versions.py function list_versions()."""
|
||||
fname = os.path.join(PATH_CODE, 'unicode_versions.py')
|
||||
print(f"writing {fname} ... ", end='')
|
||||
|
||||
utc_now = datetime.datetime.utcnow()
|
||||
version_tuples_str = '\n '.join(
|
||||
f'"{ver}",' for ver in versions)
|
||||
with open(fname, 'w') as fp:
|
||||
fp.write(f"""\"\"\"
|
||||
Exports function list_versions() for unicode version level support.
|
||||
|
||||
This code generated by {__file__} on {utc_now}.
|
||||
\"\"\"
|
||||
|
||||
|
||||
def list_versions():
|
||||
\"\"\"
|
||||
Return Unicode version levels supported by this module release.
|
||||
|
||||
Any of the version strings returned may be used as keyword argument
|
||||
``unicode_version`` to the ``wcwidth()`` family of functions.
|
||||
|
||||
:returns: Supported Unicode version numbers in ascending sorted order.
|
||||
:rtype: list[str]
|
||||
\"\"\"
|
||||
return (
|
||||
{version_tuples_str}
|
||||
)
|
||||
""")
|
||||
print('done.')
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
|
@ -0,0 +1,706 @@
|
|||
#!/usr/bin/env python
|
||||
"""
|
||||
A terminal browser, similar to less(1) for testing printable width of unicode.
|
||||
|
||||
This displays the full range of unicode points for 1 or 2-character wide
|
||||
ideograms, with pipes ('|') that should always align for any terminal that
|
||||
supports utf-8.
|
||||
|
||||
Usage:
|
||||
./bin/wcwidth-browser.py [--wide=<n>]
|
||||
[--alignment=<str>]
|
||||
[--combining]
|
||||
[--help]
|
||||
|
||||
Options:
|
||||
--wide=<int> Browser 1 or 2 character-wide cells.
|
||||
--alignment=<str> Chose left or right alignment. [default: left]
|
||||
--combining Use combining character generator. [default: 2]
|
||||
--help Display usage
|
||||
"""
|
||||
# pylint: disable=C0103,W0622
|
||||
# Invalid constant name "echo"
|
||||
# Invalid constant name "flushout" (col 4)
|
||||
# Invalid module name "wcwidth-browser"
|
||||
from __future__ import division, print_function
|
||||
|
||||
# std imports
|
||||
import sys
|
||||
import signal
|
||||
import string
|
||||
import functools
|
||||
import unicodedata
|
||||
|
||||
# 3rd party
|
||||
import docopt
|
||||
import blessed
|
||||
|
||||
# local
|
||||
from wcwidth import ZERO_WIDTH, wcwidth, list_versions, _wcmatch_version
|
||||
|
||||
#: print function alias, does not end with line terminator.
|
||||
echo = functools.partial(print, end='')
|
||||
flushout = functools.partial(print, end='', flush=True)
|
||||
|
||||
#: printable length of highest unicode character description
|
||||
LIMIT_UCS = 0x3fffd
|
||||
UCS_PRINTLEN = len('{value:0x}'.format(value=LIMIT_UCS))
|
||||
|
||||
|
||||
def readline(term, width):
|
||||
"""A rudimentary readline implementation."""
|
||||
text = ''
|
||||
while True:
|
||||
inp = term.inkey()
|
||||
if inp.code == term.KEY_ENTER:
|
||||
break
|
||||
if inp.code == term.KEY_ESCAPE or inp == chr(3):
|
||||
text = None
|
||||
break
|
||||
if not inp.is_sequence and len(text) < width:
|
||||
text += inp
|
||||
echo(inp)
|
||||
flushout()
|
||||
elif inp.code in (term.KEY_BACKSPACE, term.KEY_DELETE):
|
||||
if text:
|
||||
text = text[:-1]
|
||||
echo('\b \b')
|
||||
flushout()
|
||||
return text
|
||||
|
||||
|
||||
class WcWideCharacterGenerator(object):
|
||||
"""Generator yields unicode characters of the given ``width``."""
|
||||
|
||||
# pylint: disable=R0903
|
||||
# Too few public methods (0/2)
|
||||
def __init__(self, width=2, unicode_version='auto'):
|
||||
"""
|
||||
Class constructor.
|
||||
|
||||
:param width: generate characters of given width.
|
||||
:param str unicode_version: Unicode Version for render.
|
||||
:type width: int
|
||||
"""
|
||||
self.characters = (
|
||||
chr(idx) for idx in range(LIMIT_UCS)
|
||||
if wcwidth(chr(idx), unicode_version=unicode_version) == width)
|
||||
|
||||
def __iter__(self):
|
||||
"""Special method called by iter()."""
|
||||
return self
|
||||
|
||||
def __next__(self):
|
||||
"""Special method called by next()."""
|
||||
while True:
|
||||
ucs = next(self.characters)
|
||||
try:
|
||||
name = string.capwords(unicodedata.name(ucs))
|
||||
except ValueError:
|
||||
continue
|
||||
return (ucs, name)
|
||||
|
||||
|
||||
class WcCombinedCharacterGenerator(object):
|
||||
"""Generator yields unicode characters with combining."""
|
||||
|
||||
# pylint: disable=R0903
|
||||
# Too few public methods (0/2)
|
||||
|
||||
def __init__(self, width=1):
|
||||
"""
|
||||
Class constructor.
|
||||
|
||||
:param int width: generate characters of given width.
|
||||
:param str unicode_version: Unicode version.
|
||||
"""
|
||||
self.characters = []
|
||||
letters_o = ('o' * width)
|
||||
last_version = list_versions()[-1]
|
||||
for (begin, end) in ZERO_WIDTH[last_version].items():
|
||||
for val in [_val for _val in
|
||||
range(begin, end + 1)
|
||||
if _val <= LIMIT_UCS]:
|
||||
self.characters.append(
|
||||
letters_o[:1] +
|
||||
chr(val) +
|
||||
letters_o[wcwidth(chr(val)) + 1:])
|
||||
self.characters.reverse()
|
||||
|
||||
def __iter__(self):
|
||||
"""Special method called by iter()."""
|
||||
return self
|
||||
|
||||
def __next__(self):
|
||||
"""
|
||||
Special method called by next().
|
||||
|
||||
:return: unicode character and name, as tuple.
|
||||
:rtype: tuple[unicode, unicode]
|
||||
:raises StopIteration: no more characters
|
||||
"""
|
||||
while True:
|
||||
if not self.characters:
|
||||
raise StopIteration
|
||||
ucs = self.characters.pop()
|
||||
try:
|
||||
name = string.capwords(unicodedata.name(ucs[1]))
|
||||
except ValueError:
|
||||
continue
|
||||
return (ucs, name)
|
||||
|
||||
# python 2.6 - 3.3 compatibility
|
||||
next = __next__
|
||||
|
||||
|
||||
class Style(object):
|
||||
"""Styling decorator class instance for terminal output."""
|
||||
|
||||
# pylint: disable=R0903
|
||||
# Too few public methods (0/2)
|
||||
@staticmethod
|
||||
def attr_major(text):
|
||||
"""non-stylized callable for "major" text, for non-ttys."""
|
||||
return text
|
||||
|
||||
@staticmethod
|
||||
def attr_minor(text):
|
||||
"""non-stylized callable for "minor" text, for non-ttys."""
|
||||
return text
|
||||
|
||||
delimiter = '|'
|
||||
continuation = ' $'
|
||||
header_hint = '-'
|
||||
header_fill = '='
|
||||
name_len = 10
|
||||
alignment = 'right'
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
"""
|
||||
Class constructor.
|
||||
|
||||
Any given keyword arguments are assigned to the class attribute of the same name.
|
||||
"""
|
||||
for key, val in kwargs.items():
|
||||
setattr(self, key, val)
|
||||
|
||||
|
||||
class Screen(object):
|
||||
"""Represents terminal style, data dimensions, and drawables."""
|
||||
|
||||
intro_msg_fmt = ('Delimiters ({delim}) should align, '
|
||||
'unicode version is {version}.')
|
||||
|
||||
def __init__(self, term, style, wide=2):
|
||||
"""Class constructor."""
|
||||
self.term = term
|
||||
self.style = style
|
||||
self.wide = wide
|
||||
|
||||
@property
|
||||
def header(self):
|
||||
"""Text of joined segments producing full heading."""
|
||||
return self.head_item * self.num_columns
|
||||
|
||||
@property
|
||||
def hint_width(self):
|
||||
"""Width of a column segment."""
|
||||
return sum((len(self.style.delimiter),
|
||||
self.wide,
|
||||
len(self.style.delimiter),
|
||||
len(' '),
|
||||
UCS_PRINTLEN + 2,
|
||||
len(' '),
|
||||
self.style.name_len,))
|
||||
|
||||
@property
|
||||
def head_item(self):
|
||||
"""Text of a single column heading."""
|
||||
delimiter = self.style.attr_minor(self.style.delimiter)
|
||||
hint = self.style.header_hint * self.wide
|
||||
heading = ('{delimiter}{hint}{delimiter}'
|
||||
.format(delimiter=delimiter, hint=hint))
|
||||
|
||||
def alignment(*args):
|
||||
if self.style.alignment == 'right':
|
||||
return self.term.rjust(*args)
|
||||
return self.term.ljust(*args)
|
||||
|
||||
txt = alignment(heading, self.hint_width, self.style.header_fill)
|
||||
return self.style.attr_major(txt)
|
||||
|
||||
def msg_intro(self, version):
|
||||
"""Introductory message disabled above heading."""
|
||||
return self.term.center(self.intro_msg_fmt.format(
|
||||
delim=self.style.attr_minor(self.style.delimiter),
|
||||
version=self.style.attr_minor(version))).rstrip()
|
||||
|
||||
@property
|
||||
def row_ends(self):
|
||||
"""Bottom of page."""
|
||||
return self.term.height - 1
|
||||
|
||||
@property
|
||||
def num_columns(self):
|
||||
"""Number of columns displayed."""
|
||||
if self.term.is_a_tty:
|
||||
return self.term.width // self.hint_width
|
||||
return 1
|
||||
|
||||
@property
|
||||
def num_rows(self):
|
||||
"""Number of rows displayed."""
|
||||
return self.row_ends - self.row_begins - 1
|
||||
|
||||
@property
|
||||
def row_begins(self):
|
||||
"""Top row displayed for content."""
|
||||
# pylint: disable=R0201
|
||||
# Method could be a function (col 4)
|
||||
return 2
|
||||
|
||||
@property
|
||||
def page_size(self):
|
||||
"""Number of unicode text displayed per page."""
|
||||
return self.num_rows * self.num_columns
|
||||
|
||||
|
||||
class Pager(object):
|
||||
"""A less(1)-like browser for browsing unicode characters."""
|
||||
# pylint: disable=too-many-instance-attributes
|
||||
|
||||
#: screen state for next draw method(s).
|
||||
STATE_CLEAN, STATE_DIRTY, STATE_REFRESH = 0, 1, 2
|
||||
|
||||
def __init__(self, term, screen, character_factory):
|
||||
"""
|
||||
Class constructor.
|
||||
|
||||
:param term: blessed Terminal class instance.
|
||||
:type term: blessed.Terminal
|
||||
:param screen: Screen class instance.
|
||||
:type screen: Screen
|
||||
:param character_factory: Character factory generator.
|
||||
:type character_factory: callable returning iterable.
|
||||
"""
|
||||
self.term = term
|
||||
self.screen = screen
|
||||
self.character_factory = character_factory
|
||||
self.unicode_version = 'auto'
|
||||
self.dirty = self.STATE_REFRESH
|
||||
self.last_page = 0
|
||||
self._page_data = list()
|
||||
|
||||
def on_resize(self, *args):
|
||||
"""Signal handler callback for SIGWINCH."""
|
||||
# pylint: disable=W0613
|
||||
# Unused argument 'args'
|
||||
assert self.term.width >= self.screen.hint_width, (
|
||||
'Screen to small {}, must be at least {}'.format(
|
||||
self.term.width, self.screen.hint_width))
|
||||
self._set_lastpage()
|
||||
self.dirty = self.STATE_REFRESH
|
||||
|
||||
def _set_lastpage(self):
|
||||
"""Calculate value of class attribute ``last_page``."""
|
||||
self.last_page = (len(self._page_data) - 1) // self.screen.page_size
|
||||
|
||||
def display_initialize(self):
|
||||
"""Display 'please wait' message, and narrow build warning."""
|
||||
echo(self.term.home + self.term.clear)
|
||||
echo(self.term.move_y(self.term.height // 2))
|
||||
echo(self.term.center('Initializing page data ...').rstrip())
|
||||
flushout()
|
||||
|
||||
def initialize_page_data(self):
|
||||
"""Initialize the page data for the given screen."""
|
||||
# pylint: disable=attribute-defined-outside-init
|
||||
if self.term.is_a_tty:
|
||||
self.display_initialize()
|
||||
self.character_generator = self.character_factory(
|
||||
self.screen.wide)
|
||||
self._page_data = list()
|
||||
while True:
|
||||
try:
|
||||
self._page_data.append(next(self.character_generator))
|
||||
except StopIteration:
|
||||
break
|
||||
self._set_lastpage()
|
||||
|
||||
def page_data(self, idx, offset):
|
||||
"""
|
||||
Return character data for page of given index and offset.
|
||||
|
||||
:param idx: page index.
|
||||
:type idx: int
|
||||
:param offset: scrolling region offset of current page.
|
||||
:type offset: int
|
||||
:returns: list of tuples in form of ``(ucs, name)``
|
||||
:rtype: list[(unicode, unicode)]
|
||||
"""
|
||||
size = self.screen.page_size
|
||||
|
||||
while offset < 0 and idx:
|
||||
offset += size
|
||||
idx -= 1
|
||||
offset = max(0, offset)
|
||||
|
||||
while offset >= size:
|
||||
offset -= size
|
||||
idx += 1
|
||||
|
||||
if idx == self.last_page:
|
||||
offset = 0
|
||||
idx = min(max(0, idx), self.last_page)
|
||||
|
||||
start = (idx * self.screen.page_size) + offset
|
||||
end = start + self.screen.page_size
|
||||
return (idx, offset), self._page_data[start:end]
|
||||
|
||||
def _run_notty(self, writer):
|
||||
"""Pager run method for terminals that are not a tty."""
|
||||
page_idx = page_offset = 0
|
||||
while True:
|
||||
npage_idx, _ = self.draw(writer, page_idx + 1, page_offset)
|
||||
if npage_idx == self.last_page:
|
||||
# page displayed was last page, quit.
|
||||
break
|
||||
page_idx = npage_idx
|
||||
self.dirty = self.STATE_DIRTY
|
||||
|
||||
def _run_tty(self, writer, reader):
|
||||
"""Pager run method for terminals that are a tty."""
|
||||
# allow window-change signal to reflow screen
|
||||
signal.signal(signal.SIGWINCH, self.on_resize)
|
||||
|
||||
page_idx = page_offset = 0
|
||||
while True:
|
||||
if self.dirty:
|
||||
page_idx, page_offset = self.draw(writer,
|
||||
page_idx,
|
||||
page_offset)
|
||||
self.dirty = self.STATE_CLEAN
|
||||
inp = reader(timeout=0.25)
|
||||
if inp is not None:
|
||||
nxt, noff = self.process_keystroke(inp,
|
||||
page_idx,
|
||||
page_offset)
|
||||
if self.dirty:
|
||||
continue
|
||||
if not self.dirty:
|
||||
self.dirty = nxt != page_idx or noff != page_offset
|
||||
page_idx, page_offset = nxt, noff
|
||||
if page_idx == -1:
|
||||
return
|
||||
|
||||
def run(self, writer, reader):
|
||||
"""
|
||||
Pager entry point.
|
||||
|
||||
In interactive mode (terminal is a tty), run until
|
||||
``process_keystroke()`` detects quit keystroke ('q'). In
|
||||
non-interactive mode, exit after displaying all unicode points.
|
||||
|
||||
:param writer: callable writes to output stream, receiving unicode.
|
||||
:type writer: callable
|
||||
:param reader: callable reads keystrokes from input stream, sending
|
||||
instance of blessed.keyboard.Keystroke.
|
||||
:type reader: callable
|
||||
"""
|
||||
self.initialize_page_data()
|
||||
if not self.term.is_a_tty:
|
||||
self._run_notty(writer)
|
||||
else:
|
||||
self._run_tty(writer, reader)
|
||||
|
||||
def process_keystroke(self, inp, idx, offset):
|
||||
"""
|
||||
Process keystroke ``inp``, adjusting screen parameters.
|
||||
|
||||
:param inp: return value of blessed.Terminal.inkey().
|
||||
:type inp: blessed.keyboard.Keystroke
|
||||
:param idx: page index.
|
||||
:type idx: int
|
||||
:param offset: scrolling region offset of current page.
|
||||
:type offset: int
|
||||
:returns: tuple of next (idx, offset).
|
||||
:rtype: (int, int)
|
||||
"""
|
||||
if inp.lower() in ('q', 'Q'):
|
||||
# exit
|
||||
return (-1, -1)
|
||||
self._process_keystroke_commands(inp)
|
||||
idx, offset = self._process_keystroke_movement(inp, idx, offset)
|
||||
return idx, offset
|
||||
|
||||
def _process_keystroke_commands(self, inp):
|
||||
"""Process keystrokes that issue commands (side effects)."""
|
||||
if inp in ('1', '2') and self.screen.wide != int(inp):
|
||||
# change between 1 or 2-character wide mode.
|
||||
self.screen.wide = int(inp)
|
||||
self.initialize_page_data()
|
||||
self.on_resize(None, None)
|
||||
elif inp == 'c':
|
||||
# switch on/off combining characters
|
||||
self.character_factory = (
|
||||
WcWideCharacterGenerator
|
||||
if self.character_factory != WcWideCharacterGenerator
|
||||
else WcCombinedCharacterGenerator)
|
||||
self.initialize_page_data()
|
||||
self.on_resize(None, None)
|
||||
elif inp in ('_', '-'):
|
||||
# adjust name length -2
|
||||
nlen = max(1, self.screen.style.name_len - 2)
|
||||
if nlen != self.screen.style.name_len:
|
||||
self.screen.style.name_len = nlen
|
||||
self.on_resize(None, None)
|
||||
elif inp in ('+', '='):
|
||||
# adjust name length +2
|
||||
nlen = min(self.term.width - 8, self.screen.style.name_len + 2)
|
||||
if nlen != self.screen.style.name_len:
|
||||
self.screen.style.name_len = nlen
|
||||
self.on_resize(None, None)
|
||||
elif inp == 'v':
|
||||
with self.term.location(x=0, y=self.term.height - 2):
|
||||
print(self.term.clear_eos())
|
||||
input_selection_msg = (
|
||||
"--> Enter unicode version [{versions}] ("
|
||||
"current: {self.unicode_version}):".format(
|
||||
versions=', '.join(list_versions()),
|
||||
self=self))
|
||||
echo('\n'.join(self.term.wrap(input_selection_msg,
|
||||
subsequent_indent=' ')))
|
||||
echo(' ')
|
||||
flushout()
|
||||
inp = readline(self.term, width=max(map(len, list_versions())))
|
||||
if inp.strip() and inp != self.unicode_version:
|
||||
# set new unicode version -- page data must be
|
||||
# re-initialized. Any version is legal, underlying
|
||||
# library performs best-match (with warnings)
|
||||
self.unicode_version = _wcmatch_version(inp)
|
||||
self.initialize_page_data()
|
||||
self.on_resize(None, None)
|
||||
|
||||
def _process_keystroke_movement(self, inp, idx, offset):
|
||||
"""Process keystrokes that adjust index and offset."""
|
||||
term = self.term
|
||||
# a little vi-inspired.
|
||||
if inp in ('y', 'k') or inp.code in (term.KEY_UP,):
|
||||
# scroll backward 1 line
|
||||
offset -= self.screen.num_columns
|
||||
elif inp in ('e', 'j') or inp.code in (term.KEY_ENTER,
|
||||
term.KEY_DOWN,):
|
||||
# scroll forward 1 line
|
||||
offset = offset + self.screen.num_columns
|
||||
elif inp in ('f', ' ') or inp.code in (term.KEY_PGDOWN,):
|
||||
# scroll forward 1 page
|
||||
idx += 1
|
||||
elif inp == 'b' or inp.code in (term.KEY_PGUP,):
|
||||
# scroll backward 1 page
|
||||
idx = max(0, idx - 1)
|
||||
elif inp == 'F' or inp.code in (term.KEY_SDOWN,):
|
||||
# scroll forward 10 pages
|
||||
idx = max(0, idx + 10)
|
||||
elif inp == 'B' or inp.code in (term.KEY_SUP,):
|
||||
# scroll backward 10 pages
|
||||
idx = max(0, idx - 10)
|
||||
elif inp.code == term.KEY_HOME:
|
||||
# top
|
||||
idx, offset = (0, 0)
|
||||
elif inp == 'G' or inp.code == term.KEY_END:
|
||||
# bottom
|
||||
idx, offset = (self.last_page, 0)
|
||||
elif inp == '\x0c':
|
||||
self.dirty = True
|
||||
return idx, offset
|
||||
|
||||
def draw(self, writer, idx, offset):
|
||||
"""
|
||||
Draw the current page view to ``writer``.
|
||||
|
||||
:param callable writer: callable writes to output stream, receiving unicode.
|
||||
:param int idx: current page index.
|
||||
:param int offset: scrolling region offset of current page.
|
||||
:returns: tuple of next (idx, offset).
|
||||
:rtype: (int, int)
|
||||
"""
|
||||
# as our screen can be resized while we're mid-calculation,
|
||||
# our self.dirty flag can become re-toggled; because we are
|
||||
# not re-flowing our pagination, we must begin over again.
|
||||
while self.dirty:
|
||||
self.draw_heading(writer)
|
||||
self.dirty = self.STATE_CLEAN
|
||||
(idx, offset), data = self.page_data(idx, offset)
|
||||
for txt in self.page_view(data):
|
||||
writer(txt)
|
||||
self.draw_status(writer, idx)
|
||||
flushout()
|
||||
return idx, offset
|
||||
|
||||
def draw_heading(self, writer):
|
||||
"""
|
||||
Conditionally redraw screen when ``dirty`` attribute is valued REFRESH.
|
||||
|
||||
When Pager attribute ``dirty`` is ``STATE_REFRESH``, cursor is moved
|
||||
to (0,0), screen is cleared, and heading is displayed.
|
||||
|
||||
:param callable writer: callable writes to output stream, receiving unicode.
|
||||
:return: True if class attribute ``dirty`` is ``STATE_REFRESH``.
|
||||
:rtype: bool
|
||||
"""
|
||||
if self.dirty == self.STATE_REFRESH:
|
||||
writer(''.join(
|
||||
(self.term.home, self.term.clear,
|
||||
self.screen.msg_intro(version=self.unicode_version), '\n',
|
||||
self.screen.header, '\n',)))
|
||||
return True
|
||||
return False
|
||||
|
||||
def draw_status(self, writer, idx):
|
||||
"""
|
||||
Conditionally draw status bar when output terminal is a tty.
|
||||
|
||||
:param callable writer: callable writes to output stream, receiving unicode.
|
||||
:param int idx: current page position index.
|
||||
:type idx: int
|
||||
"""
|
||||
if self.term.is_a_tty:
|
||||
writer(self.term.hide_cursor())
|
||||
style = self.screen.style
|
||||
writer(self.term.move(self.term.height - 1))
|
||||
if idx == self.last_page:
|
||||
last_end = '(END)'
|
||||
else:
|
||||
last_end = '/{0}'.format(self.last_page)
|
||||
txt = ('Page {idx}{last_end} - '
|
||||
'{q} to quit, [keys: {keyset}]'
|
||||
.format(idx=style.attr_minor('{0}'.format(idx)),
|
||||
last_end=style.attr_major(last_end),
|
||||
keyset=style.attr_major('kjfbvc12-='),
|
||||
q=style.attr_minor('q')))
|
||||
writer(self.term.center(txt).rstrip())
|
||||
|
||||
def page_view(self, data):
|
||||
"""
|
||||
Generator yields text to be displayed for the current unicode pageview.
|
||||
|
||||
:param list[(unicode, unicode)] data: The current page's data as tuple
|
||||
of ``(ucs, name)``.
|
||||
:returns: generator for full-page text for display
|
||||
"""
|
||||
if self.term.is_a_tty:
|
||||
yield self.term.move(self.screen.row_begins, 0)
|
||||
# sequence clears to end-of-line
|
||||
clear_eol = self.term.clear_eol
|
||||
# sequence clears to end-of-screen
|
||||
clear_eos = self.term.clear_eos
|
||||
|
||||
# track our current column and row, where column is
|
||||
# the whole segment of unicode value text, and draw
|
||||
# only self.screen.num_columns before end-of-line.
|
||||
#
|
||||
# use clear_eol at end of each row to erase over any
|
||||
# "ghosted" text, and clear_eos at end of screen to
|
||||
# clear the same, especially for the final page which
|
||||
# is often short.
|
||||
col = 0
|
||||
for ucs, name in data:
|
||||
val = self.text_entry(ucs, name)
|
||||
col += 1
|
||||
if col == self.screen.num_columns:
|
||||
col = 0
|
||||
if self.term.is_a_tty:
|
||||
val = ''.join((val, clear_eol, '\n'))
|
||||
else:
|
||||
val = ''.join((val.rstrip(), '\n'))
|
||||
yield val
|
||||
|
||||
if self.term.is_a_tty:
|
||||
yield ''.join((clear_eol, '\n', clear_eos))
|
||||
|
||||
def text_entry(self, ucs, name):
|
||||
"""
|
||||
Display a single column segment row describing ``(ucs, name)``.
|
||||
|
||||
:param str ucs: target unicode point character string.
|
||||
:param str name: name of unicode point.
|
||||
:return: formatted text for display.
|
||||
:rtype: unicode
|
||||
"""
|
||||
style = self.screen.style
|
||||
if len(name) > style.name_len:
|
||||
idx = max(0, style.name_len - len(style.continuation))
|
||||
name = ''.join((name[:idx], style.continuation if idx else ''))
|
||||
if style.alignment == 'right':
|
||||
fmt = ' '.join(('0x{val:0>{ucs_printlen}x}',
|
||||
'{name:<{name_len}s}',
|
||||
'{delimiter}{ucs}{delimiter}'
|
||||
))
|
||||
else:
|
||||
fmt = ' '.join(('{delimiter}{ucs}{delimiter}',
|
||||
'0x{val:0>{ucs_printlen}x}',
|
||||
'{name:<{name_len}s}'))
|
||||
delimiter = style.attr_minor(style.delimiter)
|
||||
if len(ucs) != 1:
|
||||
# determine display of combining characters
|
||||
val = ord(ucs[1])
|
||||
# a combining character displayed of any fg color
|
||||
# will reset the foreground character of the cell
|
||||
# combined with (iTerm2, OSX).
|
||||
disp_ucs = style.attr_major(ucs[0:2])
|
||||
if len(ucs) > 2:
|
||||
disp_ucs += ucs[2]
|
||||
else:
|
||||
# non-combining
|
||||
val = ord(ucs)
|
||||
disp_ucs = style.attr_major(ucs)
|
||||
|
||||
return fmt.format(name_len=style.name_len,
|
||||
ucs_printlen=UCS_PRINTLEN,
|
||||
delimiter=delimiter,
|
||||
name=name,
|
||||
ucs=disp_ucs,
|
||||
val=val)
|
||||
|
||||
|
||||
def validate_args(opts):
|
||||
"""Validate and return options provided by docopt parsing."""
|
||||
if opts['--wide'] is None:
|
||||
opts['--wide'] = 2
|
||||
else:
|
||||
assert opts['--wide'] in ("1", "2"), opts['--wide']
|
||||
if opts['--alignment'] is None:
|
||||
opts['--alignment'] = 'left'
|
||||
else:
|
||||
assert opts['--alignment'] in ('left', 'right'), opts['--alignment']
|
||||
opts['--wide'] = int(opts['--wide'])
|
||||
opts['character_factory'] = WcWideCharacterGenerator
|
||||
if opts['--combining']:
|
||||
opts['character_factory'] = WcCombinedCharacterGenerator
|
||||
return opts
|
||||
|
||||
|
||||
def main(opts):
|
||||
"""Program entry point."""
|
||||
term = blessed.Terminal()
|
||||
style = Style()
|
||||
|
||||
# if the terminal supports colors, use a Style instance with some
|
||||
# standout colors (magenta, cyan).
|
||||
if term.number_of_colors:
|
||||
style = Style(attr_major=term.magenta,
|
||||
attr_minor=term.bright_cyan,
|
||||
alignment=opts['--alignment'])
|
||||
style.name_len = 10
|
||||
|
||||
screen = Screen(term, style, wide=opts['--wide'])
|
||||
pager = Pager(term, screen, opts['character_factory'])
|
||||
|
||||
with term.location(), term.cbreak(), \
|
||||
term.fullscreen(), term.hidden_cursor():
|
||||
pager.run(writer=echo, reader=term.inkey)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main(validate_args(docopt.docopt(__doc__))))
|
|
@ -0,0 +1,138 @@
|
|||
#!/usr/bin/env python
|
||||
# coding: utf-8
|
||||
"""
|
||||
Manual tests comparing wcwidth.py to libc's wcwidth(3) and wcswidth(3).
|
||||
|
||||
https://github.com/jquast/wcwidth
|
||||
|
||||
This suite of tests compares the libc return values with the pure-python return
|
||||
values. Although wcwidth(3) is POSIX, its actual implementation may differ,
|
||||
so these tests are not guaranteed to be successful on all platforms, especially
|
||||
where wcwidth(3)/wcswidth(3) is out of date. This is especially true for many
|
||||
platforms -- usually conforming only to unicode specification 1.0 or 2.0.
|
||||
|
||||
This program accepts one optional command-line argument, the unicode version
|
||||
level for our library to use when comparing to libc.
|
||||
"""
|
||||
# pylint: disable=C0103
|
||||
# Invalid module name "wcwidth-libc-comparator"
|
||||
|
||||
# standard imports
|
||||
from __future__ import print_function
|
||||
|
||||
# std imports
|
||||
import sys
|
||||
import locale
|
||||
import warnings
|
||||
import ctypes.util
|
||||
import unicodedata
|
||||
|
||||
# local
|
||||
# local imports
|
||||
import wcwidth
|
||||
|
||||
|
||||
def is_named(ucs):
|
||||
"""
|
||||
Whether the unicode point ``ucs`` has a name.
|
||||
|
||||
:rtype bool
|
||||
"""
|
||||
try:
|
||||
return bool(unicodedata.name(ucs))
|
||||
except ValueError:
|
||||
return False
|
||||
|
||||
|
||||
def is_not_combining(ucs):
|
||||
return not unicodedata.combining(ucs)
|
||||
|
||||
|
||||
def report_ucs_msg(ucs, wcwidth_libc, wcwidth_local):
|
||||
"""
|
||||
Return string report of combining character differences.
|
||||
|
||||
:param ucs: unicode point.
|
||||
:type ucs: unicode
|
||||
:param wcwidth_libc: libc-wcwidth's reported character length.
|
||||
:type comb_py: int
|
||||
:param wcwidth_local: wcwidth's reported character length.
|
||||
:type comb_wc: int
|
||||
:rtype: unicode
|
||||
"""
|
||||
ucp = (ucs.encode('unicode_escape')[2:]
|
||||
.decode('ascii')
|
||||
.upper()
|
||||
.lstrip('0'))
|
||||
url = "http://codepoints.net/U+{}".format(ucp)
|
||||
name = unicodedata.name(ucs)
|
||||
return (u"libc,ours={},{} [--o{}o--] name={} val={} {}"
|
||||
" ".format(wcwidth_libc, wcwidth_local, ucs, name, ord(ucs), url))
|
||||
|
||||
|
||||
# use chr() for py3.x,
|
||||
# unichr() for py2.x
|
||||
try:
|
||||
_ = unichr(0)
|
||||
except NameError as err:
|
||||
if err.args[0] == "name 'unichr' is not defined":
|
||||
# pylint: disable=W0622
|
||||
# Redefining built-in 'unichr' (col 8)
|
||||
|
||||
unichr = chr
|
||||
else:
|
||||
raise
|
||||
|
||||
if sys.maxunicode < 1114111:
|
||||
warnings.warn('narrow Python build, only a small subset of '
|
||||
'characters may be tested.')
|
||||
|
||||
|
||||
def _is_equal_wcwidth(libc, ucs, unicode_version):
|
||||
w_libc = libc.wcwidth(ucs)
|
||||
w_local = wcwidth.wcwidth(ucs, unicode_version)
|
||||
assert w_libc == w_local, report_ucs_msg(ucs, w_libc, w_local)
|
||||
|
||||
|
||||
def main(using_locale=('en_US', 'UTF-8',)):
|
||||
"""
|
||||
Program entry point.
|
||||
|
||||
Load the entire Unicode table into memory, excluding those that:
|
||||
|
||||
- are not named (func unicodedata.name returns empty string),
|
||||
- are combining characters.
|
||||
|
||||
Using ``locale``, for each unicode character string compare libc's
|
||||
wcwidth with local wcwidth.wcwidth() function; when they differ,
|
||||
report a detailed AssertionError to stdout.
|
||||
"""
|
||||
all_ucs = (ucs for ucs in
|
||||
[unichr(val) for val in range(sys.maxunicode)]
|
||||
if is_named(ucs) and is_not_combining(ucs))
|
||||
|
||||
libc_name = ctypes.util.find_library('c')
|
||||
if not libc_name:
|
||||
raise ImportError("Can't find C library.")
|
||||
|
||||
libc = ctypes.cdll.LoadLibrary(libc_name)
|
||||
libc.wcwidth.argtypes = [ctypes.c_wchar, ]
|
||||
libc.wcwidth.restype = ctypes.c_int
|
||||
|
||||
assert getattr(libc, 'wcwidth', None) is not None
|
||||
assert getattr(libc, 'wcswidth', None) is not None
|
||||
|
||||
locale.setlocale(locale.LC_ALL, using_locale)
|
||||
unicode_version = 'latest'
|
||||
if len(sys.argv) > 1:
|
||||
unicode_version = sys.argv[1]
|
||||
|
||||
for ucs in all_ucs:
|
||||
try:
|
||||
_is_equal_wcwidth(libc, ucs, unicode_version)
|
||||
except AssertionError as err:
|
||||
print(err)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
|
@ -0,0 +1,38 @@
|
|||
==========
|
||||
Public API
|
||||
==========
|
||||
|
||||
This package follows SEMVER_ rules for version, therefor, for all of the
|
||||
given functions signatures, at example version 1.1.1, you may use version
|
||||
dependency ``>=1.1.1,<2.0`` for forward compatibility of future wcwidth
|
||||
versions.
|
||||
|
||||
.. autofunction:: wcwidth.wcwidth
|
||||
|
||||
.. autofunction:: wcwidth.wcswidth
|
||||
|
||||
.. autofunction:: wcwidth.list_versions
|
||||
|
||||
.. _SEMVER: https://semver.org
|
||||
|
||||
===========
|
||||
Private API
|
||||
===========
|
||||
|
||||
These functions should only be used for wcwidth development, and not used by
|
||||
dependent packages except with care and by use of frozen version dependency,
|
||||
as these functions may change names, signatures, or disappear entirely at any
|
||||
time in the future, and not reflected by SEMVER rules.
|
||||
|
||||
If stable public API for any of the given functions is needed, please suggest a
|
||||
Pull Request!
|
||||
|
||||
.. autofunction:: wcwidth._bisearch
|
||||
|
||||
.. autofunction:: wcwidth._wcversion_value
|
||||
|
||||
.. autofunction:: wcwidth._wcmatch_version
|
||||
|
||||
.. autofunction:: wcwidth._get_package_version
|
||||
|
||||
.. autofunction:: wcwidth._wcmatch_version
|
|
@ -0,0 +1,178 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# wcwidth documentation build configuration file, created by
|
||||
# sphinx-quickstart on Fri Oct 20 15:18:02 2017.
|
||||
#
|
||||
# This file is execfile()d with the current directory set to its
|
||||
# containing dir.
|
||||
#
|
||||
# Note that not all possible configuration values are present in this
|
||||
# autogenerated file.
|
||||
#
|
||||
# All configuration values have a default; values that are commented out
|
||||
# serve to show the default.
|
||||
|
||||
# If extensions (or modules to document with autodoc) are in another directory,
|
||||
# add these directories to sys.path here. If the directory is relative to the
|
||||
# documentation root, use os.path.abspath to make it absolute, like shown here.
|
||||
#
|
||||
# import os
|
||||
# import sys
|
||||
# sys.path.insert(0, os.path.abspath('.'))
|
||||
|
||||
# local
|
||||
# 3rd-party imports
|
||||
import wcwidth
|
||||
|
||||
# -- General configuration ------------------------------------------------
|
||||
|
||||
# If your documentation needs a minimal Sphinx version, state it here.
|
||||
#
|
||||
# needs_sphinx = '1.0'
|
||||
|
||||
# Add any Sphinx extension module names here, as strings. They can be
|
||||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
|
||||
# ones.
|
||||
extensions = ['sphinx.ext.autodoc',
|
||||
'sphinx.ext.doctest',
|
||||
'sphinx.ext.intersphinx',
|
||||
'sphinx.ext.coverage',
|
||||
'sphinx.ext.viewcode']
|
||||
|
||||
# Add any paths that contain templates here, relative to this directory.
|
||||
templates_path = ['_templates']
|
||||
|
||||
# The suffix(es) of source filenames.
|
||||
# You can specify multiple suffix as a list of string:
|
||||
#
|
||||
# source_suffix = ['.rst', '.md']
|
||||
source_suffix = '.rst'
|
||||
|
||||
# The master toctree document.
|
||||
master_doc = 'index'
|
||||
|
||||
# General information about the project.
|
||||
project = 'wcwidth'
|
||||
copyright = '2017, Jeff Quast'
|
||||
author = 'Jeff Quast'
|
||||
|
||||
# The version info for the project you're documenting, acts as replacement for
|
||||
# |version| and |release|, also used in various other places throughout the
|
||||
# built documents.
|
||||
#
|
||||
# The short X.Y version,
|
||||
# The full version, including alpha/beta/rc tags.
|
||||
release = version = wcwidth.__version__
|
||||
|
||||
# The language for content autogenerated by Sphinx. Refer to documentation
|
||||
# for a list of supported languages.
|
||||
#
|
||||
# This is also used if you do content translation via gettext catalogs.
|
||||
# Usually you set "language" from the command line for these cases.
|
||||
language = None
|
||||
|
||||
# List of patterns, relative to source directory, that match files and
|
||||
# directories to ignore when looking for source files.
|
||||
# This patterns also effect to html_static_path and html_extra_path
|
||||
exclude_patterns = []
|
||||
|
||||
# The name of the Pygments (syntax highlighting) style to use.
|
||||
pygments_style = 'sphinx'
|
||||
|
||||
# If true, `todo` and `todoList` produce output, else they produce nothing.
|
||||
todo_include_todos = False
|
||||
|
||||
|
||||
# -- Options for HTML output ----------------------------------------------
|
||||
|
||||
# The theme to use for HTML and HTML Help pages. See the documentation for
|
||||
# a list of builtin themes.
|
||||
#
|
||||
html_theme = 'alabaster'
|
||||
|
||||
# Theme options are theme-specific and customize the look and feel of a theme
|
||||
# further. For a list of options available for each theme, see the
|
||||
# documentation.
|
||||
#
|
||||
# html_theme_options = {}
|
||||
|
||||
# Add any paths that contain custom static files (such as style sheets) here,
|
||||
# relative to this directory. They are copied after the builtin static files,
|
||||
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||
html_static_path = ['_static']
|
||||
|
||||
# Custom sidebar templates, must be a dictionary that maps document names
|
||||
# to template names.
|
||||
#
|
||||
# This is required for the alabaster theme
|
||||
# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
|
||||
# html_sidebars = {
|
||||
# '**': [
|
||||
# 'about.html',
|
||||
# 'navigation.html',
|
||||
# 'relations.html', # needs 'show_related': True theme option to display
|
||||
# 'searchbox.html',
|
||||
# 'donate.html',
|
||||
# ]
|
||||
# }
|
||||
|
||||
|
||||
# -- Options for HTMLHelp output ------------------------------------------
|
||||
|
||||
# Output file base name for HTML help builder.
|
||||
htmlhelp_basename = 'wcwidthdoc'
|
||||
|
||||
|
||||
# -- Options for LaTeX output ---------------------------------------------
|
||||
|
||||
latex_elements = {
|
||||
# The paper size ('letterpaper' or 'a4paper').
|
||||
#
|
||||
# 'papersize': 'letterpaper',
|
||||
|
||||
# The font size ('10pt', '11pt' or '12pt').
|
||||
#
|
||||
# 'pointsize': '10pt',
|
||||
|
||||
# Additional stuff for the LaTeX preamble.
|
||||
#
|
||||
# 'preamble': '',
|
||||
|
||||
# Latex figure (float) alignment
|
||||
#
|
||||
# 'figure_align': 'htbp',
|
||||
}
|
||||
|
||||
# Grouping the document tree into LaTeX files. List of tuples
|
||||
# (source start file, target name, title,
|
||||
# author, documentclass [howto, manual, or own class]).
|
||||
latex_documents = [
|
||||
(master_doc, 'wcwidth.tex', 'wcwidth Documentation',
|
||||
'Jeff Quast', 'manual'),
|
||||
]
|
||||
|
||||
|
||||
# -- Options for manual page output ---------------------------------------
|
||||
|
||||
# One entry per manual page. List of tuples
|
||||
# (source start file, name, description, authors, manual section).
|
||||
man_pages = [
|
||||
(master_doc, 'wcwidth', 'wcwidth Documentation',
|
||||
[author], 1)
|
||||
]
|
||||
|
||||
|
||||
# -- Options for Texinfo output -------------------------------------------
|
||||
|
||||
# Grouping the document tree into Texinfo files. List of tuples
|
||||
# (source start file, target name, title, author,
|
||||
# dir menu entry, description, category)
|
||||
texinfo_documents = [
|
||||
(master_doc, 'wcwidth', 'wcwidth Documentation',
|
||||
author, 'wcwidth', 'One line description of project.',
|
||||
'Miscellaneous'),
|
||||
]
|
||||
|
||||
|
||||
intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
|
|
@ -0,0 +1,15 @@
|
|||
wcwidth
|
||||
=======
|
||||
|
||||
.. toctree::
|
||||
|
||||
intro
|
||||
unicode_version
|
||||
api
|
||||
|
||||
Indices and tables
|
||||
------------------
|
||||
|
||||
* :ref:`genindex`
|
||||
* :ref:`modindex`
|
||||
* :ref:`search`
|
|
@ -0,0 +1,280 @@
|
|||
|pypi_downloads| |codecov| |license|
|
||||
|
||||
============
|
||||
Introduction
|
||||
============
|
||||
|
||||
This library is mainly for CLI programs that carefully produce output for
|
||||
Terminals, or make pretend to be an emulator.
|
||||
|
||||
**Problem Statement**: The printable length of *most* strings are equal to the
|
||||
number of cells they occupy on the screen ``1 charater : 1 cell``. However,
|
||||
there are categories of characters that *occupy 2 cells* (full-wide), and
|
||||
others that *occupy 0* cells (zero-width).
|
||||
|
||||
**Solution**: POSIX.1-2001 and POSIX.1-2008 conforming systems provide
|
||||
`wcwidth(3)`_ and `wcswidth(3)`_ C functions of which this python module's
|
||||
functions precisely copy. *These functions return the number of cells a
|
||||
unicode string is expected to occupy.*
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
The stable version of this package is maintained on pypi, install using pip::
|
||||
|
||||
pip install wcwidth
|
||||
|
||||
Example
|
||||
-------
|
||||
|
||||
**Problem**: given the following phrase (Japanese),
|
||||
|
||||
>>> text = u'コンニチハ'
|
||||
|
||||
Python **incorrectly** uses the *string length* of 5 codepoints rather than the
|
||||
*printible length* of 10 cells, so that when using the `rjust` function, the
|
||||
output length is wrong::
|
||||
|
||||
>>> print(len('コンニチハ'))
|
||||
5
|
||||
|
||||
>>> print('コンニチハ'.rjust(20, '_'))
|
||||
_____コンニチハ
|
||||
|
||||
By defining our own "rjust" function that uses wcwidth, we can correct this::
|
||||
|
||||
>>> def wc_rjust(text, length, padding=' '):
|
||||
... from wcwidth import wcswidth
|
||||
... return padding * max(0, (length - wcswidth(text))) + text
|
||||
...
|
||||
|
||||
Our **Solution** uses wcswidth to determine the string length correctly::
|
||||
|
||||
>>> from wcwidth import wcswidth
|
||||
>>> print(wcswidth('コンニチハ'))
|
||||
10
|
||||
|
||||
>>> print(wc_rjust('コンニチハ', 20, '_'))
|
||||
__________コンニチハ
|
||||
|
||||
|
||||
Choosing a Version
|
||||
------------------
|
||||
|
||||
Export an environment variable, ``UNICODE_VERSION``. This should be done by
|
||||
*terminal emulators* or those developers experimenting with authoring one of
|
||||
their own, from shell::
|
||||
|
||||
$ export UNICODE_VERSION=13.0
|
||||
|
||||
If unspecified, the latest version is used. If your Terminal Emulator does not
|
||||
export this variable, you can use the `jquast/ucs-detect`_ utility to
|
||||
automatically detect and export it to your shell.
|
||||
|
||||
wcwidth, wcswidth
|
||||
-----------------
|
||||
Use function ``wcwidth()`` to determine the length of a *single unicode
|
||||
character*, and ``wcswidth()`` to determine the length of many, a *string
|
||||
of unicode characters*.
|
||||
|
||||
Briefly, return values of function ``wcwidth()`` are:
|
||||
|
||||
``-1``
|
||||
Indeterminate (not printable).
|
||||
|
||||
``0``
|
||||
Does not advance the cursor, such as NULL or Combining.
|
||||
|
||||
``2``
|
||||
Characters of category East Asian Wide (W) or East Asian
|
||||
Full-width (F) which are displayed using two terminal cells.
|
||||
|
||||
``1``
|
||||
All others.
|
||||
|
||||
Function ``wcswidth()`` simply returns the sum of all values for each character
|
||||
along a string, or ``-1`` when it occurs anywhere along a string.
|
||||
|
||||
Full API Documentation at http://wcwidth.readthedocs.org
|
||||
|
||||
==========
|
||||
Developing
|
||||
==========
|
||||
|
||||
Install wcwidth in editable mode::
|
||||
|
||||
pip install -e.
|
||||
|
||||
Execute unit tests using tox_::
|
||||
|
||||
tox
|
||||
|
||||
Regenerate python code tables from latest Unicode Specification data files::
|
||||
|
||||
tox -eupdate
|
||||
|
||||
Supplementary tools for browsing and testing terminals for wide unicode
|
||||
characters are found in the `bin/`_ of this project's source code. Just ensure
|
||||
to first ``pip install -erequirements-develop.txt`` from this projects main
|
||||
folder. For example, an interactive browser for testing::
|
||||
|
||||
./bin/wcwidth-browser.py
|
||||
|
||||
Uses
|
||||
----
|
||||
|
||||
This library is used in:
|
||||
|
||||
- `jquast/blessed`_: a thin, practical wrapper around terminal capabilities in
|
||||
Python.
|
||||
|
||||
- `jonathanslenders/python-prompt-toolkit`_: a Library for building powerful
|
||||
interactive command lines in Python.
|
||||
|
||||
- `dbcli/pgcli`_: Postgres CLI with autocompletion and syntax highlighting.
|
||||
|
||||
- `thomasballinger/curtsies`_: a Curses-like terminal wrapper with a display
|
||||
based on compositing 2d arrays of text.
|
||||
|
||||
- `selectel/pyte`_: Simple VTXXX-compatible linux terminal emulator.
|
||||
|
||||
- `astanin/python-tabulate`_: Pretty-print tabular data in Python, a library
|
||||
and a command-line utility.
|
||||
|
||||
- `LuminosoInsight/python-ftfy`_: Fixes mojibake and other glitches in Unicode
|
||||
text.
|
||||
|
||||
- `nbedos/termtosvg`_: Terminal recorder that renders sessions as SVG
|
||||
animations.
|
||||
|
||||
- `peterbrittain/asciimatics`_: Package to help people create full-screen text
|
||||
UIs.
|
||||
|
||||
Other Languages
|
||||
---------------
|
||||
|
||||
- `timoxley/wcwidth`_: JavaScript
|
||||
- `janlelis/unicode-display_width`_: Ruby
|
||||
- `alecrabbit/php-wcwidth`_: PHP
|
||||
- `Text::CharWidth`_: Perl
|
||||
- `bluebear94/Terminal-WCWidth`: Perl 6
|
||||
- `mattn/go-runewidth`_: Go
|
||||
- `emugel/wcwidth`_: Haxe
|
||||
- `aperezdc/lua-wcwidth`: Lua
|
||||
- `joachimschmidt557/zig-wcwidth`: Zig
|
||||
- `fumiyas/wcwidth-cjk`: `LD_PRELOAD` override
|
||||
- `joshuarubin/wcwidth9`: Unicode version 9 in C
|
||||
|
||||
History
|
||||
-------
|
||||
|
||||
0.2.0 *2020-06-01*
|
||||
* **Enhancement**: Unicode version may be selected by exporting the
|
||||
Environment variable ``UNICODE_VERSION``, such as ``13.0``, or ``6.3.0``.
|
||||
See the `jquast/ucs-detect`_ CLI utility for automatic detection.
|
||||
* **Enhancement**:
|
||||
API Documentation is published to readthedocs.org.
|
||||
* **Updated** tables for *all* Unicode Specifications with files
|
||||
published in a programmatically consumable format, versions 4.1.0
|
||||
through 13.0
|
||||
that are published
|
||||
, versions
|
||||
|
||||
0.1.9 *2020-03-22*
|
||||
* **Performance** optimization by `Avram Lubkin`_, `PR #35`_.
|
||||
* **Updated** tables to Unicode Specification 13.0.0.
|
||||
|
||||
0.1.8 *2020-01-01*
|
||||
* **Updated** tables to Unicode Specification 12.0.0. (`PR #30`_).
|
||||
|
||||
0.1.7 *2016-07-01*
|
||||
* **Updated** tables to Unicode Specification 9.0.0. (`PR #18`_).
|
||||
|
||||
0.1.6 *2016-01-08 Production/Stable*
|
||||
* ``LICENSE`` file now included with distribution.
|
||||
|
||||
0.1.5 *2015-09-13 Alpha*
|
||||
* **Bugfix**:
|
||||
Resolution of "combining_ character width" issue, most especially
|
||||
those that previously returned -1 now often (correctly) return 0.
|
||||
resolved by `Philip Craig`_ via `PR #11`_.
|
||||
* **Deprecated**:
|
||||
The module path ``wcwidth.table_comb`` is no longer available,
|
||||
it has been superseded by module path ``wcwidth.table_zero``.
|
||||
|
||||
0.1.4 *2014-11-20 Pre-Alpha*
|
||||
* **Feature**: ``wcswidth()`` now determines printable length
|
||||
for (most) combining_ characters. The developer's tool
|
||||
`bin/wcwidth-browser.py`_ is improved to display combining_
|
||||
characters when provided the ``--combining`` option
|
||||
(`Thomas Ballinger`_ and `Leta Montopoli`_ `PR #5`_).
|
||||
* **Feature**: added static analysis (prospector_) to testing
|
||||
framework.
|
||||
|
||||
0.1.3 *2014-10-29 Pre-Alpha*
|
||||
* **Bugfix**: 2nd parameter of wcswidth was not honored.
|
||||
(`Thomas Ballinger`_, `PR #4`_).
|
||||
|
||||
0.1.2 *2014-10-28 Pre-Alpha*
|
||||
* **Updated** tables to Unicode Specification 7.0.0.
|
||||
(`Thomas Ballinger`_, `PR #3`_).
|
||||
|
||||
0.1.1 *2014-05-14 Pre-Alpha*
|
||||
* Initial release to pypi, Based on Unicode Specification 6.3.0
|
||||
|
||||
This code was originally derived directly from C code of the same name,
|
||||
whose latest version is available at
|
||||
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
|
||||
|
||||
* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
|
||||
*
|
||||
* Permission to use, copy, modify, and distribute this software
|
||||
* for any purpose and without fee is hereby granted. The author
|
||||
* disclaims all warranties with regard to this software.
|
||||
|
||||
.. _`tox`: https://testrun.org/tox/latest/install.html
|
||||
.. _`prospector`: https://github.com/landscapeio/prospector
|
||||
.. _`combining`: https://en.wikipedia.org/wiki/Combining_character
|
||||
.. _`bin/`: https://github.com/jquast/wcwidth/tree/master/bin
|
||||
.. _`bin/wcwidth-browser.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py
|
||||
.. _`Thomas Ballinger`: https://github.com/thomasballinger
|
||||
.. _`Leta Montopoli`: https://github.com/lmontopo
|
||||
.. _`Philip Craig`: https://github.com/philipc
|
||||
.. _`PR #3`: https://github.com/jquast/wcwidth/pull/3
|
||||
.. _`PR #4`: https://github.com/jquast/wcwidth/pull/4
|
||||
.. _`PR #5`: https://github.com/jquast/wcwidth/pull/5
|
||||
.. _`PR #11`: https://github.com/jquast/wcwidth/pull/11
|
||||
.. _`PR #18`: https://github.com/jquast/wcwidth/pull/18
|
||||
.. _`PR #30`: https://github.com/jquast/wcwidth/pull/30
|
||||
.. _`PR #35`: https://github.com/jquast/wcwidth/pull/35
|
||||
.. _`jquast/blessed`: https://github.com/jquast/blessed
|
||||
.. _`selectel/pyte`: https://github.com/selectel/pyte
|
||||
.. _`thomasballinger/curtsies`: https://github.com/thomasballinger/curtsies
|
||||
.. _`dbcli/pgcli`: https://github.com/dbcli/pgcli
|
||||
.. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit
|
||||
.. _`timoxley/wcwidth`: https://github.com/timoxley/wcwidth
|
||||
.. _`wcwidth(3)`: http://man7.org/linux/man-pages/man3/wcwidth.3.html
|
||||
.. _`wcswidth(3)`: http://man7.org/linux/man-pages/man3/wcswidth.3.html
|
||||
.. _`astanin/python-tabulate`: https://github.com/astanin/python-tabulate
|
||||
.. _`janlelis/unicode-display_width`: https://github.com/janlelis/unicode-display_width
|
||||
.. _`LuminosoInsight/python-ftfy`: https://github.com/LuminosoInsight/python-ftfy
|
||||
.. _`alecrabbit/php-wcwidth`: https://github.com/alecrabbit/php-wcwidth
|
||||
.. _`Text::CharWidth`: https://metacpan.org/pod/Text::CharWidth
|
||||
.. _`bluebear94/Terminal-WCWidth`: https://github.com/bluebear94/Terminal-WCWidth
|
||||
.. _`mattn/go-runewidth`: https://github.com/mattn/go-runewidth
|
||||
.. _`emugel/wcwidth`: https://github.com/emugel/wcwidth
|
||||
.. _`jquast/ucs-detect`: https://github.com/jquast/ucs-detect
|
||||
.. _`Avram Lubkin`: https://github.com/avylove
|
||||
.. _`nbedos/termtosvg`: https://github.com/nbedos/termtosvg
|
||||
.. _`peterbrittain/asciimatics`: https://github.com/peterbrittain/asciimatics
|
||||
.. _`aperezdc/lua-wcwidth`: https://github.com/aperezdc/lua-wcwidth
|
||||
.. _`fumiyas/wcwidth-cjk`: https://github.com/fumiyas/wcwidth-cjk
|
||||
.. |pypi_downloads| image:: https://img.shields.io/pypi/dm/wcwidth.svg?logo=pypi
|
||||
:alt: Downloads
|
||||
:target: https://pypi.org/project/wcwidth/
|
||||
.. |codecov| image:: https://codecov.io/gh/jquast/wcwidth/branch/master/graph/badge.svg
|
||||
:alt: codecov.io Code Coverage
|
||||
:target: https://codecov.io/gh/jquast/wcwidth/
|
||||
.. |license| image:: https://img.shields.io/github/license/jquast/wcwidth.svg
|
||||
:target: https://pypi.python.org/pypi/wcwidth/
|
||||
:alt: MIT License
|
|
@ -0,0 +1,4 @@
|
|||
Sphinx
|
||||
sphinx-paramlinks
|
||||
sphinx_rtd_theme
|
||||
sphinxcontrib-manpage
|
|
@ -0,0 +1,104 @@
|
|||
=====================
|
||||
Unicode release files
|
||||
=====================
|
||||
|
||||
This library aims to be forward-looking, portable, and most correct.
|
||||
The most current release of this API is based on the Unicode Standard
|
||||
release files:
|
||||
|
||||
|
||||
``DerivedGeneralCategory-4.1.0.txt``
|
||||
*Date: 2005-02-26, 02:35:50 GMT [MD]*
|
||||
|
||||
``DerivedGeneralCategory-5.0.0.txt``
|
||||
*Date: 2006-02-27, 23:41:27 GMT [MD]*
|
||||
|
||||
``DerivedGeneralCategory-5.1.0.txt``
|
||||
*Date: 2008-03-20, 17:54:57 GMT [MD]*
|
||||
|
||||
``DerivedGeneralCategory-5.2.0.txt``
|
||||
*Date: 2009-08-22, 04:58:21 GMT [MD]*
|
||||
|
||||
``DerivedGeneralCategory-6.0.0.txt``
|
||||
*Date: 2010-08-19, 00:48:09 GMT [MD]*
|
||||
|
||||
``DerivedGeneralCategory-6.1.0.txt``
|
||||
*Date: 2011-11-27, 05:10:22 GMT [MD]*
|
||||
|
||||
``DerivedGeneralCategory-6.2.0.txt``
|
||||
*Date: 2012-05-20, 00:42:34 GMT [MD]*
|
||||
|
||||
``DerivedGeneralCategory-6.3.0.txt``
|
||||
*Date: 2013-07-05, 14:08:45 GMT [MD]*
|
||||
|
||||
``DerivedGeneralCategory-7.0.0.txt``
|
||||
*Date: 2014-02-07, 18:42:12 GMT [MD]*
|
||||
|
||||
``DerivedGeneralCategory-8.0.0.txt``
|
||||
*Date: 2015-02-13, 13:47:11 GMT [MD]*
|
||||
|
||||
``DerivedGeneralCategory-9.0.0.txt``
|
||||
*Date: 2016-06-01, 10:34:26 GMT*
|
||||
|
||||
``DerivedGeneralCategory-10.0.0.txt``
|
||||
*Date: 2017-03-08, 08:41:49 GMT*
|
||||
|
||||
``DerivedGeneralCategory-11.0.0.txt``
|
||||
*Date: 2018-02-21, 05:34:04 GMT*
|
||||
|
||||
``DerivedGeneralCategory-12.0.0.txt``
|
||||
*Date: 2019-01-22, 08:18:28 GMT*
|
||||
|
||||
``DerivedGeneralCategory-12.1.0.txt``
|
||||
*Date: 2019-03-10, 10:53:08 GMT*
|
||||
|
||||
``DerivedGeneralCategory-13.0.0.txt``
|
||||
*Date: 2019-10-21, 14:30:32 GMT*
|
||||
|
||||
``EastAsianWidth-4.1.0.txt``
|
||||
*Date: 2005-03-17, 15:21:00 PST [KW]*
|
||||
|
||||
``EastAsianWidth-5.0.0.txt``
|
||||
*Date: 2006-02-15, 14:39:00 PST [KW]*
|
||||
|
||||
``EastAsianWidth-5.1.0.txt``
|
||||
*Date: 2008-03-20, 17:42:00 PDT [KW]*
|
||||
|
||||
``EastAsianWidth-5.2.0.txt``
|
||||
*Date: 2009-06-09, 17:47:00 PDT [KW]*
|
||||
|
||||
``EastAsianWidth-6.0.0.txt``
|
||||
*Date: 2010-08-17, 12:17:00 PDT [KW]*
|
||||
|
||||
``EastAsianWidth-6.1.0.txt``
|
||||
*Date: 2011-09-19, 18:46:00 GMT [KW]*
|
||||
|
||||
``EastAsianWidth-6.2.0.txt``
|
||||
*Date: 2012-05-15, 18:30:00 GMT [KW]*
|
||||
|
||||
``EastAsianWidth-6.3.0.txt``
|
||||
*Date: 2013-02-05, 20:09:00 GMT [KW, LI]*
|
||||
|
||||
``EastAsianWidth-7.0.0.txt``
|
||||
*Date: 2014-02-28, 23:15:00 GMT [KW, LI]*
|
||||
|
||||
``EastAsianWidth-8.0.0.txt``
|
||||
*Date: 2015-02-10, 21:00:00 GMT [KW, LI]*
|
||||
|
||||
``EastAsianWidth-9.0.0.txt``
|
||||
*Date: 2016-05-27, 17:00:00 GMT [KW, LI]*
|
||||
|
||||
``EastAsianWidth-10.0.0.txt``
|
||||
*Date: 2017-03-08, 02:00:00 GMT [KW, LI]*
|
||||
|
||||
``EastAsianWidth-11.0.0.txt``
|
||||
*Date: 2018-05-14, 09:41:59 GMT [KW, LI]*
|
||||
|
||||
``EastAsianWidth-12.0.0.txt``
|
||||
*Date: 2019-01-21, 14:12:58 GMT [KW, LI]*
|
||||
|
||||
``EastAsianWidth-12.1.0.txt``
|
||||
*Date: 2019-03-31, 22:01:58 GMT [KW, LI]*
|
||||
|
||||
``EastAsianWidth-13.0.0.txt``
|
||||
*Date: 2029-01-21, 18:14:00 GMT [KW, LI]*
|
|
@ -0,0 +1,10 @@
|
|||
[bdist_wheel]
|
||||
universal = 1
|
||||
|
||||
[metadata]
|
||||
license_file = LICENSE
|
||||
|
||||
[egg_info]
|
||||
tag_build =
|
||||
tag_date = 0
|
||||
|
|
@ -0,0 +1,99 @@
|
|||
#!/usr/bin/env python
|
||||
"""
|
||||
Setup.py distribution file for wcwidth.
|
||||
|
||||
https://github.com/jquast/wcwidth
|
||||
"""
|
||||
# std imports
|
||||
import os
|
||||
import codecs
|
||||
|
||||
# 3rd party
|
||||
import setuptools
|
||||
|
||||
|
||||
def _get_here(fname):
|
||||
return os.path.join(os.path.dirname(__file__), fname)
|
||||
|
||||
|
||||
class _SetupUpdate(setuptools.Command):
|
||||
# This is a compatibility, some downstream distributions might
|
||||
# still call "setup.py update".
|
||||
#
|
||||
# New entry point is tox, 'tox -eupdate'.
|
||||
description = "Fetch and update unicode code tables"
|
||||
user_options = []
|
||||
|
||||
def initialize_options(self):
|
||||
pass
|
||||
|
||||
def finalize_options(self):
|
||||
pass
|
||||
|
||||
def run(self):
|
||||
import sys
|
||||
import subprocess
|
||||
retcode = subprocess.Popen([
|
||||
sys.executable,
|
||||
_get_here(os.path.join('bin', 'update-tables.py'))]).wait()
|
||||
assert retcode == 0, ('non-zero exit code', retcode)
|
||||
|
||||
|
||||
def main():
|
||||
"""Setup.py entry point."""
|
||||
setuptools.setup(
|
||||
name='wcwidth',
|
||||
# NOTE: manually manage __version__ in wcwidth/__init__.py !
|
||||
version='0.2.5',
|
||||
description=(
|
||||
"Measures the displayed width of unicode strings in a terminal"),
|
||||
long_description=codecs.open(
|
||||
_get_here('README.rst'), 'rb', 'utf8').read(),
|
||||
author='Jeff Quast',
|
||||
author_email='contact@jeffquast.com',
|
||||
install_requires=('backports.functools-lru-cache>=1.2.1;'
|
||||
'python_version < "3.2"'),
|
||||
license='MIT',
|
||||
packages=['wcwidth'],
|
||||
url='https://github.com/jquast/wcwidth',
|
||||
package_data={
|
||||
'wcwidth': ['*.json'],
|
||||
'': ['LICENSE', '*.rst'],
|
||||
},
|
||||
zip_safe=True,
|
||||
classifiers=[
|
||||
'Intended Audience :: Developers',
|
||||
'Natural Language :: English',
|
||||
'Development Status :: 5 - Production/Stable',
|
||||
'Environment :: Console',
|
||||
'License :: OSI Approved :: MIT License',
|
||||
'Operating System :: POSIX',
|
||||
'Programming Language :: Python :: 2.7',
|
||||
'Programming Language :: Python :: 3.5',
|
||||
'Programming Language :: Python :: 3.6',
|
||||
'Programming Language :: Python :: 3.7',
|
||||
'Programming Language :: Python :: 3.8',
|
||||
'Topic :: Software Development :: Libraries',
|
||||
'Topic :: Software Development :: Localization',
|
||||
'Topic :: Software Development :: Internationalization',
|
||||
'Topic :: Terminals'
|
||||
],
|
||||
keywords=[
|
||||
'cjk',
|
||||
'combining',
|
||||
'console',
|
||||
'eastasian',
|
||||
'emoji'
|
||||
'emulator',
|
||||
'terminal',
|
||||
'unicode',
|
||||
'wcswidth',
|
||||
'wcwidth',
|
||||
'xterm',
|
||||
],
|
||||
cmdclass={'update': _SetupUpdate},
|
||||
)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
|
@ -0,0 +1 @@
|
|||
"""This file intentionally left blank."""
|
|
@ -0,0 +1,154 @@
|
|||
# coding: utf-8
|
||||
"""Core tests for wcwidth module."""
|
||||
# 3rd party
|
||||
import pkg_resources
|
||||
|
||||
# local
|
||||
import wcwidth
|
||||
|
||||
|
||||
def test_package_version():
|
||||
"""wcwidth.__version__ is expected value."""
|
||||
# given,
|
||||
expected = pkg_resources.get_distribution('wcwidth').version
|
||||
|
||||
# exercise,
|
||||
result = wcwidth.__version__
|
||||
|
||||
# verify.
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_hello_jp():
|
||||
u"""
|
||||
Width of Japanese phrase: コンニチハ, セカイ!
|
||||
|
||||
Given a phrase of 5 and 3 Katakana ideographs, joined with
|
||||
3 English-ASCII punctuation characters, totaling 11, this
|
||||
phrase consumes 19 cells of a terminal emulator.
|
||||
"""
|
||||
# given,
|
||||
phrase = u'コンニチハ, セカイ!'
|
||||
expect_length_each = (2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1)
|
||||
expect_length_phrase = sum(expect_length_each)
|
||||
|
||||
# exercise,
|
||||
length_each = tuple(map(wcwidth.wcwidth, phrase))
|
||||
length_phrase = wcwidth.wcswidth(phrase)
|
||||
|
||||
# verify.
|
||||
assert length_each == expect_length_each
|
||||
assert length_phrase == expect_length_phrase
|
||||
|
||||
|
||||
def test_wcswidth_substr():
|
||||
"""
|
||||
Test wcswidth() optional 2nd parameter, ``n``.
|
||||
|
||||
``n`` determines at which position of the string
|
||||
to stop counting length.
|
||||
"""
|
||||
# given,
|
||||
phrase = u'コンニチハ, セカイ!'
|
||||
end = 7
|
||||
expect_length_each = (2, 2, 2, 2, 2, 1, 1,)
|
||||
expect_length_phrase = sum(expect_length_each)
|
||||
|
||||
# exercise,
|
||||
length_phrase = wcwidth.wcswidth(phrase, end)
|
||||
|
||||
# verify.
|
||||
assert length_phrase == expect_length_phrase
|
||||
|
||||
|
||||
def test_null_width_0():
|
||||
"""NULL (0) reports width 0."""
|
||||
# given,
|
||||
phrase = u'abc\x00def'
|
||||
expect_length_each = (1, 1, 1, 0, 1, 1, 1)
|
||||
expect_length_phrase = sum(expect_length_each)
|
||||
|
||||
# exercise,
|
||||
length_each = tuple(map(wcwidth.wcwidth, phrase))
|
||||
length_phrase = wcwidth.wcswidth(phrase, len(phrase))
|
||||
|
||||
# verify.
|
||||
assert length_each == expect_length_each
|
||||
assert length_phrase == expect_length_phrase
|
||||
|
||||
|
||||
def test_control_c0_width_negative_1():
|
||||
"""CSI (Control sequence initiate) reports width -1 for ESC."""
|
||||
# given,
|
||||
phrase = u'\x1b[0m'
|
||||
expect_length_each = (-1, 1, 1, 1)
|
||||
expect_length_phrase = -1
|
||||
|
||||
# exercise,
|
||||
length_each = tuple(map(wcwidth.wcwidth, phrase))
|
||||
length_phrase = wcwidth.wcswidth(phrase, len(phrase))
|
||||
|
||||
# verify.
|
||||
assert length_each == expect_length_each
|
||||
assert length_phrase == expect_length_phrase
|
||||
|
||||
|
||||
def test_combining_width():
|
||||
"""Simple test combining reports total width of 4."""
|
||||
# given,
|
||||
phrase = u'--\u05bf--'
|
||||
expect_length_each = (1, 1, 0, 1, 1)
|
||||
expect_length_phrase = 4
|
||||
|
||||
# exercise,
|
||||
length_each = tuple(map(wcwidth.wcwidth, phrase))
|
||||
length_phrase = wcwidth.wcswidth(phrase, len(phrase))
|
||||
|
||||
# verify.
|
||||
assert length_each == expect_length_each
|
||||
assert length_phrase == expect_length_phrase
|
||||
|
||||
|
||||
def test_combining_cafe():
|
||||
u"""Phrase cafe + COMBINING ACUTE ACCENT is café of length 4."""
|
||||
phrase = u"cafe\u0301"
|
||||
expect_length_each = (1, 1, 1, 1, 0)
|
||||
expect_length_phrase = 4
|
||||
|
||||
# exercise,
|
||||
length_each = tuple(map(wcwidth.wcwidth, phrase))
|
||||
length_phrase = wcwidth.wcswidth(phrase, len(phrase))
|
||||
|
||||
# verify.
|
||||
assert length_each == expect_length_each
|
||||
assert length_phrase == expect_length_phrase
|
||||
|
||||
|
||||
def test_combining_enclosing():
|
||||
u"""CYRILLIC CAPITAL LETTER A + COMBINING CYRILLIC HUNDRED THOUSANDS SIGN is А҈ of length 1."""
|
||||
phrase = u"\u0410\u0488"
|
||||
expect_length_each = (1, 0)
|
||||
expect_length_phrase = 1
|
||||
|
||||
# exercise,
|
||||
length_each = tuple(map(wcwidth.wcwidth, phrase))
|
||||
length_phrase = wcwidth.wcswidth(phrase, len(phrase))
|
||||
|
||||
# verify.
|
||||
assert length_each == expect_length_each
|
||||
assert length_phrase == expect_length_phrase
|
||||
|
||||
|
||||
def test_combining_spacing():
|
||||
u"""Balinese kapal (ship) is ᬓᬨᬮ᭄ of length 4."""
|
||||
phrase = u"\u1B13\u1B28\u1B2E\u1B44"
|
||||
expect_length_each = (1, 1, 1, 1)
|
||||
expect_length_phrase = 4
|
||||
|
||||
# exercise,
|
||||
length_each = tuple(map(wcwidth.wcwidth, phrase))
|
||||
length_phrase = wcwidth.wcswidth(phrase, len(phrase))
|
||||
|
||||
# verify.
|
||||
assert length_each == expect_length_each
|
||||
assert length_phrase == expect_length_phrase
|
|
@ -0,0 +1,184 @@
|
|||
# coding: utf-8
|
||||
"""Unicode version level tests for wcwidth."""
|
||||
# std imports
|
||||
import json
|
||||
import warnings
|
||||
|
||||
# 3rd party
|
||||
import pytest
|
||||
import pkg_resources
|
||||
|
||||
# local
|
||||
import wcwidth
|
||||
|
||||
|
||||
def test_latest():
|
||||
"""wcwidth._wcmatch_version('latest') returns tail item."""
|
||||
# given,
|
||||
expected = wcwidth.list_versions()[-1]
|
||||
|
||||
# exercise,
|
||||
result = wcwidth._wcmatch_version('latest')
|
||||
|
||||
# verify.
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_exact_410_str():
|
||||
"""wcwidth._wcmatch_version('4.1.0') returns equal value (str)."""
|
||||
# given,
|
||||
given = expected = '4.1.0'
|
||||
|
||||
# exercise,
|
||||
result = wcwidth._wcmatch_version(given)
|
||||
|
||||
# verify.
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_exact_410_unicode():
|
||||
"""wcwidth._wcmatch_version(u'4.1.0') returns equal value (unicode)."""
|
||||
# given,
|
||||
given = expected = u'4.1.0'
|
||||
|
||||
# exercise,
|
||||
result = wcwidth._wcmatch_version(given)
|
||||
|
||||
# verify.
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_nearest_505_str():
|
||||
"""wcwidth._wcmatch_version('5.0.5') returns nearest '5.0.0'. (str)"""
|
||||
# given
|
||||
given, expected = '5.0.5', '5.0.0'
|
||||
|
||||
# exercise
|
||||
result = wcwidth._wcmatch_version(given)
|
||||
|
||||
# verify.
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_nearest_505_unicode():
|
||||
"""wcwidth._wcmatch_version(u'5.0.5') returns nearest u'5.0.0'. (unicode)"""
|
||||
# given
|
||||
given, expected = u'5.0.5', u'5.0.0'
|
||||
|
||||
# exercise
|
||||
result = wcwidth._wcmatch_version(given)
|
||||
|
||||
# verify.
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_nearest_lowint40_str():
|
||||
"""wcwidth._wcmatch_version('4.0') returns nearest '4.1.0'."""
|
||||
# given
|
||||
given, expected = '4.0', '4.1.0'
|
||||
warnings.resetwarnings()
|
||||
wcwidth._wcmatch_version.cache_clear()
|
||||
|
||||
# exercise
|
||||
with pytest.warns(UserWarning):
|
||||
# warns that given version is lower than any available
|
||||
result = wcwidth._wcmatch_version(given)
|
||||
|
||||
# verify.
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_nearest_lowint40_unicode():
|
||||
"""wcwidth._wcmatch_version(u'4.0') returns nearest u'4.1.0'."""
|
||||
# given
|
||||
given, expected = u'4.0', u'4.1.0'
|
||||
warnings.resetwarnings()
|
||||
wcwidth._wcmatch_version.cache_clear()
|
||||
|
||||
# exercise
|
||||
with pytest.warns(UserWarning):
|
||||
# warns that given version is lower than any available
|
||||
result = wcwidth._wcmatch_version(given)
|
||||
|
||||
# verify.
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_nearest_800_str():
|
||||
"""wcwidth._wcmatch_version('8') returns nearest '8.0.0'."""
|
||||
# given
|
||||
given, expected = '8', '8.0.0'
|
||||
|
||||
# exercise
|
||||
result = wcwidth._wcmatch_version(given)
|
||||
|
||||
# verify.
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_nearest_800_unicode():
|
||||
"""wcwidth._wcmatch_version(u'8') returns nearest u'8.0.0'."""
|
||||
# given
|
||||
given, expected = u'8', u'8.0.0'
|
||||
|
||||
# exercise
|
||||
result = wcwidth._wcmatch_version(given)
|
||||
|
||||
# verify.
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_nearest_999_str():
|
||||
"""wcwidth._wcmatch_version('999.0') returns nearest (latest)."""
|
||||
# given
|
||||
given, expected = '999.0', wcwidth.list_versions()[-1]
|
||||
|
||||
# exercise
|
||||
result = wcwidth._wcmatch_version(given)
|
||||
|
||||
# verify.
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_nearest_999_unicode():
|
||||
"""wcwidth._wcmatch_version(u'999.0') returns nearest (latest)."""
|
||||
# given
|
||||
given, expected = u'999.0', wcwidth.list_versions()[-1]
|
||||
|
||||
# exercise
|
||||
result = wcwidth._wcmatch_version(given)
|
||||
|
||||
# verify.
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_nonint_unicode():
|
||||
"""wcwidth._wcmatch_version(u'x.y.z') returns latest (unicode)."""
|
||||
# given
|
||||
given, expected = u'x.y.z', wcwidth.list_versions()[-1]
|
||||
warnings.resetwarnings()
|
||||
wcwidth._wcmatch_version.cache_clear()
|
||||
|
||||
# exercise
|
||||
with pytest.warns(UserWarning):
|
||||
# warns that given version is not valid
|
||||
result = wcwidth._wcmatch_version(given)
|
||||
|
||||
# verify.
|
||||
assert result == expected
|
||||
|
||||
|
||||
def test_nonint_str():
|
||||
"""wcwidth._wcmatch_version(u'x.y.z') returns latest (str)."""
|
||||
# given
|
||||
given, expected = 'x.y.z', wcwidth.list_versions()[-1]
|
||||
warnings.resetwarnings()
|
||||
wcwidth._wcmatch_version.cache_clear()
|
||||
|
||||
# exercise
|
||||
with pytest.warns(UserWarning):
|
||||
# warns that given version is not valid
|
||||
result = wcwidth._wcmatch_version(given)
|
||||
|
||||
# verify.
|
||||
assert result == expected
|
|
@ -0,0 +1,306 @@
|
|||
Metadata-Version: 1.1
|
||||
Name: wcwidth
|
||||
Version: 0.2.5
|
||||
Summary: Measures the displayed width of unicode strings in a terminal
|
||||
Home-page: https://github.com/jquast/wcwidth
|
||||
Author: Jeff Quast
|
||||
Author-email: contact@jeffquast.com
|
||||
License: MIT
|
||||
Description: |pypi_downloads| |codecov| |license|
|
||||
|
||||
============
|
||||
Introduction
|
||||
============
|
||||
|
||||
This library is mainly for CLI programs that carefully produce output for
|
||||
Terminals, or make pretend to be an emulator.
|
||||
|
||||
**Problem Statement**: The printable length of *most* strings are equal to the
|
||||
number of cells they occupy on the screen ``1 charater : 1 cell``. However,
|
||||
there are categories of characters that *occupy 2 cells* (full-wide), and
|
||||
others that *occupy 0* cells (zero-width).
|
||||
|
||||
**Solution**: POSIX.1-2001 and POSIX.1-2008 conforming systems provide
|
||||
`wcwidth(3)`_ and `wcswidth(3)`_ C functions of which this python module's
|
||||
functions precisely copy. *These functions return the number of cells a
|
||||
unicode string is expected to occupy.*
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
The stable version of this package is maintained on pypi, install using pip::
|
||||
|
||||
pip install wcwidth
|
||||
|
||||
Example
|
||||
-------
|
||||
|
||||
**Problem**: given the following phrase (Japanese),
|
||||
|
||||
>>> text = u'コンニチハ'
|
||||
|
||||
Python **incorrectly** uses the *string length* of 5 codepoints rather than the
|
||||
*printible length* of 10 cells, so that when using the `rjust` function, the
|
||||
output length is wrong::
|
||||
|
||||
>>> print(len('コンニチハ'))
|
||||
5
|
||||
|
||||
>>> print('コンニチハ'.rjust(20, '_'))
|
||||
_____コンニチハ
|
||||
|
||||
By defining our own "rjust" function that uses wcwidth, we can correct this::
|
||||
|
||||
>>> def wc_rjust(text, length, padding=' '):
|
||||
... from wcwidth import wcswidth
|
||||
... return padding * max(0, (length - wcswidth(text))) + text
|
||||
...
|
||||
|
||||
Our **Solution** uses wcswidth to determine the string length correctly::
|
||||
|
||||
>>> from wcwidth import wcswidth
|
||||
>>> print(wcswidth('コンニチハ'))
|
||||
10
|
||||
|
||||
>>> print(wc_rjust('コンニチハ', 20, '_'))
|
||||
__________コンニチハ
|
||||
|
||||
|
||||
Choosing a Version
|
||||
------------------
|
||||
|
||||
Export an environment variable, ``UNICODE_VERSION``. This should be done by
|
||||
*terminal emulators* or those developers experimenting with authoring one of
|
||||
their own, from shell::
|
||||
|
||||
$ export UNICODE_VERSION=13.0
|
||||
|
||||
If unspecified, the latest version is used. If your Terminal Emulator does not
|
||||
export this variable, you can use the `jquast/ucs-detect`_ utility to
|
||||
automatically detect and export it to your shell.
|
||||
|
||||
wcwidth, wcswidth
|
||||
-----------------
|
||||
Use function ``wcwidth()`` to determine the length of a *single unicode
|
||||
character*, and ``wcswidth()`` to determine the length of many, a *string
|
||||
of unicode characters*.
|
||||
|
||||
Briefly, return values of function ``wcwidth()`` are:
|
||||
|
||||
``-1``
|
||||
Indeterminate (not printable).
|
||||
|
||||
``0``
|
||||
Does not advance the cursor, such as NULL or Combining.
|
||||
|
||||
``2``
|
||||
Characters of category East Asian Wide (W) or East Asian
|
||||
Full-width (F) which are displayed using two terminal cells.
|
||||
|
||||
``1``
|
||||
All others.
|
||||
|
||||
Function ``wcswidth()`` simply returns the sum of all values for each character
|
||||
along a string, or ``-1`` when it occurs anywhere along a string.
|
||||
|
||||
Full API Documentation at http://wcwidth.readthedocs.org
|
||||
|
||||
==========
|
||||
Developing
|
||||
==========
|
||||
|
||||
Install wcwidth in editable mode::
|
||||
|
||||
pip install -e.
|
||||
|
||||
Execute unit tests using tox_::
|
||||
|
||||
tox
|
||||
|
||||
Regenerate python code tables from latest Unicode Specification data files::
|
||||
|
||||
tox -eupdate
|
||||
|
||||
Supplementary tools for browsing and testing terminals for wide unicode
|
||||
characters are found in the `bin/`_ of this project's source code. Just ensure
|
||||
to first ``pip install -erequirements-develop.txt`` from this projects main
|
||||
folder. For example, an interactive browser for testing::
|
||||
|
||||
./bin/wcwidth-browser.py
|
||||
|
||||
Uses
|
||||
----
|
||||
|
||||
This library is used in:
|
||||
|
||||
- `jquast/blessed`_: a thin, practical wrapper around terminal capabilities in
|
||||
Python.
|
||||
|
||||
- `jonathanslenders/python-prompt-toolkit`_: a Library for building powerful
|
||||
interactive command lines in Python.
|
||||
|
||||
- `dbcli/pgcli`_: Postgres CLI with autocompletion and syntax highlighting.
|
||||
|
||||
- `thomasballinger/curtsies`_: a Curses-like terminal wrapper with a display
|
||||
based on compositing 2d arrays of text.
|
||||
|
||||
- `selectel/pyte`_: Simple VTXXX-compatible linux terminal emulator.
|
||||
|
||||
- `astanin/python-tabulate`_: Pretty-print tabular data in Python, a library
|
||||
and a command-line utility.
|
||||
|
||||
- `LuminosoInsight/python-ftfy`_: Fixes mojibake and other glitches in Unicode
|
||||
text.
|
||||
|
||||
- `nbedos/termtosvg`_: Terminal recorder that renders sessions as SVG
|
||||
animations.
|
||||
|
||||
- `peterbrittain/asciimatics`_: Package to help people create full-screen text
|
||||
UIs.
|
||||
|
||||
Other Languages
|
||||
---------------
|
||||
|
||||
- `timoxley/wcwidth`_: JavaScript
|
||||
- `janlelis/unicode-display_width`_: Ruby
|
||||
- `alecrabbit/php-wcwidth`_: PHP
|
||||
- `Text::CharWidth`_: Perl
|
||||
- `bluebear94/Terminal-WCWidth`: Perl 6
|
||||
- `mattn/go-runewidth`_: Go
|
||||
- `emugel/wcwidth`_: Haxe
|
||||
- `aperezdc/lua-wcwidth`: Lua
|
||||
- `joachimschmidt557/zig-wcwidth`: Zig
|
||||
- `fumiyas/wcwidth-cjk`: `LD_PRELOAD` override
|
||||
- `joshuarubin/wcwidth9`: Unicode version 9 in C
|
||||
|
||||
History
|
||||
-------
|
||||
|
||||
0.2.0 *2020-06-01*
|
||||
* **Enhancement**: Unicode version may be selected by exporting the
|
||||
Environment variable ``UNICODE_VERSION``, such as ``13.0``, or ``6.3.0``.
|
||||
See the `jquast/ucs-detect`_ CLI utility for automatic detection.
|
||||
* **Enhancement**:
|
||||
API Documentation is published to readthedocs.org.
|
||||
* **Updated** tables for *all* Unicode Specifications with files
|
||||
published in a programmatically consumable format, versions 4.1.0
|
||||
through 13.0
|
||||
that are published
|
||||
, versions
|
||||
|
||||
0.1.9 *2020-03-22*
|
||||
* **Performance** optimization by `Avram Lubkin`_, `PR #35`_.
|
||||
* **Updated** tables to Unicode Specification 13.0.0.
|
||||
|
||||
0.1.8 *2020-01-01*
|
||||
* **Updated** tables to Unicode Specification 12.0.0. (`PR #30`_).
|
||||
|
||||
0.1.7 *2016-07-01*
|
||||
* **Updated** tables to Unicode Specification 9.0.0. (`PR #18`_).
|
||||
|
||||
0.1.6 *2016-01-08 Production/Stable*
|
||||
* ``LICENSE`` file now included with distribution.
|
||||
|
||||
0.1.5 *2015-09-13 Alpha*
|
||||
* **Bugfix**:
|
||||
Resolution of "combining_ character width" issue, most especially
|
||||
those that previously returned -1 now often (correctly) return 0.
|
||||
resolved by `Philip Craig`_ via `PR #11`_.
|
||||
* **Deprecated**:
|
||||
The module path ``wcwidth.table_comb`` is no longer available,
|
||||
it has been superseded by module path ``wcwidth.table_zero``.
|
||||
|
||||
0.1.4 *2014-11-20 Pre-Alpha*
|
||||
* **Feature**: ``wcswidth()`` now determines printable length
|
||||
for (most) combining_ characters. The developer's tool
|
||||
`bin/wcwidth-browser.py`_ is improved to display combining_
|
||||
characters when provided the ``--combining`` option
|
||||
(`Thomas Ballinger`_ and `Leta Montopoli`_ `PR #5`_).
|
||||
* **Feature**: added static analysis (prospector_) to testing
|
||||
framework.
|
||||
|
||||
0.1.3 *2014-10-29 Pre-Alpha*
|
||||
* **Bugfix**: 2nd parameter of wcswidth was not honored.
|
||||
(`Thomas Ballinger`_, `PR #4`_).
|
||||
|
||||
0.1.2 *2014-10-28 Pre-Alpha*
|
||||
* **Updated** tables to Unicode Specification 7.0.0.
|
||||
(`Thomas Ballinger`_, `PR #3`_).
|
||||
|
||||
0.1.1 *2014-05-14 Pre-Alpha*
|
||||
* Initial release to pypi, Based on Unicode Specification 6.3.0
|
||||
|
||||
This code was originally derived directly from C code of the same name,
|
||||
whose latest version is available at
|
||||
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
|
||||
|
||||
* Markus Kuhn -- 2007-05-26 (Unicode 5.0)
|
||||
*
|
||||
* Permission to use, copy, modify, and distribute this software
|
||||
* for any purpose and without fee is hereby granted. The author
|
||||
* disclaims all warranties with regard to this software.
|
||||
|
||||
.. _`tox`: https://testrun.org/tox/latest/install.html
|
||||
.. _`prospector`: https://github.com/landscapeio/prospector
|
||||
.. _`combining`: https://en.wikipedia.org/wiki/Combining_character
|
||||
.. _`bin/`: https://github.com/jquast/wcwidth/tree/master/bin
|
||||
.. _`bin/wcwidth-browser.py`: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py
|
||||
.. _`Thomas Ballinger`: https://github.com/thomasballinger
|
||||
.. _`Leta Montopoli`: https://github.com/lmontopo
|
||||
.. _`Philip Craig`: https://github.com/philipc
|
||||
.. _`PR #3`: https://github.com/jquast/wcwidth/pull/3
|
||||
.. _`PR #4`: https://github.com/jquast/wcwidth/pull/4
|
||||
.. _`PR #5`: https://github.com/jquast/wcwidth/pull/5
|
||||
.. _`PR #11`: https://github.com/jquast/wcwidth/pull/11
|
||||
.. _`PR #18`: https://github.com/jquast/wcwidth/pull/18
|
||||
.. _`PR #30`: https://github.com/jquast/wcwidth/pull/30
|
||||
.. _`PR #35`: https://github.com/jquast/wcwidth/pull/35
|
||||
.. _`jquast/blessed`: https://github.com/jquast/blessed
|
||||
.. _`selectel/pyte`: https://github.com/selectel/pyte
|
||||
.. _`thomasballinger/curtsies`: https://github.com/thomasballinger/curtsies
|
||||
.. _`dbcli/pgcli`: https://github.com/dbcli/pgcli
|
||||
.. _`jonathanslenders/python-prompt-toolkit`: https://github.com/jonathanslenders/python-prompt-toolkit
|
||||
.. _`timoxley/wcwidth`: https://github.com/timoxley/wcwidth
|
||||
.. _`wcwidth(3)`: http://man7.org/linux/man-pages/man3/wcwidth.3.html
|
||||
.. _`wcswidth(3)`: http://man7.org/linux/man-pages/man3/wcswidth.3.html
|
||||
.. _`astanin/python-tabulate`: https://github.com/astanin/python-tabulate
|
||||
.. _`janlelis/unicode-display_width`: https://github.com/janlelis/unicode-display_width
|
||||
.. _`LuminosoInsight/python-ftfy`: https://github.com/LuminosoInsight/python-ftfy
|
||||
.. _`alecrabbit/php-wcwidth`: https://github.com/alecrabbit/php-wcwidth
|
||||
.. _`Text::CharWidth`: https://metacpan.org/pod/Text::CharWidth
|
||||
.. _`bluebear94/Terminal-WCWidth`: https://github.com/bluebear94/Terminal-WCWidth
|
||||
.. _`mattn/go-runewidth`: https://github.com/mattn/go-runewidth
|
||||
.. _`emugel/wcwidth`: https://github.com/emugel/wcwidth
|
||||
.. _`jquast/ucs-detect`: https://github.com/jquast/ucs-detect
|
||||
.. _`Avram Lubkin`: https://github.com/avylove
|
||||
.. _`nbedos/termtosvg`: https://github.com/nbedos/termtosvg
|
||||
.. _`peterbrittain/asciimatics`: https://github.com/peterbrittain/asciimatics
|
||||
.. _`aperezdc/lua-wcwidth`: https://github.com/aperezdc/lua-wcwidth
|
||||
.. _`fumiyas/wcwidth-cjk`: https://github.com/fumiyas/wcwidth-cjk
|
||||
.. |pypi_downloads| image:: https://img.shields.io/pypi/dm/wcwidth.svg?logo=pypi
|
||||
:alt: Downloads
|
||||
:target: https://pypi.org/project/wcwidth/
|
||||
.. |codecov| image:: https://codecov.io/gh/jquast/wcwidth/branch/master/graph/badge.svg
|
||||
:alt: codecov.io Code Coverage
|
||||
:target: https://codecov.io/gh/jquast/wcwidth/
|
||||
.. |license| image:: https://img.shields.io/github/license/jquast/wcwidth.svg
|
||||
:target: https://pypi.python.org/pypi/wcwidth/
|
||||
:alt: MIT License
|
||||
|
||||
Keywords: cjk,combining,console,eastasian,emojiemulator,terminal,unicode,wcswidth,wcwidth,xterm
|
||||
Platform: UNKNOWN
|
||||
Classifier: Intended Audience :: Developers
|
||||
Classifier: Natural Language :: English
|
||||
Classifier: Development Status :: 5 - Production/Stable
|
||||
Classifier: Environment :: Console
|
||||
Classifier: License :: OSI Approved :: MIT License
|
||||
Classifier: Operating System :: POSIX
|
||||
Classifier: Programming Language :: Python :: 2.7
|
||||
Classifier: Programming Language :: Python :: 3.5
|
||||
Classifier: Programming Language :: Python :: 3.6
|
||||
Classifier: Programming Language :: Python :: 3.7
|
||||
Classifier: Programming Language :: Python :: 3.8
|
||||
Classifier: Topic :: Software Development :: Libraries
|
||||
Classifier: Topic :: Software Development :: Localization
|
||||
Classifier: Topic :: Software Development :: Internationalization
|
||||
Classifier: Topic :: Terminals
|
|
@ -0,0 +1,19 @@
|
|||
LICENSE
|
||||
MANIFEST.in
|
||||
README.rst
|
||||
setup.cfg
|
||||
setup.py
|
||||
tests/__init__.py
|
||||
tests/test_core.py
|
||||
tests/test_ucslevel.py
|
||||
wcwidth/__init__.py
|
||||
wcwidth/table_wide.py
|
||||
wcwidth/table_zero.py
|
||||
wcwidth/unicode_versions.py
|
||||
wcwidth/wcwidth.py
|
||||
wcwidth.egg-info/PKG-INFO
|
||||
wcwidth.egg-info/SOURCES.txt
|
||||
wcwidth.egg-info/dependency_links.txt
|
||||
wcwidth.egg-info/requires.txt
|
||||
wcwidth.egg-info/top_level.txt
|
||||
wcwidth.egg-info/zip-safe
|
|
@ -0,0 +1 @@
|
|||
|
|
@ -0,0 +1,3 @@
|
|||
|
||||
[:python_version < "3.2"]
|
||||
backports.functools-lru-cache>=1.2.1
|
|
@ -0,0 +1 @@
|
|||
wcwidth
|
|
@ -0,0 +1 @@
|
|||
|
|
@ -0,0 +1,37 @@
|
|||
"""
|
||||
wcwidth module.
|
||||
|
||||
https://github.com/jquast/wcwidth
|
||||
"""
|
||||
# re-export all functions & definitions, even private ones, from top-level
|
||||
# module path, to allow for 'from wcwidth import _private_func'. Of course,
|
||||
# user beware that any _private function may disappear or change signature at
|
||||
# any future version.
|
||||
|
||||
# local
|
||||
from .wcwidth import ZERO_WIDTH # noqa
|
||||
from .wcwidth import (WIDE_EASTASIAN,
|
||||
wcwidth,
|
||||
wcswidth,
|
||||
_bisearch,
|
||||
list_versions,
|
||||
_wcmatch_version,
|
||||
_wcversion_value)
|
||||
|
||||
# The __all__ attribute defines the items exported from statement,
|
||||
# 'from wcwidth import *', but also to say, "This is the public API".
|
||||
__all__ = ('wcwidth', 'wcswidth', 'list_versions')
|
||||
|
||||
# I used to use a _get_package_version() function to use the `pkg_resources'
|
||||
# module to parse the package version from our version.json file, but this blew
|
||||
# some folks up, or more particularly, just the `xonsh' shell.
|
||||
#
|
||||
# Yikes! I always wanted to like xonsh and tried it many times but issues like
|
||||
# these always bit me, too, so I can sympathize -- this version is now manually
|
||||
# kept in sync with version.json to help them out. Shucks, this variable is just
|
||||
# for legacy, from the days before 'pip freeze' was a thing.
|
||||
#
|
||||
# We also used pkg_resources to load unicode version tables from version.json,
|
||||
# generated by bin/update-tables.py, but some environments are unable to
|
||||
# import pkg_resources for one reason or another, yikes!
|
||||
__version__ = '0.2.5'
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,35 @@
|
|||
"""
|
||||
Exports function list_versions() for unicode version level support.
|
||||
|
||||
This code generated by bin/update-tables.py on 2020-06-23 16:03:21.350604.
|
||||
"""
|
||||
|
||||
|
||||
def list_versions():
|
||||
"""
|
||||
Return Unicode version levels supported by this module release.
|
||||
|
||||
Any of the version strings returned may be used as keyword argument
|
||||
``unicode_version`` to the ``wcwidth()`` family of functions.
|
||||
|
||||
:returns: Supported Unicode version numbers in ascending sorted order.
|
||||
:rtype: list[str]
|
||||
"""
|
||||
return (
|
||||
"4.1.0",
|
||||
"5.0.0",
|
||||
"5.1.0",
|
||||
"5.2.0",
|
||||
"6.0.0",
|
||||
"6.1.0",
|
||||
"6.2.0",
|
||||
"6.3.0",
|
||||
"7.0.0",
|
||||
"8.0.0",
|
||||
"9.0.0",
|
||||
"10.0.0",
|
||||
"11.0.0",
|
||||
"12.0.0",
|
||||
"12.1.0",
|
||||
"13.0.0",
|
||||
)
|
|
@ -0,0 +1,375 @@
|
|||
"""
|
||||
This is a python implementation of wcwidth() and wcswidth().
|
||||
|
||||
https://github.com/jquast/wcwidth
|
||||
|
||||
from Markus Kuhn's C code, retrieved from:
|
||||
|
||||
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
|
||||
|
||||
This is an implementation of wcwidth() and wcswidth() (defined in
|
||||
IEEE Std 1002.1-2001) for Unicode.
|
||||
|
||||
http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html
|
||||
http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
|
||||
|
||||
In fixed-width output devices, Latin characters all occupy a single
|
||||
"cell" position of equal width, whereas ideographic CJK characters
|
||||
occupy two such cells. Interoperability between terminal-line
|
||||
applications and (teletype-style) character terminals using the
|
||||
UTF-8 encoding requires agreement on which character should advance
|
||||
the cursor by how many cell positions. No established formal
|
||||
standards exist at present on which Unicode character shall occupy
|
||||
how many cell positions on character terminals. These routines are
|
||||
a first attempt of defining such behavior based on simple rules
|
||||
applied to data provided by the Unicode Consortium.
|
||||
|
||||
For some graphical characters, the Unicode standard explicitly
|
||||
defines a character-cell width via the definition of the East Asian
|
||||
FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
|
||||
In all these cases, there is no ambiguity about which width a
|
||||
terminal shall use. For characters in the East Asian Ambiguous (A)
|
||||
class, the width choice depends purely on a preference of backward
|
||||
compatibility with either historic CJK or Western practice.
|
||||
Choosing single-width for these characters is easy to justify as
|
||||
the appropriate long-term solution, as the CJK practice of
|
||||
displaying these characters as double-width comes from historic
|
||||
implementation simplicity (8-bit encoded characters were displayed
|
||||
single-width and 16-bit ones double-width, even for Greek,
|
||||
Cyrillic, etc.) and not any typographic considerations.
|
||||
|
||||
Much less clear is the choice of width for the Not East Asian
|
||||
(Neutral) class. Existing practice does not dictate a width for any
|
||||
of these characters. It would nevertheless make sense
|
||||
typographically to allocate two character cells to characters such
|
||||
as for instance EM SPACE or VOLUME INTEGRAL, which cannot be
|
||||
represented adequately with a single-width glyph. The following
|
||||
routines at present merely assign a single-cell width to all
|
||||
neutral characters, in the interest of simplicity. This is not
|
||||
entirely satisfactory and should be reconsidered before
|
||||
establishing a formal standard in this area. At the moment, the
|
||||
decision which Not East Asian (Neutral) characters should be
|
||||
represented by double-width glyphs cannot yet be answered by
|
||||
applying a simple rule from the Unicode database content. Setting
|
||||
up a proper standard for the behavior of UTF-8 character terminals
|
||||
will require a careful analysis not only of each Unicode character,
|
||||
but also of each presentation form, something the author of these
|
||||
routines has avoided to do so far.
|
||||
|
||||
http://www.unicode.org/unicode/reports/tr11/
|
||||
|
||||
Latest version: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
|
||||
"""
|
||||
from __future__ import division
|
||||
|
||||
# std imports
|
||||
import os
|
||||
import sys
|
||||
import warnings
|
||||
|
||||
# local
|
||||
from .table_wide import WIDE_EASTASIAN
|
||||
from .table_zero import ZERO_WIDTH
|
||||
from .unicode_versions import list_versions
|
||||
|
||||
try:
|
||||
from functools import lru_cache
|
||||
except ImportError:
|
||||
# lru_cache was added in Python 3.2
|
||||
from backports.functools_lru_cache import lru_cache
|
||||
|
||||
# global cache
|
||||
_UNICODE_CMPTABLE = None
|
||||
_PY3 = (sys.version_info[0] >= 3)
|
||||
|
||||
|
||||
# NOTE: created by hand, there isn't anything identifiable other than
|
||||
# general Cf category code to identify these, and some characters in Cf
|
||||
# category code are of non-zero width.
|
||||
# Also includes some Cc, Mn, Zl, and Zp characters
|
||||
ZERO_WIDTH_CF = set([
|
||||
0, # Null (Cc)
|
||||
0x034F, # Combining grapheme joiner (Mn)
|
||||
0x200B, # Zero width space
|
||||
0x200C, # Zero width non-joiner
|
||||
0x200D, # Zero width joiner
|
||||
0x200E, # Left-to-right mark
|
||||
0x200F, # Right-to-left mark
|
||||
0x2028, # Line separator (Zl)
|
||||
0x2029, # Paragraph separator (Zp)
|
||||
0x202A, # Left-to-right embedding
|
||||
0x202B, # Right-to-left embedding
|
||||
0x202C, # Pop directional formatting
|
||||
0x202D, # Left-to-right override
|
||||
0x202E, # Right-to-left override
|
||||
0x2060, # Word joiner
|
||||
0x2061, # Function application
|
||||
0x2062, # Invisible times
|
||||
0x2063, # Invisible separator
|
||||
])
|
||||
|
||||
|
||||
def _bisearch(ucs, table):
|
||||
"""
|
||||
Auxiliary function for binary search in interval table.
|
||||
|
||||
:arg int ucs: Ordinal value of unicode character.
|
||||
:arg list table: List of starting and ending ranges of ordinal values,
|
||||
in form of ``[(start, end), ...]``.
|
||||
:rtype: int
|
||||
:returns: 1 if ordinal value ucs is found within lookup table, else 0.
|
||||
"""
|
||||
lbound = 0
|
||||
ubound = len(table) - 1
|
||||
|
||||
if ucs < table[0][0] or ucs > table[ubound][1]:
|
||||
return 0
|
||||
while ubound >= lbound:
|
||||
mid = (lbound + ubound) // 2
|
||||
if ucs > table[mid][1]:
|
||||
lbound = mid + 1
|
||||
elif ucs < table[mid][0]:
|
||||
ubound = mid - 1
|
||||
else:
|
||||
return 1
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
@lru_cache(maxsize=1000)
|
||||
def wcwidth(wc, unicode_version='auto'):
|
||||
r"""
|
||||
Given one Unicode character, return its printable length on a terminal.
|
||||
|
||||
:param str wc: A single Unicode character.
|
||||
:param str unicode_version: A Unicode version number, such as
|
||||
``'6.0.0'``, the list of available version levels may be
|
||||
listed by pairing function :func:`list_versions`.
|
||||
|
||||
Any version string may be specified without error -- the nearest
|
||||
matching version is selected. When ``latest`` (default), the
|
||||
highest Unicode version level is used.
|
||||
:return: The width, in cells, necessary to display the character of
|
||||
Unicode string character, ``wc``. Returns 0 if the ``wc`` argument has
|
||||
no printable effect on a terminal (such as NUL '\0'), -1 if ``wc`` is
|
||||
not printable, or has an indeterminate effect on the terminal, such as
|
||||
a control character. Otherwise, the number of column positions the
|
||||
character occupies on a graphic terminal (1 or 2) is returned.
|
||||
:rtype: int
|
||||
|
||||
The following have a column width of -1:
|
||||
|
||||
- C0 control characters (U+001 through U+01F).
|
||||
|
||||
- C1 control characters and DEL (U+07F through U+0A0).
|
||||
|
||||
The following have a column width of 0:
|
||||
|
||||
- Non-spacing and enclosing combining characters (general
|
||||
category code Mn or Me in the Unicode database).
|
||||
|
||||
- NULL (``U+0000``).
|
||||
|
||||
- COMBINING GRAPHEME JOINER (``U+034F``).
|
||||
|
||||
- ZERO WIDTH SPACE (``U+200B``) *through*
|
||||
RIGHT-TO-LEFT MARK (``U+200F``).
|
||||
|
||||
- LINE SEPARATOR (``U+2028``) *and*
|
||||
PARAGRAPH SEPARATOR (``U+2029``).
|
||||
|
||||
- LEFT-TO-RIGHT EMBEDDING (``U+202A``) *through*
|
||||
RIGHT-TO-LEFT OVERRIDE (``U+202E``).
|
||||
|
||||
- WORD JOINER (``U+2060``) *through*
|
||||
INVISIBLE SEPARATOR (``U+2063``).
|
||||
|
||||
The following have a column width of 1:
|
||||
|
||||
- SOFT HYPHEN (``U+00AD``).
|
||||
|
||||
- All remaining characters, including all printable ISO 8859-1
|
||||
and WGL4 characters, Unicode control characters, etc.
|
||||
|
||||
The following have a column width of 2:
|
||||
|
||||
- Spacing characters in the East Asian Wide (W) or East Asian
|
||||
Full-width (F) category as defined in Unicode Technical
|
||||
Report #11 have a column width of 2.
|
||||
|
||||
- Some kinds of Emoji or symbols.
|
||||
"""
|
||||
# NOTE: created by hand, there isn't anything identifiable other than
|
||||
# general Cf category code to identify these, and some characters in Cf
|
||||
# category code are of non-zero width.
|
||||
ucs = ord(wc)
|
||||
if ucs in ZERO_WIDTH_CF:
|
||||
return 0
|
||||
|
||||
# C0/C1 control characters
|
||||
if ucs < 32 or 0x07F <= ucs < 0x0A0:
|
||||
return -1
|
||||
|
||||
_unicode_version = _wcmatch_version(unicode_version)
|
||||
|
||||
# combining characters with zero width
|
||||
if _bisearch(ucs, ZERO_WIDTH[_unicode_version]):
|
||||
return 0
|
||||
|
||||
return 1 + _bisearch(ucs, WIDE_EASTASIAN[_unicode_version])
|
||||
|
||||
|
||||
def wcswidth(pwcs, n=None, unicode_version='auto'):
|
||||
"""
|
||||
Given a unicode string, return its printable length on a terminal.
|
||||
|
||||
:param str pwcs: Measure width of given unicode string.
|
||||
:param int n: When ``n`` is None (default), return the length of the
|
||||
entire string, otherwise width the first ``n`` characters specified.
|
||||
:param str unicode_version: An explicit definition of the unicode version
|
||||
level to use for determination, may be ``auto`` (default), which uses
|
||||
the Environment Variable, ``UNICODE_VERSION`` if defined, or the latest
|
||||
available unicode version, otherwise.
|
||||
:rtype: int
|
||||
:returns: The width, in cells, necessary to display the first ``n``
|
||||
characters of the unicode string ``pwcs``. Returns ``-1`` if
|
||||
a non-printable character is encountered.
|
||||
"""
|
||||
# pylint: disable=C0103
|
||||
# Invalid argument name "n"
|
||||
|
||||
end = len(pwcs) if n is None else n
|
||||
idx = slice(0, end)
|
||||
width = 0
|
||||
for char in pwcs[idx]:
|
||||
wcw = wcwidth(char, unicode_version)
|
||||
if wcw < 0:
|
||||
return -1
|
||||
width += wcw
|
||||
return width
|
||||
|
||||
|
||||
@lru_cache(maxsize=128)
|
||||
def _wcversion_value(ver_string):
|
||||
"""
|
||||
Integer-mapped value of given dotted version string.
|
||||
|
||||
:param str ver_string: Unicode version string, of form ``n.n.n``.
|
||||
:rtype: tuple(int)
|
||||
:returns: tuple of digit tuples, ``tuple(int, [...])``.
|
||||
"""
|
||||
retval = tuple(map(int, (ver_string.split('.'))))
|
||||
return retval
|
||||
|
||||
|
||||
@lru_cache(maxsize=8)
|
||||
def _wcmatch_version(given_version):
|
||||
"""
|
||||
Return nearest matching supported Unicode version level.
|
||||
|
||||
If an exact match is not determined, the nearest lowest version level is
|
||||
returned after a warning is emitted. For example, given supported levels
|
||||
``4.1.0`` and ``5.0.0``, and a version string of ``4.9.9``, then ``4.1.0``
|
||||
is selected and returned:
|
||||
|
||||
>>> _wcmatch_version('4.9.9')
|
||||
'4.1.0'
|
||||
>>> _wcmatch_version('8.0')
|
||||
'8.0.0'
|
||||
>>> _wcmatch_version('1')
|
||||
'4.1.0'
|
||||
|
||||
:param str given_version: given version for compare, may be ``auto``
|
||||
(default), to select Unicode Version from Environment Variable,
|
||||
``UNICODE_VERSION``. If the environment variable is not set, then the
|
||||
latest is used.
|
||||
:rtype: str
|
||||
:returns: unicode string, or non-unicode ``str`` type for python 2
|
||||
when given ``version`` is also type ``str``.
|
||||
"""
|
||||
# Design note: the choice to return the same type that is given certainly
|
||||
# complicates it for python 2 str-type, but allows us to define an api that
|
||||
# to use 'string-type', for unicode version level definitions, so all of our
|
||||
# example code works with all versions of python. That, along with the
|
||||
# string-to-numeric and comparisons of earliest, latest, matching, or
|
||||
# nearest, greatly complicates this function.
|
||||
_return_str = not _PY3 and isinstance(given_version, str)
|
||||
|
||||
if _return_str:
|
||||
unicode_versions = [ucs.encode() for ucs in list_versions()]
|
||||
else:
|
||||
unicode_versions = list_versions()
|
||||
latest_version = unicode_versions[-1]
|
||||
|
||||
if given_version in (u'auto', 'auto'):
|
||||
given_version = os.environ.get(
|
||||
'UNICODE_VERSION',
|
||||
'latest' if not _return_str else latest_version.encode())
|
||||
|
||||
if given_version in (u'latest', 'latest'):
|
||||
# default match, when given as 'latest', use the most latest unicode
|
||||
# version specification level supported.
|
||||
return latest_version if not _return_str else latest_version.encode()
|
||||
|
||||
if given_version in unicode_versions:
|
||||
# exact match, downstream has specified an explicit matching version
|
||||
# matching any value of list_versions().
|
||||
return given_version if not _return_str else given_version.encode()
|
||||
|
||||
# The user's version is not supported by ours. We return the newest unicode
|
||||
# version level that we support below their given value.
|
||||
try:
|
||||
cmp_given = _wcversion_value(given_version)
|
||||
|
||||
except ValueError:
|
||||
# submitted value raises ValueError in int(), warn and use latest.
|
||||
warnings.warn("UNICODE_VERSION value, {given_version!r}, is invalid. "
|
||||
"Value should be in form of `integer[.]+', the latest "
|
||||
"supported unicode version {latest_version!r} has been "
|
||||
"inferred.".format(given_version=given_version,
|
||||
latest_version=latest_version))
|
||||
return latest_version if not _return_str else latest_version.encode()
|
||||
|
||||
# given version is less than any available version, return earliest
|
||||
# version.
|
||||
earliest_version = unicode_versions[0]
|
||||
cmp_earliest_version = _wcversion_value(earliest_version)
|
||||
|
||||
if cmp_given <= cmp_earliest_version:
|
||||
# this probably isn't what you wanted, the oldest wcwidth.c you will
|
||||
# find in the wild is likely version 5 or 6, which we both support,
|
||||
# but it's better than not saying anything at all.
|
||||
warnings.warn("UNICODE_VERSION value, {given_version!r}, is lower "
|
||||
"than any available unicode version. Returning lowest "
|
||||
"version level, {earliest_version!r}".format(
|
||||
given_version=given_version,
|
||||
earliest_version=earliest_version))
|
||||
return earliest_version if not _return_str else earliest_version.encode()
|
||||
|
||||
# create list of versions which are less than our equal to given version,
|
||||
# and return the tail value, which is the highest level we may support,
|
||||
# or the latest value we support, when completely unmatched or higher
|
||||
# than any supported version.
|
||||
#
|
||||
# function will never complete, always returns.
|
||||
for idx, unicode_version in enumerate(unicode_versions):
|
||||
# look ahead to next value
|
||||
try:
|
||||
cmp_next_version = _wcversion_value(unicode_versions[idx + 1])
|
||||
except IndexError:
|
||||
# at end of list, return latest version
|
||||
return latest_version if not _return_str else latest_version.encode()
|
||||
|
||||
# Maybe our given version has less parts, as in tuple(8, 0), than the
|
||||
# next compare version tuple(8, 0, 0). Test for an exact match by
|
||||
# comparison of only the leading dotted piece(s): (8, 0) == (8, 0).
|
||||
if cmp_given == cmp_next_version[:len(cmp_given)]:
|
||||
return unicode_versions[idx + 1]
|
||||
|
||||
# Or, if any next value is greater than our given support level
|
||||
# version, return the current value in index. Even though it must
|
||||
# be less than the given value, its our closest possible match. That
|
||||
# is, 4.1 is returned for given 4.9.9, where 4.1 and 5.0 are available.
|
||||
if cmp_next_version > cmp_given:
|
||||
return unicode_version
|
||||
assert False, ("Code path unreachable", given_version, unicode_versions)
|
Loading…
Reference in New Issue