mirror of https://github.com/python/cpython.git
[3.11] gh-108590: Improve sqlite3 docs on encoding issues and how to handle those (GH-108699) (#111325)
Add a guide for how to handle non-UTF-8 text encodings.
Link to that guide from the 'text_factory' docs.
(cherry picked from commit 1262e41842
)
Co-authored-by: Erlend E. Aasland <erlend@python.org>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Co-authored-by: Corvin <corvin@corvin.dev>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
This commit is contained in:
parent
fc9a5ef1a8
commit
07664c9ddb
|
@ -1029,6 +1029,10 @@ Connection objects
|
||||||
f.write('%s\n' % line)
|
f.write('%s\n' % line)
|
||||||
con.close()
|
con.close()
|
||||||
|
|
||||||
|
.. seealso::
|
||||||
|
|
||||||
|
:ref:`sqlite3-howto-encoding`
|
||||||
|
|
||||||
|
|
||||||
.. method:: backup(target, *, pages=-1, progress=None, name="main", sleep=0.250)
|
.. method:: backup(target, *, pages=-1, progress=None, name="main", sleep=0.250)
|
||||||
|
|
||||||
|
@ -1095,6 +1099,10 @@ Connection objects
|
||||||
|
|
||||||
.. versionadded:: 3.7
|
.. versionadded:: 3.7
|
||||||
|
|
||||||
|
.. seealso::
|
||||||
|
|
||||||
|
:ref:`sqlite3-howto-encoding`
|
||||||
|
|
||||||
.. method:: getlimit(category, /)
|
.. method:: getlimit(category, /)
|
||||||
|
|
||||||
Get a connection runtime limit.
|
Get a connection runtime limit.
|
||||||
|
@ -1253,39 +1261,8 @@ Connection objects
|
||||||
and returns a text representation of it.
|
and returns a text representation of it.
|
||||||
The callable is invoked for SQLite values with the ``TEXT`` data type.
|
The callable is invoked for SQLite values with the ``TEXT`` data type.
|
||||||
By default, this attribute is set to :class:`str`.
|
By default, this attribute is set to :class:`str`.
|
||||||
If you want to return ``bytes`` instead, set *text_factory* to ``bytes``.
|
|
||||||
|
|
||||||
Example:
|
See :ref:`sqlite3-howto-encoding` for more details.
|
||||||
|
|
||||||
.. testcode::
|
|
||||||
|
|
||||||
con = sqlite3.connect(":memory:")
|
|
||||||
cur = con.cursor()
|
|
||||||
|
|
||||||
AUSTRIA = "Österreich"
|
|
||||||
|
|
||||||
# by default, rows are returned as str
|
|
||||||
cur.execute("SELECT ?", (AUSTRIA,))
|
|
||||||
row = cur.fetchone()
|
|
||||||
assert row[0] == AUSTRIA
|
|
||||||
|
|
||||||
# but we can make sqlite3 always return bytestrings ...
|
|
||||||
con.text_factory = bytes
|
|
||||||
cur.execute("SELECT ?", (AUSTRIA,))
|
|
||||||
row = cur.fetchone()
|
|
||||||
assert type(row[0]) is bytes
|
|
||||||
# the bytestrings will be encoded in UTF-8, unless you stored garbage in the
|
|
||||||
# database ...
|
|
||||||
assert row[0] == AUSTRIA.encode("utf-8")
|
|
||||||
|
|
||||||
# we can also implement a custom text_factory ...
|
|
||||||
# here we implement one that appends "foo" to all strings
|
|
||||||
con.text_factory = lambda x: x.decode("utf-8") + "foo"
|
|
||||||
cur.execute("SELECT ?", ("bar",))
|
|
||||||
row = cur.fetchone()
|
|
||||||
assert row[0] == "barfoo"
|
|
||||||
|
|
||||||
con.close()
|
|
||||||
|
|
||||||
.. attribute:: total_changes
|
.. attribute:: total_changes
|
||||||
|
|
||||||
|
@ -1423,7 +1400,6 @@ Cursor objects
|
||||||
COMMIT;
|
COMMIT;
|
||||||
""")
|
""")
|
||||||
|
|
||||||
|
|
||||||
.. method:: fetchone()
|
.. method:: fetchone()
|
||||||
|
|
||||||
If :attr:`~Cursor.row_factory` is ``None``,
|
If :attr:`~Cursor.row_factory` is ``None``,
|
||||||
|
@ -2369,6 +2345,47 @@ With some adjustments, the above recipe can be adapted to use a
|
||||||
instead of a :class:`~collections.namedtuple`.
|
instead of a :class:`~collections.namedtuple`.
|
||||||
|
|
||||||
|
|
||||||
|
.. _sqlite3-howto-encoding:
|
||||||
|
|
||||||
|
How to handle non-UTF-8 text encodings
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
By default, :mod:`!sqlite3` uses :class:`str` to adapt SQLite values
|
||||||
|
with the ``TEXT`` data type.
|
||||||
|
This works well for UTF-8 encoded text, but it might fail for other encodings
|
||||||
|
and invalid UTF-8.
|
||||||
|
You can use a custom :attr:`~Connection.text_factory` to handle such cases.
|
||||||
|
|
||||||
|
Because of SQLite's `flexible typing`_, it is not uncommon to encounter table
|
||||||
|
columns with the ``TEXT`` data type containing non-UTF-8 encodings,
|
||||||
|
or even arbitrary data.
|
||||||
|
To demonstrate, let's assume we have a database with ISO-8859-2 (Latin-2)
|
||||||
|
encoded text, for example a table of Czech-English dictionary entries.
|
||||||
|
Assuming we now have a :class:`Connection` instance :py:data:`!con`
|
||||||
|
connected to this database,
|
||||||
|
we can decode the Latin-2 encoded text using this :attr:`~Connection.text_factory`:
|
||||||
|
|
||||||
|
.. testcode::
|
||||||
|
|
||||||
|
con.text_factory = lambda data: str(data, encoding="latin2")
|
||||||
|
|
||||||
|
For invalid UTF-8 or arbitrary data in stored in ``TEXT`` table columns,
|
||||||
|
you can use the following technique, borrowed from the :ref:`unicode-howto`:
|
||||||
|
|
||||||
|
.. testcode::
|
||||||
|
|
||||||
|
con.text_factory = lambda data: str(data, errors="surrogateescape")
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The :mod:`!sqlite3` module API does not support strings
|
||||||
|
containing surrogates.
|
||||||
|
|
||||||
|
.. seealso::
|
||||||
|
|
||||||
|
:ref:`unicode-howto`
|
||||||
|
|
||||||
|
|
||||||
.. _sqlite3-explanation:
|
.. _sqlite3-explanation:
|
||||||
|
|
||||||
Explanation
|
Explanation
|
||||||
|
|
Loading…
Reference in New Issue