Commit Graph

294 Commits

Author SHA1 Message Date
Pablo Galindo Salgado ad47c7d926
[3.10] gh-99581: Fix a buffer overflow in the tokenizer when copying lines that fill the available buffer (GH-99605). (#99630) 2022-11-20 22:30:15 +00:00
Miss Islington (bot) d357f71f46
gh-96678: Fix UB of null pointer arithmetic (GH-96782)
Automerge-Triggered-By: GH:pablogsal
(cherry picked from commit 81e36f350b)

Co-authored-by: Matthias Görgens <matthias.goergens@gmail.com>
2022-09-13 08:03:51 -07:00
Miss Islington (bot) b6af933716
gh-96611: Fix error message for invalid UTF-8 in mid-multiline string (GH-96623)
(cherry picked from commit 05692c67c5)

Co-authored-by: Michael Droettboom <mdboom@gmail.com>
2022-09-06 16:36:03 -07:00
Pablo Galindo Salgado 697e78ca05
[3.10] gh-94360: Fix a tokenizer crash when reading encoded files with syntax errors from stdin (GH-94386) (GH-94574)
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>

(cherry picked from commit 36fcde61ba)
2022-07-05 20:14:28 +02:00
Miss Islington (bot) f20ac2ed07
bpo-46820: Fix a SyntaxError in a numeric literal followed by "not in" (GH-31479) (GH-31493)
Fix parsing a numeric literal immediately (without spaces) followed by
"not in" keywords, like in "1not in x". Now the parser only emits
a warning, not a syntax error.
(cherry picked from commit 090e5c4b94)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2022-02-22 12:00:50 +02:00
Pablo Galindo Salgado 5b58db7529
[3.10] bpo-46521: Fix codeop to use a new partial-input mode of the parser (GH-31010). (GH-31213)
(cherry picked from commit 69e10976b2)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2022-02-08 12:25:15 +00:00
Miss Islington (bot) 91e8889044
bpo-14916: use specified tokenizer fd for file input (GH-31006)
@pablogsal, sorry i failed to rebase to main, so i recreated https://github.com/python/cpython/pull/22190GH-issuecomment-1024633392

> PyRun_InteractiveOne\*() functions allow to explicitily set fd instead of stdin.
but stdin was hardcoded in readline call.

> This patch does not fix target file for prompt unlike original bpo one : prompt fd is unrelated to tokenizer source which could be read only. It is more of a bugfix regarding the docs :  actual documentation say "prompt the user" so one would expect prompt to go on stdout not a file for both PyRun_InteractiveOne\*() and PyRun_InteractiveLoop\*().

Automerge-Triggered-By: GH:pablogsal
(cherry picked from commit 89b13042fc)

Co-authored-by: Paul m. p. P <mail.peny@free.fr>
2022-02-03 15:32:22 -08:00
Pablo Galindo Salgado 3fc8b74ace
[3.10] bpo-46091: Correctly calculate indentation levels for whitespace lines with continuation characters (GH-30130). (GH-30898)
(cherry picked from commit a0efc0c196)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2022-01-25 22:33:57 +00:00
Miss Islington (bot) 94483f1e3c
bpo-46054: Fix parsing error when parsing non-utf8 characters in source files (GH-30068) (GH-30069)
(cherry picked from commit 4325a766f5)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2021-12-12 16:52:49 +00:00
Pablo Galindo Salgado 07cf66fd03
[3.10] Ensure the str member of the tokenizer is always initialised (GH-29681). (GH-29683)
(cherry picked from commit 4f006a789a)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2021-11-21 04:15:22 +00:00
Miss Islington (bot) bf26a6da7a
bpo-45738: Fix computation of error location for invalid continuation (GH-29550)
characters in the parser
(cherry picked from commit 25835c518a)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2021-11-13 17:30:03 -08:00
Miss Islington (bot) d8ca47c943
bpo-45562: Ensure all tokenizer debug messages are printed to stderr (GH-29270)
(cherry picked from commit cdc7a58277)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2021-10-29 10:21:15 -07:00
Miss Islington (bot) 038f452308
bpo-45562: Print tokenizer debug messages to stderr (GH-29250) (GH-29252)
(cherry picked from commit 10bbd41ba8)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2021-10-27 22:45:43 +01:00
Miss Islington (bot) cadf06eab7
bpo-45574: fix warning about `print_escape` being unused (GH-29172) (#29176)
It used to be like this:
<img width="1232" alt="Снимок экрана 2021-10-22 в 23 07 40" src="https://user-images.githubusercontent.com/4660275/138516608-fef6ec01-a96a-40f4-81ef-52265b0f536b.png">

Quick `grep` tells that it is just used in one place under `Py_DEBUG`: f6e8b80d20/Parser/tokenizer.cGH-L1047-L1051
<img width="752" alt="Снимок экрана 2021-10-22 в 23 08 09" src="https://user-images.githubusercontent.com/4660275/138516684-ea503136-1e92-48a5-95bb-419e190d5866.png">

I am not sure, but it also looks like a private thing, it should not affect other users.

Automerge-Triggered-By: GH:pablogsal
(cherry picked from commit 4bc5473a42)

Co-authored-by: Nikita Sobolev <mail@sobolevn.me>

Co-authored-by: Nikita Sobolev <mail@sobolevn.me>
2021-10-23 14:35:48 +01:00
Miss Islington (bot) ae78ffdc93
bpo-45562: Only show debug output from the parser in debug builds (GH-29140) (#29149)
(cherry picked from commit 86dfb55d2e)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2021-10-22 11:14:47 +01:00
Miss Islington (bot) f7f1c26423
Update URLs in comments and metadata to use HTTPS (GH-27458) (GH-27478)
(cherry picked from commit be42c06bb0)

Co-authored-by: Noah Kantrowitz <noah@coderanger.net>
2021-07-30 16:25:28 +02:00
Miss Islington (bot) 2a722d4fab
bpo-44317: Improve tokenizer errors with more informative locations (GH-26555) (GH-27079)
(cherry picked from commit f24777c2b3)

Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
2021-07-10 01:47:33 +01:00
Miss Islington (bot) d03f342a83
bpo-44396: Update multi-line-start location when reallocating tokenizer buffers (GH-26676) (GH-26695)
Automerge-Triggered-By: GH:pablogsal
(cherry picked from commit a342cc5891)
2021-06-12 21:27:02 +01:00
Miss Islington (bot) eeefa7f6c0
bpo-43833: Emit warnings for numeric literals followed by keyword (GH-25466)
Emit a deprecation warning if the numeric literal is immediately followed by
one of keywords: and, else, for, if, in, is, or. Raise a syntax error with
more informative message if it is immediately followed by other keyword or
identifier.

Automerge-Triggered-By: GH:pablogsal
(cherry picked from commit 2ea6d89028)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2021-06-08 16:52:23 -07:00
Miss Islington (bot) 1fb6b9e91d
bpo-44201: Avoid side effects of "invalid_*" rules in the REPL (GH-26298) (GH-26313)
When the parser does a second pass to check for errors, these rules can
have some small side-effects as they may advance the parser more than
the point reached in the first pass. This can cause the tokenizer to ask
for extra tokens in interactive mode causing the tokenizer to show the
prompt instead of failing instantly.

To avoid this, add a new mode to the tokenizer that is activated in the
second pass and deactivates asking for new tokens when the interactive
line is finished. As the parsing should have reached the last line in
the first pass, the second pass should not need to ask for more tokens.

(cherry picked from commit bd7476dae3)

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2021-05-22 23:23:26 +01:00
Pablo Galindo 92a02c1f7e
Fix tokenizer error when raw decoding null bytes (GH-25080) 2021-03-30 00:24:49 +01:00
Pablo Galindo 261a452a13
bpo-25643: Refactor the C tokenizer into smaller, logical units (GH-25050) 2021-03-28 23:48:05 +01:00
Pablo Galindo cd8dcbc851
bpo-43410: Fix crash in the parser when producing syntax errors when reading from stdin (GH-24763) 2021-03-14 04:38:40 +01:00
Batuhan Taskaya a698d52c39
bpo-40176: Improve error messages for unclosed string literals (GH-19346)
Automerge-Triggered-By: GH:isidentical
2021-01-20 13:38:47 -08:00
Pablo Galindo ae7d3cd980
bpo-42864: Fix compiler warning in the tokenizer with the new paren stack for column numbers (GH-24266) 2021-01-20 12:53:52 +00:00
Pablo Galindo d6d6371447
bpo-42864: Improve error messages regarding unclosed parentheses (GH-24161) 2021-01-19 23:59:33 +00:00
Lysandros Nikolaou e5fe509054
bpo-42827: Fix crash on SyntaxError in multiline expressions (GH-24140)
When trying to extract the error line for the error message there
are two distinct cases:

1. The input comes from a file, which means that we can extract the
   error line by using `PyErr_ProgramTextObject` and which we already
   do.
2. The input does not come from a file, at which point we need to get
   the source code from the tokenizer:
   * If the tokenizer's current line number is the same with the line
     of the error, we get the line from `tok->buf` and we're ready.
   * Else, we can extract the error line from the source code in the
     following two ways:
     * If the input comes from a string we have all the input
       in `tok->str` and we can extract the error line from it.
     * If the input comes from stdin, i.e. the interactive prompt, we
       do not have access to the previous line. That's why a new
       field `tok->stdin_content` is added which holds the whole input for the
       current (multiline) statement or expression. We can then extract the
       error line from `tok->stdin_content` like we do in the string case above.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2021-01-14 21:36:30 +00:00
Victor Stinner 00d7abd7ef
bpo-42519: Replace PyMem_MALLOC() with PyMem_Malloc() (GH-23586)
No longer use deprecated aliases to functions:

* Replace PyMem_MALLOC() with PyMem_Malloc()
* Replace PyMem_REALLOC() with PyMem_Realloc()
* Replace PyMem_FREE() with PyMem_Free()
* Replace PyMem_Del() with PyMem_Free()
* Replace PyMem_DEL() with PyMem_Free()

Modify also the PyMem_DEL() macro to use directly PyMem_Free().
2020-12-01 09:56:42 +01:00
Victor Stinner e822e37946
bpo-36020: Remove snprintf macro in pyerrors.h (GH-20889)
On Windows, #include "pyerrors.h" no longer defines "snprintf" and
"vsnprintf" macros.

PyOS_snprintf() and PyOS_vsnprintf() should be used to get portable
behavior.

Replace snprintf() calls with PyOS_snprintf() and replace vsnprintf()
calls with PyOS_vsnprintf().
2020-06-15 21:59:47 +02:00
Lysandros Nikolaou 896f4cf63f
bpo-40847: Consider a line with only a LINECONT a blank line (GH-20769)
A line with only a line continuation character should be considered
a blank line at tokenizer level so that only a single NEWLINE token
gets emitted. The old parser was working around the issue, but the
new parser threw a `SyntaxError` for valid input. For example,
an empty line following a line continuation character was interpreted
as a `SyntaxError`.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-06-11 00:56:08 +01:00
Ammar Askar a2bbedc8b1
Fix peg_generator compiler warnings under MSVC (GH-20405) 2020-05-26 05:33:35 +01:00
Serhiy Storchaka 74ea6b5a75
bpo-40593: Improve syntax errors for invalid characters in source code. (GH-20033) 2020-05-12 12:42:04 +03:00
Lysandros Nikolaou 846d8b28ab
bpo-40246: Revert reporting of invalid string prefixes (GH-19888)
Due to backwards compatibility concerns regarding keywords immediately followed by a string without whitespace between them (like in `bg="#d00" if clear else"#fca"`) will fail to parse,
commit 41d5b94af4 has to be reverted.
2020-05-04 12:32:18 +01:00
Pablo Galindo 11a7f158ef
bpo-40335: Correctly handle multi-line strings in tokenize error scenarios (GH-19619)
Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
2020-04-21 01:53:04 +01:00
Lysandros Nikolaou 41d5b94af4
bpo-40246: Report a better error message for invalid string prefixes (GH-19476) 2020-04-12 19:21:00 +01:00
Victor Stinner 87d3b9db4a
bpo-39882: Add _Py_FatalErrorFormat() function (GH-19157) 2020-03-25 19:27:36 +01:00
Victor Stinner 9e5d30cc99
bpo-39882: Py_FatalError() logs the function name (GH-18819)
The Py_FatalError() function is replaced with a macro which logs
automatically the name of the current function, unless the
Py_LIMITED_API macro is defined.

Changes:

* Add _Py_FatalErrorFunc() function.
* Remove the function name from the message of Py_FatalError() calls
  which included the function name.
* Update tests.
2020-03-07 00:54:20 +01:00
Andy Lester 384f3c536d
closes bpo-39721: Fix constness of members of tok_state struct. (GH-18600)
The function PyTokenizer_FromUTF8 from Parser/tokenizer.c had a comment:

    /* XXX: constify members. */

This patch addresses that.

In the tok_state struct:
    * end and start were non-const but could be made const
    * str and input were const but should have been non-const

Changes to support this include:
    * decode_str() now returns a char * since it is allocated.
    * PyTokenizer_FromString() and PyTokenizer_FromUTF8() each creates a
        new char * for an allocate string instead of reusing the input
        const char *.
    * PyTokenizer_Get() and tok_get() now take const char ** arguments.
    * Various local vars are const or non-const accordingly.

I was able to remove five casts that cast away constness.
2020-02-27 18:44:52 -08:00
Serhiy Storchaka 0cc6b5e559
bpo-39219: Fix SyntaxError attributes in the tokenizer. (GH-17828)
* Always set the text attribute.
* Correct the offset attribute for non-ascii sources.
2020-02-12 12:17:00 +02:00
Victor Stinner f3e7ea5b8c
bpo-39500: Document PyUnicode_IsIdentifier() function (GH-18397)
PyUnicode_IsIdentifier() does not call Py_FatalError() anymore if the
string is not ready.
2020-02-11 14:29:33 +01:00
Pablo Galindo 5ec91f78d5
bpo-39209: Manage correctly multi-line tokens in interactive mode (GH-17860) 2020-01-06 15:59:09 +00:00
Batuhan Taşkaya 109fc2792a bpo-38673: dont switch to ps2 if the line starts with comment or whitespace (GH-17421)
https://bugs.python.org/issue38673
2019-12-08 20:36:27 -08:00
Hansraj Das 69f37bcb28 Indent code inside if block. (GH-15284)
Without indendation, seems like strcpy line is parallel to `if` condition.
2019-08-15 09:19:07 -07:00
Anthony Sottile 5b94f3578c Fix `SyntaxError` indicator printing too many spaces for multi-line strings (GH-14433) 2019-07-29 14:59:13 +01:00
Michael J. Sullivan d8a82e2897 bpo-36878: Only allow text after `# type: ignore` if first character ASCII (GH-13504)
This disallows things like `# type: ignoreé`, which seems wrong.

Also switch to using Py_ISALNUM for the alnum check, for consistency
with other code (and maybe correctness re: locale issues?).


https://bugs.python.org/issue36878
2019-05-22 13:43:36 -07:00
Michael J. Sullivan 933e1509ec bpo-36878: Track extra text added to 'type: ignore' in the AST (GH-13479)
GH-13238 made extra text after a # type: ignore accepted by the parser.
This finishes the job and actually plumbs the extra text through the
parser and makes it available in the AST.
2019-05-22 15:54:20 +01:00
Anthony Sottile abea73bf4a bpo-2180: Treat line continuation at EOF as a `SyntaxError` (GH-13401)
This makes the parser consistent with the tokenize module (already the case
in `pypy`).

sample
------

```python
x = 5\
```

before
------

```console
$ python3 t.py
$ python3 -mtokenize t.py
t.py:2:0: error: EOF in multi-line statement
```

after
-----

```console
$ ./python t.py
  File "t.py", line 3
    x = 5\

         ^
SyntaxError: unexpected EOF while parsing
$ ./python -m tokenize t.py
t.py:2:0: error: EOF in multi-line statement
```



https://bugs.python.org/issue2180
2019-05-18 11:27:16 -07:00
Michael J. Sullivan d8320ecb86 bpo-36878: Allow extra text after `# type: ignore` comments (GH-13238)
In the parser, when using the type_comments=True option, recognize
a TYPE_IGNORE as anything containing `# type: ignore` followed by
a non-alphanumeric character. This is to allow ignores such as
`# type: ignore[E1000]`.
2019-05-11 19:17:24 +01:00
Pablo Galindo f2cf1e3e28
bpo-36623: Clean parser headers and include files (GH-12253)
After the removal of pgen, multiple header and function prototypes that lack implementation or are unused are still lying around.
2019-04-13 17:05:14 +01:00
Zackery Spytz cda139d1de bpo-36459: Fix a possible double PyMem_FREE() due to tokenizer.c's tok_nextc() (12601)
Remove the PyMem_FREE() call added in cb90c89.  The buffer will be
freed when PyTokenizer_Free() is called on the tokenizer state.
2019-03-28 15:53:00 +02:00