gh-107369: optimize textwrap.indent() by methane · Pull Request #107374 · python/cpython

methane · 2023-07-28T05:58:20Z

indent()-ing Object/unicodeobject.c (15332 lines) about 25% faster.

Issue: Optimize textwrap.indent() #107369

eendebakpt

Looks good! Using str.split for the predicate instead of line.strip might change something for input that is not str, but I think this is ok.

serhiy-storchaka

lstrip is faster for non-indented lines.

I wonder whether the following variants can be faster for some input and for how wide category of input.

def predicate(line):
    return line and (not line[0].isspace() or line.lstrip())

or

predicate = re.compile(r'\S').search

methane · 2023-07-28T16:51:29Z

_has_nonspace = re.compile(r'\S').search in global and predicate = _has_nonspace -- 3.5ms
str.rstrip = 1.95ms
str.lstrip = 2.03ms
lambda x: not x.isspace() = 2.07ms

Since we use splitlines(keepends=True), we can use just not x.isspace(). (no empty line is guaranteed. "".splitlines(keepends=True) == [] and "foo\n".splitlines(True) == ['foo\n']).
But it is a bit tricky and has relatively high cognitive load.

In case of unicodeobject.c, rstrip is bit faster. But it may be because most lines are indented already.

So I chose str.lstrip here, as Serhiy suggested.

serhiy-storchaka · 2023-07-28T18:52:30Z

Now that you mention it, I can see that using isspace() is the most obvious way to do this. Why I did not see it earlier?

We want to test whether the line has any non-space character. bool(line.strip()) is actually a tricky way -- we strips the line from spaces and if the rest is not empty string, then the original line has non-space characters too. not line.isspace() is a straightforward way -- it asks the opposite question (is the line only contains space characters?) and negates the result.

Algorithmically, isspace() looks more preferable, because it does not create a string. But on practice it may not matter in common cases. Did you compare variants with different inputs? For example Misc/NEWS.d/3.8.0a1.rst may show a very different result.

Lib/textwrap.py

methane · 2023-07-29T02:18:13Z

Now that you mention it, I can see that using isspace() is the most obvious way to do this. Why I did not see it earlier?

Because "".isspace() is False. We need to guarantee that "" is not used here.
x and not x.isspace() would be bit obvious, but little slower.

Algorithmically, isspace() looks more preferable, because it does not create a string. But on practice it may not matter in common cases. Did you compare variants with different inputs? For example Misc/NEWS.d/3.8.0a1.rst may show a very different result.

lstrip() is slow when every line has long indent. But Misc/NEWS.d/3.8.0a1.rst has almost no indents.

With 4c6a46a and https://gist.github.com/methane/5c6153c564d9508199a81c48d33161eb

> ./python.exe bench_indent.py Misc/NEWS.d/3.8.0a1.rst
filename='Misc/NEWS.d/3.8.0a1.rst' 8978 lines.
                   lstrip: 0.736msec
          not x.isspace(): 0.877msec
    x and not x.isspace(): 0.929msec

> ./python.exe bench_indent.py Objects/unicodeobject.c
filename='Objects/unicodeobject.c' 15332 lines.
                   lstrip: 1.812msec
          not x.isspace(): 1.877msec
    x and not x.isspace(): 1.970msec

If I add text = textwrap.indent(text, " "*32) before bench:

> ./python.exe bench_indent.py Objects/unicodeobject.c
filename='Objects/unicodeobject.c' 15332 lines.
                   lstrip: 2.259msec
          not x.isspace(): 2.356msec
    x and not x.isspace(): 2.437msec

methane · 2023-07-29T02:46:45Z

To maximize performance, we can stop using lambda by...:

    if predicate is None:
        for line in text.splitlines(True):
            if not line.isspace():
                prefixed_lines.append(prefix)
            prefixed_lines.append(line)
    else:
        for line in text.splitlines(True):
            if predicate(line):
                prefixed_lines.append(prefix)
            prefixed_lines.append(line)

filename='Objects/unicodeobject.c' 15332 lines.
                     None: 1.604msec
                   lstrip: 1.826msec
          not x.isspace(): 1.883msec

serhiy-storchaka

Thank you for your research Inada-san. Which to use here, lstrip or isspace, I leave up to you. It does not really matter in most cases.

picnixz · 2023-07-29T12:01:19Z

For very long texts, I think changing

prefixed_lines = []
for line in text.splitlines(True):
    if not line.isspace():
        prefixed_lines.append(prefix)
    prefixed_lines.append(line)

into the following may improve the overall performances

prefixed_lines = []
append_line = prefixed_lines.append
for line in text.splitlines(True):
    if not line.isspace():
        append_line(prefix)
    append_line(line)

EDIT: After a more careful benchmarking, this does not seem to bring more improvements. However, not using a lambda function seems to be better.

methane added 2 commits July 28, 2023 13:36

optimize textwrap.indent()

94ab051

Add NEWS

8c5896c

bedevere-bot mentioned this pull request Jul 28, 2023

Optimize textwrap.indent() #107369

Closed

bedevere-bot added the awaiting core review label Jul 28, 2023

methane added performance Performance or resource usage stdlib Standard Library Python modules in the Lib/ directory labels Jul 28, 2023

Add what's new entry

6ee731c

eendebakpt approved these changes Jul 28, 2023

View reviewed changes

serhiy-storchaka reviewed Jul 28, 2023

View reviewed changes

Use lstrip instead of strip

fad98a2

eendebakpt reviewed Jul 28, 2023

View reviewed changes

Lib/textwrap.py Outdated Show resolved Hide resolved

avoid temporary tuple.

4c6a46a

methane added 2 commits July 29, 2023 12:34

use str.isspace instead of lstrip

5e60878

add comment about splitlines(True)

16e3dbd

serhiy-storchaka approved these changes Jul 29, 2023

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Jul 29, 2023

25% -> 30%

734fd01

methane enabled auto-merge (squash) July 29, 2023 06:03

methane merged commit 37551c9 into python:main Jul 29, 2023

methane deleted the opt-textwrap-indent branch July 29, 2023 06:37

bedevere-bot removed the awaiting merge label Jul 29, 2023

This was referenced Jul 29, 2023

Optimize textwrap.indent a bit more. #107424

Closed

gh-107424: avoid using lambda functions in textwrap.indent() #107426

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

gh-107369: optimize textwrap.indent()#107374

gh-107369: optimize textwrap.indent()#107374
methane merged 8 commits intopython:mainfrom
methane:opt-textwrap-indent

methane commented Jul 28, 2023 •

edited by bedevere-bot

Loading

Uh oh!

eendebakpt left a comment

Uh oh!

serhiy-storchaka left a comment

Uh oh!

methane commented Jul 28, 2023 •

edited

Loading

Uh oh!

serhiy-storchaka commented Jul 28, 2023

Uh oh!

Uh oh!

methane commented Jul 29, 2023

Uh oh!

methane commented Jul 29, 2023

Uh oh!

serhiy-storchaka left a comment

Uh oh!

picnixz commented Jul 29, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Comments

Conversation

methane commented Jul 28, 2023 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eendebakpt left a comment

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

methane commented Jul 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serhiy-storchaka commented Jul 28, 2023

Uh oh!

Uh oh!

methane commented Jul 29, 2023

Uh oh!

methane commented Jul 29, 2023

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

picnixz commented Jul 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

methane commented Jul 28, 2023 •

edited by bedevere-bot

Loading

methane commented Jul 28, 2023 •

edited

Loading

picnixz commented Jul 29, 2023 •

edited

Loading