gh-107369: optimize textwrap.indent()#107374
Conversation
eendebakpt
left a comment
There was a problem hiding this comment.
Looks good! Using str.split for the predicate instead of line.strip might change something for input that is not str, but I think this is ok.
serhiy-storchaka
left a comment
There was a problem hiding this comment.
lstrip is faster for non-indented lines.
I wonder whether the following variants can be faster for some input and for how wide category of input.
def predicate(line):
return line and (not line[0].isspace() or line.lstrip())or
predicate = re.compile(r'\S').search
Since we use In case of unicodeobject.c, rstrip is bit faster. But it may be because most lines are indented already. So I chose str.lstrip here, as Serhiy suggested. |
|
Now that you mention it, I can see that using We want to test whether the line has any non-space character. Algorithmically, |
Because
lstrip() is slow when every line has long indent. But With 4c6a46a and https://gist.github.com/methane/5c6153c564d9508199a81c48d33161eb If I add |
|
To maximize performance, we can stop using lambda by...: |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
Thank you for your research Inada-san. Which to use here, lstrip or isspace, I leave up to you. It does not really matter in most cases.
|
For very long texts, I think changing prefixed_lines = []
for line in text.splitlines(True):
if not line.isspace():
prefixed_lines.append(prefix)
prefixed_lines.append(line)into the following may improve the overall performances prefixed_lines = []
append_line = prefixed_lines.append
for line in text.splitlines(True):
if not line.isspace():
append_line(prefix)
append_line(line)EDIT: After a more careful benchmarking, this does not seem to bring more improvements. However, not using a lambda function seems to be better. |
indent()-ing
Object/unicodeobject.c(15332 lines) about 25% faster.