gh-86519: Add prefixmatch APIs to the re module#31137
gh-86519: Add prefixmatch APIs to the re module#31137gpshead wants to merge 19 commits intopython:mainfrom
Conversation
These alleviate common confusion around what "match" means as Python is different than other popular languages in our use of the term as an API name. The original "match" names are NOT being deprecated. Source tooling like linters are expected to suggest using prefixmatch instead of match to improve code health and reduce cognitive burden of understanding the intent when reading code. See the documentation changes within this PR for a better description.
554bb41 to
54c77ca
Compare
`match = prefixmatch`
54c77ca to
149f6e4
Compare
## Summary The example for [FURB167](https://docs.astral.sh/ruff/rules/regex-flag-alias/#regex-flag-alias-furb167) has `re.match` with an `^` anchor to match the start of the string: ```python if re.match("^hello", "hello world", re.I): ``` But `re.match` already implicitly matches the start of the string: https://docs.python.org/3/library/re.html#search-vs-match Let's change the example to `re.search` so the anchor isn't redundant. (The anchor's actually irrelevant to the example for this rule about long or short flag names.) (Aside: There's a discussion about adding `re.prefixmatch` and [soft] deprecating `re.match` because of the confusion around it: https://discuss.python.org/t/add-re-prefixmatch-deprecate-re-match/105927, python/cpython#86519, python/cpython#31137.) ## Test Plan <!-- How was it tested? --> 1. Create feature branch 2. Push to my fork to run CI 3. Realise feature branches are disabled for forks in Ruff CI 4. Merge feature branch to my `main` 5. Push that 6. Be happy I did, because it failed because I missed something 7. Fixup, pushup 8. Passes [🎉](https://github.com/hugovk/ruff/actions/runs/21524112749)
Resolved conflicts: - Doc/whatsnew/3.14.rst: Used main's version (3.14 is released) - Lib/re/__init__.py: Removed __version__ (removed in main), updated docstring to reference 3.15 instead of 3.14 Added prefixmatch What's New entry to Doc/whatsnew/3.15.rst since the feature is now targeting Python 3.15.
- Change "25 years" to "30 years" to reflect actual time - Replace speculative "this decade, if ever" / "7 years" language with clear statement that we will never remove the original match name Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix traceback to include ^ anchor matching the pair pattern definition - Add \A anchor to one example as a teaching hint for readers - Update card game examples to demonstrate search/match/prefixmatch mix - Add explanatory paragraph about match and prefixmatch being identical - Rename compiled regex variables to use _re suffix (valid_hand_re, pair_re) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Using first_name/last_name patterns promotes the myth that names have simple, universal structures. Replace with: - "killer rabbit" with adjective/animal groups - "Norwegian Blue, pining for the fjords" for unlabeled groups
hauntsaninja
left a comment
There was a problem hiding this comment.
Great, thank you for making this happen!
|
Did we actually reached a consensus on DPO for that feature to be merged? I thought there were some issues, especially with the fact that |
Yes, I think the 24 hearts on Greg's https://discuss.python.org/t/add-re-prefixmatch-deprecate-re-match/105927/20 is a pretty good sign. The only other posts with double-digit hearts are also in support: the OP (10), a docs suggestion from Tim (14), a name suggestion (
Whatever the third-party |
hugovk
left a comment
There was a problem hiding this comment.
Thanks!
Non-blocking docs suggestion:
Use prefixmatch pretty much throughout, except to mention match is the alias (plus version changed etc.), which you've mostly already done here.
We can also update other places like the regex HOWTO to prefer the new name, but this can also be in a followup.
Bike shed colour:
I'm fine with either prefixmatch and startswith.
Pro for prefixmatch: it keeps the "match" part, which might help remind people when they see match: "oh, wait, is this the lopsided prefixmatch or fullmatch? I'd better double check the docs and use one of those descriptive names or instead search with explicit anchors."
Soft deprecation:
We're discouraging in docs and not planning on removing. Sounds like soft deprecation. I'd be fine with soft deprecating too, but that could also be another discussion once this is done (if anyone has energy for that!).
I'm similarly inclined to stick with lots of what look like good comments on the docs here and thoughts on how to document this to digest from the discuss thread. i'll work on getting the doc edits in and see if it looks like it needs more review after that. |
Apply suggestions from hugovk and hauntsaninja reviews:
- List prefixmatch before match in function/method signatures
- Use prefixmatch exclusively in examples, remove redundant match duplicates
- Remove interchangeable paragraph (covered by prefixmatch-vs-match section)
- Rename section to "search() vs. prefixmatch()"
- Add .. note:: directive for MULTILINE caveat
- Fix time-sensitive wording ("very recent Python", "never")
- Fix alphabetical ordering in whatsnew/3.15.rst
- Fix comment grammar in Lib/re/__init__.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the copy_prefixmatch_method_def_to_match() runtime memcpy with static struct initializers that reuse the Clinic-generated prefixmatch parser directly. This avoids duplicating argument parsing boilerplate while keeping everything initialized at compile time. Add Py_DEBUG assertions in sre_exec to verify the match and prefixmatch method table entries remain identical except for ml_name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The "Removed" heading underline was exactly 7 '=' characters, which triggers the check-merge-conflict pre-commit hook as a false positive. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refactor test_match_getitem to use a helper method instead of a for loop, keeping the assertion lines at their original indentation to preserve git line attribution history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds
prefixmatchAPIs to the re module as an alternate name for our long existingmatchAPIs to help alleviate a common Python confusion for those coming from other languages regular expression libraries.These alleviate common confusion around what "match" means as Python is different than other popular languages regex libraries in our use of the term as an API name. The original
matchnames are NOT being deprecated. Source tooling like linters, IDEs, and LLMs could suggest usingprefixmatchinstead of match to improve code health and reduce cognitive burden of understanding the intent of code when configured for a modern minimum Python version.See the documentation changes within this PR for a better description.
Documentation Preview: https://cpython-previews--31137.org.readthedocs.build/en/31137/