gh-116738: Make _suggestions module thread-safe#140321
Conversation
| Py_ssize_t size = PyList_Size(candidates); | ||
| for (Py_ssize_t i = 0; i < size; ++i) { | ||
| PyObject *elem = PyList_GetItem(candidates, i); | ||
| PyObject *elem = PyList_GET_ITEM(candidates, i); |
There was a problem hiding this comment.
How about using PyList_GetItemRef with this change.
There was a problem hiding this comment.
@corona10, Thank you so much for your comment!
Just to make sure I understand your suggestion correctly, did you mean:
- To use
PyList_GetItemRef()instead ofPyList_GET_ITEM()within the currentcritical_section? - Or to remove the
critical_sectionand usePyList_GetItemRef()in both loops I mentioned in the description?
Regarding 1: I replaced PyList_GetItem() with PyList_GET_ITEM() because, in the with-GIL build, it’s already assumed that PyList_GetItem() cannot fail, so the returned value isn't checked. Within the critical_section, the assumption is that the candidates list cannot be mutated, and PyList_GET_ITEM() avoids extra atomic operations.
Regarding 2: I think this could be implemented with PyList_GetItemRef() and no critical_section, as long as errors returned from PyList_GetItemRef() are properly handled. However, in my opinion, using a critical_section makes the code a bit easier to read and reason about, since the candidates list cannot be mutated in this context. This means we don’t need to consider update cases between blocks or iterations.
What do you think, would it be better to use PyList_GetItemRef() here?
There was a problem hiding this comment.
@yoney
Hmm, even if you use PyList_GET_ITEM, it won’t check whether the returned value is NULL, so it won’t actually fix the UB you mentioned. What do you think?
cpython/Include/cpython/listobject.h
Line 40 in f11ec6e
There was a problem hiding this comment.
@corona10
Sorry, my previous comment wasn't clear before. I was referring to cases where PyList_GetItem() could fail due to PyList_Check() or valid_index(), particularly in the free-threading build. I was trying to explain why we need a critical_section.
Lines 384 to 397 in f11ec6e
These properties are checked in _suggestions__generate_suggestions_impl() earlier, before calling PyList_GetItem(), and the GIL or the critical_section "protects" them. That's why I replaced PyList_GetItem() with PyList_GET_ITEM(). However, the actual element of the list could still be NULL. I had assumed this wasn’t possible, but should we check if the element is NULL?
There was a problem hiding this comment.
Now I get it, looks good to me. Thank you for explain
|
Added a critical section to protect the
candidateslist in the free-threading build, addressing two possible issues below:PyList_GetItem()could fail and returnNULL. Passing it toPyUnicode_Check()results in UB in the loop below.cpython/Modules/_suggestions.c
Lines 31 to 33 in bedaea0
_Py_CalculateSuggestions().cpython/Python/suggestions.c
Lines 150 to 153 in bedaea0
cc: @mpage @colesbury
Note: I considered placing the critical section inside
_Py_CalculateSuggestions(). However, due to the loop in_suggestions__generate_suggestions_impl(), the critical section needs to be earlier. I also checked other places where_Py_CalculateSuggestions()is called, and all of them use local list (candidates) variables in C code, so they are safe.