Skip to content

modified get_stop_words(), preventing being changed from outside.#31

Merged
Alir3z4 merged 2 commits intoAlir3z4:masterfrom
yyanhan:master
Oct 27, 2025
Merged

modified get_stop_words(), preventing being changed from outside.#31
Alir3z4 merged 2 commits intoAlir3z4:masterfrom
yyanhan:master

Conversation

@yyanhan
Copy link
Contributor

@yyanhan yyanhan commented Mar 12, 2024

Dear Alir3z4,

I used this repo for the work at my previous company, and I found one issue with the function get_stop_words():

if we obtain the list in variable and modifiy the list variable, like:

en_stop_words = get_stop_words('en')
en_stop_words.append('harrypotter')

then the return list from get_stop_words() will also be changed:

'harrypotter' in get_stop_words('en')   # True

This will raise a mistake when we call the function get_stop_words('en') many times recursively, like:

en_stop_words_again = get_stop_words('en')
'harrypotter' in en_stop_words_again    # True

To solve this issue, of course the user can use copy.deepcopy(get_stop_words('en')), however this may not be noticed by the user.

Thus I added a copy in the function get_stop_words('en'), namely:

replacing:
    return stop_words

by: 
     return stop_words[:]

and as a result:

en_stop_words = get_stop_words('en')
en_stop_words.append('harrypotter')
en_stop_words_again = get_stop_words('en')

'harrypotter' in en_stop_words              # True
'harrypotter' in get_stop_words('en')     # False
'harrypotter' in en_stop_words_again    # False

And I have tested the performance before and after, see:

I hope this PR can make it better!

Best
Han

Co-authored-by: David Miró <davidmirom@hotmail.com>
@Alir3z4 Alir3z4 merged commit 2b71bd5 into Alir3z4:master Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants