Skip to content

Decode error AND Add catalan language to LANGUAGE_MAPPING#5

Merged
Alir3z4 merged 3 commits intoAlir3z4:masterfrom
dmiro:master
Jan 30, 2015
Merged

Decode error AND Add catalan language to LANGUAGE_MAPPING#5
Alir3z4 merged 3 commits intoAlir3z4:masterfrom
dmiro:master

Conversation

@dmiro
Copy link
Contributor

@dmiro dmiro commented Jan 27, 2015

1. Add catalan language to LANGUAGE_MAPPING.
I previously I added the file with stop words in project "stop-words"

2. Decode error

stop_words = [line.strip().decode('utf-8')
             for line in language_file.readlines()]

Strip() return a copy of the string with leading and trailing whitespace characters removed.
But if the string contains non-ascii characters, Strip() causes a UnicodeDecodeError error
(eg UnicodeDecodeError: 'utf8' codec can not decode byte 0xc3 in position 34: unexpected end of data).

The workaround is to reorder the call:

stop_words = [line.decode('utf-8').strip()
             for line in language_file.readlines()]

dmiro added 3 commits January 27, 2015 15:15
stop_words = [line.strip().decode('utf-8')
             for line in language_file.readlines()]

Strip() return a copy of the string with leading and trailing whitespace characters removed.
But if the string contains non-ascii characters, Strip() causes a UnicodeDecodeError error
(eg UnicodeDecodeError: 'utf8' codec can not decode byte 0xc3 in position 34: unexpected end of data).

The workaround is to reorder the call:

stop_words = [line.decode('utf-8').strip()
             for line in language_file.readlines()]
Alir3z4 added a commit that referenced this pull request Jan 30, 2015
Decode error AND Add catalan language to LANGUAGE_MAPPING

Thanks @dmiro
@Alir3z4 Alir3z4 merged commit 8323629 into Alir3z4:master Jan 30, 2015
@Alir3z4
Copy link
Owner

Alir3z4 commented Jan 30, 2015

@dmiro good job.
I've merged both of the patches for repos.

I'll update the sub-module for this repo and release soon.

Thanks ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants