Add MMLSPARK_PYSPARK_CORES to specify CPU core count for PySpark#577
Add MMLSPARK_PYSPARK_CORES to specify CPU core count for PySpark#577ghost wants to merge 2 commits intomicrosoft:masterfrom
Conversation
…ailable for PySpark
|
Looks good @zulli73 if you add a line in the docs ill merge! |
Changed MMLSPARK_PYSPARK_CORES to MMLSPARK_PYSPARK_THREADS to better reflect it's purpose. Added documentation
|
I've added documentation. I added a whole new section covering all environment variables because I felt it didn't fit into any of the existing part of the documentation. Moreover, I thought about adding it to the example |
drdarshan
left a comment
There was a problem hiding this comment.
Thank you @zulli73! This looks good to me.
|
Hello @zulli73, if you don't mind, please resolve the conflict and I'll trigger the merge. Thank you for your contribution! |
|
Hi @drdarshan. Short story: Has this pull request become obsolete? Searching for "local[", all results use "local[*]" which indicates that the latest version at master may already use all CPU cores. Long story: I'd happily fix merge conflicts, but I have troubles to understand the change that caused this merge conflict d34f9d1: The file I modified got removed and it's not obvious to me why it became obsolete. It seems to me that since that change, no new Docker image has been pushed - therefore I can't easily check whether Spark utilizes all available CPU cores since that commit. Finally: I couldn't find the docs for building the Docker image myself/locally |
This is just a POC to get early feedback.
I run mmlspark locally on my notebook and figured out that only 2 of my 6 CPU cores were used when calculating Pi with PySpark, with code as below. I couldn't find an easy out-of-the-box mechanism to tweak this behavior. Therefore, I thought it'd be nice to make this configurable through env-vars so that users can tweak this during container creating. Thus, this pull request.
Please give me feedback whether you like this feature. If you do, I'll extend the documentation accordingly.
For reference, here's the code I run