Add MMLSPARK_PYSPARK_CORES to specify CPU core count for PySpark · Pull Request #577 · microsoft/SynapseML

ghost · 2019-06-01T08:57:37Z

This is just a POC to get early feedback.

I run mmlspark locally on my notebook and figured out that only 2 of my 6 CPU cores were used when calculating Pi with PySpark, with code as below. I couldn't find an easy out-of-the-box mechanism to tweak this behavior. Therefore, I thought it'd be nice to make this configurable through env-vars so that users can tweak this during container creating. Thus, this pull request.

Please give me feedback whether you like this feature. If you do, I'll extend the documentation accordingly.

For reference, here's the code I run

import random
num_samples = 100000000
def inside(p):
  x, y = random.random(), random.random()
  return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4.0 * count / num_samples
print(pi)

…ailable for PySpark

mhamilton723 · 2019-06-11T19:57:50Z

Looks good @zulli73 if you add a line in the docs ill merge!

Changed MMLSPARK_PYSPARK_CORES to MMLSPARK_PYSPARK_THREADS to better reflect it's purpose. Added documentation

msftclas · 2019-06-16T09:26:07Z

All CLA requirements met.

ghost · 2019-06-16T09:32:31Z

I've added documentation.

I added a whole new section covering all environment variables because I felt it didn't fit into any of the existing part of the documentation. Moreover, I thought about adding it to the example docker run command, but I didn't want to make that example more complicated than necessary. Hence, the new section.

drdarshan

Thank you @zulli73! This looks good to me.

drdarshan · 2019-07-17T01:21:03Z

Hello @zulli73, if you don't mind, please resolve the conflict and I'll trigger the merge. Thank you for your contribution!

ghost · 2019-08-12T05:28:38Z

Hi @drdarshan.

Short story: Has this pull request become obsolete? Searching for "local[", all results use "local[*]" which indicates that the latest version at master may already use all CPU cores.

Long story: I'd happily fix merge conflicts, but I have troubles to understand the change that caused this merge conflict d34f9d1: The file I modified got removed and it's not obvious to me why it became obsolete. It seems to me that since that change, no new Docker image has been pushed - therefore I can't easily check whether Spark utilizes all available CPU cores since that commit. Finally: I couldn't find the docs for building the Docker image myself/locally

Add MMLSPARK_PYSPARK_CORES allowing to specify amount of CPU cores av…

a379924

…ailable for PySpark

ghost changed the title ~~Add MMLSPARK_PYSPARK_CORES to CPU core count for PySpark~~ Add MMLSPARK_PYSPARK_CORES to specify CPU core count for PySpark Jun 1, 2019

Polishing and Docker documentation

4cc09f4

Changed MMLSPARK_PYSPARK_CORES to MMLSPARK_PYSPARK_THREADS to better reflect it's purpose. Added documentation

drdarshan approved these changes Jul 17, 2019

View reviewed changes

mhamilton723 self-requested a review as a code owner November 17, 2022 12:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MMLSPARK_PYSPARK_CORES to specify CPU core count for PySpark#577

Add MMLSPARK_PYSPARK_CORES to specify CPU core count for PySpark#577
ghost wants to merge 2 commits intomicrosoft:masterfrom
Software-Natives-OSS:master

ghost commented Jun 1, 2019

Uh oh!

mhamilton723 commented Jun 11, 2019

Uh oh!

msftclas commented Jun 16, 2019 •

edited

Loading

Uh oh!

ghost commented Jun 16, 2019

Uh oh!

drdarshan left a comment

Uh oh!

drdarshan commented Jul 17, 2019

Uh oh!

ghost commented Aug 12, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ghost commented Jun 1, 2019

Uh oh!

mhamilton723 commented Jun 11, 2019

Uh oh!

msftclas commented Jun 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Jun 16, 2019

Uh oh!

drdarshan left a comment

Choose a reason for hiding this comment

Uh oh!

drdarshan commented Jul 17, 2019

Uh oh!

ghost commented Aug 12, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

msftclas commented Jun 16, 2019 •

edited

Loading