src: set UV_THREADPOOL_SIZE based on available parallelism#61533
src: set UV_THREADPOOL_SIZE based on available parallelism#61533RafaelGSS wants to merge 2 commits intonodejs:mainfrom
Conversation
When UV_THREADPOOL_SIZE is not set, Node.js will auto-size it based on uv_available_parallelism(), with a minimum of 4 and a maximum of 1024.
|
Review requested:
|
|
The
notable-change
Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #61533 +/- ##
=======================================
Coverage 89.76% 89.77%
=======================================
Files 672 672
Lines 203809 203817 +8
Branches 39189 39188 -1
=======================================
+ Hits 182949 182972 +23
+ Misses 13179 13168 -11
+ Partials 7681 7677 -4
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
LGTM.
Though I think there is some confusion here regarding parallelism vs concurrency. It can be meaningful to have more io dispatch threads than logical cpus. The task of the uv thread pool (in the case of fs) is to dispatch io and wait for results. So if your hardware can do 128 io requests in parallel you only need to dispatch 128 io requests concurrently which you can do on e.g. 16 logical cpus / available parallelism as well. In the opposite end, if your io hardware only has 128 parallel requests, than having more than 128 worker threads is not meaningful.
The whole thing with the "right" number of threads is super complicated with the current architecture. If we used ioring instead for fs then available parallelism would be the perfect amount.
That being said, I think available parallelism is a good guess but far from being a guaranteed optimal.
| int rc = uv_os_getenv("UV_THREADPOOL_SIZE", buf, &buf_size); | ||
| if (rc == UV_ENOENT) { | ||
| unsigned int parallelism = uv_available_parallelism(); | ||
| unsigned int threadpool_size = std::min(std::max(4u, parallelism), 1024u); |
There was a problem hiding this comment.
Is it worth just deferring the maximum to libuv here, since it already caps to MAX_THREADPOOL_SIZE, rather than hardcoding it a second time?
There was a problem hiding this comment.
If libuv changes that in a semver-minor version, it would affect Node.js. We could defer to them, but having a lower number in the MAX_THREADPOOL_SIZE sounds "safer".
There was a problem hiding this comment.
IMO, it's better to rely on the MAX_THREADPOOL_SIZE const in libuv, coz Node as a whole depends entirely on libuv for io/parallelism, and this number can only go up ⬆️ anyway, so there's nothing to worry about from a safety perspective 🦺
Ethan-Arrowood
left a comment
There was a problem hiding this comment.
I'm generally in favor of this change. Though I appreciate more research on if this is better for overall performance. I did something similar for picking a more sensible Node.js test_runner parallelization number for resource intensive tests. You're welcome to evaluate/use/reference my process and code here: https://github.com/Ethan-Arrowood/node-test-runner-parallelization-analysis
|
Which overhead (e.g. memory) is expected per uv thread? |
|
Note: Under most optimal conditions, the value 🔢 shall never exceed the following formula: |
|
I took some time today, and I did the following measurement: Benchmark Results: PR vs Node.js v24.13.0This report compares the performance of the PR Build against the Baseline ( Machine Configurations (lscpu)
1. 32GB Machine (16 vCPUs)PR Threadpool: 16 | Baseline Threadpool: 4
2. 8GB Machine (4 vCPUs)PR Threadpool: 4 | Baseline Threadpool: 4
3. 2GB Machine (2 vCPUs)PR Threadpool: 4 | Baseline Threadpool: 4*
If you have dedicated machines, feel free to run: curl -sL https://raw-githubusercontent-com-gh.computerqwq.top/RafaelGSS/uv-threadpool-benchmark/main/benchmark.sh | bash |
|
Didn't go so swell: |
|
Which OS? @ronag |
When UV_THREADPOOL_SIZE is not set, Node.js will auto-size it based on uv_available_parallelism(), with a minimum of 4 and a maximum of 1024.
Refs: nodejs/performance#193
I will still run some benchmarks to see its real impact. Adding blocked label for now.
cc: @ronag