Skip to content

Set explicit GPU defaults in ResourcesSpec and improve default GPU vendor selection#3573

Open
peterschmidt85 wants to merge 2 commits intomasterfrom
resources-gpu-default
Open

Set explicit GPU defaults in ResourcesSpec and improve default GPU vendor selection#3573
peterschmidt85 wants to merge 2 commits intomasterfrom
resources-gpu-default

Conversation

@peterschmidt85
Copy link
Contributor

@peterschmidt85 peterschmidt85 commented Feb 13, 2026

Motivation

Previously, ResourcesSpec.gpu defaulted to None, making GPU the only resource without an explicit default (CPU defaults to 2.., memory to 8GB.., disk to 100GB..). This caused the GPU field to be hidden in the CLI plan output. Additionally, the NVIDIA vendor default was silently applied deep in the validation code regardless of whether the user set a custom image.

Changes

  • Set ResourcesSpec.gpu default to GPUSpec(count=0..) instead of None, aligning GPU with other resource defaults
  • Default GPU vendor to nvidia only when using the default CUDA image (no custom image or docker). With a custom image, any vendor is allowed
  • Server-side vendor inference via set_gpu_vendor_default() with full backward compatibility across old/new CLI and server combinations

Before / After

# Before (no GPU, no image)
Resources  cpu=2.. mem=8GB.. disk=100GB..

# After
Resources  cpu=2.. mem=8GB.. disk=100GB.. gpu=nvidia:0..
# Before (gpu: 1, no image)
Resources  cpu=2.. mem=8GB.. disk=100GB.. gpu:1

# After
Resources  cpu=2.. mem=8GB.. disk=100GB.. gpu=nvidia:1
# Before (gpu: 1, custom image)
Resources  cpu=2.. mem=8GB.. disk=100GB.. gpu:1

# After - any vendor allowed since user provides their own image
Resources  cpu=2.. mem=8GB.. disk=100GB.. gpu=1

Backward compatibility

Tested across all CLI/server combinations (new CLI 0.21-dev, old CLI 0.20.9, new server, old server on sky.dstack.ai). GCP backend used for tests (has both NVIDIA and TPU offers).

dstack apply

Row Config CLI Server Display TPU included?
1 No GPU, no image New New gpu=nvidia:0.. No
2 No GPU, custom image New New gpu=0.. Yes
3 No GPU, docker=true New New gpu=0.. Yes
4 gpu: 1, no image New New gpu=nvidia:1 No
5 gpu: 1, custom image New New gpu=1 Yes
6 gpu: 1, docker=true New New gpu=1 Yes
7 gpu: A100 New New gpu=A100:1.. No
8 gpu: MI300X, no image New New Error: image required N/A
9 gpu: MI300X, image New New gpu=MI300X:1.. No
10 gpu: 1..4, no image New New gpu=nvidia:1..4 No
11 gpu: nvidia:1 New New gpu=nvidia:1 No
12 gpu: amd:1, image New New gpu=amd:1 No
13 No GPU, no image Old 0.20.9 New (no gpu shown) Yes
14 gpu: 1, no image Old 0.20.9 New gpu:1 No
15 gpu: 1, custom image Old 0.20.9 New gpu:1 No
16 gpu: A100 Old 0.20.9 New A100:1.. No
17 No GPU, no image New Old (sky) gpu=0.. Yes
18 gpu: 1, no image New Old (sky) gpu=1 No
19 gpu: 1, custom image New Old (sky) gpu=1 No

dstack offer

Row Config CLI Server Display TPU included?
20 --gpu 1 New New gpu=1 Yes
21 --gpu nvidia:1 New New gpu=nvidia:1 No
22 --gpu tpu:1 New New gpu=google:1 Yes (TPU only)
23 --gpu 1 New Old (sky) gpu=1 Yes
24 default Old 0.20.9 New (no gpu shown) Yes
25 --gpu 1 Old 0.20.9 New gpu:1 No

Key observations

  • No regressions in any old CLI + old server combination
  • Row 15: old CLI always sets vendor=nvidia regardless of image (pre-existing behavior)
  • Row 19: old server always infers nvidia regardless of image (pre-existing behavior)

- Default to NVIDIA only if user has no image
- Keep backward compatibility with old/new server/CLI
- Make `dstack offer` consistent with `dstack apply`

Co-authored-by: Cursor <cursoragent@cursor.com>
@peterschmidt85 peterschmidt85 changed the title Set explicit GPU default (0..) in ResourcesSpec and minor improvements in resource pretty-printing Set explicit GPU defaults in ResourcesSpec and improve default GPU vendor selection Feb 13, 2026
@peterschmidt85
Copy link
Contributor Author

TODOs

Docs updates:

  • Update gpu property description in concepts pages (dev-environments.md, tasks.md, services.md) and protips.md to reflect:
    • No gpu specified → defaults to 0.. (any vendor)
    • No count specified → defaults to 1..
    • Vendor defaults to nvidia only when no custom image is set; with custom image, any vendor is allowed
  • Update reference schema docs accordingly

Future scope (outside this PR):

  • Consider making TPU an explicit opt-in for GCP backend, since TPU requires specific setup and silently offering TPU instances may lead to unexpected behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants