Kubernetes v1.36 Beta: Dynamically Adjusting Pod Resources for Suspended Jobs
Introduction
Kubernetes v1.36 introduces a powerful beta feature that allows modifying container resource requests and limits in the pod template of a suspended Job. This capability, initially released as alpha in v1.35, empowers queue controllers and cluster administrators to fine-tune CPU, memory, GPU, and extended resource specifications on a Job while it remains suspended—before it starts or resumes execution. By eliminating the need to recreate Jobs for resource adjustments, this feature greatly improves operational flexibility in dynamic cluster environments.
Why Mutable Pod Resources Matter
Batch and machine learning workloads often face uncertain resource requirements at Job creation time. The optimal allocation depends on real-time cluster capacity, queue priorities, and the availability of specialized hardware such as GPUs. Previously, once a Job's pod template resource fields were set, they became immutable—any change required deleting and recreating the entire Job, which caused loss of metadata, status, and history. For queue controllers like Kueue, this was a significant limitation.
With the new beta feature, queue controllers can now:
- Adjust resource allocations for suspended Jobs based on current cluster load.
- Avoid losing Job metadata or history when scaling resources.
- Enable CronJob instances to run with reduced resources under heavy load instead of failing entirely.
Consider a machine learning training Job initially requesting 4 GPUs:
apiVersion: batch/v1
kind: Job
metadata:
name: training-job-example-abcd123
labels:
app.kubernetes.io/name: trainer
spec:
suspend: true
template:
metadata:
annotations:
kubernetes.io/description: "ML training, ID abcd123"
spec:
containers:
- name: trainer
image: example-registry.example.com/training:2026-04-23T150405.678
resources:
requests:
cpu: "8"
memory: "32Gi"
example-hardware-vendor.com/gpu: "4"
limits:
cpu: "8"
memory: "32Gi"
example-hardware-vendor.com/gpu: "4"
restartPolicy: NeverA queue controller evaluating cluster capacity might discover only 2 GPUs available. With this feature, it can update the Job’s resource requests before resuming:
apiVersion: batch/v1
kind: Job
metadata:
name: training-job-example-abcd123
labels:
app.kubernetes.io/name: trainer
spec:
suspend: true
template:
metadata:
annotations:
kubernetes.io/description: "ML training, ID abcd123"
spec:
containers:
- name: trainer
image: example-registry.example.com/training:2026-04-23T150405.678
resources:
requests:
cpu: "4"
memory: "16Gi"
example-hardware-vendor.com/gpu: "2"
limits:
cpu: "4"
memory: "16Gi"
example-hardware-vendor.com/gpu: "2"
restartPolicy: NeverOnce updated, the controller resumes the Job by setting spec.suspend to false, and new Pods are created with the adjusted resource specifications.
How It Works
Under the hood, the Kubernetes API server relaxes the immutability constraint on pod template resource fields—but only for suspended Jobs. No new API types were introduced; instead, the existing Job and pod template structures accommodate this change through a targeted relaxation of validation logic.
Implementation Details
When a Job is suspended (spec.suspend: true), the API server now allows updates to spec.template.spec.containers[*].resources.requests and limits. These modifications are applied before the Job resumes, ensuring that newly created Pods use the updated resource profile. The feature is enabled by default in v1.36 due to its beta status, making it available without any special feature gate.
Practical Benefits
- No Job recreation: Adjust resources without losing Job history or associated metadata.
- Graceful degradation: CronJob instances can continue running with reduced resources instead of failing under load.
- Better scheduling: Queue controllers can optimize resource utilization across the cluster in real time.
This enhancement is particularly valuable for batch processing, ML training pipelines, and any environment where resource demands fluctuate. For more details, refer to the Kubernetes Job documentation.
Conclusion
The mutable pod resources feature for suspended Jobs in Kubernetes v1.36 (beta) marks a significant improvement in workload management. By enabling dynamic resource adjustments without Job recreation, it reduces operational overhead and increases cluster efficiency. Operators and developers using batch or ML workloads should evaluate this capability to simplify their resource orchestration strategies.
Related Articles
- Building Accurate AI Agents with Knowledge Graphs and Graph RAG: A Step-by-Step Guide
- Digital Nomads Face Infrastructure Crisis: 7 Essential Tools for 2026 Revealed
- From Local Vision to Global Reach: A Step-by-Step Blueprint for Entrepreneurs
- From Coding Novice to AI Agent Builder: A Beginner's Step-by-Step Guide to Creating a Leaderboard-Cracking AI
- Balancing School Screen Bans with Assistive Technology Needs: A Q&A
- AWS Unveils AI Agent Revolution: Quick Assistant and Amazon Connect Expansion Redefine Enterprise Workflows
- How to Understand GPT-3's Few-Shot Learning: A Step-by-Step Guide
- Red Hat's AI Skills Repository: Turning Decades of Experience into Agentic Intelligence