Kubernetes 1.36: Revolutionizing Workload-Aware Scheduling with 6 Key Advancements

Kubernetes 1.36 introduces a wave of enhancements for workload-aware scheduling, specifically targeting AI/ML and batch jobs. Building on the foundation laid in version 1.35, this release cleanly separates API concerns, introduces atomic processing via PodGroups, and brings topology awareness and preemption to the scheduler. These improvements streamline scheduling for complex workloads, reduce overhead, and pave the way for future scaling. Below, we explore the six most impactful features in detail.

1. Decoupled Workload and PodGroup APIs

The most significant architectural change in v1.36 is the separation of the Workload and PodGroup APIs. Previously, in v1.35, runtime states were embedded within the Workload resource. Now, the Workload acts solely as a static template, while the PodGroup manages dynamic runtime data. This decoupling enhances performance and scalability by enabling per-replica sharding of status updates. The scheduler no longer needs to parse the Workload object; it reads only the PodGroup, which contains all necessary scheduling information.

Kubernetes 1.36: Revolutionizing Workload-Aware Scheduling with 6 Key Advancements

For example, a Job controller defines a Workload with podGroupTemplates like workers and specifies gang scheduling with a minCount of 4. Controllers then stamp out independent PodGroup instances based on those templates. This simplification reduces scheduler overhead and makes the system more robust for large-scale deployments.

2. Atomic PodGroup Scheduling Cycle

The kube-scheduler now features a dedicated PodGroup scheduling cycle, enabling atomic processing of entire workload groups. Instead of scheduling pods one by one, the scheduler considers all pods in a PodGroup holistically. This is crucial for gang scheduling where all pods must run simultaneously (e.g., in distributed training). The new cycle ensures that either all pods in the group are scheduled together or none are, preventing partial resource allocation that could stall jobs.

This atomic approach also lays the groundwork for future enhancements like advanced batching and resource optimization. In practice, it means faster scheduling decisions and more reliable execution for batch jobs, especially those requiring strict pod co-location or consensus.

3. Topology-Aware Scheduling

Kubernetes 1.36 introduces the first iteration of topology-aware scheduling for PodGroups. This feature helps optimize workload placement by considering hardware topology (e.g., NUMA nodes, CPU sockets, or GPU proximity). For AI/ML workloads that demand low-latency inter-pod communication, placing pods on the same node or rack reduces network hops and improves performance.

The scheduler now respects topology constraints defined in the PodGroup spec, such as preferred zones or node affinity. While still in early stages, this capability enables operators to fine-tune scheduling for latency-sensitive applications. Future releases will expand topology awareness to more advanced scenarios like chain scheduling or resource isolation.

4. Workload-Aware Preemption

Preemption in Kubernetes has traditionally been based on pod priority, often causing unnecessary disruptions for batch workloads. v1.36 introduces workload-aware preemption, which considers the entire PodGroup when deciding which pods to evict. The scheduler now preempts pods in a way that minimizes impact on gang-scheduled jobs—for example, avoiding breaking a group unless absolutely necessary.

This improvement reduces the risk of partial preemption that could tear apart a running job. Instead, the scheduler can choose to preempt entire groups or lower-priority workloads that don't interfere with critical batch processes. The result is more stable and predictable scheduling for high-value AI/ML jobs running alongside other workloads.

5. Dynamic Resource Allocation for PodGroups

ResourceClaim support for workloads unlocks Dynamic Resource Allocation (DRA) within PodGroups. This means PodGroups can now request specialized hardware (e.g., GPUs, FPGAs, or NICs) in a declarative manner. The scheduler works with the DRA mechanism to allocate these resources atomically to all pods in the group, ensuring consistency.

For example, a training job can specify a ResourceClaim for 4 GPUs, and the scheduler ensures all workers in the PodGroup receive those GPUs before starting. This eliminates resource mismatch issues common in large-scale distributed training. It also simplifies cluster management by abstracting hardware details away from pod definitions.

6. First Phase of Job Controller Integration

To demonstrate real-world readiness, v1.36 ships the first phase of integration between the Kubernetes Job controller and the new Workload/PodGroup APIs. Jobs can now leverage the new scheduling improvements natively, without custom controllers. This means any existing Job-based workload (e.g., batch processing, ML training) can automatically benefit from atomic scheduling, topology awareness, and preemption.

The integration is backward-compatible, so users don't need to rewrite their job definitions. Over time, the Job controller will adopt more advanced features like dynamic scaling and intelligent retry logic based on PodGroup states. This milestone reduces friction for teams migrating to updated scheduling paradigms.

Kubernetes 1.36 marks a pivotal step toward smarter, workload-aware scheduling. By decoupling APIs, introducing atomic cycles, and adding topology/preemption smarts, the platform becomes far more capable for AI/ML and batch workloads. Cluster administrators should evaluate these features to optimize performance and reliability. Future releases will build on this foundation, promising even greater efficiency and ease of use.

Tags: