Mastering Kubernetes Scaling: Server-Side Shard-Based List and Watch in v1.36

As Kubernetes clusters expand beyond tens of thousands of nodes, traditional controllers that monitor high-cardinality resources like Pods encounter a scalability bottleneck. In previous setups, each replica of a horizontally scaled controller had to process the entire event stream from the API server, decoding every object even if it was irrelevant, consuming significant CPU, memory, and network resources—and scaling out only multiplied the cost. Kubernetes v1.36 addresses this with an alpha feature (KEP-5866) that introduces server-side sharded list and watch, allowing the API server to filter events at the source so each replica receives only the data slice it owns.

What problem does server-side sharded list and watch solve?

Traditional client-side sharding, used by tools like kube-state-metrics, assigns each controller replica a portion of the key space but still streams the full event list from the API server. Every replica deserializes and processes every event, discarding most of them, leading to wasted CPU, memory, and bandwidth. The network overhead scales linearly with the number of replicas, not with the useful data size. This inefficiency becomes critical in large clusters where event streams are enormous. Server-side sharding moves filtering into the API server, so only matching events are transmitted. This drastically reduces per‑replica resource usage, enables true linear scaling, and cuts network traffic, making it feasible to run many controller replicas without prohibitive overhead.

Mastering Kubernetes Scaling: Server-Side Shard-Based List and Watch in v1.36

How does the API server know which events to send to each replica?

Clients (controller replicas) specify a hash range in the shardSelector field of ListOptions. They provide a start and end hash value (e.g., '0x0000000000000000' to '0x8000000000000000'). The API server computes a deterministic 64‑bit FNV‑1a hash of a specified field, such as object.metadata.uid or object.metadata.namespace, for each resource object. It then returns only those objects whose hash falls within the range [start, end). This filtering applies both to initial list responses and to subsequent watch event streams. Because the hash is deterministic and consistent across all API server instances, this approach works reliably in multi‑master deployments.

What fields can be used for hash‑based sharding?

Currently, the feature supports two field paths: object.metadata.uid and object.metadata.namespace. The uid path provides a global distribution of objects, ideal for controllers that handle any Pod regardless of namespace. The namespace path distributes objects by namespace, which can be useful for controllers that manage per‑namespace resources. The hash function uses FNV‑1a and produces a 64‑bit value. When using uid, ensure that the objects have unique UIDs (which they do by default). You can define ranges that cover the entire hash space (0 to 2^64) by using hexadecimal endpoints.

How can I use server‑side sharded watches in my controller?

Controllers typically use informers to list and watch resources. To shard, each replica injects a shardSelector into its informer's list options via WithTweakListOptions. For example, to assign replica 0 the lower half of the keyspace, you’d use: "shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')". Replica 1 would use the complementary range. There’s no built‑in mechanism to automatically distribute ranges across replicas—you’ll need to implement your own assignment logic, typically through environment variables or a configuration system that hands each pod a unique shard index and total shard count.

What code changes are needed to adopt this feature?

First, ensure your cluster runs Kubernetes v1.36 or later and has the ServerSideShardedListWatch feature gate enabled (alpha). Then, in your controller code, import "k8s.io/apimachinery/pkg/apis/meta/v1" and "k8s.io/client-go/informers". When creating the shared informer factory, use informers.WithTweakListOptions to set the ShardSelector for each replica’s informer. For example: informers.NewSharedInformerFactoryWithOptions(client, resyncPeriod, informers.WithTweakListOptions(func(opts *metav1.ListOptions) { opts.ShardSelector = shardSelector })). The rest of the controller logic remains unchanged because informers already handle the filtered events transparently.

Does server‑side sharding work with multiple API server replicas?

Yes, because the hash computation (FNV‑1a) is deterministic and produces the same result on every API server instance. When a client sends a watch request with a shardSelector, any API server can fulfill it and will filter events identically. This ensures that sharded watches are safe and consistent in clusters with multiple API server replicas behind a load balancer. However, note that the feature is still alpha in v1.36, so it may not be enabled by default and could have performance implications or other limitations.

What are the benefits over client‑side sharding?

Client‑side sharding reduces the processing burden on each replica only after data arrives—each replica still downloads the entire event stream, deserializes all objects, and then discards most. This wastes network bandwidth, CPU, and memory linearly with the number of replicas. Server‑side sharding eliminates this waste by having the API server send only the relevant events. As a result, the network per replica is proportional to the shard size, not the total resource count. CPU costs for deserialization are also cut because fewer objects are transmitted. This makes horizontal scaling truly efficient, allowing controllers to handle much larger clusters without hitting resource ceilings.

Tags: