Skip to main content

· 6 min read

Karmada is an open multi-cloud and multi-cluster container orchestration engine designed to help users deploy and operate business applications in a multi-cloud environment. With its compatibility with the native Kubernetes API, Karmada can smoothly migrate single-cluster workloads while still maintaining coordination with the surrounding Kubernetes ecosystem tools.

Karmada v1.14 has been released, this version includes the following new features:

  • Introduces federated ResourceQuota enforcement capabilities for multi-tenant resource governance scenarios.
  • Add customized taint management capabilities to eliminate implicit cluster failure migration.
  • Continued improvements in the Karmada Operator.
  • Significantly improve the performance of the Karmada controllers.

Overview of New Features

Federated ResourceQuota Enhancement

In multi-tenant cloud infrastructures, quota management is crucial for ensuring fair resource allocation and preventing overuse. Especially in multi-cloud and multi-cluster environments, fragmented quota systems often lead to difficulties in resource monitoring and management silos. Therefore, implementing cross-cluster federated quota management has become a key factor in improving resource governance efficiency.

Previously, Karmada allocated global quotas to member clusters via FederatedResourceQuota, with each cluster enforcing quotas locally. This version enhances federated quota management by introducing control-plane global quota checking mechanism, enabling direct global resource quota validation at the control plane.

This feature is particularly suitable for:

  • Scenarios where you need to track resource consumption and limits from a unified location without worrying about cluster-level allocations.
  • Cases where you want to prevent over-submission of tasks by validating quota limits.

Note: This feature is currently in Alpha stage and requires enabling the FederatedQuotaEnforcement Feature Gate to use.

To set an overall CPU limit of 100, you can define it as follows:

apiVersion: policy.karmada.io/v1alpha1
kind: FederatedResourceQuota
metadata:
name: team-foo
namespace: team-foo
spec:
overall:
cpu: 100

Once applied, Karmada will start monitoring and enforcing the CPU resource limit in the test namespace. If you deploy a new Deployment requiring 20 CPUs, the federatedResourceQuota status will update as follows:

spec:
overall:
cpu: 100
status:
overall:
cpu: 100
overallUsed:
cpu: 20

If your resource request exceeds the 100 CPU limit, the resource will not be scheduled to your member clusters.

For detailed usage of this feature, refer to the feature documentation: Federated ResourceQuota.

Customized Taint Management

In versions prior to v1.14, when users enabled failover, the system would automatically add a NoExecute effect taint to the cluster upon detecting abnormal health status, triggering migration of all resources on the target cluster.

In this version, we conducted a comprehensive review of potential migration triggers in the system. All implicit cluster failover behaviors have been eliminated, and explicit constraints for cluster failure mechanisms have been introduced. This enables unified management of resource migrations triggered by cluster failures, further enhancing system stability and predictability.

Cluster failure conditions are determined by evaluating the status conditions of faulty cluster objects to apply taints, a process referred to as "Taint Cluster By Conditions." This version introduces a new API - ClusterTaintPolicy, which allows users to define custom rules for adding specific taints to target clusters when predefined cluster status conditions are met.

Cluster Taint Management

For more complex cluster failure judgment scenarios, users can directly implement a custom "cluster taint controller" to control how taints are added or removed from cluster objects.

ClusterTaintPolicy is a cluster-scoped resource. Here's a simple example demonstrating its usage:

apiVersion: policy.karmada.io/v1alpha1
kind: ClusterTaintPolicy
metadata:
name: detect-cluster-notready
spec:
targetClusters:
clusterNames:
- member1
- member2
addOnConditions:
- conditionType: Ready
operator: NotIn
statusValues:
- "True"
- conditionType: NetworkAvailable
operator: NotIn
statusValues:
- "True"
removeOnConditions:
- conditionType: Ready
operator: In
statusValues:
- "True"
- conditionType: NetworkAvailable
operator: In
statusValues:
- "True"
taints:
- key: not-ready
effect: NoSchedule
- key: not-ready
effect: NoExecute

The example above describes a ClusterTaintPolicy resource for member1 and member2 clusters. When the cluster's status conditions simultaneously meet Type as Ready and NetworkAvailable with condition values not equal to True, taints {not-ready:NoSchedule} and {not-ready:NoExecute} are added to the target cluster. When the cluster's status conditions simultaneously meet Type as Ready and NetworkAvailable with condition values equal to True, the taints {not-ready:NoSchedule} and {not-ready:NoExecute} are removed from the target cluster.

For detailed usage of this feature, refer to the feature documentation: Cluster Taint Management.

Karmada Operator Continuous Enhancement

This version continues to enhance the Karmada Operator with the following new features:

  • Support for configuring Leaf certificate validity periods.
  • Support for pausing reconciliation in the Karmada control plane.
  • Support for configuring feature gates for the karmada-webhook component.
  • Support for executing loadBalancerClass for the karmada-apiserver component to select specific load balancing implementations.
  • Introduction of karmada_build_info metrics to display build information, along with a set of runtime metrics.

These improvements make the karmada-operator more flexible and customizable, enhancing the reliability and stability of the entire Karmada system.

Karmada Performance Optimization of the Karmada Controllers

Since the release of v1.13, Karmada adopters have spontaneously organized efforts to optimize Karmada's performance. A stable and ongoing performance optimization team, SIG-Scalability, has now been established, dedicated to improving Karmada's performance and stability. We thank all participants for their efforts. If you're interested, you're welcome to join at any time.

In this version, Karmada has achieved significant performance improvements, particularly in the karmada-controller-manager component. To validate these improvements, the following test setup was implemented:

The test setup included 5000 Deployments, each paired with a corresponding PropagationPolicy that scheduled them to two member clusters. Each Deployment also depended on a unique ConfigMap, which was propagated to the same cluster as the Deployment. These resources were created while the karmada-controller-manager component was offline, meaning Karmada performed their initial synchronization during the test. The test results are as follows:

  • Cold start time (clearing the work queue) was reduced from approximately 7 minutes to about 4 minutes, a 45% improvement.
  • Resource detector: Maximum average processing time decreased from 391 ms to 180 ms (54% improvement).
  • Dependency distributor: Maximum average processing time decreased from 378 ms to 216 ms (43% improvement).
  • Execution controller: Maximum average processing time decreased from 505 ms to 248 ms (50% improvement).

In addition to faster processing speeds, resource consumption was significantly reduced:

  • CPU usage decreased from 4-7.5 cores to 1.8-2.4 cores (40%-65% reduction).
  • Peak memory usage decreased from 1.9 GB to 1.47 GB (22% reduction).

These results demonstrate that Karmada controller performance has been greatly enhanced in the v1.14 release. Moving forward, we will continue systematic performance optimizations for controllers and schedulers.

For detailed test reports, refer to [Performance] Overview of performance improvements for v1.14.

Acknowledging Our Contributors

The Karmada v1.14 release includes 271 code commits from 30 contributors. We would like to extend our sincere gratitude to all the contributors:

^-^^-^^-^
@Arhell@baiyutang@chaosi-zju
@CharlesQQ@dongjiang1989@everpeace
@husnialhamdani@ikaven1024@jabellard
@liangyuanpeng@likakuli@LivingCcj
@liwang0513@MdSayemkhan@mohamedawnallah
@mojojoji@mszacillo@my-git9
@Pratham-B-Parlecha@RainbowMango@rajsinghtech
@seanlaii@tangzhongren@tiansuo114
@vie-serendipity@warjiang@whosefriendA
@XiShanYongYe-Chang@zach593@zhzhuang-zju

karmada v1.14 contributors

· 2 min read

Community post cross-posted on the OSTIF blog and CNCF blog

OSTIF is proud to share the results of our security audit of Karmada. Karmada is an open source Kubernetes orchestration system for running cloud-native applications seamlessly across different clouds and clusters. With the help of Shielder and the Cloud Native Computing Foundation (CNCF), this project offers users improved open, multi-cloud, multi-cluster Kubernetes management.

Audit Process:

While Karmada is a part of the Kubernetes ecosystem and therefore utilizes Kubernetes libraries and implementations, the focus of this particular work was on the overall security health of the custom implementations of Karmada and its third party dependencies. Karmada’s function utilizes multiple components, CLI tools, and add ons to extend the standard Kubernetes features, which can be customized from deployment to deployment. This makes Karmada’s attack scenarios complex, so it was necessary to perform a scoped threat modelling in order to evaluate potential attack surfaces. Utilizing this custom threat model and a combination of manual, tooling, and dynamic review, Shielder identified six findings with security impact on the project.

Audit Results:

  • 6 Findings
    • 1 High, 1 Medium, 2 Low, 2 Informational
  • Recommendations for Future Efforts
  • Proposal for Long-term Improvements to Overall Security

The Karmada maintainer team worked quickly and in tandem with Shielder to resolve and fix the reported issues. Their work on behalf of the project was meticulous and mindful of users as well as relevant third-party dependencies and projects. They published necessary advisories and alerted users as to the impact and resolution of this audit. OSTIF wishes them the best of luck on their journey to graduated status with the CNCF.

Thank you to the individuals and groups that made this engagement possible:

  • Karmada maintainers and community: especially Kevin Wang, Hongcai Ren, and Zhuang Zhang
  • Shielder: Abdel Adim “Smaury” Oisfi, Pietro Tirenna, Davide Silvetti
  • The Cloud Native Computing Foundation

References:

  1. CNCF (Announcing the results of the Karmada security audit): https://www.cncf.io/blog/2025/01/16/announcing-the-results-of-the-karmada-security-audit/
  2. Audit Report: https://ostif.org/wp-content/uploads/2025/01/OSTIF-Karmada-Report-PT-v1.1.pdf
  3. Shielder: https://www.shielder.com/blog/2025/01/karmada-security-audit/

· 6 min read

Karmada is an open multi-cloud and multi-cluster container orchestration engine designed to help users deploy and operate business applications in a multi-cloud environment. With its compatibility with the native Kubernetes API, Karmada can smoothly migrate single-cluster workloads while still maintaining coordination with the surrounding Kubernetes ecosystem tools.

This version includes the following new features:

  • Supports cross-cluster rolling upgrades of federated workloads, making the user's version release process more flexible and controllable.
  • karmadactl has added multiple operational capabilities, providing a unique multi-cluster operational experience.
  • It provides standardized generation semantics for federated workloads, enabling CD execution in one step.
  • Karmada Operator supports custom CRD download strategies, making offline deployment more flexible.

Cross-Cluster Rolling Upgrade of Federated Workloads

In the latest released v1.11 version, Karmada has added the feature of cross-cluster rolling upgrades for federated workloads. This feature is particularly suitable for workloads deployed across multiple clusters, allowing users to adopt more flexible and controllable rolling upgrade strategies when releasing new versions of their workloads. Users can finely control the upgrade process to ensure a smooth transition for each cluster during the upgrade, minimizing the impact on the production environment. This feature not only enhances the user experience but also provides more flexibility and reliability for complex multi-cluster management.

Below is an example to demonstrate how to perform a rolling upgrade on federated workloads:

Assuming that the user has already propagated the Deployment to three member clusters through PropagationPolicy: ClusterA, ClusterB, ClusterC:

apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
name: nginx-propagation
spec:
resourceSelectors:
- apiVersion: apps/v1
kind: Deployment
name: nginx
placement:
clusterAffinity:
clusterNames:
- ClusterA
- ClusterB
- ClusterC

rollout step 0

At this point, the version of the Deployment is v1. To upgrade the Deployment resource version to v2, you can perform the following steps in sequence.

Firstly, configure the PropagationPolicy to temporarily halt the propagation of resources to ClusterA and ClusterB, so that the deployment changes will only occur in ClusterC:

apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
name: nginx-propagation
spec:
#...
suspension:
dispatchingOnClusters:
clusterNames:
- ClusterA
- ClusterB

rollout step 1

Then, update the PropagationPolicy resource to allow the system to synchronize the new version of the resources to the ClusterB cluster:

  suspension:
dispatchingOnClusters:
clusterNames:
- ClusterA

rollout step 2

Finally, remove the suspension field from the PropagationPolicy resource to allow the system to synchronize the new version of the resources to the ClusterA cluster:

rollout step 3

From the example above, we can see that by using the cross-cluster rolling upgrade capability of federated workloads, the new version of the workload can be rolled out cluster by cluster, and precise control can be achieved.

Additionally, this feature can also be applied to other scenarios, particularly for developers facing situations where resources are frequently updated due to competition between the Karmada control plane and member clusters for resource control. In such cases, suspending the synchronizing process of resources to member clusters can facilitate the quick identification of issues.

Enhancements to karmadactl

In this version, the Karmada community has focused on enhancing Karmadactl capabilities to provide a better multi-cluster operations experience, thereby reducing users' reliance on kubectl.

A More Extensive Command Set

Karmadactl now supports a richer command set including create, patch, delete, label, annotate, edit, attach, top node, api-resources, and explain. These commands allow users to perform more operations on resources either on the Karmada control plane or member clusters.

Enhanced Functionality

Karmadactl introduces the --operation-scope parameter to control the scope of command operations. With this new parameter, commands such as get, describe, exec, and explain can flexibly switch between cluster perspectives to operate on resources in the Karmada control plane or member clusters.

More Detailed Command Output Information

The output of the karmadactl get cluster command now includes additional details such as the cluster object's Zones, Region, Provider, API-Endpoint, and Proxy-URL.

Through these capability enhancements, the operational experience with karmadactl has been improved. New features and more detailed information about karmadactl can be accessed using karmadactl --help.

Standardization of Federation Workload Generation Semantics

In this version, Karmada has standardized the generation semantics of workload at the federation level. This update provides a reliable reference for the release system, enhancing the accuracy of cross-cluster deployments. By standardizing generation semantics, Karmada simplifies the release process and ensures consistent tracking of workload status, making it easier to manage and monitor applications across multiple clusters.

The specifics of the standardization are as follows: the observedGeneration value in the status of the federated workload is set to its own .metadata.generation value only when the state of resources distributed to all member clusters satisfies status.observedGeneration >= metadata.generation. This ensures that the corresponding controllers in each member cluster have completed processing of the workload. This move aligns the generation semantics at the federation level with those of Kubernetes clusters, allowing users to more conveniently migrate single-cluster applications to a multi-cluster setup.

The following resources have been adapted in this version:

  • GroupVersion: apps/v1 Kind: Deployment, DaemonSet, StatefulSet
  • GroupVersion: apps.kruise.io/v1alpha1 Kind: CloneSet, DaemonSet
  • GroupVersion: apps.kruise.io/v1beta1 Kind: StatefulSet
  • GroupVersion: helm.toolkit.fluxcd.io/v2beta1 Kind: HelmRelease
  • GroupVersion: kustomize.toolkit.fluxcd.io/v1 Kind: Kustomization
  • GroupVersion: source.toolkit.fluxcd.io/v1 Kind: GitRepository
  • GroupVersion: source.toolkit.fluxcd.io/v1beta2 Kind: Bucket, HelmChart, HelmRepository, OCIRepository

If you need to adapt more resources (including CRDs), you can provide feedback to the Karmada community or extend using the Resource Interpreter.

Karmada Operator Supports Custom CRD Download Strategies

CRD (Custom Resource Definition) resources are key prerequisite resources used by the Karmada Operator to configure new Karmada instances. These CRD resources contain critical API definitions for the Karmada system, such as PropagationPolicy, ResourceBinding, and Work.

In version v1.11, the Karmada Operator supports custom CRD download strategies. With this feature, users can specify the download path for CRD resources and define additional download strategies, providing a more flexible offline deployment method.

For a detailed description of this feature, refer to the proposal: Custom CRD Download Strategy Support for Karmada Operator.

Acknowledging Our Contributors

The Karmada v1.11 release includes 223 code commits from 36 contributors. We would like to extend our sincere gratitude to all the contributors:

karmada v1.11 contributors

· 18 min read

Abstract

Cloud native implementations, growing in scale and complexity, are challenging organizations on how to efficiently, reliably manage large-scale resource pools to meet growing demands. Players in the cloud field attempted to scale out single clusters by customizing native Kubernetes components, which complicated single-cluster operations and maintenance, beclouded cluster upgrade paths, let alone many other problems. This is where multi-cluster technologies come into play. They can scale resource pools horizontally without invasively modifying each single cluster, while reducing O&M costs.

The popularity of Karmada is now drawing users' attention to Karmada's scalability and deployment at scale. Therefore, we launched a large-scale test on Karmada to obtain baseline performance metrics for Karmada managing multiple Kubernetes clusters. For multi-cluster systems represented by Karmada, the size of a single cluster is not a limiting factor restricting the scalability. On that account, we referred to the standard configurations of Kubernetes large-scale clusters and real-world implementations, and tested Karmada on managing 100 Kubernetes clusters (each cluster containing 5k nodes and 20k pods) at the same time. Limited by the environment and tooling, this test is not designed for stress testing Karmada, but for using Karmada in typical multi-cluster scenarios in production. The test results show that Karmada can stably support 100 large-scale clusters with 500,000 nodes connected at the same time, running more than 2 million pods.

This article will introduce the metrics used in the test, how to conduct large-scale testing, and how we realize massive connection of nodes and clusters.

Background

Cloud computing is entering a new stage featuring multicloud and distributed clouds. As surveyed by Flexera, a well-known analyst company, more than 93% of enterprises are using services from multiple cloud vendors at the same time. Single Kubernetes clusters, limited by their capacity and fault recovery capabilities, cannot support services to run as distributed as wanted, especially if one's organization wants to go globalization. A hybrid cloud or multi-public cloud architecture helps avoid vendor lock-in or optimize costs. Karmada users are also demanding large-scale node and application management in their multi-cluster deployments.

· 11 min read

Karmada是开放的多云多集群容器编排引擎,旨在帮助用户在多云环境下部署和运维业务应用。凭借兼容Kubernetes原生API的能力,Karmada可以平滑迁移单集群工作负载,并且仍可保持与Kubernetes周边生态工具链协同。

在最新发布的1.3版本中,Karmada重新设计了应用跨集群故障迁移功能,实现了基于污点的故障驱逐机制,并提供平滑的故障迁移过程,可以有效保障服务迁移过程的连续性(不断服)。

本版本新增加的特性:

  • 增加了面向多集群的资源代理新特性,通过该代理平台业务方可以在不感知多集群的情况下,以单集群访问姿势直接操纵部署在多集群的工作负载;
  • 提供针对集群资源建模能力,通过自定义的集群资源模型,调度器可以更精准地进行资源调度;
  • 提供基于Bootstrap令牌来注册Pull模式集群的能力,不仅可以简化集群注册过程,还可以方便地进行权限控制;

此外,基于生产环境的用户反馈,本版本还进行了诸多性能优化,系统运行过程中CPU和内存资源需求大大降低,详细的性能测试报告稍后发布。

与之前版本一样,v1.3与前面的版本仍然保持兼容,前面版本的用户仍可以平滑升级。

· 9 min read

In terms of multi-cluster management, Industrial and Commercial Bank of China (ICBC) found a new way to do it efficiently, that is, using Karmada. At KubeCon 2021, Kevin Wang from Huawei Cloud and Shen Yifan from ICBC shared how they managed it.