Configure the autoscaling feature
Autoscaling enables your application to handle varying loads efficiently by dynamically adjusting the number of replicas. Depending on your needs, you can choose between:
- HPA (Horizontal Pod Autoscaler): Standard resource-based autoscaling.
- Keda (Kubernetes-based Event Driven Autoscaler): Advanced autoscaling based on external or event-driven metrics.
Autoscale with HPA (Horizontal Pod Autoscaler)
Section titled “Autoscale with HPA (Horizontal Pod Autoscaler)”HPA allows you to scale your application based on standard resource metrics, such as CPU or memory utilization. Below is an example of configuring an HPA resource:
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: sample-autoscalerspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: sample-deployment minReplicas: 1 maxReplicas: 4 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 75Find more guidance at horizontal-pod-autoscale.
More metrics with Keda
Section titled “More metrics with Keda”Keda takes autoscaling a step further by enabling event-driven scaling based on custom metrics, such as queue length, database events, or Prometheus queries. Below is an example configuration using Prometheus as a trigger:
apiVersion: keda.sh/v1alpha1kind: ScaledObjectmetadata: name: sample-keda-scalerspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: sample-deployment minReplicaCount: 2 maxReplicaCount: 10 triggers: - type: prometheus metadata: serverAddress: http://kube-prometheus-stack-prometheus.k8saas-system.svc.cluster.local.:9090 metricName: nginx_ingress_requests threshold: '30' query: | sum(rate(nginx_ingress_controller_requests{namespace="sample-namespace",service="sample-service",status!~"[4-5].*"}[2m])) activationThreshold: '5'This PromQL query calculates the rate of successful requests to the NGINX Ingress Controller
for a specific service within the sample-namespace namespace.
It watches requests over the past 2 minutes, excluding the ones with 4xx and 5xx HTTP status codes.
The sum of rates across all matching time series is compared to the threshold. If the threshold is met, the trigger fires.
Find more guidance at Keda scaled object specifications.
Note: Make sure to switch the documentation version to your Keda version.
Validate integration
Section titled “Validate integration”To confirm that autoscaling works as expected, follow these steps:
- Load Testing: Use a load testing tool (such as k6 or Locust) to generate traffic and test your application’s ability to scale.
- Check Pod Status: Run the following command to verify if additional pods have been created to handle the load:
kubectl get pods