Skip to content

Configure the autoscaling feature

Autoscaling enables your application to handle varying loads efficiently by dynamically adjusting the number of replicas. Depending on your needs, you can choose between:

Autoscale with HPA (Horizontal Pod Autoscaler)

Section titled “Autoscale with HPA (Horizontal Pod Autoscaler)”

HPA allows you to scale your application based on standard resource metrics, such as CPU or memory utilization. Below is an example of configuring an HPA resource:

hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: sample-autoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: sample-deployment
minReplicas: 1
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75

Find more guidance at horizontal-pod-autoscale.

Keda takes autoscaling a step further by enabling event-driven scaling based on custom metrics, such as queue length, database events, or Prometheus queries. Below is an example configuration using Prometheus as a trigger:

scaled-object.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: sample-keda-scaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: sample-deployment
minReplicaCount: 2
maxReplicaCount: 10
triggers:
- type: prometheus
metadata:
serverAddress: http://kube-prometheus-stack-prometheus.k8saas-system.svc.cluster.local.:9090
metricName: nginx_ingress_requests
threshold: '30'
query: |
sum(rate(nginx_ingress_controller_requests{namespace="sample-namespace",service="sample-service",status!~"[4-5].*"}[2m]))
activationThreshold: '5'

This PromQL query calculates the rate of successful requests to the NGINX Ingress Controller for a specific service within the sample-namespace namespace. It watches requests over the past 2 minutes, excluding the ones with 4xx and 5xx HTTP status codes. The sum of rates across all matching time series is compared to the threshold. If the threshold is met, the trigger fires.

Find more guidance at Keda scaled object specifications.

Note: Make sure to switch the documentation version to your Keda version.

To confirm that autoscaling works as expected, follow these steps:

  1. Load Testing: Use a load testing tool (such as k6 or Locust) to generate traffic and test your application’s ability to scale.
  2. Check Pod Status: Run the following command to verify if additional pods have been created to handle the load:
Terminal window
kubectl get pods