Version: 3.1

Automatically rebalance Portworx storage pools

You can use Autopilot to rebalance Portworx storage pools automatically when they begin to run out of space.

Autopilot monitors the metrics in your cluster (for example, via Prometheus) and detects conditions that require rebalancing of existing volumes in the cluster.

Prerequisites

Portworx version: Autopilot uses Portworx APIs to rebalance storage pools which is available only in Portworx 2.6.0 and above
Autopilot version: 1.3.0 and above

Example

The following example Autopilot rule will rebalance all storage pools which meet either of following conditions:

Pool's provision space is over 120%

Pool's used space is over 60%

apiVersion: autopilot.libopenstorage.org/v1alpha1
kind: AutopilotRule
metadata:
  name: pool-rebalance-absolute
spec:
  conditions:
    requiredMatches: 1
    expressions:
    - key: 100 * (px_pool_stats_provisioned_bytes/ on (pool) px_pool_stats_total_bytes) 
      operator: Gt 
      values:
        - "120"
    - key: 100 * (px_pool_stats_used_bytes/ on (pool) px_pool_stats_total_bytes) 
      operator: Gt 
      values:
        - "70"
  actions:
    - name: "openstorage.io.action.storagepool/rebalance"

The AutopilotRule spec consists of two important sections: conditions and actions.

The conditions section establishes threshold criteria dictating when the rule must perform its action. In this example, that criteria contains 2 formulas:

100 * (px_pool_stats_provisioned_bytes/ on (pool) px_pool_stats_total_bytes) is a prometheus query that gives a storage pool's provisioned space percentage
- The Gt operator checks if the value of the metric is greater than 120%.
100 * (px_pool_stats_used_bytes/ on (pool) px_pool_stats_total_bytes) is a prometheus query that gives a storage pool's used space percentage
- The Gt operator checks if the value of the metric is greater than 70%.
requiredMatches indicates that only one of the expressions need to match for the conditions to be considered as being met.

The actions section specifies what action Portworx performs when the conditions are met. The action name here is the Storage Pool rebalance action.

Perform the following steps to deploy this example:

Create specs

note

Other rebalance rules: If you have other AutopilotRules in the cluster for pool rebalance, Portworx by Pure Storage recommends you delete them for this test. This will make it easier to confirm that the rule in this example was triggered.
TESTING ONLY: The specs below cause all volumes to initially land on a single Portworx node. This allows you to test the rebalance rule later on, and rebalance the volumes across all nodes.

Application and PVC specs

First, create the storage and application spec files:

Identify the ID of a single Portworx node in the cluster.
List the cluster nodes and pick the first node. In this example, we will pick the first node 073ae0c7-d5e8-4c6c-982e-75339f2ada81 in the list.
```
PX_POD=$(kubectl get pods -l name=portworx -n kube-system -o jsonpath='{.items[0].metadata.name}')
kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl cluster provision-status --output-type wide
```
```
NODE                                    NODE STATUS     POOL                                            POOL STATUS     IO_PRIORITY     SIZE    AVAILABLE       USED (MEAN-DIFF %)      PROVISIONED (MEAN-DIFF %)       ZONE    REGION  RACK
073ae0c7-d5e8-4c6c-982e-75339f2ada81    Up              0 ( e24c4dbe-4f80-48db-8a1d-17ae2e459fcc )      Online          HIGH            30 GiB  26 GiB          3.5 GiB ( +0 % )        0 B ( +0 % )                    AZ1     default default
6eec1f0a-2679-41a7-a541-bc5f9dec52d9    Up              0 ( 4a8ec973-219b-48da-b0a1-b3f45e843789 )      Online          HIGH            30 GiB  26 GiB          3.5 GiB ( +0 % )        0 B ( +0 % )                    AZ1     default default
a53dbf82-faca-40a2-a7bc-3f1c397f1516    Up              0 ( 44fbba64-4b10-4fa4-974b-b6dcf491ed11 )      Online          HIGH            30 GiB  26 GiB          3.5 GiB ( +0 % )        0 B ( +0 % )                    AZ1     default default
a9cfa4ec-cf49-49f5-bdde-72cf2818e808    Up              0 ( 46ede4d0-a1ec-4758-9823-23293fd82f61 )      Online          HIGH            30 GiB  26 GiB          3.5 GiB ( +0 % )        0 B ( +0 % )                    AZ1     default default
```
Know how used and provisioned mean difference percentages are calculated!
The mean difference provides the variance of each pool's usage compared to the overall cluster usage, providing insights for capacity planning and resource allocation.
- In the USED (MEAN-DIFF %) column: the first value indicates the amount of space currently being used in a specific pool. The second value, known as the mean difference percentage, represents how the pool's used space percentage deviates from the average used space percentage across the entire cluster.
- In the PROVISIONED (MEAN-DIFF %) column: the first value shows the space allocated (provisioned) in the pool, while the second value highlights the deviation of this pool's provisioned space percentage from the cluster average.
To understand this better, let's consider an example calculation of the used space mean difference for three different pools:
- Pool-1: Total size: 100G, Used space: 60G
- Pool-2: Total size: 200G, Used space: 90G
- Pool-3: Total size: 300G, Used space: 90G
First the used space percentage for each pool is calculated:
Pool-1: 60G/100G = 60%, Pool-2: 90G/200G = 45%, Pool-3: 90G/300G = 30%
Next, the cluster-wide mean percentage of used space is calculated by dividing the sum of used space from all pools by the sum of their total sizes.
(60G+90G+90G) / (100G+200G+300G) = 240G / 600G = 40%
Finally, the MEAN-DIFF % for each pool is calculated taking the difference between its individual used space percentage and the cluster-wide mean percentage:
- Pool-1: 60% - 40% = +20%
- Pool-2: 45% - 40% = +5%
- Pool-3: 30% - 40% = -10%

Create postgres-sc.yaml and place the following content inside it:
```
##### Portworx storage class
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: postgres-pgbench-sc
provisioner: kubernetes.io/portworx-volume
parameters:
  repl: "1"
  nodes: "073ae0c7-d5e8-4c6c-982e-75339f2ada81"
allowVolumeExpansion: true
```
note
Notice how the nodes section pin the volumes from this StorageClass to initially land only on 073ae0c7-d5e8-4c6c-982e-75339f2ada81. You should use this for testing only, and you must change the value to suit your environment.:::

Create postgres-vol.yaml and place the following content inside it.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pgbench-data
  labels:
    app: postgres
spec:
  storageClassName: postgres-pgbench-sc
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 30Gi

You will not deploy any application pod using this PVC. This tutorial only demonstrates rebalancing the pools.

Create the StorageClass and create 3 PVCs in 3 unique namespaces.
In the cluster used in example, each node has a 30Gi pool. So creating 2 30Gi volumes on a single node will cause its provisioned space percentage to be 200%. This will trigger the rebalance rule.
Update the PVC size in the storage field in above spec as per the pool sizes in your cluster.
```
kubectl apply -f postgres-sc.yaml

for i in {1..2}; do
  kubectl create ns pg$i || true
  kubectl apply -f postgres-vol.yaml -n pg$i
done
```
Wait until all PVCs are bound and confirm that one pool has all the volumes.
The output from the following commands should show all PVCs as bound:
```
kubectl get pvc -n pg1
kubectl get pvc -n pg2
```
The output from this command should show that the provisioned space for the pool for the Portworx node that you selected in Step 1 has gone up since all the volumes are created there. You will see this in the PROVISIONED column of the output:
```
kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl cluster provision-status --output-type wide
```

AutopilotRule spec

Once you've created the PVCs, you can create an AutopilotRule to rebalance the pools.

Create a YAML spec for the autopilot rule named autopilotrule-pool-rebalance-example.yaml and place the following content inside it:

apiVersion: autopilot.libopenstorage.org/v1alpha1
kind: AutopilotRule
metadata:
  name: pool-rebalance-absolute
spec:
  conditions:
    requiredMatches: 1
    expressions:
    - key: 100 * (px_pool_stats_provisioned_bytes/ on (pool) px_pool_stats_total_bytes) 
      operator: Gt 
      values:
        - "120"
    - key: 100 * (px_pool_stats_used_bytes/ on (pool) px_pool_stats_total_bytes) 
      operator: Gt 
      values:
        - "70"
  actions:
    - name: "openstorage.io.action.storagepool/rebalance"

Apply the rule

kubectl apply -f autopilotrule-pool-rebalance-example.yaml

Monitor

Now that you've created the rule, Autopilot will detect that one specific pool is over-provisioned and it will start rebalancing the 3 volumes across the pools.

Enter the following command to retrieve all the events generated for the pool-rebalance rule:

kubectl get events --field-selector involvedObject.kind=AutopilotRule,involvedObject.name=pool-rebalance-absolute --all-namespaces --sort-by .lastTimestamp

You should see events that will show the rule has triggered. About 30 seconds later, the rebalance actions will begin.

Once you see actions have begun on the pools, you can use pxctl to again check the cluster provision status.

kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl cluster provision-status --output-type wide

Above command should now show that the provisioned space for all your pools are balanced and spread evenly. You will see this in the PROVISIONED column of the output.

Prerequisites​

Example​

Create specs​

Application and PVC specs​

AutopilotRule spec​

Monitor​