Failback an application
Failback is the process of moving the application and its data back to the source cluster once the source cluster is restored and operational again.
Once your unhealthy Kubernetes cluster is back up and running, the Portworx nodes in that cluster will not immediately rejoin the cluster. They will stay in
Out of Quorum
state until you explicitly Activate this cluster domain.
After this domain is marked as Active you can failback the applications if you want.
The following considerations are used in the examples on this page. Update them to the appropriate values for your environment:
- Source Cluster is the Kubernetes cluster which is down and where your applications were originally running. The cluster domain for this source cluster is
us-east-1a
. - Destination Cluster is the Kubernetes cluster where the applications will be failed over. The cluster domain for this destination cluster is
us-east-1b
. - The Zookeeper application is being failed over to the destination cluster.
Reactivate your source cluster domain
Once your source cluster is operational, perform the following steps from your destination cluster to active your source cluster domain:
Run the following command to activate the source cluster domain:
storkctl activate clusterdomain us-east-1a
Cluster Domain activate operation started successfully for us-east-1a
Verify if the source cluster domain is activated:
storkctl get clusterdomainsstatus
NAME LOCAL-DOMAIN ACTIVE INACTIVE CREATED
px-dr-cluster us-east-1a us-east-1a (InSync), us-east-1b (InSync) 29 Nov 22 22:09 UTC
Reverse sync your clusters
If the destination cluster has been running applications for some time, it is possible that the state of your application on the destination cluster differs from your source cluster. This is due to the creation of new resources or changes in data within stateful applications on the destination cluster.
It is recommended to perform one migration from destination cluster to your source cluster before failing back your applications, so that you have the most up-to-date applications on your original source cluster.
As both of your clusters are accessible, follow the instructions to configure a reverse migration schedule:
Create a schedule policy on your destination cluster using the instructions in the Create a schedule policy section.
Copy and paste the following spec into a file called
reverse-migrationschedule.yaml
on your destination cluster:apiVersion: stork.libopenstorage.org/v1alpha1
kind: MigrationSchedule
metadata:
name: reversemigrationschedule
namespace: <migrationnamespace>
spec:
template:
spec:
clusterPair: <your-clusterpair-name>
includeResources: true
startApplications: false
includeVolumes: false
namespaces:
- <app-namespace1>
- <app-namespace2>
schedulePolicyName: <your-schedule-policy>
suspend: falseApply your
reverse-migrationschedule.yaml
on your destination cluster:kubectl apply -f reverse-migrationschedule.yaml
Verify if at least one migration cycle has been successfully completed:
storkctl get migration -n <migrationnamespace>
NAME CLUSTERPAIR STAGE STATUS VOLUMES RESOURCES CREATED ELAPSED TOTAL BYTES TRANSFERRED
reversemigrationschedule-interval-2023-02-01-201747 <your-remote-clusterpair> Final Successful 0/0 4/4 01 Feb 23 20:17 UTC Volumes () Resources (21.71709746s) 0Suspend the reverse migration schedule:
storkctl suspend migrationschedule reversemigrationschedule -n <migrationnamespace>
Stop the application on the destination cluster
Stop the applications from running by changing the replica count of your deployments and statefulsets to 0:
storkctl deactivate migration -n <migrationnamespace>
Restart the application on the source cluster
After you have stopped the applications on the destination cluster, start the applications on the source cluster by editing the replica count:
storkctl activate migration -n <migrationnamespace>
Verify if your application pods and associated resources are migrated:
kubectl get all -n zookeeper
NAME READY STATUS RESTARTS AGE
pod/zk-544ffcc474-6gx64 1/1 Running 0 18h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/zk-service ClusterIP 10.233.22.60 <none> 3306/TCP 18h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/zk 1/1 1 1 18h
NAME DESIRED CURRENT READY AGE
replicaset.apps/zk-544ffcc474 1 1 1 18hRun the following command on the destination to resume the migration schedule:
storkctl resume migrationschedule migrationschedule -n <migrationnamespace>
MigrationSchedule migrationschedule resumed successfully
Verify if the migration schedule is active:
storkctl get migrationschedule -n <migrationnamespace>
NAME POLICYNAME CLUSTERPAIR SUSPEND LAST-SUCCESS-TIME LAST-SUCCESS-DURATION
migrationschedule <your-schedule-policy> <your-clusterpair-name> false 01 Dec 23 22:25 UTC 10sThe
false
value for theSUSPEND
field shows that the migration schedule for your policy is active on the source cluster. Hence, your application has successfully failed back to your source cluster.