Advanced procedures
This topic provides information about troubleshooting and manually recovering data in PDS.
Troubleshoot diverged GTIDs in MySQL
The MySQL data service in PDS handles (in most cases) pod crashes and outages. For example, instances can failover and rejoin the cluster automatically on reboot. In some cases, a pod, after an outage will be unable to reboot the cluster and keeps failing with the following error:
The instance `instance-a` has an incompatible Global Transaction Identifier (GTID) set with the seed instance `instance-b` (GTIDs diverged). If you wish to proceed, the `force` option must be explicitly set.
This means, instances cannot agree on who should be the new master as data on those instances has diverged.
To troubleshoot this issue:
Review the GTIDs in the binary log of the instances and choose which instance contains the latest or the most appropriate changes to continue on with. You can inspect the transactions on instances by:
opening a shell into the
mysql
container of the podsusing MySQL tools such as
mysql
andmysqlbinlog
Once you selected which instance should be used as seed, you can force reboot the cluster by executing the following commands inside the
mysql
container of the selected pod:seed_instance=$(hostname -f)
mysqlsh --host=$seed_instance --user=innodb-config --password=$password -- dba reboot-cluster-from-complete-outage --force --primary=$seed_instance:3306
Check the cluster status and wait for the cluster to become recovered:
mysqlsh --host=$seed_instance --user=innodb-config --password=$password -- cluster status
If the cluster does not become healthy or if some nodes are not becoming online, then you should continue with:
removing the failing instances:
mysqlsh ... -- cluster remove-instance <other_instance>
and re-adding the instances:
mysqlsh ... -- cluster add-instance <other_instance> --recoveryMethod=clone
See restoring and rebooting a cluster for more imformation.
Recover Cassandra pods from corrupt commit logs
After deploying the Cassandra data service, when you reboot the worker nodes, the Cassandra pods do not come up to form the cluster. The pods do not come up due to the corrupt logs:
cassandra ERROR 18:22:11 Exiting due to error while processing commit log during initialization.
cassandra org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Mutation checksum failure at 23031717 in Next section at 23028485 in CommitLog-7-1676881531203.log
To recover Cassandra pods from the corrupt commit logs:
Scale deployment
pds-deployment-controller-manager
to 0.Edit Cassandra statefulset by adding the follwing line under
spec.template.spec.containers
:command: ["/bin/sleep", "3650d"]
noteThe statefulset name is identical to the deployment name in PDS UI.
Delete all Cassandra pods and wait for the pod 0 to start running (it will start running, but never become ready).
Shell into pod 0 and delete the corrupt commit log. For example:
rm /srv/pds/data/commitlog/CommitLog-7-1676881531203.log
Exit the sheel of pod 0 and scale
pds-deployment-controller-manager
back to 1.Wait for 30 seconds (approximately) for the deployment Operator to update the statefulset.
Delete Cassandra pod 0.
All Cassandra nodes should come up successfully. However, it is recommended to shell back into one of the Cassandra pods and run nodetool repair
.
Update Kubernetes secret after changing the pds
password
If you change the password for the pds
user, you need to also update the corresponding Kubernetes secret for the deployment. To base64
encode a string and update the Kubernetes secret:
Get the Kubernetes secret for the couchbase data service:
kubectl get secrets -n <namespace-where-the-Couchbase-data-service-is-deployed>
Encode your new administrator password into
base64
:echo <the-updated-password1> | base64
Update the Kubernetes secret with the new
base64
encoded adminsitrator password:kubectl get secret cb-rke-qichff-creds -n cb -o json | jq '.data["password"]="UGFzc3dvcmQxCg=="' | kubectl apply -f -secret/cb-rke-qichff-creds configured
Update the pds
password in the cqlshrc
file for Cassandra pods
If you change the password for the pds
user, you need to also update the cqlshrc
file located on all Cassandra pods:
Get in to all Cassandra pods:
kubectl exec -it -n <NAMESPACE> <POD-NAME> -- bash
Open the
cqlshrc
file:vi ./cassandra/cqlshrc
Change the default user password
pds
with a new password.