Jump to content

2-Node HA ADM (on-prem) DB sync errors but neither appliance Gui shows a Deployment option under Settings.


Recommended Posts

Gone round and round on this one and pre-prod ADM [Release 13.1.27.62] in high availability mode w/ floating IP.  

 

Initially, it started with database sync errors - Database streaming channel is broken between HA nodes - (CTX239798)

 

After upgrading to 13.1.27.62

 

The problem encountered as of now is neither ADM appliance shows up in the GUI.  The once-upon-a-time primary is where we see the error. (Screenshot)

 

I'm stumped.  

 

This is on both appliances and so I ran the steps listed in CTX239798 - although the article states:

Starting from ADM on-prem 13.0 71.x release, if database streaming between the nodes in an HA deployment fails, now you can click the Sync Database tab under System > Deployment > High Availability Deployment in the ADM GUI, to restore the database.

 

{GUI - Deployment is gone from the GUI}

 

So...

Alternatively if you are on version less than 13.0 71.x, then use the following procedure:

 

if ADM version is 12.1-59.x , 13.0-64.x or later

## nohup sh /mps/scripts/pgsql/join_streaming_replication.sh SecondaryIP PrimaryIP > /var/mps/log/join_streaming_replication_console.log 2>&1 &

 

Monitor the output of the above command in /var/mps/log:

# tail -f /var/mps/log/join_streaming_replication_console.log

 

Wait for a few hours and confirm if the HA channel is UP by running the command on Secondary:

# ps -ax | grep -i wal

 

You should see this line to confirm if the channel is UP

??  Ss     0:14.14 postgres: wal receiver process   streaming

 

- No luck-

 

I'm logged back to the GUI and deployment and HA options gone.

 

I'm not concerned about the data to this point as it is pre-prod.  I'm looking for a way to reset these to default - is that possible?

 

1. I rebooted both appliances and reset the nsroot and nsrecover passwords back to default.

2. I shutdown the secondary and re-ran the deployment script and set to stand alone

(* Deployment / HA is gone from the GUI)

3. I rebooted the primary and I still see the DB Synchronization error despite setting to stand-alone and no option for HA or Deployment.  By that measure it looks like a stand-alone

4. I shutdown the secondary and tried the same - no luck.

 

Next, I tried to redeploy these as primary and secondary and now I get an error when previously this worked without a hitch until the upgrade.

 

So, I have two ADM appliances running latest update and both of them look as if they are stand-alone because there are no options for deployment or HA management.

 

Yet, on the original primary appliance I still get the HA related errors but no options to Sync or break HA.

 

I've looked for options to hard-reset the appliances but can't find anything.

 

It's not a matter of simple reinstall because both of these went through a provisioning process using the OVA file and provisioned by a separate team.

 

If that is the only option to rebuild from scratch then so be it.  But, I wanted to ask and see if there are other options.

 

 

 

 

 

 

 

 

 

 

Errors-and-no-deployment-2022-08-29_09-36-40.png

no-deploy-2022-08-29_13-45-41.png

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...