› VitalPBX Community Support › General Discussion › HA stops at every midnight and peer disk becomes standalone
- This topic has 4 replies, 3 voices, and was last updated 11 months, 1 week ago by
yazarlwin.
- Post
-
- January 28, 2020 at 3:47 am
We found HA stops and peer disk becomes stand alone at every night after 12:00 AM though we don’t have any cronjob running at that hour. Here is our logs. Please help us know anything to prevent this happening all the time.
Jan 28 00:00:44 cs-voice01 Filesystem(HA_Filesystem)[1266]: ERROR: Couldn’t unmount /mnt; trying cleanup with TERM
Jan 28 00:00:49 cs-voice01 kernel: drbd drbd0 cs-voice02.frontiir.net: error receiving P_STATE, e: -5 l: 0!
Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: unable to get cluster status from crm_mon ]
Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: cluster is not available on this node ]
Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: unable to get cluster status from crm_mon ]
Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: cluster is not available on this node ]
Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: unable to get cluster status from crm_mon ]
Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: cluster is not available on this node ]
Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: unable to get cluster status from crm_mon ]
Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: cluster is not available on this node ]0
- Replies
-
- January 28, 2020 at 4:53 am
- January 29, 2020 at 9:58 am
- February 3, 2020 at 3:47 pm
You may create the following script to solve the split-brain on DRBD
cat > /usr/local/bin/drbdsplit << EOF
#!/bin/bash
set -e
# Authors: Rodrigo Cuadra
# with Collaboration of Jose Miguel Rivera
# 2019/11/06
# Support: rcuadra@aplitel.com
#
drbdadm secondary drbd0
drbdadm disconnect drbd0
drbdadm -- --discard-my-data connect drbd0
ssh root@$ip_slave "drbdadm connect drbd0"
echo "Disk Status"
drbdadm status
EOF
chmod +x /usr/local/bin/drbdsplitWhere the variable “$ip_slave”, is the IP of the opposite server.
This script must be created on both servers.
To automatize this process, you can edit the file “/etc/drbd.conf”, and add the following on both servers:
resource resourcename { [snip] handlers { split-brain "/usr/local/bin/drbdsplit"; } [snip] }
0- February 10, 2020 at 10:43 am
Thank you very much for your help. We will deploy the script accordingly. We also found that drbd was disconnected because of firewall rule reload at every 00:00 hour. It was written as /etc/cron.d/vpbx. After we disable the second one, our problem stopped happening.
* * * * * root /usr/share/ombutel/scripts/update_tc >/dev/null 2>&1
0 0 * * * root /usr/share/ombutel/scripts/build_firewall_blacklists >/dev/null 2>&10
- You must be logged in to reply to this topic.