HA stops at every midnight and peer disk becomes standalone

VitalPBX Community Support General Discussion HA stops at every midnight and peer disk becomes standalone

Up
0
Down
  • Post
    yazarlwin
    Participant

    We found HA stops and peer disk becomes stand alone at every night after 12:00 AM though we don’t have any cronjob running at that hour. Here is our logs. Please help us know anything to prevent this happening all the time.

    Jan 28 00:00:44 cs-voice01 Filesystem(HA_Filesystem)[1266]: ERROR: Couldn’t unmount /mnt; trying cleanup with TERM
    Jan 28 00:00:49 cs-voice01 kernel: drbd drbd0 cs-voice02.frontiir.net: error receiving P_STATE, e: -5 l: 0!
    Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: unable to get cluster status from crm_mon ]
    Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: cluster is not available on this node ]
    Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: unable to get cluster status from crm_mon ]
    Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: cluster is not available on this node ]
    Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: unable to get cluster status from crm_mon ]
    Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: cluster is not available on this node ]
    Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: unable to get cluster status from crm_mon ]
    Jan 28 08:53:49 cs-voice01 lrmd[2632]: notice: Database_Resources_start_0:7271:stderr [ Error: cluster is not available on this node ]

    0
Viewing 4 replies - 1 through 4 (of 4 total)
  • Replies
    oromero31
    Participant

    Are you using virtual machine? VMware?

    0
    yazarlwin
    Participant

    @oromero31

    We are using Physical Hardware. 2 Supermicro 1U server.

    0

    You may create the following script to solve the split-brain on DRBD

    cat > /usr/local/bin/drbdsplit << EOF
    #!/bin/bash
    set -e
    # Authors: Rodrigo Cuadra
    # with Collaboration of Jose Miguel Rivera
    # 2019/11/06
    # Support: rcuadra@aplitel.com
    #
    drbdadm secondary drbd0
    drbdadm disconnect drbd0
    drbdadm -- --discard-my-data connect drbd0
    ssh root@$ip_slave "drbdadm connect drbd0"
    echo "Disk Status"
    drbdadm status
    EOF
    chmod +x /usr/local/bin/drbdsplit

    Where the variable “$ip_slave”, is the IP of the opposite server.

    This script must be created on both servers.

    To automatize this process, you can edit the file “/etc/drbd.conf”, and add the following on both servers:

    resource resourcename {
    	   [snip]
    	   handlers {
    	      split-brain "/usr/local/bin/drbdsplit";
    	   }
    	   [snip]
    	}
    0
    yazarlwin
    Participant

    Thank you very much for your help. We will deploy the script accordingly. We also found that drbd was disconnected because of firewall rule reload at every 00:00 hour. It was  written as /etc/cron.d/vpbx.  After we disable the second one, our problem stopped happening.

    * * * * * root /usr/share/ombutel/scripts/update_tc >/dev/null 2>&1
    0 0 * * * root /usr/share/ombutel/scripts/build_firewall_blacklists >/dev/null 2>&1

    0
Viewing 4 replies - 1 through 4 (of 4 total)
  • You must be logged in to reply to this topic.