Mikrotik Failover Netwatch
Over the years I have used many different methods of failover for primary to secondary, sometime tietary, wan links on Mikrotik devices. Along with manual routing table entries, I have always relied on scripts of some sort that are triggered when one of the WAN links goes down. I have had varying success with this approach. One of the biggest problems I have had when switching from a primary WAN to a secondary WAN is the registration of VoIP phones seems to hang. The PBX keeps the registration open for the old WAN IP and the phones are now establishing connections from the new WAN IP.
The solution I have recently discovered on Mikrotik devices is Netwatch. You can find Netwatch within the Mikrotik Tools section. I believe Netwatch is available on all licence levels of Mikrotik RouterOS.
The principle of Netwatch is very simple. It pings an IP address of your choice on a user defined interval. When that ping fails (again, the failure threshold is user defined) then it runs any command within the Netwatch DOWN section. Likewise, when the link comes back up (the ping succeeds past the user defined threshold) then the command in Netwatch UP is run.
/ip route set [find where comment="WAN1"] distance=5
/ip firewall connection remove [find dst-address="123.123.248.8:5060"]
The above commands are in the UP command section.
/ip route set [find where comment="WAN1"] distance=15
/ip firewall connection remove [find dst-address="123.123.248.8:5060"]
The above commands are in the DOWN command section.
As you can see, when the link is up the WAN1 route is set to a distance of 5 which makes it the primary route. WAN2 should have a route distance of 10 in this example. When the link is DOWN then the WAN1 route distance is set to 15, at which point WAN2 with a distance of 10 becomes the primary route.
The second command in each of the UP and DOWN list of commands is the magic sauce. This is the command that clears any existing connections to the PBX server (in this case the example IP of 123.123.248.8:5060) which forces the SIP handsets to re-register. Interestingly, in most cases the SIP handsets don’t drop existing calls but they do re-register with the PBX which is exactly what I need.
I set the Netwatch host command to 1.1.1.1, an interval of 10 seconds and a timeout of 999ms. This means every 10 seconds the Netwatch feature will ping 1.1.1.1 and if it takes more than 999ms to respond then it will run the DOWN commands. If the ping response is ever less than 999ms then it will run the UP commands. With time I may tweak this as realistically a ping response time of greater than 100ms is likely evidence of a disrupted link.