ZFS and Apple Time Machine, a perfect team

July 31st, 2012

So lately I’ve been thinking about my backup strategy on my Mac. From previous posts you might know I’ve build my OpenIndiana ZFS FileServer. Well, just created a volume and decided to put 300GB to good use to create a time machine on my mac. There is a brilliant guide on how to do it here and suggest you all take a look (Thanks for the awesome guide Marco).

Monitoring SRX Chassis Cluster

July 9th, 2012

Just finishing off a few things at work this week.  We’ve got a few sites around the place where we have HA internet powered by two Juniper SRX100′s.  The Two SRX100′s operate in a Chassis Cluster and peer with our ISP using BGP across both active/passive devices.

This script is a little Nagios check script that I wrote to hook into our in-house Nagios monitoring platform.  It makes sure the chassis cluster has not failed over operating in a degraded state, and makes sure that there are two BGP peers connected.

NOTE:  I was aiming for simplicity in this setup, if you’ve got a bigger environment or require instant notifications you might wish to set up snmp traps to get instant notifications.

#!/bin/bash

# Bash script to check the status of a SRX cluster.
#  Works by SSHing into cluster to check "show chassis cluster status" command and SNMP walking to make sure BGP peers
#  are both in a connected state

STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3

clusterAddress=$1
privateKey=$2
clusterStatus=`ssh [email protected]$clusterAddress -i $privateKey "show chassis cluster status"`

declare -i primaryCount
declare -i secondaryCount
declare -i failoverCount
declare -i activeBgpPeers

activeBgpPeers=`snmpwalk -Os -c public -v 1 $clusterAddress .1.3.6.1.2.1.15.3.1.2 | grep "INTEGER: 6" | wc -l`
primaryCount=`echo "$clusterStatus" | grep primary | wc -l`
secondaryCount=`echo "$clusterStatus" | grep secondary | wc -l`
failoverCount=`echo "$clusterStatus" | grep "Failover count: 0" | wc -l`

if [ $primaryCount -ne 2 ]
then
        echo "No two primary redundancy groups"
		echo "$clusterStatus"
        exit $STATE_CRITICAL
fi

if [ $secondaryCount -ne 2 ]
then
        echo "No two secondary redundancy groups"
		echo "$clusterStatus"
        exit $STATE_CRITICAL
fi

if [ $failoverCount -ne 2 ]
then
        echo "SRX has fallen over on a redundancy group"
		echo "$clusterStatus"
        exit $STATE_WARNING
fi

if [ $activeBgpPeers -ne 2 ]
then
        echo "NOT 2 Active BGP Peers"
        exit $STATE_CRITICAL
fi

echo "OK, 2 peers.  OK: Chassis Cluster status OK"
echo "$clusterStatus"
exit $STATE_OK

 

Easily accessing GeoIP restricted sites in your network.

June 3rd, 2012

We all know the problem, some sites are restricted to certain countries based on the IP address you’re using to view them.  When trying to access over-seas, some solutions are HTTP proxies, Socks proxies and the like.  The problem I have with all of these is that they’re annoying to set up whenever you want to to view the site and I don’t want to have to do that for all my devices (iPad, computer, etc).

This solution will tunnel only the sites you want over a VPN connection to be NAT’d out the other end.

First we want to set up OpenVPN on the remote host by issuing “openvpn –genkey –secret static.key” then creating a file in (/etc/openvpn/server.conf)

# Network
port 1194
proto tcp-server
dev tun
ifconfig 10.0.6.1 10.0.6.2 

# Crypto
secret /etc/openvpn/static.key
comp-lzo
keepalive 10 120

# Security
persist-key
persist-tun

# Logging
status openvpn-status.log

One thing we need to do on the server is set up NAT on the outbound address (to make sure all traffic that passes our VPS destined for the internet looks like it’s from that US IP address)

# Previously initiated and accepted exchanges bypass rule checking
# Allow unlimited outbound traffic
/sbin/iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
/sbin/iptables -A OUTPUT -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT

iptables -t nat -A POSTROUTING -s 10.0.6.0/24 -o eth0 -j MASQUERADE
echo 1 > /proc/sys/net/ipv4/ip_forward

On the client side, things look very similar,

remote vps.server.com 1194 tcp-client
persist-key
comp-lzo no
redirect-gateway def1
nobind
persist-tun
secret secret.key
dev tun
ifconfig 10.0.6.2. 10.0.6.1

On the client, I just also MASQUERADE’d data though (I know, double NAT’ing, easier then adjusting routing table on the VPS)

Now for the magic to happen. I just replaced the DNS server on my home router with a little Python script (<3 Twisted framework) to proxy requests, adding a tunnel for IP's that get returned by sites in my list (in this case I've chosen two fictious sites, HooLoo.com and pandoor.com

#!/usr/bin/python

from twisted.internet.protocol import Factory, Protocol
from twisted.internet import reactor
from twisted.names import dns
from twisted.names import client, server
import subprocess
from sys import exit
from os import fork

tunnelDomains = [
  'pandoor.com',
  'hooloo.com'
]
dnsServers = [('203.12.160.35', 53), ('203.12.160.36', 53)]

try:
	pid = fork()
	if pid > 0:
		exit(0)
except OSError, e:
	exit(1)


def tunnel(address):
       subprocess.call(["route", "add", address + "/32", "tun0"])  

class SpelDnsReolver(client.Resolver):
  def filterAnswers(self, message):
    if message.trunc:
      return self.queryTCP(message.queries).addCallback(self.filterAnswers)
    else:
      for d in tunnelDomains:
        if str(message.queries[0].name).endswith(d):
          for answer in message.answers:
            if answer.type == 1: #A record
              tunnel(answer.payload.dottedQuad())

    return (message.answers, message.authority, message.additional)

verbosity = 0
resolver = SpelDnsReolver(servers=dnsServers)
f = server.DNSServerFactory(clients=[resolver], verbose=verbosity)
p = dns.DNSDatagramProtocol(f)
f.noisy = p.noisy = verbosity

reactor.listenUDP(53, p)
reactor.listenTCP(53, f)
reactor.run()

High Availability WordPress LAMP Stack – Part 2

April 13th, 2012

Setting up the Software Stack

This article is the second in a series (see part 1 here).  Please see HA Network for the first part in setting up the network topology to be highly available.

It’s all good having a redundant network design, but putting web servers and the like on our hypervisors doesn’t make them redundant.  In the event where there’s a failure on one of our servers, all virtual machines on that server will die.  Looking at our previous network design, we can see a failure on a web server or database server would cause service outage.

In this example, I’m going to talk about a few pretty cool pieces of software that you can use to make a highly available (HA) service stack for hosting Apache web sites from a MySQL data store.  This post isn’t going to be about all the in’s and out’s of how to accomplish, but hopefully it will answer some questions you might have when setting out to accomplish this task as well as point you to the appropriate articles to help you achieve what you’re after.

Nginx load balancing proxy

Lets look first at an example of a previous example.  Lets take a simple network where we have a web server with a client.  The A record for www.example.com points to our web server at 192.168.0.10 (of course we’d use NAT on the firewalls in my example to expose it to the internet).

From my previous post, we can tell that the network topology to reach our web server on 192.168.0.10 is highly redundant, if the server itself, or the physical machine it’s hosted on in the case of virtual machines does die, we can no longer serve web pages.

To solve this problem, we’re going to create multiple web servers. The A record for www.example.com is now no longer going to be pointing to the web server itself, but a proxy server (I recommend Nginx.) Now you can see that if either web server dies, our proxy server is able to start handing out requests through the other server. This method is also recommended because it mitigates load from a single server. If your web site got more popular, we can just start scaling out and adding more web servers into the mix to handle the load. Now, I know what you’re thinking, and yes, we have just moved the single point of failure from the web server to the proxy server, but please read on to find out how to protect that host from failure.

Heartbeat, keep your servers beating

In the previous example, we had a server that was a single point of failure.  If the proxy server in this case died, then our web site would go down.  To plan against the failure of this machine, you can set up a tool such as Heartbeat.  An example works like the following, your proxy server above is running the Nginx daemon handling your client’s requests, but it does so using a virtual network adaptor where it’s IP address 192.168.0.10 sits.  You put a second server in here in the mix with the same Nginx daemon and the same configuration, but it’s not running or serving anything, this is known as our slave server.  The slave server is sending heartbeat messages to the master server.  In the event that the master server stops responding to the slave’s heartbeat messages, it assumes the master is down and will create a virtual adaptor assuming the working IP address (192.168.0.10 in our example) and brings up the master services.

I use heartbeat extensively through my HA configurations. Not only for allowing services to be taken up by a slave in the event of a master failure, but also where I have clusters (for example, a MySQL cluster). When the slave assumes the master’s IP address, it’s handy to write a little script/service here to stop any replication and assume the master’s role.

ProTip: Managing configuration across your servers when you start creating multiple instances for redundancy can start to get out of hand very quickly. Sometimes you can change a configuration on one server and forget to do it on another. I suggest using a tool like Puppet to manage the configuration on your servers for you

ProTip: From experience, I’ve had situations in failover testing where the failure of a core switch will cause heartbeat to stop receiving heartbeats and fail the servers over, even though the physical and virtual machines themselves are fine. If you’re running the (non-rapid) Spanning Tree Protocol (STP) on your network, I suggest making the timeout for Heartbeat about 45 seconds. This should be enough time to allow for STP convergence before it assumes it’s partner is dead.

NFS File Store

In the example above, we’ve got multiple web servers. As I’ve said, this setup could be used to host a HA WordPress site. The problem is that when content that sits on the file system (such as an image or theme uploaded, wordpress upgrade, etc) takes place, it will no longer be in sync with the other web servers. For us, hosting the wordpress installation from an NFS mount point worked fine, which begins a thought on how to make this NFS server highly available.

Just like the previous example, we’re going to use Heartbeat to make sure we’ve got a master and slave, and that when the master fails, the slave will start hosting the NFS services. There’s only one more added piece of complexity here. If the master fails, the slave has none of the data that was stored on the master. To get around this, I’m using a tool called DRBD. DRBD allows a block device to be created and synced across multiple hosts. When you write to this device, data is replicated on the slave too. That way when the master dies and the slave takes it’s roll, it will have all the data that existed previously on the master. A good tutorial to set this up can be found HERE

ProTip: Once again from experience, when everything is not working as expected, having the slave be promoted is a horrible thing. If the data has not been replicating for some reason over the past few months and the slave gets promoted with data that’s a few months old, it can be a horrible, horrible thing. I suggest running a tool like Nagios on your stack to monitor EVERYTHING you can think of. A good way to check if your DRBD servers are in sync is to look for the term UpToDate/UpToDate in /proc/drbd.

MySQL Cluster

There are two ways of running your MySQL server in high redundancy. One simple method using tools you’ve already used is to have the MySQL data store running over DRBD and have Heartbeat keeping it in track. This is pretty simple to set up and I’m running it on one of my sites without a problem.

The problem with running MySQL in this sort of setup is scale. Putting your web servers behind a load balancing proxy is a good first step in allowing for you to slot more servers in your solution and starting to scale out. Once the bottleneck moves to your MySQL server, running a single active/passive pair over DRBD won’t scale out, only up (more expensive, faster hardware).

The second time I had to set this up, I chose to run my servers in a MySQL cluster. This means that whenever a transaction is committed on the master, it is sent to the slave to commit as well. The advantage with this is that it allows you to spread out SELECT queries among your slaves instead of running everything on your one server.

Word for the wise

Now we have a highly available network stack in place with highly available system services, we can sleep better knowing that if a system outage were to occur, we have our insurance policy in place knowing that in about 50 seconds, a network issue can converge and services can be automatically moved over to hot standby machines. This does not however give you an excuse to not have a backup strategy in place.

If you’re doing backups as an afterthought, I can recommend setting up an OpenIndiana machine running ZFS on it. Setting it up like I have off site with rotating snapshots. At our work we do nightly backups and for us it’s as simple as doing a database dump, then rsync’ing everything over to this remote ZFS machine to get snapshotted, feeling safe I can access old data from 6 weeks ago (my nightly snapshot period.)

What’s to come?

If you’ve got this far, Thanks for reading and I hope this post pointed you in the right direction to help build highly available services. I think more and more in today’s world people are expecting IT infrastructure to be always up and available. If you don’t have standby servers available, sooner or later, you’re bound to have unhappy customers. I’ve always wanted to build a disaster recovery site, with a highly active database so that in the event of an entire site failing over (think natural disaster, people errors and the like) a site can fail over to another physical disaster recovery (DR) location.

High Availability WordPress LAMP Stack.

April 13th, 2012

Introduction

In one of my last little tasks at work, I was asked to eliminate single points of failure in the software and hardware stack without spending a fortune on hardware or software licenses. During the process of ensuring high availability (HA), I realized that many small companies might have similar need, but with more pressing tasks and limited man hours, without a post that talks about all the issues and solutions in one place, many companies and organisations tend to leave single points of failure living with the chance that they’re not going to fail any time soon.

I’ve wanted to write this blog post for a while. After you’ve finished reading this blog post, you should have the knowledge to be able to eliminate the single point of failures in hosting a WordPress website. While I’ve chosen WordPress to be the demonstration of this post, the concepts will work with any apache/mysql LAMP stack software. During the course of this tutorial, I’ll run you through it in two parts. The first part is talking about setting up the physical hosts and topology (using Juniper EX2200 switches, SRX100 border firewall’s and ESXi (free) hypervisors for the software stack.) The second part is talking about setting up the software stack to deliver our LAMP stack in a highly redundant fashion.  However what I won’t be doing is providing complete configuration examples.  Instead, please consider this post as an overview to help enlighten you and link you to more specific information to help you set this up in your own environment.

This blog post is split up in two parts, this first post in the series talks about setting up the network infrastructure, the second talks about setting up the software stack.

Part 1 (physical topology)
- Network overview
- Setting up dual Switches
- Setting up ESXi network cards.
- Setting up SRX border firewall’s.

Part 2 (software stack)
- Nginx load balancing proxy
- Web Servers.
- NFS File Server
- MySQL cluster.

Part 1, Physical Topology

If you’re reading this, I’m guessing your current network topology looks something like

mine used to.  You have a single internet connection, single router/firewall, single switch and a bunch of hosts hanging from that switch.  In the event of a system failure, your system administrator (me in this case) will have to hop in a cab and rush to the server room to fix the problem.

The goal with this tutorial is to attempt to help your administrators sleep at night.  We will eliminate every single point of failure such that in the event of a system outage/failure, the system can self recover with at most a minute of unscheduled down time.

Physical Switches

We’ll be replacing the single switch (in my case, unmanaged old gigabit switch) with a pair of managed switches.  Because our bandwidth requirements in this site wasn’t terribly demanding (simple database server, few web, mail servers and the like), a single gigabit Ethernet link to all hosts was all that we required.  If you’re in the same boat I was, I can suggest a pair of Juniper EX2200’s.  If however, you’re going to be pumping some more bandwidth intensive applications through your network (thus require more then gigabit Ethernet connections to the hosts) or have the need for more then a single VLAN and intra-VLAN routing is required (that’s all outside the scope of this tutorial.)  I can strongly recommend you start looking at the EX4200 model switches (set up in a virtual chassis), which can do all your highly available layer 3 IP routing and support multi gigabit Ethernet to your hosts by spanning Ethernet channel across both physical switches.

Active/Passive Switch Design?

Ok, so this heading is a lie, but let me explain.  In my switch design with the EX2200’s, I’m using aggregate Ethernet 802.3ad Etherchannel between my two switches. I’ve opted to use 4 physical ports in my 24 port switches (ge-0/0/20 to ge-0/0/23) to give me a 4Gbit/s backbone between them. Obviously with gigabit Ethernet right through the network this isn’t much bandwidth, so the idea is to keep as little data traveling that link as possible (network broadcasts only hopefully!)

First, the following configuration configures the aggregate Ethernet link between the switches:

chassis {
    aggregated-devices {
        ethernet {
            device-count 1;
        }
    }
}

[interfaces]
    ge-0/0/20 {
        ether-options {
            802.3ad ae0;
        }
    }
    ge-0/0/21 {
        ether-options {
            802.3ad ae0;
        }
    }
    ge-0/0/22 {
        ether-options {
            802.3ad ae0;
        }
    }
    ge-0/0/23 {
        ether-options {
            802.3ad ae0;
        }
    }
    ae0 {
        aggregated-ether-options {
            lacp {
                passive;
            }
        }
        unit 0 {
            family ethernet-switching {
                port-mode trunk;
                vlan {
                    members all;
                }
            }
        }
    }

The only thing you have to watch out for, is that on at least one switch, you set the lacp mode to be active.

Setting up ESXi Network

In my environment, I’m using the free version of ESXi 4.1. I understand that you may wish to be connecting linux hosts directly. If you’re directly connecting linux hosts, I recommend you look at creating a Bonding Adaptor.  Thought without the more expensive EX4200’s, we’re best sticking with an active-backup setup and should stick with mode=1.

For the ESXi configuration, it’s pretty simple. We connect two NIC’s to the switches. A primary NIC to the primary switch, and a secondary NIC to the secondary switch.

The steps on configuring the vSwitch (virtual switch) are pretty simple. Virtual machines on these ESXi hosts won’t have to do anything special once this setup has been done to take advantage on the physical machines, they’ll just take advantage of our HA network topology.

First, ensure that we have two NIC’s setup on our vSwitch.

Next, look at the following configuration properties I’ve made to the NIC teaming information.  This basic configuration will ensure that when the link on vmnic0 is available (our active switch), it’ll use it.  When the link becomes unavailable, it will fail over to vmnic1.

Setting up border firewall’s

The last step we’ve got in our highly available infrastructure here is our border firewall’s. Please bare with me on this section, it is the most complicated and there are a few technologies introduced. If there is something I don’t explain completely, please feel free to leave a comment below and I’ll try and explain it better.  I expect you’ll have to jump back and forth between Wikipedia and this article to fully understand what it is we’re doing. Explaining the concepts behind BGP and autonomous systems is beyond the scope of this article.

In my role, I replaced an active/backup GNU based firewall solution (backup being run down to the data center as fast as possible and swap the cables over) with two Juniper SRX100’s configured in an active/passive configuration under a chassis cluster (see SRX100 high availability deployment guide at http://kb.juniper.net/InfoCenter/index?page=content&id=KB15669). We had get a second internet connection installed, then make sure that any one link (or firewall itself) can die and still have the system self recover. There are two major tasks that has to happen to make this gateway highly available. Firstly, our internal network hosts will be using one of these firewall’s as their default gateway. If the link to the primary switch, or the primary firewall itself should die, we still want the network hosts to be able to reach their gateway. We will achieve this with “redundant Ethernet” on the cluster. If you’ve come from the cisco networking world before, picture something like this to be like VRRP for the inside hosts. If the main link fails, the MAC and IP address will float over to the other physical port.

Let me give a bit more detail to our new example network topology here in this example so you can gain a better idea of how these settings work.

For the internal gateway address to fail over to the other switch should your link off Firewall1 die, You’ll want to make the following configuration

chassis {
    cluster {
        reth-count 2;
        redundancy-group 0 {
            node 0 priority 100;
            node 1 priority 1;
        }
        redundancy-group 1 {
            node 0 priority 254;
            node 1 priority 1;
            preempt;
            interface-monitor {
                fe-0/0/1 weight 255;
                fe-1/0/1 weight 255;
            }
        }
    }
}
interfaces {
    fe-0/0/1 {
            fastether-options {
                redundant-parent reth1;
            }
    }
    fe-1/0/1 {
            fastether-options {
                redundant-parent reth1;
            }
    }
}

You can see redundancy-group 1 is monitoring the local interfaces going back to the switches.

For the dual WAN links, I won’t go into much detail, but you’ll want to ask your ISP for a second internet connection. Some providers offer a cheap link that they only charge you for once you start flowing data over it (sometimes called a Shadow link). This is perfect as you can flow all your traffic through your primary internet connection, then on failure of it, you’ll move your traffic through the secondary. If you wanted complete redundancy, you could apply for a domain independent subnet (has to be a class C to advertise on world BGP tables) and your own ASN. This will let you use two different internet service providers.

In my case, I’m creating a redundant connection using the same ISP, so I’ve asked for a private ASN to be allocated (see http://en.wikipedia.org/wiki/Autonomous_System_(Internet) ).

For a small network such as ours (especially using the small base level SRX’s) you’ll want to ask your provider to advertise only the default route to you. In turn, you’ll advertise your network’s address space on both connections. On the event that a link dies, the BGP peer on the other end no longer receives updates from you and will no longer attempt to route to it.

The following configuration extract shows how we’d configure out SRX firewall’s to peer with our ISP’s routers. Things of note is that our primary link (the one on the left) has a lower metric-out then the shadow link, meaning a lower MED attribute is sent to our ISP and thus inbound traffic will, by preference use the main connection. The preference values under neighbor will determine the preference we will send traffic under that connection for outbound traffic.

routing-options {
    autonomous-system 64512
}

protocols {
    bgp {
        group ISP {
            metric-out 50;
            local-address 1.1.1.2;
            import ISP-in;
            export ISP-out;
            neighbor 1.1.1.1 {
                preference 170;
                peer-as 123;
            }
        }
        group ISP-shadow {
            metric-out 100;
            local-address 1.1.1.6;
            import ISP-in;
            export ISP-out;
            neighbor 1.1.1.5 {
                preference 180;
                peer-as 123;
            }
        }
    }
}
policy-options {
    policy-statement ISP-in {
        term default-in {
            from {
                route-filter 0.0.0.0/0 exact;
            }
            then accept;
        }
        term block {
            then reject;
        }
    }
    policy-statement ISP-out {
        term tnziPublic {
            from {
                protocol direct;
                route-filter 2.2.2.0/26 exact;
            }
            then accept;
        }
    }
}