Showing posts with label SLAP. Show all posts
Showing posts with label SLAP. Show all posts

Tuesday, 11 March 2008

Putting SLAP to use - Dynamic Config

Now I've got the specifications of the SLAP protocol out of the way, I'd like to describe how we're using it and the benefits we are seeing. The first being the dynamic configuration of service end points.

In a service like ours we have a series of nodes that process messages and apply rules to manipulate and route them. If we want to change the location for a service, we have to first notify all the upstream services of the change of location.

In a planned scenario this can be laborious. All services that may make use of the service have to be reconfigured, often requiring a restart. It is possible to have these services monitor a config file for changes to avoid a restart but this doesn't remove the need to manually go through each service config and make the necessary changes. This becomes increasingly complex in a distributed system spanning many machines and networks.

The unplanned scenario is even less desirable. There are many reasons why a service can stop and this will generally be at the most inopportune times. Having to manually reconfigure services in response to a failure is not acceptable in a high availability scenario like ours.

The other problem with the unplanned scenario is that the client service has to deal with the unavailability at the point it's attempting to consume the service. Say for example this is a network service, it could be waiting for a network timeout before it recognises that the service is not accepting requests. It then has to fire-fight, cleaning up because it thinks the service is unavailable.

Far better for it to be told categorically that the service is unavailable, it can then queue requests or whatever has been coded as appropriate, while it waits to be informed that the service is operational again.

It is possible to use separate load-balancers to handle this kind of outage, but they cost money, need configuring, draw power and the solution the bring is by no means dynamic. I'll actually discuss load-balancing using SLAP in a subsequent post.

In the SLAP world, services are not configured to use fix endpoints like sip:ems3.prod1.esendex.com:8067 but rather are bound to well known service names eg: slap:smsrouter. The actual endpoints to the service are advertised in ANNOUNCE in order that clients can maintain a record of the current state of the services they need to consume.

If a client service wants to make use of an smsrouter service, it checks the state in it's local service state table before sending the request to the correct service URI. This information is kept up to date by the service. Most will announce their state on a periodic basis but I'd also consider it good practice for the service to send an ANNOUNCE when it's state changes, perhaps when it's under load or is shutting dowm, to ensure all clients are kept up to date.

The development team at Esendex have also found it very useful when building and debugging services as Jonathan describes in SLAP, my service’s up!. This was an unforeseen benefit but another one that has cut out a lot of the hassle with debugging services. The guys can step through the code, find the issue, write the unit test and rebuild very quickly which get's us to market far quicker.

More on load balancing soon.

Wednesday, 6 February 2008

Other SLAPs

Seems I'm not the only one who sees the comedy value in the use of SLAP as a protocol name:

Great minds...

Wednesday, 30 January 2008

SLAP - LOCATE Message

The LOCATE message is used by service clients to discover the existence and state of services on the network. Sent as a UDP broadcast it requests information about a specific service.

Upon receipt of a LOCATE, the relevant service is expected to respond immediately by broadcasting an ANNOUNCE message to enable the client to understand the current state of the service.

LOCATE [service name]

Where

service nameA name for the service, well known within the system domain

A typical LOCATE message would look like this.

LOCATE SMSCProxy 

Use of the LOCATE message is optional by the client if the intervals between ANNOUNCE broadcasts by the service are sufficiently small. In many situations though, this delay is unacceptable and the client will initiate the ANNOUNCE broadcast by sending a LOCATE message.

Having both LOCATE and ANNOUNCE messages also has the side benefit of allowing the protocol to have a comedy name.

Friday, 25 January 2008

SLAP - ANNOUNCE Message

The ANNOUNCE message is the core message in SLAP. Sent as a UDP broadcast, it announces information about a service to the network. It's purpose is to inform interested clients about the current state of the service.

ANNOUNCE [service name] [state (Green|Amber|Red|Blue)] [service uri]
<[optional attributes (name=value)]>

Where

service nameA name for the service, well known within the system domain
stateThe current state of the service as represented by one of 4 values Green, Amber, Red or Blue, see below
service uriUnique network address for the service. Can be any format that the client would recognise.
optional attributesAn optional series of attribute value pairs seperated with a new line.

Service States

Service states are used in SLAP to give a simple indication as to the service state.

StateDescription
GreenService is fully operational and available
AmberService is operational but under load
RedService is not operational
BlueService state is unknown

If more information is required, then the service would include this with the optional attirbutes of the message. The content of these atttributes is not defined and is open to individual system implementations.

For example, information on current service performance (transations per second, queue size, memory utilisation, etc) could be included to allow the client to make utilisation decisions.

A typical ANNOUNCE message would look like this.

ANNOUNCE SMSCProxy Green node1.company.com:1234
Route=Operator1
Binds=3

Generally these messages would be broadcast at regular intervals to keep clients updated with service information. There are times when a client needs to actively interrogate the state of a service, for instance on startup, for which the LOCATE message is defined. More of that in the next post on SLAP.

Monday, 21 January 2008

SLAP, an introduction

Part of the reason I've been so quite on the blogging front of late is that I've got carried away with a bit of development spiking. Nothing that goes straight into production I hasten to add. I'm too far away from our development processes to be trusted with access to the live source tree, and anyway Nicholas (Development Manager) probably wouldn't let me.

I am in the enviable position of being able to create proptotypes, test things out and hand them over to the professionals to make them a reality. It's one of these projects that became SLAP (Service Location and Announcement Protocol).

One of the key technical challenges facing an organisation like ours is scaling our applications out to meet demand while at the same time keeping everything highly available.

Our architecure comprises a series of software agents that pass messages between each other routing and manipulating traffic based on a series of rules. For example

  • A outbound bulk SMS will head straight to an SMS Router for it to decide based on the state of a set of Mobile Carrier connections which is the best current route.
  • A premimum SMS message will be passed to a Subscription checking agent to confirm that the subscriber is valid, and then possibly onto a network lookup agent to resolve the destination network before heading onto the SMS Router for onward submission via the correct network for billing purposes.

As I'm sure you can imagine running multiple instances of these agents across multiple servers and have them know the state of the agents around them so messages get passed around reliably is no mean feat. Should an Mobile Operator connection become unavailable or we lose a server, the agents should be able to adapt to that.

One option is to use hardware load balancers, as we do in front of our web servers, but this quickly becomes very expensive and very costly to maintain as you end up with multiple layers of load balancing. Also it's very hard to dynamically configure the network of agents to cope with peak periods of traffic.

After reading Scalable Internet Architectures I was inspired to come up with a simpler, and cheaper, solution. SLAP was born which is an alternative, software approach that is deliverying huge benefits to us already.

Nodes broadcast one of 2 message types: ANNOUNCE and LOCATE over UDP which are subscribed to by other nodes on the network. These nodes can be application monitoring agents or clients wishing to consume services offered by the node.

It's simplicity hides a power that has allowed us to architect our system to self configure and repair itself in the event of failures. In subsequent posts I'll describe the specifics of the protocol and our describe our experiences using it.