Showing posts with label IT Management. Show all posts
Showing posts with label IT Management. Show all posts

Tuesday, 10 June 2008

Security in anonymity

Amazon have had some recent outages (Bots to blame for Amazon.com outages? | The Register) which raise a spectre of doubt for me over the viability of cloud computing services.

These services allow smaller organisations to benefit from large investments in computing power made by a small number of organisations, Amazon, Google, etc and buy IT infrastructure as a service.

The concern for me is that they provide a very, very big and obvious target for people of nefarious intent to aim at. Global brands like Amazon represent a prize for hackers and it would seem if one goes down, everyone goes down. Their being S3 services were affected as well, taking down all customers services that rely on them.

Doing it for yourself, on your own servers gives you the protection of the crowd in much the same way as being in a shoal of fish provides protection to the individual members from predators. Of course, you have to do a certain amount of self protection but being far less likely to be a target makes that a lot cheaper.

The counter argument is that these large cloud services organisations have the resources to hire the best defensive minds in the business to protect their investment, a level you couldn't possible afford or justify on your own. The problem is that these people are defending against the best attacking minds in the business because they're protecting the biggest prize, it's a classic arms race.

Another argument is that these services allow you to scale your service rapidly in the face of unprecedented demand. You never know, your service might become de rigueur and in the Internet world that means millions of hits.'You only get one chance'.

Entrepreneurs lap this one up, being the eternal optimists that we have to be to survive the start-up trials, of course my service is 'the one' I'm just waiting for the market to realise and then I'll be ready. Looking at it a little more pragmatically, this is a million to one shot.

What's more likely, should your business succeed, is that growth may well accelerate but not more than is copable with. The trick is to architect your service so you can rapidly scale it, then it just becomes an issue of cash hardware. If you've got your business model straight then this shouldn't be a problem.

So, home grown for me. It's not hard or expensive these days with so many open source services and stiff competition in the ISP market; and I haven't even got onto the perils of locking your pride and joy to a proprietary app hosting architecture.

Tuesday, 9 October 2007

The problem of dependency

Regular readers will know that we've just launched our new web site. There are some teething issues, in certain, thankfully infrequently, situations the CMS system we're using doesn't like presenting the page to the site visitor.

My team are in close contact with EPiServer to get it sorted but it's leaving us feeling a little out of control. It doesn't sit well with the control freakery I've cultivated.

One of the key tenets of the Esendex Messaging Systems that has developed over the years is doing it ourselves. We've developed, much to the surprise of many people we interconnect with, our own protocol implementations to interact with their systems. On top of that are our own messaging systems, underpinning our customer applications. Our mantra has become:

DIY to get it right

I appreciate this might sound a little like I'm blowing our own trumpet and you're probably right. We're actually very proud of what we've done and it's given us a strong reliable platform that our customers can rely on.

Enter the third party CMS system. We're using it because it is very functional, does far more than our in house system we had built, and gives us the control we need. However when there is a problem we're paralysed. Frantically trying to debug the problem on behalf the supplier to try and sort the problem. It just doesn't sit right with us.

I have no doubt this was the right decision. The EPiServer platform is a cut above anything else we evaluated and gives us so many more features. We've just got to accept that sometimes we can't fix it.

Wednesday, 3 October 2007

Another fantastic example of third party support

I came in this morning to another fantastic example of support. This was from the company who provide one of our systems management tools. We contacted them after we found a problem while testing an upgrade. Bear in mind we pay for this software and for support as well.

I wanted to let you know that this issue has been escalated to development, who are working on a fix for a release in the near future. When this release is made available, I’ll be happy to update you with download information.

Lucky we don't want to use it anytime soon then.

Tuesday, 2 October 2007

New Ecommerce Payment Processor Required

Aaaaarrrrgggghhhhhh!

We use HSBC for payment processing on our website. It has made sense from a commercial point of view to use the service provided by our bank. We have so many bank accounts in different countries and currencies, anything to keep it simple.

Tonight I've had enough.

The support has always been atrocious, a non-technical helpdesk trying to support an integration product by script rather than getting access to people who know what they're talking about. If it goes down, we just have to lump it.

This evening the PAS processing, for Visa and Mastercard secondary validation, was broken.

We phoned the helpdesk.

After holding for an half an hour we got through to someone to be be told:

The technical department has gone home, could we email in detail of the problem so someone could look at it in the morning.

So sorry customers, especially in Australia, you can't charge up your accounts because our payment provider can't be arsed to sort the problem.

So that's been the final straw, time to search for someone else to provide our payment processing.

I'll keep you posted on our search.

Thursday, 23 August 2007

Has the age of spin really ended?

I came across VoIP Watch recently and this post, Grand Central Numbers Post about Grand Central's post Number Changes caught my attention. He's made a number of posts about the 'Great Skype Out' as he described it.

Andy waxes lyrical about the founders of Grand Central and their open and honest approach to the issues they were facing with certain numbers they were providing to customers. He compares this unfavourably to the Skype's recent announcements concerning the sign-in issue they experienced. As he makes quite clear in his post, the principals of Grand Central are friends and clients, but was their post really that open an honest? Is it part of a new wave in customer communication?

GrandCentral basically seem to have passed the blame quite conveniently onto a supplier. In my book that's not necessarily open and honest about internal issues but more passing the buck. How do we know this wasn't spin, stretching the truth, or otherwise dressing up the situation?

Skype on the other hand, did it would seem, have an internal issue. The like of which, as a developer and service provider myself, do indeed materialise from time to time. Sometimes they're irritating, othertimes they can be terminal. That's the nature of complex systems.

I suspect that Skype's honesty issue was more around protecting their IP. I can imagine the technology that running a telephony service supporting millions of users on top of a network infrastructure you don't control requires some pretty intelligent programming. Something worth fiercely protecting.

Grand Central are also guilty of slapping a BETA tag on their service. This seems to be not only very fashionable of late but has the added benefit of allowing companies to absolve all responsibility for reliability. Should Grand Central come out of BETA, I wonder if the tone and delivery would be the same.

When it comes down to it, customers just want systems and services that work. Whether you adopt an honest John approach or spin the issues to the hilt, the customer will only remember whether the service was there when they wanted to use it.

Thursday, 26 July 2007

Power efficiency with virtual servers

Kevin, our Systems Manager, has been evaluating some interesting technology that should not only make his job easier, but also save some power as well.

VMware provide arange of server virtualisation products that allow you to run multiple operating system instances on one instance of hardware.

There are number of servers sitting in our production network performing essential roles (Windows Domain Control, DNS Services, monitoring, etc) that do not result in heavy hardware utilisation. From a utilisation point of view the services could easily run on one server. The problem with these services however, is they're not necessarily good bed fellows and are recommended to run on separate boxes. These services also need to be resilient so 2 servers are required for each service.

Enter VMware, their technology sits between the hardware and the operating system unlike Windows Virtual Server which is an operating system service. This means any OS can be installed side by side, ideal in our environment where we do have some Linux servers in place.

If the technology passes our evaluation, 6 servers that were running 3 services in a resiliant architecture can become 2. 3 virtual servers on each box each 'side' running a copy of the other.

So we need less hardware which also means less power (see Power is the currency). Instead of 6 servers consuming power just to sit around and do very little. 2 servers are running optimally, reducing the consumption by at least 50%.

Thursday, 19 July 2007

Business Continuity, always expect the unexpected

When discussions turn to business continuity talk generally turns to server failures, power outages, fires and the like. However there is always the unexpected.

The centre of Nottingham, where our office is located, was without mains water today Taps Run Dry In City. Seems a leak had developed but Severn Trent Water couldn't find it.

This seemed quite funny at first. We have water bubblers for drinking water anyway and people were happy to hold-on, imagining it to be a temporary situation. By the time lunchtime arrived and still no sign of a fix we had to give people the option of going home.

Fortunately, most people stayed on so the show went on and business carried on as usual. We are lucky in that most people, certainly in sales, operations and support, can do they jobs from home in a crisis situation, but it's not ideal. It's another scenario to be considered in our contigency planning.

Another example was given to me by a friend who is responsible for his company's IT infrastructure, including their datacentre. In the recent floods, their datacentre flooded. The cable void under the floor was filling with water at an alarming rate.

Suddenly he was faced with questions like "Where do I hire a pump?", "How big a pump do I need?", "Where do I pump the water too?", "How quickly can IT engineers pail water?". Questions you don't want to be asking for the first time on a Saturday afternoon when your servers are on the verge of bath.

Preparation, redundancy and contigency planning are key to surviving most incidents. Unfortunately the unexpected laughs in the face of planning so having people with the ability to think effectively on their feet is probably as important.

Wednesday, 18 July 2007

Moore's Law in action

We recently replaced the front-end application servers that support our Web SMS and SMS API services as part of a longer term infrastructure refresh programme. We've essentially replaced them with their current equivalents and the difference is astounding.

Where our last servers were averaging 70-90% CPU utlisation, these new ones rarely get above 20%. In additon, they draw less power than the servers they've replaced. handy given Power is the Currency for datacentre charging these days.

Moore's law tell's us that processing power doubles every 2 years. It seems he had a point.

Monday, 9 July 2007

Service Availability - oh the irony

I only posted about this last Friday and wouldn't you know we had an outage this morning. It was brief and was more of a slow-down than an outage but timing is everything.

We are redeveloping our web site at the moment and will be including a section to contain reports on outages and availability However until then I feel I've let the genie out of the bottle and should follow through with an explanation of what caused the outage.

In short we brought onboard a set of new front-end application servers and our load-balancers did not behave as we expected. They decided to ignore all-bar-one of the new boxes, giving all traffic to that one box. The new machines are faster than the old ones, but not that much faster.

It was picked up very quickly and resolved within a 3-4 minutes. We're updating our procedures to make sure this doesn't happen again. Sorry if you were affected.

Friday, 6 July 2007

Service Availability

As a service provider we have to make sure our service is available for our customers to use whenever they want to use it.

When we opened Esendex Australia a couple of years ago, we did so confident in the knowledge that we were running a 24/365 system. Unfortunately that confidence was a little misplaced.

The service was running but we'd got in the habit of running all those little system maintenance jobs in the early hours when our UK customers were generally in bed or had low volume requirements. Not so those pesky Australians, they insisted on using the system during their office hours! The expected a very responsive system and weren't always getting it.

We soon moved things around and everyone is happy but providing 24/365 availability does require a cultural shift in an organisations approach.

The development team are used to this as an approach, the architecture of the various service components that handle our message processing and routing are designed to be updated live without impacting service.

Our DBA (DataBase Administrator) on the other hand as an especially tough job keeping the databases optimised while also keeping them constantly online, or at least that's what he tells me ;).

Earlier this year we realised we needed an external monitoring service to give us a customers' eye view on our service. We have always had internal monitoring system running all the time, alerting the relevant people. This is an internal system and the danger is you make assumptions that are not correct for customers.

We settled on Alertra as they seemed to provide both the breadth of monitoring points and the depth of protocol monitoring we needed. Their alterting system also seemed pretty reliable.

We've setup monitoring on our key service access points, and thanks to Alertra's rather nifty Public Uptime Statistcs I can share the current status with you now.

Pretty good results, though not necessarily the 100% we were hoping for. It turns out we were really thankful for the external monitoring because we did have an outage that our internal monitoring didn't catch.

We host our own DNS (Domain Name Service) servers and it turns our we had an issue with the configuration. So while our service was happily alive and our internal monitoring was happily reporting all was good, some of our customers couldn't find our servers.

We've now added monitoring of our DNS servers so that base is covered.

It has shown us that there is no room for complacency and that a service truly is the sum of it's parts.

Monday, 11 June 2007

Training in the art of extreme programming

One of the challenges we face when new developers start with us is introducing them to the wonders of Extreme Programming (XP).

For the un-initiated, XP is a set of development principles and methodologies first formalised by Kent Beck in his book Extreme Programming Explained.

This approach to development has dramatically reduced our defect count so when we release new features into our system they just work. This has served to increase confidence in our service not only with our customers but also internally. Our sales and customer support teams know that they are unlikely to be fire-fighting on the front line while the technical teams are fixing the latest release. I'm not saying nothing ever goes wrong, but it is most definitely the exception.

All this has been achieved while maintaining the agility of the development team. They can respond quickly to requirements or enhancements within weeks rather than months. Given the complexity of our system now, multiple message formats, routing features, multiple languages, multi-currency billing, real-time monitoring, etc, etc this is great news for the business.

We can add new services and features without breaking the existing system.

However, this isn't possible without a very different development approach. Pair Programming is probably the most obviously different element. This is whereby development is completed with two people side-by-side on the same computer. But Test Driven Development, the The Planning Game and the other practicies can all be quite daunting for someone new to the approach.

We have a couple of new developers starting with us in July so this is particularly pertinent. Previously I have thrust a copy of Extreme Programming Explained into the hands of the new guys and then immersed them in the process straight away. Nicholas, one of the senior members of the development team, has found an alternative in the book Extreme Programming Adventures in C#. He's highly recommended it as a book that explains how as well as why in the context if C# development rather than something more abstract.

So I'm off to buy some copies, will let you know how we get on.

Wednesday, 9 May 2007

Dell server power consumption

Had a comment on my post Power is the currency asking if I knew how many amps a Dell Poweredge 860 would draw on average. Funnily enough I did thanks to the Dell Datacenter Capacity Planner.

It's a useful tool that allows you to virtually build your rack(s) and estimate the heat generated and power consumed. Perfect for these power sensitive times.

Tuesday, 20 March 2007

Power is the currency

We host our SMS Service applications in one of the Global Switch hosting facilities in London. These facilities are super-secure, highly available and historically were priced based on the space you occupied. Times have changed, we are at contract renewal time and the currency is now the amp. The processing power of servers has got denser and denser, ie more work can be done in a smaller volume. The introduction of blade servers now provides units that are 7U deep and can hold 20 dual core processors, and 20 drives, giving awesome computing power but also awesome power consumption and heat output. This gives the data centre operators a real issue, when before 12 amps was enough for a rack and the temperature management system was more than adequate to cope, it is very easy to be pulling 20 amps per rack with a standard set of servers. This creates power density and heat dissipation problems that the original setup wasn;t designed for. There is a vervent amount of upgrading of environmental systems at these facilities but with that comes a very different price tag. But the flipside is cramming servers into racks is no longer an issue, the chance of finding one that will take the load is remote/v.expensive. So what does that mean for us? Our SMS server applications are based on a multi-component, distributed architecture enabling us to scale out by just adding more servers. We're currently refreshing our current set of application servers and the original plan was to go for blade arrays to keep the space utlitisation down. This change in rules as meant we can look at good old fashioned pizza box servers like the Dell PowerEdge 860. It's cheap (relatively), powerful (can have quad-core processors) and best of all doesn't draw much power. We can also space them out in the racks quite nicely to keep them running cool and optimal.