If your website experienced a surge in traffic, would it be able to cope? Or would consumers be left out in the cold, seeking solace from your competitors? Piers Ford looks at how contingency planning can help mitigate a sudden system overload
Nothing exposes the fundamental need for technology to work like the traffic surges that periodically hit customer interfaces across the online travel industry.
Just ask BAA, which suffered the double whammy of a major terrorist alert in the summer of 2006, followed by a pre-Christmas fog blanket, both of which grounded hundreds of flights and prompted an overwhelming onslaught of hits that left its websites barely functioning.
BAA’s remedial action – a temporary switch to text-only pages – was hardly impressive but it did demonstrate the challenge the authority shares with airlines and travel agents when it comes to contingency planning for site and booking system availability.
BA had similar problems during the Gate Gourmet catering strike. And fog-bound passengers sent home to rebook cancelled flights last Christmas found many airline websites simply hadn’t caught up with the situation quickly enough.
As Tony Walsh, development director at LateRooms.com, explains, big brands and services are protected to an extent by their market dominance.
“If BAA or EasyJet fails to deliver adequate online services during a particular crisis, customers will feel aggrieved,” he says. “They’ll call it names and get very angry. But they’ll use it again the next time they need it. For a company like us, customers are more flighty. If we fail to deliver, they’ll go somewhere else. So, the ability to maintain web-based services through peak demand is a make or break situation.”
The hit, particularly for smaller organisations, is that contingency planning requires investment in servers that may be redundant for much of the time. Walsh says that while LateRooms’ traffic – three million unique visits per month – tends to be steady, previous experience has proved the need for accommodating surges.
“When we were still relatively new, our visitor levels were more variable,” he says. “If the Sunday Times ran an editorial piece about us, we’d see a traffic spike and the server might grind to a halt. Today we’re at a level where a peak is going to be a matter of a few thousand extra hits, and our database and web servers can cope with that. Our front-end servers are load-balanced and we over-egg them all. And our contingency planning is based on our ability to use all the computers in the company – the IT department only represents a quarter of them – so that if we had to, we could increase capacity by six or seven times at the flick of a switch.”
Seasonal spikes and variations make capacity planning a notoriously difficult science for airlines and online travel agencies, so it isn’t surprising that many of them now use global distribution systems to operate their booking and reservation systems, effectively offloading the stress to suppliers such as Amadeus, which says it is well-equipped to mitigate system overload.
“The search process alone is extremely resource intensive,” explains Gilles Mascaras, head of online and leisure at Amadeus. “For each actual sale on the website, 100 people are shopping, so when you have a peak it really stresses the system. We supply 100,000 travel agencies worldwide, and most of the big airlines use us for their reservation processes. But we size our systems to cope with anticipated worldwide demand, which is 25% of our capacity. So we have a buffer. If there’s a local sudden peak due to the weather, a specific event or a promotion, we can easily absorb it. And even when there’s a worldwide event, there will be available capacity because of global time differences.”
Mascaras says that Amadeus’ migration to open systems means that accommodating higher volumes of search activity by simply adding server capacity has made it considerably less expensive than a decade ago, when online demand began to take off in the travel sector.
Even anticipated surges can be a burden for smaller outfits. Andy Firth, technical director at Holidaylettings.co.uk, says that it’s impossible to gauge exactly when a seasonal or daily peak might ‘break’ a site. His solution is to anticipate 10 times more traffic than the current average. And because the site allows its 19,000 holiday homeowners to post up to 16 images each, it is also important to cache the most popular search results and frequently accessed photographs in memory, in case of emergency.
“Our main traffic surge is the ‘Boxing Day effect’, when everyone’s thinking switches from Christmas to the family
holiday,” he says. “This is an annual trend that we have experienced for a number of years. We will surge from having 50,000 visitors per day to 100,000 visitors per day. There are big daily peaks too. For instance at 8pm we have twice the traffic than at 10am.
“To test how our server will perform under high-load conditions, we simulate high-user traffic using a website profiling tool. Our core search facilities are designed to scale well under increasing loads. So, if visitor numbers double, the processing required to service those searches rises less steeply.”
But unexpected peaks affect the site as well. Most poignantly, in the days following Madeleine McCann’s disappearance in Portugal last May, the site experienced a mini-surge as people searched for images of holiday properties and their surrounds in Praia da Luz.
One way to remove the costly overhead of redundancy and spare capacity, of course, is to consider hosted services. The utility computing model advocated by service providers such as Xcalibre is gaining currency in the industry, according to chief executive Tony Lucas, because it allows travel site owners to scale their resources up and down, as and when required.
“It’s an alternative to solving capacity problems by just throwing money at them,” he says. “For example, if an organisation such as BAA wanted to avoid a situation where its site was broken for four days, it could build us into its contingency plan. We would store copies of the operating system and images for a nominal monthly fee, and its preconfigured space on our servers – which have up to 64GB of memory, segregated into chunks – could be turned on in minutes when necessary, boosting its capacity by several hundred percent on a pay-as-you-go process.”
One potential drawback of hosting is that travel sites sharing server space with each other, or with partners subject to similar seasonal traffic flows, might experience surges at the same time, neutralising the model’s benefits. But Lucas says careful industry tracking and the scale of the servers he envisages would make simultaneous conflicting peaks unlikely.
“This is one of the key points about the online travel sector,” he explains. “There is more variable demand than in most industries, which means that unless you have a completely static requirement, you’re either under-utilising and over-paying for equipment, or you’re struggling during peak periods. Utility computing is the only way to solve that.”
Travel agency Ebookers.com takes an internally collaborative approach to contingency planning, which includes tight integration with the company’s call centres.
Reviews are always held before planned traffic surges to understand the risks to the agency’s technology platform, and standard operation procedures are written for the service operations centre outlining triggers and actions to ensure that the platform isn’t destabilised by the surge.
“Collaboration and planning between the operations, marketing and technology teams allows us to have an edge in anticipating traffic surges – online and offline – as a result of campaigns, product launches, competitor actions, seasonality or related events in the industry,” says director of technology Ilya Borisov.
“Our platform is monitored 24/7 by the service operations centre, which is able to detect traffic abnormalities at the onset of the event and take appropriate action via previously agreed procedures, and alert the business. We also use geographically distributed call centres, and call volumes are monitored round the clock. The operations team is able to make adjustments to call centre staff and call flows in real time, to adjust to the surge.”
The Ebookers.com platform is modular, allowing for additional capacity to be deployed rapidly, and most of the application components enable rapid capacity redistribution to alleviate the strain of a surge.
“Capacity and performance are routinely reviewed and appropriate actions are taken to ensure adequate headroom exists across the entire technology platform,” says Borisov.