We’ve all heard the stories. A website crashes because of a spike in user demand. For the media this is the classic egg-on-face IT insider story: someone hadn’t deployed network resources wisely and when visitor numbers soared, things went bad quickly.
But in all the finger-pointing, the really significant story, i.e., the future, was missed.
Here’s that future:
What if you could tap the user experience to shape and even determine how your network will run? Not after the fact, but in real time?
Let’s use an analogy to something consumer facing: the contact centre. In the contact centre world, you know that when something goes wrong with a network, you are going to hear about it. The calls and emails that flood in with complaints mean that the users are the canaries in the coalmine when it comes to performance problems. Actually, they are more than canaries in coal mines because canaries only warn, these calls and emails actually tell you what has gone wrong and frequently what you need to do about it to make things right again.
The contact centre –or at least the very best type of customer service— recognises that this feedback needs to shape the business. In a sense, it’s the “Lean Startup” model. You don’t worry too much about getting the whole structure right from the beginning. You shape it as necessary in response to the input you get, and the data you collect, from those encountering that structure.
So let’s move to networks. What if networks, in a sense, had a user-facing nervous system? In other words, what if they could “feel” and adapt to the user’s “touch.” This might seem like something they should always have been built like, but traditional networks were built and run the other way around. They were built to function from the inside, overestimating the value of technical data while underestimating the value of user experience data.
In the old model, we measured the pieces — the individual servers and links — and monitored these pieces independently of each other as IT teams stuck to their individual silos. In other words, to continue the nervous system analogy a little longer, these nervous systems were not only internal, they were isolated. The thinking was that if the pieces were happy, then the application must be okay. That’s been proven wrong. Just because the individual pieces are healthy does not mean that the users are happy. In other words, it’s the service not the servers that we need to look at.
And that is exactly what is beginning to happen – and it is happening very fast.
There is a major shift underway from monitoring elements toward service monitoring, since the services and applications are really the thing you’re delivering. You’re not delivering a Web server or a land link; you’re delivering an application that crosses dozens of links and maybe hundreds of servers. We are talking about applications that from the user’s experiential point of view are totally disconnected from the complex infrastructure that supports them. People simply don’t care about the silos any more than people care about how a television or radio transmission reaches them – transmission static equals poor user experience equals failure. End of story.
The new approach asks questions such as: Are the consumers of the service happy? Are they getting the performance they need? What are their usage habits telling us? Those are your key measurements. User satisfaction has to be the primary way of looking at and evaluating an application. This is what is known as “transaction tracing.”
When user-experience monitoring detects problems, transaction tracing helps solve them. Say your company has an e-commerce portal with an application that runs across 150 servers. What appears to the user as just one page may require the work of 30 different elements. Servers talk to other servers, which talk to the cloud, which talk to offsite services, and so on.
Transaction tracing answers questions such as: What servers did the page connect to? Was the front-end server slowing down the transaction, or was it a fifth-tier back- end server causing the problem?
Transaction tracing lets you trace that spider web of activity back through the data centre. Once you learn that the user’s shopping cart experience was not good, you can rapidly localise the source of the problem to a particular server, even down to the offending block of code. Developers are folded into this process in a kind of cross-tier transaction tracing.
The silos cease to matter as developers become wedded to the production and troubleshooting process. Even under a full production load you can trace every user transaction, so developers can come back five days, a week, two weeks later and see how a problem started. Is the spike periodic or is it continuous? They can interrogate the precise characteristics of how the code executes.
Not surprisingly, this shift is being propelled by the freedom that big data is offering us. Historically, you cherry-picked what data got stored in archives, via triggers and selective rules. You kept just one per cent of the data and threw away the rest. That made sense due to storage limitations, but working from such a compromised data set greatly degraded the quality of conclusions you could draw from it.
Big data has been transformative. Things like Riverbed’s OPNET APM solutions use big data techniques to record, store, index, and archive every transaction. That lets you deal with complexity — the key to shifting from the traditional IT business focused on technology and machines, to the emerging epoch focused on users and applications. And from where we sit, that’s the future of our industry.
Damien Murphy is a Systems Engineering Manager (ANZ) at Riverbed Technology; Russ Elsner is senior director of product management at Riverbed Technology and 14-year veteran of OPNET