Convergence of IT and Network Cloud Solutions: If you had to choose a member to lose

There is a relative new trend in Telco industry: to merge IT and Network Cloud solutions, both at infrastructure and at organization level. In the first wave of Network “cloudification” that was a taboo for several reasons, more or less reasonable. But for really cloud believers this convergence is a natural movement. Sooner o later it is going to happen, as it has occurred for uncountable industries.

Once both IT and Network workloads are under the same organizational perimeter and running over the same cloud solutions something really curious happens. There is a kind of philosophical debate about what class of workloads matter the most.

Network people are convinced their workloads are more relevant, of course!. But traditional IT staff they do not really see it so clear. In fact both of them have suffered terrifying experiences about the unavailability of critical workloads running over cloud infrastructures. So if they had to choose Network or IT workloads to fail now that they are all in the “same basket”, there is not always a direct answer.

It is very human to think that the things you understand and are really close to you are the most relevant. One example: Nobody is going to choose to loose the roof of his house instead of a long break of the food supply chain in his state. Especially if you do not have a remote idea about how this supply chain works and the consequences of losing it. If you think about it twice, you probably would choose to lose the roof of your house, move temporally to a relative’s house and therefore you will not die of starvation. But let’s admit it is not so direct to reach such conclusion and to take that decision.

This same problem appears in these debates between Network vs IT and workloads relevance. Most of the people do not have a clear understanding of the business behind those “remote” workloads that have never been in their daily routines. Therefore you need to understand both environments quiet well.

The first thing is to set the definition of IT vs Network workloads and it is not an easy task. In fact there has been big standardization attempts both in IT and Telco industry¹ dedicated to define a complete map of the processes & systems needed in IT and Telco companies to work satisfactory.

These standard models are quiet complex, at least for my simple mind. Although I studied them a time ago and almost understood them, in my daily work and in most of my colleagues, we rather use a rougher classification. In this classification processes and associated systems supporting the company lay in three main categories, being the 2nd and 3rd traditionally considered as IT “world”. They are:

Network
OSS – Operating Support Subsystem
BSS – Business Support Subsystem

Network has been always the easiest for me to define. Behind the Network systems are the Network Functions that enable the communications services our customers enjoy. It can be minutes of voice, data channels with internet connection, leased lines,… whatever. These services are enabled by radio access, wireline access, transport elements, core network elements and many other boxes.

Many of them are even defined by standardization bodies that provide an enormous list of strange names: GERAN, E-UTRAN, FTTH, WDWM, MPLS backbone, MSC, PSTN node, HSS, SGSN, GGSN, S-GW, P-GW, BRAS, GG-NAT,… Really sick isn’t it? The good thing is there are specifications and/or well defined market products that allow you to understand them quiet well.

OSS are usually tightly attached to specific Network Elements, but also constituting a common umbrella over all of them. In a nutshell, OSS are the applications and systems that allows to operate the Network. It means getting performance indicators, detecting failures and giving access to logs and tools to diagnose and fix the problems. Luckily, if yous OSS is advanced, it can solve the problem by itself with self-healing and even machine learning techniques. If not, at least it serves to wake up a pair of smart guys during the night who are able to solve the crisis.

OSS systems are not so precisely defined as the Network but essentially they use to be a hierarchically arrange of elements. There is a first level of Element Managers specialized in a subgroup of Network nodes. On top of these there are transversal systems that collect, organize and represent the information. These include the central Alarm Managers, Ticketing applications ore the Machine Learning and Self Autonomous Decision systems, etc. Remember those fancy consoles you see in movies or in the newscasts with a big map of the country full of colored lines and dots: these are the OSS!

BSS is probably the more customized and heterogeneous layer. It includes a big range of Applications including:

CRM – The point of contact with your customers, traditional phone contact centers, online channels, WhatsApp business account back-office, chat bots,…
Fulfillment – Your customers and providers place orders, claims,… hundreds of different flows that need to be executed coordinating departments and interacting with other IT apps.
Billing – Really important isn’t?
Human Resources – To manage your staff and overall make the payrolls at time
And many others

BSS are usually a very big mesh that, once built over the history of company, only few persons are able to decipher. A really complex world.

Now we have a clear² classification of Network and IT systems that are, at least in a reasonable percentage, running as workloads over a cloud solution of your portfolio. So let’s make an exercise: Imagine we have a DEFCON1 emergency and we need to sacrifice one group, which one?

Ok, great exercise, but at least your security and risk evaluation teams have been slept from 2010 not even the worst internal sabotage is able to tear down neither the whole Network nor the entire OSS nor the complete BSS. We need to be more precise if we want to be more realistic.

Done: Lets suppose we are terribly unlucky and the incident is one of the worst case scenario of each world. This would be my guess for these worst case scenarios:

Network: The common database node with all our customer network profiles is gone so HLR, HSS, UDM and other related entities are down. Your entire mobile access and some of the wireline services are down. A big stake of your customers are unable to communicate. Be prepared for a reputation impact in media and a wage of claims looking form compensations.
OSS: The central performance and alarm collecting system is affected. You are completely blind. If somethings happen, you are not going to detect it. And if someone do it for you, i.e. your angry customers, you are going to need a tough diagnosing task to identify the root cause and solve it.
BSS: Your fulfillment system is not working. At least that part that serves most of the new sales. Each minute the situation persists the company is leaving behind thousands of € and giving a very bad image to new customers you have captured with a lot of effort.

Now we have enough data to make an informed decision. If I had to choose, what member I would lose? This is my choice in order of preference:

OSS – Alarm Collecting: The reason simple, if nothing happens in between we are good. It should be a two big failures coincidence to begin to have serious problems. Of course is not pleasant to be blind during a long time, but we can always cross fingers.
BSS – Fulfillment for new sales: Things become serious. The sales channels, CRM, financial departments, the CEO.. half of your workmates are thinking on you. You have the company almost stopped!
Network – Common Database: If we think there is no worst scenario than the previous here it is. Now is not your company, but a significant piece of the country it is stopped!³

Of course this is a theoretical exercise. Operation incidents does not allow you to choose. But when designing and building cloud solutions for your workloads it is important to have these kind of thoughts in mind.

Anyway the most important thing is to be excellent designing and choosing cloud solutions. And even so, think always on the most perverse faulty scenarios and be prepared for them: if something can fail it will fail, it’s a matter of time or luck.

Let’s be careful out there!

The most relevant being ITIL an eTOM ↩︎
Unfortunately, there are always grays and is not always so easy to tag a system using these three simple labels ↩︎
I was direct witness of this exact incident more than a decade ago. During terrifying 48H a Tier 1 operator was unable to give access to millions of 2G/3G customers. Really impacting being in one of the war rooms, I saw the roof of the house flying ↩︎

2 responses to “Convergence of IT and Network Cloud Solutions: If you had to choose a member to lose”

muneti

May 28, 2024 at 12:43 pm

I broadly agree with the prioritization of disaster scenarios presented. I would have chosen the same order of priority. However, I believe that the worst-case disaster scenario for OSS is not the loss of the alarm process, but rather the failure in service activation and provisioning. If services are not activated, they cannot be billed, and revenue is impacted… resulting in less income for the company!

Great job, Miguel!

LikeLiked by 1 person

1. Miguel Angel Barrera Ruano
  
  May 29, 2024 at 12:28 am
  
  Thank so much for your comment!
  You are completly rigth, loosing the activarion/provisioning layer is far worse than having an issue with the main Alarm collecting system. The thing is – and probably this is a grey case – I use to locate the activation subsystem in the southbound of BSS, with direct interaction with Network northbound interfaces. Maybe the general consensus is in your side and I am contaminated by my specific work experiences. I’ll try to do some broader research and I’ll make the proper reference if needed!
  Very good comment! Thanks again!
  
  LikeLike

Convergence of IT and Network Cloud Solutions: If you had to choose a member to lose

Share this:

2 responses to “Convergence of IT and Network Cloud Solutions: If you had to choose a member to lose”

Leave a reply to Miguel Angel Barrera Ruano Cancel reply