If Communication Service Providers (CSPs) want to realize true business agility they need to focus on automating every aspect of their processes and tools.
With the arrival of 5G, CSPs are starting to see a huge amount of new use cases where their connectivity services are the key enablers. But what truly matters when transforming from Communication Service Provider (CSP) towards Digital Service Provider (DSP)? In this blog post, we’ll take a deep dive to find out.
When you are offering new services to your customers the performance of your value streams, and how quickly they can adapt to new processes, is vital. Optimizing your value streams is the most painful part of transforming from CSP to DSP. To make this transformation as easy as possible it is important to understand the essential role of automation.
Source: © Scaled Agile, Inc. https://www.scaledagileframework.com/value-streams/
The first problem is usually low visibility causing long feedback loops. To achieve business agility for your value streams you need to have automated assets and processes in place. That means you and your organization need to have DevOps practices, tools, and competencies. It is about returning to first principles and challenging old ways of working. Lead times and bottlenecks need to be visible in real-time so you are not blind when it comes to making decisions.
We’ll now take a close look at the specific aspects of automation that will help you to transform from being a CSP with 6 months cycle time to a DSP with a 5 days cycle time.
Business Process Automation
CSPs are not as fortunate as Over The Top (OTT) players who are able to deploy to production multiple times each day. CSPs work with multiple vendors who are in turn also providing tailored products to multiple CSPs. In this scenario, the software package needs to be tailored for the given CSP based on license agreements and entitlements and legal policies, like export regulations. This commercial step is typically handled in ERP systems outside of the reach of developers and might take weeks due to manual ordering and data processing. In a digitalized value chain this activity shall also be automated, typically utilizing Robotic Process Automation (RPA) tools, like Robot Framework. RPA can help to automate routine operations and quickly connect different systems where an API does not exist.
To achieve the required speed Telco service vendors should invest in IT and Business Process Automation, covering the financial process as well. The good news is that some of our CSP clients mention innovative solution development on their vendor side, achieving a promising 6 minutes cycle time from order to actual software delivery!
Software ingestion and on-boarding
The march of digitalization in SW supply chains is pushing the demand for automation. However, completely automated file transfers over the public internet make delivered software assets more susceptible to various attacks, and via the opened interface different network intrusions are possible.
Proper network hardening must be implemented as a preventative first step, but much more than this needs to be done to enhance security. Continuous monitoring of every subsystem must be executed according to defined policies so unauthorized access, misuse, or modification can be detected immediately. As vulnerabilities are constantly being uncovered (and even exploited) continuous vulnerability scanning of any software component - regardless of whether it’s part of the running network or a delivered SW asset - is a must. There are a number of tools available with frequently updated databases that provide complete scanning and will even act upon anything that violates defined policies. For security practices, we can always refer to the ISP Standard of Good Practices.
To secure the delivered SW assets CSPs must automate the authenticity and integrity validation of these assets. Integrity validation can most easily be implemented by comparing the generated checksum with the checksum at the vendor side. Once that is implemented the risk of “man in the middle” attacks or any malicious software ingestion is already significantly decreased. As a follow-up measure, the authenticity validation should be in place by checking and authenticating the digital signature of assets. The next step should involve checking not just the files but also their attributes because any change in them could also signal a possible attack.
Compliance objectives are well defined by many national regulatory bodies and vulnerabilities are constantly updated in various databases. These serve as a good basis for complete automation to secure and scan any newly delivered SW assets.
We still see many examples where these kinds of security measures are not part of the SW value stream and are merely an afterthought solely for the security team. This approach to security is not scalable and definitely hurts the time-to-market aspect of the value chain. Poorly integrated security could cost you in many ways, including total cost of ownership for any value chain, delayed revenue, or lost business due to a successful attack.
Infrastructure automation
Infrastructure automation, also known as provisioning or configuration management, is about automating the creation, installation, operation, and maintenance of the operating environment. With 5G automation is a must because the infrastructure variation can significantly increase with the utilization of public and/or hybrid clouds, edge clouds, VM, and container infrastructure, and the demand for available infrastructure is driven by the running software applications. Imagine having a massively scalable 5G application that can’t cope with the required transactions because there are no available resources, and you have to call an engineer to make the required configurations.
SDN and SD-WAN are key components of infrastructure automation. With programmable policies, the provisioning of networking resources can easily adapt to the changing needs of the application or the business.
For infrastructure automation, tools like Ansible and Kubernetes are usually the tools of choice. Ansible provides network and infrastructure automation capabilities supporting a vast number of physical devices, virtualization solutions, and operating systems, and with the use of Ansible Tower, infrastructure management can be centralized. Kubernetes operators are essential to manage and meter your Kubernetes applications. With the use of operators, you can implement your engineering processes and engineering knowledge as executables, providing a time and cost-effective foundation towards an agile network.
Acceptance automation
Deployment and integration of a new VNF or CNF is the first step in acceptance. As we have seen in the poll responses during a recent Light Reading webinar, VNF or Network Service (NS) onboarding and their lifecycle management are still the most time-consuming activities for CSPs. As we take a closer look on ETSI-defined NFV MANO architecture we can see that each defined network function is already having its automated process steps available from telecom vendor portfolios, but that still does not add up to an end-to-end automated solution.
Execution of process steps is delegated to different functions, such as VNFM or NFVO, but these elements are not communicating with each other on all aspects of deployment, or more broadly, lifecycle automation. ETSI GS NFV SOL016 is, to a certain extent, aiming to address this shortcoming, but as of right now we cannot be sure if SOL016 will be delivered as part of Release 4 of ETSI ISG NFV. The de facto stand ONAP has taken a somewhat different approach compared to ETSI, and puts more emphasis on end-to-end automation, but it still has many process steps, e.g. SW package management, version control, and integration steps still rely on manual processes.
Complete deployment and other lifecycle management procedures should rely on automating the business process and event-driven system where the engineering force deals with the design of such policies and procedures, rather than their execution. Such closed-loop automation systems must be vendor agnostic, but ready to handle multi-vendor operability. These automation solutions can be built with well-known automation servers or CI tooling (e.g. Jenkins, Bamboo, etc), where the triggers and steps can be implemented in transferable, easily reproducible scripts. Of course, such automation should keep track of a vast amount of data and metrics which should be continuously monitored and acted upon.
Automating the deployment is a key element of any value chain automation because multiple deployments in acceptance can be very time-consuming. A closed-loop automation system in an agile network should execute this crucial step without any human input.
How to have a continuous view of quality?
Having standardized tooling and well-defined test stages on the pipeline is not enough to achieve transparency. Information on quality shall be available in close to real-time and easily accessible. On top of test execution results and bug status, monitoring the relevant service and product KPIs, and representing them on a real-time dashboard, enables automation to achieve the benefits of continuous quality assurance. Roll-out decisions shall be made automatically based on quality gating rules using all available information.
To distinguish between different test goals we propose to identify different test stages on the pipeline with standardized tooling and automation for efficiency and transparency. Concepts described in infrastructure automation, deployment automation, and configuration automation can be easily reused while setting up test environments for different test stages. The best practice is to handle the testing-related infrastructure, including test harnesses and test environments, the same way as other VNFs.
In a typical telco scenario acceptance of an application takes months (one value stream mapping exercise with a client showed that the acceptance process cycle was around 170 days). This is to make sure that the latest giant release of the application does not break any existing functionality, or cause any KPI degradation on the live system, which would result in the decline of end-user experience and possible customer churn.
Acceptance Test-Driven Development and Behaviour Driven Development, where the scenarios of these tests are defined in collaboration between vendor and CSP upfront, is the best practice to utilize for Functional Acceptance. Using Keyword Driven test automation frameworks, like Robot Framework, helps to efficiently automate these tests upfront and makes maintenance and documentation easier. The keywords developed are serving as reusable assets for both the vendor and the CSP and can reduce the cost of automation dramatically. Furthermore, the ETSI standard DGR/NFV-TST006 on DevOps suggests that any VNF shall also include the test cases that are needed to be able to test the given VNF functionality.
Non-functional Acceptance shall focus on many areas like performance, load, capacity, security, and reliability of the NFVI, VNFs, and NSs. These testing activities require different test approaches, test automation, and test tooling, but all need to be an integral part of the DevOps pipeline.
Well-focused, automated acceptance testing that is built into a multistage DevOps pipeline with full transparency and automated gating decisions can dramatically speed up the acceptance process necessary for frequent delivery.
Production roll-out
Once acceptance of a new application is completed you should roll it out to production. Of course, the roll-out should be seamless and ideally should not cause any disturbance or downtime in the network.
A well-chosen deployment strategy is the first step to achieve near-zero downtime roll-out. Whether you decide to go with Red-Black or Blue-Green deployment, Canary releasing, or a simple rolling update, automating these actions is key to success. This is especially true when hundreds or thousands of instances are being deployed or updated. The roll-out itself can easily build upon the previously discussed infrastructure and deployment automation. However, extra steps need to be included for diverting transactions or changing the operational state of the applications. Once roll-out is completed the application probably needs to be scaled to be able to cope with the transaction numbers. In all these cases you can rely on available portfolio items from different vendors, such as any element management system, or NOC application. These solutions won’t provide complete end-to-end automation, but there are various playbooks and operators that can create automated and repeatable solutions to optimize roll-out time.
Production automation
Generally speaking, there are many good telecom vendors providing production automation solutions for Network or Service Operations Centers. However, we still see a lack of data collection and data-driven decision-making in operations.
Typical operational metrics and KPIs are being monitored and collected, but in most cases, the user data and user behavior data is not utilized and hardly any machine learning or deep learning solution is used to create preventative and proactive actions.
Using a properly selected Customer Data Platform with an optimized data lake provides a goldmine of information for network optimization and operation. Once this data is available, the use of ML is the next logical step so that the networks can self-optimize according to end-user needs. In 5G, where the network density is significantly higher, constant monitoring of all layers (infrastructure, application, and application performance) and optimization is key for operational and financial success.
Conclusion: Automation drives your new DSP journey
As you can see there is a lot to do and much to adopt. With automation you should start with the low-hanging fruits and gradually increase the level of automation until you transition all the way to a true DSP. Ultimately, this comes down to the people, culture, and how you manage the organizational transformation. Start by enabling people who know the technology and its ability to adapt to the new use cases.
You need new competencies, but that doesn't necessarily mean that change requires more people. Automation-based methods and practices encourage your domain expert to concentrate on the technology and its capabilities instead of manual and repetitive tasks. And when people see the benefits of these automation practices in different levels of the organization they always demand more.
Published: Mar 27, 2020
Updated: Mar 18, 2024