Here we wrap things up with conclusions and opinions on the topics discussed. We consolidate the big questions around NFV from the earlier chapters and offer our opinions on a path forward.
Conclusion; NFV; future; networks; moving forward
Throughout this book, we have tried to bring the reader along on the evolution of thinking about Network Function Virtualization and in the process instill an understanding of both its potential problems (as realized today) and potential future benefits for network operators.
We have introduced the basic problems in service creation that provided the impetus for a change in paradigm. We moved through a monolithic physical hardware service solution, through aggregates of appliances to the proposition that service functions could be disaggregated and chained together as if a collection of Lego blocks. This latter state is what is commonly referred to today as Network Function Virtualization and the corresponding connection of these virtualized functions Service Function Chaining.
We have discussed the involvement of various groups in defining the conversation, technical details, Standards applicable, and best practices around NFV.
We have taken a look at some of the solution components defined by operators and implemented by vendors through the lens of an open source project, specifically the big ETSI architectural blocks of NFVI and MANO.
We have described the evolution of virtualization techniques, target hardware, and solution packaging: an ever-changing landscape.
By the time of the printing of this book, the world will have already moved forward in the evolutionary process of NFV by some four-to-six months. Given the rate of change we have described and our numerous observations on avenues for further evolution, our conclusion has to be that the material we surveyed above is by no means the end of the road for this topic.
But if this is not the end of the road for NFV, where do we go from here? We could start by working on the current NFV model, because it does not seem to make sense economically (yet).
In this chapter, we will summarize the major points from the past, paint a brief picture of the present and set a course for the future. Our final chapter is not the end of a story. Like SDN before it, NFV has a long way to go.
In our first chapter, we defined NFV as a concept that builds on some of the basic and now salient concepts of SDN and the evolution of virtualization technology to realize network functions on commodity (COTS) hardware platforms. These include control/data plane separation, logical centralization, controllers, application awareness, and application intent control.
We also said NFV was about service creation and that its true value would be realized through a combination of SFC and OSS modernization (the latter reemphasized in the Chapter 6: MANO: Management, Orchestration, OSS, and Service Assurance).
From a business perspective, NFV is more than these things. NFV is an exercise in massive organizational change as much as technology.
We are in a phase of NFV that is destined to change.
The starting point for NFV—the ETSI mandated architecture—is a familiar movie for anyone familiar with the telco world. They can easily draw comparisons with the specification of IMS.1
It is the habit of telco engineers to specify a solution in a manner that there are many ambiguous interfaces that are impossible to extract and implement singly. So, vendors ultimately implement the functionality in one big integrated bundle. We already see some customers already drifting that way.
Because of, or in spite of, its appeal and necessity, NFV exists in a currently chaotic environment. ETSI is continually working on its model, trying to distance it from the fates of models like IMS. Will additional interface models fix fundamental problems?
The implied open source focus of the participants has led to a confusing number of initiatives to expand, define, and develop NFV. Everyone (and we mean everyone) is trying to take the beach so to speak, and declare that their approach—open source or otherwise—can completely and utterly address the ETSI base architecture in the best way. Truth and promises compared to reality are still a little different—quite a bit in some cases.
There are positive signs. NSH is showing the first signs of being adopted (a service is no longer a single function) and some providers are now talking with more about microservices and complex virtual network functions (VNFs) (particularly those with mobile-oriented use cases).
But, as we have hinted throughout the latter part of the book, the question is whether we are following the right model. We are three years into PoCs and early implementations and it’s still not clear.
One of our reviewers posed the question differently:
Web scale providers are executing over-the-top (OTT) services without the strictures of the ETSI architecture and have been for years. Why are we bound to this architecture if it’s not clear that it’s a winner?
Stepping back a bit, it is important to note that one of the often-cited premises of NFV was a massive reduction of OPEX and CAPEX costs around network services.
Certainly, there will be an architecture that delivers on the goals of NFV for which costs and saving cross the threshold of our expectations and needs. But, it is difficult to pinpoint where that horizon in which NFV CAPEX and OPEX savings combine—against the balance of the cost of transition, declining costs in existing solutions, increased OPEX spend on security and NG-OSS—to deliver on that premise.
We have referenced repeatedly the troubling lack of a comprehensive and compelling financial model. We have talked about the cost of infrastructure (including new OPEX in increased ongoing power costs, orchestration, and security) that will be incurred to implement NFV-based services and the implied reductions in operations expense (OPEX).
The simple equation below might help with some basic reasoning around such a model.
TC_non-virt >> TC_NFV (virtualized)
Total Cost (TC) = cost of CAPEX + cost of OPEX over service lifetime
The equation above illustrates a hard reality around virtualization regardless of the mechanisms as described earlier in Chapter 5, The NFV Infrastructure Management; Chapter 7, The Virtualization Layer—Performance, Packaging, and NFV; and Chapter 8, NFV Infrastructure—Hardware Evolution and Testing, as well as whether or not the virtualized component is considered “aggregated” or “disaggregated.” What it says simply is that the total cost of both acquiring and operating a service built from virtualized components must be quite a bit less expensive than the equivalent nonvirtualized version. This implies that it can be made more profitable too, but that depends on the business and its operations in question, so we will not consider that for the moment.
Thinking about the equation above, one will observe that the complexity of TC_NFV component and the obscurity of existing service-related OPEX for the existing methodology make it harder to state definitively that the equation is true—that NFV “works.”
To start with, there are potentially so many more moving software parts in the NFV go-to-market operational configuration. In contrast to an nonvirtualized version of a physical device—say a popular routing platform (with its attendant orchestration)—the operational cost considerations of a comparable aggregated virtualized solution must now not only include the traditional EMS/NMS and orchestration (our MANO/OSS layer) used to commission, manage, and de-commission services, but must now also include a potential virtualization layer (the hypervisor, perhaps less so if a container is used in a pure container environment, but virtualization is not “free”), VNF and infrastructure lifecycle management system (our VIM, eg, VMware or Openstack), including service chaining, resource optimization, and security components in order to operate effectively at scale.
While the cost of the actual VNF might be cheaper than the physical equivalent or at least be consumable dynamically on an at-needed basis, the overall or total cost of ownership (TCO) currently may NOT be running away from the costs of the existing physical model.
Granted, there are encouraging signs (for the CAPEX parts of the sum) that value engineering is already occurring in the platform and optimization of the virtualization aspects of the original ETSI NFV architecture.2 Beyond the continual tick/tock of Intel hardware delivery making per socket density increase and incorporating some of the I/O centric offload technology, this is shown through the increasing focus on containers (smaller footprints), ukernels (packaging), and new networking paradigms (fd.io).
However, as we have pointed out in several chapters, the real difference in the virtualization strategy and resulting economics for NFV versus general “cloud” is the load type—NFV applications are predominantly network I/O centric. If we reduce the equation solely to a measure of network service throughput, for example, a 10 Gpbs “equivalent” for a virtualized service—eliminating the potential overhead of virtualization, will even the simplification TC_non-virt CAPEX >> TC_virtualized CAPEX be compelling?
With a “your mileage may vary” caveat (your discounts for both the virtualized and nonvirtualized platform basis of the equation), the answer TODAY probably will be “no”—particularly if your virtualization platform is of the variety (highly available, five 9s, fully redundant) being peddled for Telco DC.3
Assuming that 10 Gbps/core is achievable throughput for your service function with no restriction on PCI bus throughput and mapping of adapter port to that core,4 what are you paying per core compared to the same 10 Gpbs on an established vendor’s “service edge” platform these days?
Keep in mind that the design of most virtual service connectivity can also carry some implicit connectivity overhead that is not always in the not-virtualized variant. In the simplest scenarios, we may have added an additional Ethernet port per 10 Gbps flow equivalent to the cost. For example, instead of entering the platform on a downstream port and exiting via an upstream equivalent (typical of edge and aggregation), the customer connection now must detour somewhere in the path through at least one additional port dedicated to the virtualization platform. It is expected that these “unloaded/commodity” ports are cheaper than those of the vendor service edge platform, but they are part of the overall cost of infrastructure and should be counted.
A potential to draw ROI closer to existing network platforms may lie in building a network I/O “race car” using additional hardware. At that point, we would posit that you have then changed “NFV” to be a “white box appliance” play—and that may be truly what lies under the current generation of NFV. But it is not exactly the same model of COTS compute as applied to massively scaled data centers.
And, if you are really going to do the math on the CAPEX ROI part fairly, remember two things about the “old way of doing things”:
First, vendors of existing network service platforms have not stopped innovating.
In this landscape both orchestration for and the pricing of traditional physical network functions (PNFs) have not been stagnant since the emphasis on NFV began. (Whether the NFV focus can be credited for this or normal “learning curve” economics are taking place is a separate conversation.)
Another way of putting this is that the revolution in orchestration was going to happen ANYWAY. Several vendors had already stepped into this void and the push for model-based services and APIs will only accelerate the pace.
The continued evolution of PNFs and their management can push out the horizon in which both the agility and the cost-effectiveness of the virtualized counterparts undercut traditional offerings—even a self-constructed white-box appliance.
Second, and as a further complication, for most of the parties involved—both vendors and operators—NFV is not their only operational focus. NFV is not a cliff; it is a transition.
Like SDN, NFV will not occur in a vacuum. Both vendors and operators will continue to produce, consume, and support products to support existing services until customers are migrated—creating a very “long tail” for both. And there are additional carrying costs for everyone implied in that “long tail.” How do we quantify THOSE?
The truth today is that we seem heavily dependent on OPEX reduction for the current NFV architecture and vision for services to work economically.
The OPEX savings COULD be tantalizing. In our experience, most operators are not very forthcoming about the exact allocation of operations costs. The public record can sketch the potential. For example, looking at Verizon’s 2014 year-end public financial report,5 the cost of services and sales were almost US$50 billion (this part includes the lion’s share of service OPEX). Selling, general and business expense was US$41 billion. Aggregate customer care costs, which include billing and service provisioning, are allocated between the two buckets. This was not ALL of their yearly operational expense, but probably is the part most directly related to service. By comparison, total capital expense was only US$17.2 billion.
With that in mind, for many operators an OPEX:Revenue ratio reduction of even one percent can mean hundreds of millions (if not billions) of dollars of profitability. For most, tangible results will probably be realized through a reduction in headcount as an expression of efficiency.
Still, how much of that OPEX is NFV supposed to save exactly?
Given the uncertainty of potential savings, is there anything else can we do to optimize the cost of NFV solutions? Or do we just wave our hands and say it does not matter because the potential OPEX savings is so large?
While the potential for a degree of agility is present in the current realization of NFV and there has been some increase in competition due to NFV (which is bound to be of benefit to the provider), there has been no fundamental change in service architectures themselves.
The ability to change the service architecture beyond replacing the physical with the virtual is the first order change required for success.
There is still a lot of functional duplication between VNFs and we still cobble them together to make services, like we used to cobble together dedicated appliances.
So, there is an opportunity for further optimization in the approach to NFV in front of operators—to change from emulation to new paradigms.
For some services, providers need to think like an OTT even though it may seem unnatural to do so.
NFV may push operators to the cusp of the network becoming “transport centric” with some services going completely OTT—which may not be a bad thing IF it is the operator themselves adopting the OTT ethos. And that is the fundamental lesson for the next NFV phase for operators:
This does not leave the network operator out in the cold. First, there is a profitable future for value-engineered transport and that CAN theoretically be leveraged to add some “binding power” (think optimized paths) to OTT services. Second, the operators can adopt the OTT ethos in their own offerings (or sell through some other OTT party’s solution).6 Third, there is a great deal of automation/OPEX reduction either way.
Of course, along the way, we will have to get a grasp on some of the troubling unknowns in NFV (outside of the economic model itself). Among these, at the highest rank, include what security compromises will be made in the NFV environment.
We need to figure out whether everything can and should be offloaded to COTS compute, and what a hybrid VNF/PNF solution architecture and its economic model look like.
If we really care about the CAPEX component of the solution we need to figure out whether the complex VNFs that cannot be approached as an OTT service (and are just now coming to market) might evolve to be combined VNF/PNF deployments.
Take the NFV service creation model to a new level—think PaaS.
The ultimate new paradigm may be to think of NFV and service creation as a PaaS problem (as introduced in Chapter 5: The NFV Infrastructure Management). Reduce the functionality required into microservices and use development and deployment frameworks that allow more flexible combining, extension, and innovation in creating new services. With these changes comes the opportunity to address the currently very long production-consumption-deployment cycle that hinders agility in the current model (and is perpetuated by simply converting monolithic physical service appliances into monolithic virtual service appliances).
Throughout the book we have illustrated different SDO approaches to NFV, particularly major contributions (or potential contributions) from ETSI and the IETF.
In ETSI, we saw an example of “forum seeking” behavior by a user group which resulted in a study group but no Standards.
With the IETF, we saw the slow and somewhat restrained response of an SDO with a recognized mandate that applies to pieces of the NFV puzzle. The result is focused and not as sweeping architecturally, but important to NFV.
We could continue from here to explore more Standards bodies that are now “studying” or “developing standards” for NFV, but that will not add much more to the conversation. As with SDN and cloud computing before NFV (and probably the “Internet of Things”—the next breaking technology wave), many other SDO bodies (eg, the TMF, ITU, ONF, MEF, BBF) exhibit common behaviors regarding rapidly emerging technologies. They generally start work late after relevance to the industry is established, often straying from their original mandate or purpose to form study groups and pursue Standards in these areas.
You could argue that the restraint and less-than-agile process of the IETF probably contributed to the establishment of the ETSI forum in the first place.
As we pointed out in Chapter 1, Network Function Virtualization, this has been discussed publicly and can hopefully be addressed through structural and evolutionary changes in the IETF and partnerships with Open Source projects relevant to new technology.
While interoperability with legacy systems is paramount to enable a smooth transition to NFV-driven services, traditional standards are no longer the only avenue to achieve interoperability between the management systems or functional parts of complex architectural specifications like NFV.
A bloom of API generation and publication (some using modeling mechanisms described earlier) is attempting to address the gap between legacy and new systems. APIs are also a proposed binder between competing and complimentary Open Source projects pursuing solutions in the SDN and NFV space.
While the ability to create, publish, and maintain an API is a GOOD thing, too many APIs and the inability to agree on a standard API definition for a function can lead to additional costs.
For example, creating a market for API brokerage (eg, the Enterprise Service Bus alluded to in the silo-creating, multidomain approach to resource control mentioned in Chapter 6: MANO: Management, Orchestration, OSS, and Service Assurance). Not surprisingly, an open organization has sprung up to try and insert some neutrality into the API description format (https://openapis.org/).
Open source can be a powerful tool for collaboration and to unlock the stranglehold on captive markets. This advance in the general software engineering space has left a lasting mark on how NFV can potentially achieve its goals is the growing trend of implementation in highly collaborative environments. In many cases there are multivendor open source environments. This is impacting how NFV components are standardized, packaged, and very rapidly getting to the marketplace where network operators now have many choices on how to deploy these services.
While open source has accelerated the pace of innovation in NFV, and open source has been an objective of NFV proponents, it has not been without drawbacks. The open source environment related to NFV was described earlier as “chaotic” because:
• Older projects like OpenStack have started to spread their focus as they attempt to address more desired functionality. Some of the older projects have slowed in their rate of innovation. They can begin to suffer some of the same organizational drawbacks that are pointed to in SDOs.
• In response, new projects are starting up with a more specific focus on individual areas of functionality related to NFV (eg, the fd.io project mentioned in Chapter 5: The NFV Infrastructure Management). So, while an open source framework may endure, not all of its projects might. Over time, the community or the market will move to fill gaps.
• This is exacerbated by a general tendency to “open wash” existing products or generate new projects with questionable governance. This will “muddy the water,” as some companies and individuals jockey to control the “open” conversation or particular pieces of functionality (normally through a tightly regulated API).
• A misconception plagues those unfamiliar with open source—that open source is “free.” This is normally coupled with the idea that there is no way to add value to the functionality.
Without doubt, a great deal of the NFV architecture will be open source. But until open source projects offer the full functionality of vendor integrated and maintained products, there will be legitimate areas in which the operator might purchase “value-added” functionality (eg, orchestration in the MANO space).
The integration and maintenance of architectures with many parts, some vendor provided and some open source presents a great complexity problem for service providers venturing into NFV. These are only compounded by the potential security problems we discussed in Chapter 8, NFV Infrastructure—Hardware Evolution and Testing.
In Chapter 1, Network Function Virtualization, we mentioned a discontinuity between “readiness” and “willingness” in the NFV target customer base. Unless they can “shift” to becoming leaders in their open source projects of choice, that is, self-supporting in this environment, this sort of integration and its ongoing maintenance will have to be obtained by paying a trusted partner.
When we are doing the economic model for NFV, we have to honestly address the systems integration costs and ongoing maintenance costs of the NFV solution.
Because NFV is a cultural change, organizations are approaching it in different ways - often reflecting their internal hierarchies and political realities related to service creation and delivery. Across the spectrum of network operators, different organizations end up leading the NFV purchase decision (and architecture). This has three basic NFV consumption/purchasing patterns:
• Use-case-led consumption driven by individual business verticals within carriers (eg, consumer, wireless, broadband access, security). This top-down approach tends to pick vendor-provided turnkey solutions (that may or may not have open source components). Since the early ETSI NFV architecture work was heavily driven by a large number of use-cases, this orientation is not unexpected. Further, it allows network operators to absorb NFV without necessarily crossing organizational boundaries. Ultimately, consolidation of infrastructure may make this purchasing pattern fade. This pattern does lead to the architecture (and its problems) described in Chapter 6, MANO: Management, Orchestration, OSS, and Service Assurance—effectively “punting” the stitching of domains to a higher level orchestrator and potentially recreating the OSS proliferation of the past.
• Infrastructure-focused purchasing, led by network and data center infrastructure teams, is becoming more common (potentially overtaking use-case driven purchasing behaviors). Traditional operator engineering relates very well to hardware speeds-and-feeds, infrastructure tasks and control paradigms. These teams’ first NFV decisions revolve around choosing a common VIM—platforms, virtualization strategy, and SDN controller. VNFs and orchestration (potentially selected on a use case basis as previously described) have to integrate those components. Here the approach and focus is much more aligned with the “NFV economics come from DC economics” driver originally propelling NFV. This particular purchasing center is often diverted towards “telco data center,” which has different requirements, focus and costs than the traditional, high-volume IaaS data center (driving the network I/O focus covered in this chapter and also in Chapter 8: NFV Infrastructure—Hardware Evolution and Testing).
• Orchestration-focused consumption is something of an outlier, but some providers have started with a use case and consolidated around the orchestration system involved, driving all future purchases. Some providers have also made relatively recent investments in more modern orchestration from vendors to automate and drive end-to-end network services (MPLS pseudo-wire or MEF Ethernet-based services). This purchase would ideally become the high-level orchestrator for their NFV strategy. Here, the provider network management and OSS groups first select a MANO solution with which VNF vendors and service solutions have to integrate. This can be problematic if the orchestration system selected is not open and/or does not provide the tooling for the customer (or their agents) to easily add support for other vendor’s products or interface with other systems (ie, emulating the behavior of older OSS systems).
We are not implying that there is any right/wrong way to begin with NFV. Rather, that every organization will have a different focus/anchor for their architecture depending on their individual internal power structures. These strategies might all be transients as the NFV model and focus change.
Automation and a software-driven approach to services CAN lead to new efficiencies that allow huge organizations to trim down—and that is the plan. A good view of this was presented in a recent NY Times article7 (its subject, AT&T, is not alone in their views).
One of the thrusts of NFV was to enable a new vendor ecosystem in which solutions are produced by software and equipment vendors in a more agile and rapidly developed and deployed model. The idea of making releases of relatively major functions available at a pace of weeks or months and not years has played an important role in the NFV vision.
To this end, in order for traditional service providers to play in this evolving world, they need to adopt and accept these new delivery models. In fact, they need to become part of the new delivery model and begin doing some software development on their own. This directly challenges the status quo in their operations departments by forcing network operations people to evolve their skill sets to learn DevOps skills and even some mild software engineering. The mantra of their future is clear: change or be changed.
As to the genesis of a new ecosystem, at ONUG 2014, a panel group of investment bankers (three technology investors for mainstream banks and one venture capital representative) was asked if the stated direction of the AT&T Domain 2.0 publication8 (relatively new at the time) would lead to opportunities for startups—a new vendor ecosystem. Except for a caveated response by the venture capital representative (if THEY are not bullish, you know there is a problem), the rest of the panel was negative. They cited service provider habits in procurement and vendor management (long cycles and heavy resource demands) as negative environments for fostering a startup. Tellingly, they preferred to back products that appealed to Enterprise and exploit any overlap with service provider sales—and this was echoed by the products being discussed at the meeting (overlay VPN tools for enterprise and enterprise data center infrastructure tools).
The implication is that the organizational changes required for such an NFV vision go well beyond operations and engineering.
Even though we are over three years into the NFV cycle, NFV is at a very early point in its realization. Unlike the SDN wave that preceded it, NFV is potentially on a much steeper adoption curve. The anticipation is that IF NFV does not succeed for the companies that have publicly declared transcendence via NFV—the financial results for shareholders could be calamitous. Although SDN and NFV are often coupled in conversation and analysis, as we have stated several times throughout the chapters, NFV is a MUST for traditional network operators.9
Operators face tremendous pressures from two sources—retaining or expanding customers/business and reducing costs.
For the former, OTT service offerings from competitors in wireline services and a rapid embrace of MVNO architectures in wireless threaten to upend existing hierarchies and relationships.
The often cloud-based OTT offerings can “cherry pick” important large enterprises through their self-serve, multi-operator, customer managed service models (with strange parallels to how CLECs threatened incumbent telcos years ago).
The MVNO architecture slices the network to create a wholesale/retail split such that the transport/infrastructure provider no longer has direct access to the customer.
In both scenarios, the infrastructure provider is commoditized, disintermediated, or disintegrated.
For the latter, investor pressures to reduce costs, streamline operations and become more agile at introducing new services (this also plays into customer retention) mean changes in both labor force and the traditional approaches to providing services.
Unlike our prior book on SDN, we do not feel compelled to convince a readership of the value and place of NFV10—it is obvious. Nor do we feel compelled to encourage the reader to experiment, because experiments abound.
Both of us are bullish on the future of NFV; maybe not on the ETSI architecture and implementations of today, but definitely on the need for operators to become more agile, break their OSS bonds and evolve both as value engineered transport providers and service providers.
The catastrophic scenario in NFV is that OTT vendors take all services away from telcos over time. While the ubiquity of outsourcing the Enterprise to the OTT vendors (eg, AWS) is debatable (see articles that imply much of their growth is due to startups—ie, venture capital driven revenue vs “true” Enterprise) the trend to do so cannot be ignored. These companies are much more adroit in a software world and are slowly becoming credible business partners to the Enterprise. For some OTTs, their back-end analytics capabilities can potentially allow them to derive even greater value from network service offerings by increasing the value of other products (e.g. advertising).
The biggest question is whether the NFV road we are on ultimately leads to the agility required of traditional operators in time to avert that outcome.
NFV is definitely another paradigm shift in the networking industry—the second in the past six years. The rapid succession of these changes is an indicator that networks are important not only for what we used them for in the past, but for what we envision for the future.
Stay tuned…