I learned much of what I know today about systems operations from Bill, a head of network engineering that I hired at a startup about 15 years ago. We had already built separate systems for managing development and testing from production, matured source code controls, and implemented some automation around testing and deployment. What Bill taught me is that we needed to separate developers from production environments because they required a different mindset, tools, and practices to keep production environments secure, stable, and scalable.
He was right. After removing access, instrumenting some basic change control practices, and improving monitoring of our systems, we saw better stability and performance. We also saw better developer productivity largely because they were now more focused on development tasks and less on operations. There was some grumbling about procedures and some finger-pointing when things went wrong, but as the CTO I accepted most of it since the teams behaved well most of the time, and there were measurable improvements in both “Dev” and “Ops.”
This step of separating duties between operations and development is what many startups must execute for them to scale. But when I joined The McGraw-Hill Companies as CIO at Businessweek, I needed to transition in the opposite direction. Everything was operational or support, and there was very little actual development going on.
It’s easy to see how this happens. A business team develops a case for a new capability, and an investment is made to instrument it. A new CRM, CMS, or other platform is installed, and a team goes in to configure it for business purposes. The project funds the installation and development from its initial installation through several improvements and upgrades, but at some point, there is business leader fatigue to continue investing in the practice and platform. They cut staffing and oversight to this platform, and in the best of situations, they leave a skeleton group around to provide support.
And what is support? Well, most enterprises will have a service desk responding to incidents and requests. This team handles the intake and is usually pretty good at responding when there is a system’s issue or need. Maybe a user needs access or a system is unresponsive. In small operational teams, they may also handle a few recurring engineering tasks such as patching, performing application upgrades, monitoring backups, testing disaster recovery, and ensuring that system resources such as network, storage, and system are meeting business needs. But as soon as there is an issue with the underlying applications, addressing it is often beyond the skills of the service desk professionals, and a developer or specialist is needed to resolve it. These responsibilities are what many organizations call support. But that’s not all that application support teams typically handle.
When enterprise systems are rolled out, care is taken to service the infrastructure and the underlying business processes. This often leaves many administrative areas as “support tasks,” which require a human to intervene and follow steps in order to complete them. Examples include processing data feeds, adding or removing users, or modifying data when there is something new or there is a change. This not only happens with enterprise systems but is even more likely to happen with homegrown applications. In the best of situations, these administrative tasks are handled by tools and workflows, but sadly many systems are deployed without these provisions, and administration requires technical skills to script a procedure, go on a production system to kick off a job, or access the database to modify data.
So, when you step into any organization that has a mix of enterprise and proprietary applications, you see common patterns of how they are managed. Most of the IT team is in support mode. In the best of situations, they have few incidents and requests coming in and have sufficient technical skills to do some development and basic improvements to the underlying applications. In cases where that hasn’t happened, these applications fall behind in basic lifecycle management and are either near or past their end of life. When that happens, there is complete paralysis among the IT staff to make changes or perform upgrades.
That’s how applications and systems truly degrade to legacy status. Between a lack of resources and skills, lost knowledge, and poorly documented testing practices, the entire organization develops a rational fear of performing upgrades. When an upgrade is finally commissioned for a legacy system, there’s a reasonable chance that it will disrupt business users and create a complete lack of confidence in investing in new technologies or partnering on improving business processes.
Now here’s the key question: How are you supposed to execute a transformational effort when all your IT staff is barely supporting the existing applications?
Let’s look at the two preceding scenarios and see how we can bring them together. In the first scenario, we have a startup whose business is starting to scale and needs to grow and separate out operational responsibilities. The goal is to make sure the applications are performing well, that infrastructure can scale and developers can be freed up to focus on the customer and product enhancements.
The second scenario evolved over time, and virtually all the staff is on the operational side to the extent that very little customer-driven or strategic improvements are occurring. In this scenario, development responsibilities need to be separated out from support related tasks.
These two scenarios, one likely happening in small startups and the other in larger, more established organizations, have a common organizational solution. Figure 3-1 represents a target end state organization and responsibilities.
Figure 3-1. Defining DevOps practices maximizing agility and operational responsiveness
Development teams take a prioritized list of epics as their primary input and are responsible for delivering customer and strategic improvements through agile sprints and releases. With respect to operations, their goal is to hand over releases that adhere to technical standards, deploy easily, introduce few operational issues, and include updates to any operational procedure.
Operations teams are procedural and are driven by incidents and requests escalated by users, by recurring and standardized operational processes, or by operational projects that get prioritized. Their goal with respect to development teams is to free them of operational issues. When there are issues that require fixes to the applications, they escalate these to the development team as defects.
Quality assurance responsibilities and tasks enable this separation of duties. Releases need to be certified that they meet business requirements and technical standards. When defects are raised, QA’s job is to triage them, determine severity, and ideally suggest a course of remediation to development teams, who then need a workflow to add them to the development backlog.
There are times when these teams must collaborate. Collaboration is often required if there is a severity issue affecting customers. Many IT departments will convene a “war room” to bring individuals with the right knowledge and skills to triage and resolve the issue. Another time is when changes are needed that require steps and skills by both teams like architecture changes or higher-risk infrastructure upgrades.
This structure can be contracted and expanded in two dimensions. Larger enterprises are more likely to have this separation in place already, and the strategic challenge is getting funding, resources, and practices to establish development teams that can perform ongoing improvements. But even the smallest of startups can have these practices. Even when there is a single CTO in a new startup, separating development activities from operational responsibilities enables scaling the operation when the business grows.
While I am showing multisprint releases, this separation is still required for teams that have the tools, skills, and practices defined to perform continuous delivery. In those situations, the testing, deployment, and operational practices are automated to an extreme that it enables development to push very frequent releases with very low operational impacts.
Let’s look at the organizational responsibilities in more detail.
The development team should operate as the technology extension of the business. This means that they must be looking to provide expertise in enterprise applications and specifically on how business users can be more productive, drive data-driven decisions, connect with customers, or make supply chains more efficient. It means that they should be spending their time learning user needs, researching solutions, or building and enhancing applications. Ideally, developers ought to be spending 70–80% of their time working with users on new capabilities and ideally only 20–30% on “support.”
I’m using “support” as a catchphrase for time spent on anything that is not development toward building or improving applications. It can be chasing an operational incident or diagnosing a defect. It can also be upgrading versions of anything from an enterprise application to a JavaScript library used in an application.
It can also be anything technical to make the development team more efficient. It can be investing time in configuring version control, automating builds, or scripting deployments. It can be time spent researching technology tools. All this time, while very critical to an efficiently running development program, is time balanced against direct efforts to improve the business.
More than just how developers spend their time, the impact of responding to noncritical operational issues is a major distraction from their development activities. Picture developers with their heads down, headphones on, intensely banging away at multiple screens being distracted several times while coding a critical algorithm to answer user requests escalated from the operations team. It not only disrupts the concentration required to engineer efficient, defect-free code; it also affects their psyche. Developers that anticipate these distractions have no choice but to lower their agile commitments to have bandwidth to respond to operational needs and manage through the distractions. In extreme situations where the distractions are significant, it can degrade the working environment to the extent that top developers may seek to leave the organization.
While development teams optimize for speed and agility, they may compromise stability and quality. Sometimes this happens because of business pressure to release upgrades quickly, or because QA is underfunded and can’t adequately test everything, or possibly because they are new to a technology and can’t anticipate exactly how it will perform in a production environment. It may also happen if there isn’t sufficient infrastructure to perform different types of testing or if the data available in testing environments does not sufficiently match the disparity, velocity, or volume of data in the production setting. However this happens, it is the operations team that often feels the pain points of remediating what the development team released, and their goal is to minimize the impact to end users.
While development is optimizing for speed and may compromise reliability, operations teams have the opposite charter and look to ensure the performance and reliability of the environment as their top priority. Their practices, such as incident management and change management, align with this charter and aim to protect the business from disruption. Disruption can be outages that affect customers and business operations but also include incidents that hamper organization productivity.
The by-product of protecting the organization is often rigidity. Examples include infrequent time windows for change management or lengthy lead times to deploy new infrastructure. Operations teams may also have a very limited, IT-centric view of service levels when they define uptime at a system or even application performance level rather that a business success metric.
Why do operations teams become so rigid with their practices, and what’s the impetus to get them to become more agile? The rigidity starts with the nature of their work of protecting the enterprise, which requires them to develop standard operating procedures. They have change management practices, but their culture evolves to be less flexible because of the number of times they have been burned over the years managing bad deployments and defective applications.
It’s not just the pace of business and application development that has changed, it’s also the technologies. Instead of managing racks, servers, networks, storage arrays, and appliances, many organizations are moving more workloads to Cloud environments. Instead of maturing information technology infrastructure library (ITIL) practices and learning how to diagnose infrastructure, operations teams must manage vendors, keep up with growing security needs, and support more flexibility around computing devices. In addition to providing more business and data services related to the enterprise applications, there are now more customer-facing applications with even higher demands of reliability.
So, operations departments still have the same charter of protecting the enterprise but are now facing the need to do so faster, with new technologies, with more personalization, to greater business impact, against a wider variety of security threats, within the requirements of greater regulatory controls, and at lower cost.
DevOps is about the culture, collaborative practices, and automation that aligns development and operations teams so that they have a single mindset on improving customer experiences, responding faster to business needs, and ensuring that innovation is balanced with security and operational needs. For development teams, that usually means standardizing platforms, following an agile development process, and participating in operationally driven initiatives. For operations teams, it means targeting better stability, reducing costs, and improving responsiveness.
“DevOps” is a recent industry term used to describe the organizational structure, practices, and culture needed to enable the services businesses require inasmuch as the importance of reliable and improving technology in the organization and to customers is a competitive necessity. Most experts in the industry agree on many of the core DevOps practices that center around infrastructure configuration standards, automation, testing, and monitoring. Everyone agrees that DevOps requires cultural and mindset changes in IT. But not everyone agrees on the organizational structure and how responsibilities are assigned between development and operational teams.
Some stress a merging of development and operational people and responsibilities so that one DevOps team carries both responsibilities. This does have some cultural advantages as both developers and engineers are now forced to respond as a team to both operational and business needs. Some argue that this structure is more efficient especially in smaller shops that can’t easily dedicate resources to focus on development or operations independently. Many also argue that when you standardize on Cloud infrastructure and automate everything, there is less of a need for having separate people with different skills and responsibilities.
While this may be possible, it’s a difficult hill to climb for organizations carrying legacy systems and without super skilled engineers that can enable all the standards and automation. Even when that’s possible, I still believe that most organizations that are driving digital need to separate first, achieve their digital business agendas, and then perhaps look to merge when there is a maturity of IT practices. It’s from this perspective that I will define DevOps and how transformational organizations need to implement it.
You can understand how DevOps evolved by looking at what happens when an organization big or small begins to execute an agile development practice.
An agile development team is humming along, maturing its practice, putting out regularly scheduled application releases, improving quality, and making business users happy. Then, something happens. Maybe the infrastructure needs an upgrade or needs to scale. Maybe there is a difficult to diagnose performance issue. Or maybe the business and development teams are looking to deploy a new enterprise application to support a new capability.
Operating teams are confronted with a wave of change that they historically haven’t been accustomed to supporting. They go back to classic operational procedures that were designed to manage data centers with heterogeneous computing environments. They attempt to apply them to the new requirements of supporting frequent agile-driven application releases, enterprise applications that now run in the Cloud, or sophisticated transactions that involve API connections to multiple environments.
Confronted with these new challenges, two behaviors emerge. First, many in operations hold to their procedures that have worked for them in the past and try to force these new demanding channels of activity into existing practices. It often doesn’t work because existing practices were never designed for the speed of change or for the flexibility of different computing environments and workload. The net effect is that their resistance slows down development teams and the business as they adjust their operational practices and skills.
The other thing that happens is that some on the development team attempt to take over some of the operational responsibilities. This is facilitated by the availability of Cloud infrastructure that can be purchased by a credit card and a few clicks. Even after that, the developer may have the better skill sets to fully operationalize this environment since most of it can be configured with web tools and scripting. Need to configure load balancers? It’s a few clicks. Need to automate the elasticity of the front-end web servers? No problem, just develop scripts with the appropriate business rules and connect to the Cloud service provider’s APIs. If the operations team can’t get the developers what they need and when they need it, skilled developers can sometimes challenge them by taking on these responsibilities.
When these scenarios occur, the question is whether the operations teams are ready to partner with development teams on the timing, complexity, or scale of the task at hand. When there is a mismatch in culture, values, methodology, timing, skills, or operational practices, a classic Dev vs. Ops clash arises. Finger-pointing, bottlenecks, anger, or isolation may form between the development and operational teams.
It’s not just business or development driven needs and issues that can drive this rift. Perhaps developers are pushing changes and breaking operational procedures. Maybe there is a mismatch between an application and some required systems upgrade that’s holding back operations from maintaining an environment. Or maybe development isn’t as good as they think they are in releasing defect-free, secure, high-reliability applications that perform well, and operations is left holding the bag responding to outages.
DevOps aims to reduce this conflict. It attempts to educate developers on operational responsibilities and operations teams on how to serve business needs smarter and faster. When there is a shared understanding and a better alignment on priorities and process between the development and operations team, a more customer-centric DevOps culture emerges.
The basic technical concept in DevOps is that, as you automate more of the interactions with the infrastructure from building, testing, deploying, and monitoring, you can remove many operational defects and better align development and operations processes. From a practice point of view, the big questions involve what tools and to what degree to invest developing any single DevOps practice area.
When I’ve discussed DevOps with colleagues, other CIOs, and DevOps practitioners, the big divide is who “owns” DevOps. Does the DevOps starting spark come from developers encroaching into operational responsibilities? Is it operations engineers who standardize configurations and push developers to align their development and release practices? Or is it better to reorganize into a single DevOps team that collectively owns this practice and its maturity?
I’m not sure if there is a consensus best practice on this question, but it certainly is one of the more heated topics around DevOps. While all will acknowledge that DevOps requires a culture change, there is some debate on who should take ownership and how to align the IT organization to achieve business and operational benefits.
DevOps practices are designed to make development more reliable and operations more agile and nimble and effectively help each organization with its weakness. For development, practices such as automating test cases, scripting application deployments, and standardizing application builds all aim to improve the quality and reliability of handoffs from development into operations. For operations, standardizing system configurations, automating server changes, and scripting Cloud operations are all tools to help automate operational tasks and enable them to be more agile with infrastructure. Therein lies the transformation, more specifically a better balance between agility and stability.
DevOps practices are most often associated with the following technologies and services:
Configuration management tools store infrastructure configurations in a database and help automate setup and integration. New server configuration, installing applications, adding storage are all operations that can be automated. Container services enable build, ship, and run applications on different types of environments.
Code deployment tools integrated with configuration management know what environments exist, provide visuals on applications deployed, and enable automated mechanisms to deploy new application versions or back out ones if there are issues.
Testing tools make it easier to develop regression tests and automate them to run during application builds or deployments.
Change management operations formalize procedures for versioning and backing out changes to computing environments and applications.
Monitoring systems automate data capture from many disparate systems and provide tools to diagnose root causes of complex performance or stability issues.
As organizations invest more in application development, these practices become more important in order to enable predictable and stable releases. Configuration management enables scaling the infrastructure. Code deployment automation enables easy changes and back-outs when required. Automated testing establishes quality validations to occur repeatedly and quickly. Change management formalizes controls that aid in diagnosing production issues. Monitoring helps alert staff on when an application or system isn’t performing as expected.
Most of this is enabled in software tools or in scripting, especially in Cloud environments. An inevitable problem for many organizations is defining who owns these practices.
In the emergence of DevOps tools and practices, some agile development teams have taken the initiative and gone beyond their core responsibilities into system and operational domains. Some argue that because technologies like Docker and Chef and Puppet enable software-driven configuration management and scripted system deployments, it now falls under development responsibilities to program virtual environments and automate their scalability.
Recall that I target developers to spend at least 70% of their time on development efforts that directly improve business capabilities. That means they devote only 30% of their time to operational and support tasks like responding to incidents, addressing application lifecycle issues, resolving technical debt, and improving development processes. These are critically important, but keeping up with these responsibilities is a tall order and difficult to fulfill within their 30%.
Developers, especially ones being asked to drive digital, simply do not have the time to take on additional operational responsibilities. They should not be involved configuring servers or automating their configurations. If they are going to participate in a DevOps transformation, they should support their operations teams by focusing on the development practices of DevOps such as standardizing their application builds, automating testing, or strengthening their version control practices.
Agile development teams should be self-organizing, but if there is a need to take on DevOps practices, they may elect to put this work on their own backlog. A product owner that senses infrastructure or operational gaps may support them. If there is a strong cultural divide with the operations team, if there is weak governance on procuring Cloud infrastructure, or if there aren’t practices to ensure that development priorities are applied to appropriate technical functions, then agile teams may cross the line and take on operational practices.
I doubt agile teams would like their product owners to start designing databases or developing technical standards (though some try to) because they lost confidence in their development teams. Development teams need to have a similar respect and collaboration with their operations teams regarding operational responsibilities.
If operations teams aren’t getting it done and DevOps is a strategic priority for business transformation, then the technology leaders need to work out the details of roles, responsibilities, skills, and platforms. If the CIO accepts developers handling operational responsibilities, then it might create longer-term animosities or produce conflicts with fulfilling business priorities.
My perspective is that most of the DevOps practice areas should be owned by operational teams. The ones that are tied to application development, such as version control, testing, and automating builds, are exceptions and should be owned by the development teams especially if continuous delivery is a priority. Here’s some additional rationale why operations teams should be stepping up:
The largest benefit in Cloud environments falls with enterprises that manage thousands of server instances and petabytes of data and where standardization and automation provide significant cost advantages. New tools like Docker, Puppet, and Chef are designed to automate configuration management to these extremes, but it is Ops that should learn the technology and take on the responsibility of configuring them.
Agile development teams have already aligned their efforts to business-driven priorities, but their frustration is when operations or the infrastructure can’t support frequent changes. Perhaps development needs to spin up new environments to test an upgrade or evaluate a new platform, and it takes too long to configure. Maybe releases need to be scheduled more frequently, but the deployment steps are too complicated. Operations should collaborate with development to define application, configuration, and other changes that will enable them to perform these activities at more demanding service levels.
The increase in Cloud instances and applications, along with higher expected service levels from customers and stakeholders require an updated approach to monitoring applications and collecting performance data. Applications need many more monitors, real-time alerts, and longer time frame performance trending. Operations teams need to invest in their skills to respond to this greater need.
Security is only going to get more business critical and more difficult to perform. Because operations are in the front lines protecting security, they are in a better position to drive infrastructure and configuration standards that help simplify the disparity of assets that need secure configurations and support.
Server loads are increasingly more dynamic especially when many applications must handle variable loads driven by global, Big Data, or mobility computing needs. Operations are already versed in handling load balancers, application clusters, storage area networks, software defined networks and other technologies to dynamically scale system resources in the data center. Cloud technologies offer these and more capabilities, but the tools and methods configuring them are different. Cloud and automation tools might require more “coding” but should not be outside of operations’ ability to learn.
Operations teams and engineers should seize the day and take responsibility for these improvements and transformations. When agile development teams elect to tackle these challenges, they are doing the overall business a disservice by focusing on operational needs rather than business improvement drivers.
A CIO should recognize when one team needs support and another team is overrunning them. If an operations team is struggling with the new DevOps practices and tools, the CIO needs to step in and pace the program accordingly. A CIO driving a DevOps transformation should look to invest in the following areas:
Defining roles and responsibilities so Dev and Ops people understand who owns what practices and where collaboration is required
Bringing in a coach who adapts to the CIO’s and organization’s governance to help define and transition the culture
Recognizing and addressing skill gaps by bringing in experts, investing in training, and providing sufficient time for practicing new skills
CIO may also consider bringing in outsourcing partners to address some of the DevOps skills or to provide manage services around Cloud infrastructure.
Resolving the potential conflicts on platform and technology selections by engaging both development and operations and creating a selection process
Defining reasonable scopes and goals especially on transformation initiatives and articulating a timeline
The CIO owns the DevOps transformation. Transformation requires a senior leader to sponsor the investment, prioritize the effort, drive the culture, and market the wins.
But regarding development and operations collaborating through a transformation, the CIO needs to step in and be very specific about what challenges and opportunities to address. In what areas does development need to be more stable? To what business benefit and to what extent does operations need to improve monitoring? When instituting a configuration management tool, what is operation’s primary objectives, and where must development contribute? The CIO needs to lead these discussions, set priorities, and secure the collaboration required to make this transformation successful.
A DevOps transition for an existing business or enterprise is not trivial. There are several strategic challenges for large businesses going through a DevOps transition.
The first is financial. Most of IT operations today is wrapped in sunk infrastructure costs and operational practices developed over significant amounts of time. Many organizations simply can’t afford the dual costs of running legacy infrastructure while developing Cloud infrastructure and new DevOps practices. In addition, there is also the shift of what were capital infrastructure investments in data centers, servers, storage, and licenses to Cloud, SaaS, and services that are usually sold through subscriptions charged to operational budgets.
The second one is talent and skills. Cloud engineering, automation, and software-driven configuration management are all new skills for IT operations that have been operating data centers of heterogeneous platforms and workloads. Simply giving them Cloud infrastructure and new tools is not sufficient to reach the automation for continuous delivery.
The third issue remains cultural. Even once you get some alignment on DevOps roles and responsibilities, platforms, and priorities, getting practice areas redefined so that it drives positive business impact remains a challenge.
Therefore, DevOps must be managed as a transformational program; otherwise it will likely fall short of expectations. It can be even worse if the CIO succeeds only in driving a partial transition in technologies, leaving too much unaddressed legacy systems and practices. The CIO also must address how business users perceive the transformation and work with the new services. For example, if business users skip the service desk and walk right up to the development team for their immediate needs, then a key objective of the transformation of freeing up development resources will fail or fall short.
My recommendation to CIO is to build this up as a transformational program slowly. Start with a few practice areas and get some success leveraging Cloud, developing automation, or advancing monitoring. Pace what is added to the programs based on achieving business-impacting success, so drive harder when you achieve it and slow down if it requires more time or investment to materialize. In these early experiments, avoid programs that require material investments, training, or organizational changes.
At some point, the cultural conflicts will emerge. You might see debates on platforms, arguments on who owns what practices, conflicts over priorities, or stress from trying to drive both agile and DevOps programs. It’s at this point that the CIO needs to articulate DevOps as a transformational program and lay out a vision and charter to the staff. They need to be aligned on strategic drivers and then asked to participate in defined priorities and articulated goals.
The vision and charter should be modest because the CIO hasn’t brought business leaders on board with the program. Having some defined success on early programs is exactly where the CIO needs to start but then needs to define the business details of the program, showing the benefits, priorities, investments, organizational change, roadmap, and financial impact. Failing to position this transformation for business leaders will mean that the CIO is likely to get cut off as soon as there is business impact, organizational conflict, or significant investment.
Here are some things the CIO should consider preparing to present the transformation:
Focus on a description of benefits aligned with business drivers, especially demonstrating opportunities for business growth or other areas where regulation is going to require investment.
Define priorities in business terms, ideally with meaningful business metrics. Minimize how these translate to technical priorities.
Be very clear and transparent on the organizational impact especially in IT investing in skills, hiring resources, and possibly reorganizing the team. You’re likely going to need help from HR and other leaders to execute this change.
Make sure you present a realistic roadmap. It needs to be reasonably detailed for the first six months, but the full scope should be represented.
Get help articulating a financial model showing current versus future state costs and transition investments. Make sure your colleagues understand that this is a model that will need to be updated as more decisions are quantified during the transformation.
Be specific on how communications will be handled during the program and when the leadership team can expect updates.
With two practice transformations now defined—agile largely for development groups and DevOps largely for operational teams—let’s now look at key aspects of technology, platforms, and architecture that drive digital.
Developing extendable technology platforms is what makes agile businesses nimble. For simplicity, I’m going to call a platform any single technology or stack of technologies designed to be leveraged in more than one product, application, or business process. Since a product or business process is somewhat fungible, what I mean is more than one application developed for more than one target user group.
It’s easy to see the economics of why establishing reusable platforms fares better than adding optimal platforms for every business case. Consider all the startup costs from selecting a technology, physical installation, configuration, integration, training, and developing support processes that make up overhead above developing applications. Even if the technology is SaaS or Cloud deployed, there are still startup costs to research the technology, negotiate on price, evaluate service levels, pilot its use, and share knowledge.
What is difficult to understand or estimate is the benefit when individuals and teams develop a competence with the platform. In addition to better productivity, the team develops better expertise to guide the business on capability, feasibility, and innovation applying it in new contexts. Strong teams will develop reusable components and services that promote speed and quality of future applications.
Finding extendable platforms has been the role of CTOs, application architects, and enterprise architects. Individual systems are considered “building blocks” of the architecture that can be assembled efficiently toward new business benefits. We like to think in idealistic conditions that platforms will work as advertised, applications and system interfaces will perform as expected, developers will be able to leverage the capabilities in standard ways, and business sponsors will leverage the capabilities and reduce their appetite for customized solutions.
But as technologists, we all know it isn’t that easy. First, it’s hard to get any new platform selected, installed, leveraged for a first business case, and then structured so that it can be reused in new business cases. In many situations, IT teams fumble through the initial project, sometimes because they don’t fully understand the platform and how to use it properly, other times because they’ve been oversold by the vendor on its capabilities, and sometimes because the business sponsors push the team to customize the platform’s configuration beyond its best capabilities. Likely all three happen, and by the time the project is done, there’s enough spilled blood among the technologists that implemented it, the vendor, and the business sponsors that there is little appetite to apply new business cases.
A second issue is that business leaders, regardless of what IT assets already exist in the enterprise, go out and select technologies that “best” fit their needs. This is one flavor of “Rogue IT” because business leaders often do this without any support of IT, and when things go afoul, they either blame IT for the problems or push IT to manage the issues with a problematic vendor that they didn’t have a voice in selecting..
In some cases, these business leaders may not be rogue and will partner with IT to select platforms. But their motivation remains to pick something that suits their business needs first, with secondary business needs a distant afterthought. The problem remains that a single “point” solution is being selected for a business need, not a business platform that is intended to be the basis of transformation. Every time a business leader selects a new solution, it’s one more asset the enterprise needs to pay for and support and one less that will help propel standards.
A third problem involves the platforms themselves. The most versatile platforms are programming environments like Java, PHP, Javascript, and Microsoft .Net, which offer complete programming flexibility at the cost of instrumenting tools, maturing development practices, and selecting third-party components to help accelerate development. Your other option is to select higher-level platforms that provide an out-of-the-box solution for a specific business need and with sufficient configuration capability that it can be tailored without custom programming to business need. Examples include CRM, ERP, BPM data integration, content management, and business intelligence platforms, which all deliver customer and internal-facing workflows and reporting. The most capable platforms then offer APIs, SDKs, App Stores, and full development environments, enabling technology teams to either integrate across platforms or to engineer customized capabilities.
The problem, of course, is that virtually all technology platforms are now marketing these extendable capabilities but are not all equally capable. Are they easy to learn? Are they sufficiently flexible? Do they offer defined testing capabilities? Do the vendors provide adequate support to developers? When there is a platform upgrade, do they ensure that applications connected to their development interfaces continue to work without modification? Do they provide sufficient operational logging or diagnostic information? Are they transparent with performance and security considerations?
Bottom line is that programmatic platforms is a maturing space and not all of them perform equally. It takes a lot of learning, prototyping and investment to discover what really works, what’s maturing, and what is hype.
The last step is getting the developers on board with the skills and practices to make architecture and platform practices a reality. Some developers can code but lack the engineering experience to know how to structure code for reuse. Other developers are versatile and fast but want to develop things “their way” without considering other people’s code or defined architectures. Some developers are conservative and want to stick with platforms they know or ones from specific vendors; others love to shop and tinker with the latest coding tools.
What CIO and IT leaders need are developers who buy into the architecture and development practices, have the skills to develop off the selected platforms, and have sufficient business acumen to be able to recommend appropriate solutions. They need platforms that enable superstars to deliver amazing innovative solutions but that also let average developers build, enhance, and support applications. It’s a difficult intersection of skills, and it takes some time to cultivate this mindset.
Given the constraints, investment, and patience it takes to establish technology platforms along with their importance to the transformation effort, it is critical that a high level of care is taken in their selection. There are several criteria and considerations technology leaders should leverage when undergoing a platform selection or review. I call these agile platforms, and they have the following properties:
Delivers a user experience that people love—Reviews of digital business and technologies starts with an analysis of user experience across multiple sales channels (omnichannel) and device (multimode and especially mobile). This is for good reason because if the experience sucks, if it doesn’t simplify life for end users, if the value proposition is irrelevant, if the data and experience are not ubiquitous across devices, if the user interface isn’t intuitive or if it exhibits other poor usability considerations, then the platform will fail to achieve the desired transformation goals.
Fast and easy to learn—It lets average developers learn and be productive with short ramp-up time. It has a combination of tools and documentation that lets the whole team start using it quickly.
Built on standards—It’s easier to find developers with skills in “standard” platforms and harder to find niche vendor platforms with proprietary methodologies. Also, consider that many platforms die and if you need to migrate, you will want the coding to be largely portable.
Has an open, extendable architecture—This means the vendor can’t enable all the desired capabilities, and sometimes the platform needs to be extended for proprietary needs. Also, the platform doesn’t live in its own ecosystem, and it probably needs to be integrated with other services, applications, and data sources. Does it have a published API that’s easy to understand and leverage? Can it be developed and deployed in Cloud or virtual environments? Is there a healthy ecosystem of development partners, plugins, and other components?
Well-defined performance/scalability—Because no one wants a platform that requires lots of servers for low levels of activity. “Well-defined” means that the platform’s architects have documented, benchmarked, and delivered tools to help developers know the performance and scalability boundaries of the platform. It should have defined approaches to monitor capacity.
“Easy” to install, configure, and administer—I’d like the operations team to own the production environments, and developers need to be hands-off them. Agile platforms need the basic tools for administrators to perform their functions easily and quickly.
Must support Big Data—If it’s easy to learn, then a development team should be able to prototype applications easily. But if it can’t scale to handle larger volumes of data, then these prototype applications may be no more than “tinker ware,” that is, nice applications that can’t be deployed to end users or customers.
Easy diagnostics and troubleshooting—If the development or operations team needs to call the platform’s Technical Support, then that’s a big red flag. Good error messages, diagnostic tools, and useful log messages are all minimal requirements.
Well-defined security model—Security capabilities often need to be evaluated prior to any prototyping, so the model and capabilities need to be well-defined and easy to leverage.
Has embraced self-service capabilities—“Enterprise” applications enable configuration and customization capabilities largely by providing application development and administrative tools to the IT staff. Today’s SaaS platforms are better measured by how easy and how much of this configuration can be accomplished easily and correctly (without complications or quality issues) by nontechnical business users. Many self-serve applications allow users to customize look/feel, publish forms, configure dashboards and reports, enable simple workflows, or perform common data operations. The more sophisticated platforms are fully self-service and enable “citizen developers” through simple visual interfaces, low-code tools, or entire desktop applications fully designed for business self-service.
Has achieved critical mass of fanatical developers and users—At minimum, users and developers should love the technology and be promoting it on social media. The vendor’s conferences should have huge attendance and growing every year. Chances are if the social sentiment of a technology is strong, it’s a significant endorsement that it’s easy, powerful, versatile, secure, and stable. Developers and users do not socially endorse platforms too easily or have the budgets to attend conferences regularly, so when you see thousands of daily tweets, tens of thousands attending the conference, subreddits1 that have frequent and positive feedback, then it’s a sign that this is an easy to use platform that has made others successful in a wide variety of opportunities.
Why do I list these criteria as business and not technical driven? If you ask the vendors, existing customers, or third-party analysts to evaluate the size, financial strength, or customer satisfaction of the technology, then you’re likely to get an incomplete and possibly biased picture of the platform’s long-term viability. When you can measure a growing ecosystem of passionate supporters, then you can benchmark this endorsement against competitive options. That’s one reason why leading platform providers like Google, Apple, Amazon, Salesforce, Twitter, and Facebook all work hard to cultivate large and supportive developer communities.
Demonstrates a strong commitment to data portability and quality—Technologies can create, process, and deliver data, but they often don’t do all three in isolation. Digital platforms should enable growth and that should drive a slew of both data and quality considerations. What usage data is the platform collecting, and how is it used to improve user experience? Does it have well used APIs and other tools to get data in and out? Does the platform provide sufficient functional, regression, and data quality testing tools? What tools, reports, and guidelines are provided to ensure that business-driven configurations do not yield performance issues? What is the performance and cost impact of using the platform with Big Data and increasing volume or velocity of data? What flexibility does the technology have to configure data security and enable auditing? This is a partial list and needs to be tailored to the type of platform and how it will be used.
Participates in the digital ecosystem—It’s one thing to have an API, but is the technology already out-of-the-box integrated with other technologies? Can you easily enable single sign-on with Azure Active Directory, Okta, or some competitive platform? Is it already configured in a data integration or IFTTT platform like Zapier, Informatica Cloud, SnapLogic, or similar? Can your self-service BI platform like Tableau or Qlik automatically connect to it as a data source? What app stores is the technology plugged into? I’m not talking just about mobile applications on IoS or Android; what about SalesForce, AWS, or Azure marketplaces?
What’s very difficult to manage—and what can make me irate as a CIO—is the selection or review of new technologies that overlap with platforms and capabilities that already exist in the enterprise without considering reusing existing platforms or calculating the future costs to consolidate technologies.
Technology media and industry analysts enable this issue by developing new and sometimes proprietary terminology for technologies that have common capabilities. For example, are you buying a CRM, a sales automation platform, a 360-degree customer experience platform, or tools for marketing automation? These tools and platforms have some overlapping capabilities that can be competitive strengths or commodity offerings, but selecting multiple tools leads to complexities.
The best way to avoid this is to develop reference models that document a technological future state that aligns with strategic priorities. Reference models are often useful to document platforms, capabilities, reference data, data flows, and enterprise workflows. Then, when someone wants a “new solution,” the organization is in a better place to review existing capabilities or identify primary requirements for any new technology selections.
This is a difficult question to answer holistically since it depends on industry, business need, and what platforms already exist in the enterprise. That being said, some new platforms and requirements become more important when taking on a digital transformation. I’m going to highlight these considerations here.
First and foremost, digital transformation requires rethinking and developing holistic, omnichannel customer experiences. Front office platforms are designed to deliver customer-facing capabilities, some sold as products, others as business services as part of a nontechnical product offering. They need to be flexible and robust because they need to evolve to mature interfaces that have personalized user experiences.
The exact nature of platforms used to deliver customer experiences will vary considerably depending on the nature of the application. There are content management systems, business intelligence platforms, search engines, portals, e-commerce catalogs, and other front office platforms that can be leveraged for customer-facing applications. There are development platforms and frameworks used to engineer proprietary applications.
But certain principles become critical for these platforms to support:
Multidevice and browser—Any front office framework today needs to support mobile, tablet and other interfaces without adding a lot more development effort. Ideally these should be responsive to handle the device and browser specifics. In many business contexts, front office applications need to be “mobile first” and designed for users expecting to access information from mobile as often if not more often than desktop.
Separation of content, presentation, and logic—Front office applications require some form of separation to enable multiple developers to understand and extend the code base. This usually is done by separating the code that interfaces with data source, code that handles all the business rules, and code that controls the presentation and user interface. A common framework that supports this separation is a model-view-controller (MVC), and most web development languages (Java, .Net, JavaScript, PHP etc.) have several frameworks to choose from.
Modular to support reuse and extensibility—Modularity should enable you to plug in third-party components and configure them or enable you to develop your own components. This option can be very powerful especially on open source platforms or ones with component stores offering businesses the option to buy and configure rather than build on your own.
If you’re leveraging a front office development platform to develop these applications (as opposed to a development framework), then you’ll want to use the following to develop an evaluation checklist.
Business tools for configuration—A big part of the business benefits of front office platforms is the extent that they enable business users to self-support applications. This can be anything from adding users, setting entitlements, publishing content, developing data visualizations, and publishing reports. The easier, more flexible these tools are, the less you have to develop these tools on your own.
Extensibility—APIs and plugin frameworks for more advanced automation and integration are needed to enable developers when customizing something is truly required.
Modularity—Should enable centralizing code and configurations for reuse and enable multiple developers to work on separate modules simultaneously.
Self-documenting capabilities—Once developers configure the application, the business tools used in configuration become buried in the application and hard to review when cross-training new developers or explaining to users how the system functions. Tools should extract basic documentation such as metadata on the application’s configuration, business rules, formulas, styles, and parameters.
Version control—At minimum, this should enable developers to edit the application without impacting production usage. It should then support automated deployment, rollback, and assigning version numbers. Ideally, the platform should interface with version control repositories and saving all business configurations and code there.
Developer pool—At some point you might need to custom-develop something, at which point the availability of talent becomes critical.
Availability of third-party platforms—Front office platforms should have an ecosystem beyond just third-party modules and components. Ideally, they have a marketplace where you can identify components and a published list of approved service providers.
Interface with testing frameworks—One of the bigger issues when working with these frameworks is finding methods and tools to test the application. This is very important for tools that enable customized applications like BI and CMS systems where organizations are likely going to need to automate regression tests or conduct performance tests.
Test as user in role—This is a key need, so that developers can simulate the experience of end users with different roles, entitlements, or preferences.
Profiling—Tools should have built-in capabilities to help developers identify and diagnose performance issues while they are developing the application.
Single sign-on interfaces—Should enable organizations to provide single sign-on for applications developed.
Encryption—Should be standard for any data transmissions and storage. Ideally, vendors should provide capabilities to interface with private keys and to mask data when moved to development or test environments.
Caching—Enables higher-performance delivery to end users by storing computed data, content, or visualizations in memory for a defined time.
Load balancing—Ideally has options to leverage elastic Cloud capabilities.
Disaster recovery— Should enable reflecting data and cutting over to backup environments.
Logging and diagnostic information—These tools are needed by administrators to diagnose production issues.
Metrics—On usage by end users. Not just the basics of who logged in when, but also what parts of the application they utilized. Ideally, they should offer plugin capabilities with third-party analytic platforms.
Auditing—Should enable logging changes made by end users, tools to identify who changed what when, and ideally tools to roll back changes made.
Exporting—Many development tools lock in developers and prevent exporting the code if they need to rebuild the application in a different environment. Toolmakers should have more confidence that even if they offered this capability, most developers won’t actually use it, but it’s a big selling point to IT leaders who have concerns about being locked in.
What are some examples of front office development platforms?
Content management platforms that enable authoring, editing, and publishing content, such as WordPress, Drupal, and Joomla
Business intelligence tools that enable developing dashboards and charts, such as Tableau and Qlik
E-commerce platforms that enable publishing product catalogs and enable selling products
Mobile development platforms, specially designed to develop mobile applications on multiple devices and operating systems
Data management platforms are responsible for aggregating data, cleansing it, aligning it with master data, storing data, making data searchable, providing transaction services, and providing export capabilities. Here’s how these break down:
ETL and data quality are extract, transform, and load platforms that are tools to automate data aggregation and loading into data warehouses and other repositories. They are often bundled with data quality tools that enable profiling data for quality issues, identifying duplicates, defining automation rules to cleanse data, and providing workflow to handle exception cases.
Master data management tools manage databases of “golden records” of entities such as accounts, contacts, and other entities that are core to the business and used in multiple applications.
Data preparation is the latest technology used for easily loading, cleansing, and in some cases analyzing data stored in data lakes and Big Data stores.
Data virtualization tools are often combined with business intelligence and other reporting tools to enable defining data marts and simplifying access to data scientists or other end users.
Web service platforms and service-oriented architectures enable creating reusable data access patterns and business services.
API platforms provide security and other services around APIs that are exposed to internal or client use.
IoT and data stream processing are designed to process ongoing streams of data in real time, whereas ETL platforms are designed largely to batch process data loads on fixed schedules.
Artificial intelligence and machine learning services are new forms of capabilities that can be plugged into and provide higher-level pattern recognition and personalized services. Some are designed to process human inputs such as voice and image. Machine intelligence services are designed to process large amounts of data in order to find complex patterns and summarize and personalize output. Finally, other services are designed to mimic or automate human decision making. As most organizations will not have the technical capabilities to develop proprietary AI solutions, technology providers such as IBM, Google, and Microsoft are opening their AI capabilities through APIs and other data services.
Blockchain services are for organizations that manage digital or physical assets and provide transaction services for them. Blockchain technologies are a digital ledger to securely track both the asset and transactions across multiple parties.
Are traditional enterprise platforms like CRM, marketing, financial, and HR the same old thing, or are the requirements and capabilities different to support digital business?
The simple answer is that while many of the core capabilities are the same as they were a decade ago, digital creates new and accelerated needs in these platforms. Here are some examples:
While CRM is often thought of as a sales automation platform, in the last decade it has taken on far greater importance as the leading platform to bring customer and prospect information and activities to the greater organization. The potential is to make the entire enterprise more customer driven and data driven when defining customer segments, prioritizing prospects, establishing customer engagement protocols, and matching customers to products.
Coupled with CRM systems are marketing automation systems and tools that aid marketers in all aspects of marketing from branding, capturing social and other customer 360 feedback, prioritizing digital marketing activities, and nurturing leads to prospects. The need to compete for the digital consumer has become so important over the last five years that CMO technology budgets rival the CIO’s, and the number of marketing technology products has doubled every year for the last couple of years.2
While financial and ERP systems continue to support the financial capabilities of the enterprise, the capabilities and integration have extended significantly beyond accounting needs. Many digital businesses are subscription driven, a financial model that can be challenging to implement in traditional financial systems. To compete at digital speed and intelligence, more department leaders need access to manage their budget and understand operating performance by linking financial with sales or operational metrics. To enable executives to realign business strategy, make intelligent investments, or pivot their operations, they need access to more real-time financial analytics that enable them to perform both top-down and bottom-up analytics.
HR and employees have received a lot more capability over the last five years as payroll and benefits have become digital capabilities and talent management has become a critical business activity. In addition, enterprise collaboration platforms that enable employees to share information and develop networks have helped larger organizations improve productivity, to develop cross-functional teams to take on new projects, and to retain top talent.
Operational platforms are the specific technologies used by one or more departments to enable their operational responsibilities. They include business process management platforms that enable workflows, knowledge management tools, and communication and collaboration tools. Some may be very specific to the type of work being done, and others are more generic platforms that have been programmed or configured for a specific need. They can be industry-specific systems, as well as CAD engines in construction, electronic health record systems in health care, or manufacturing systems. Technically even the application development tools used in IT fall into this category.
There are a few platforms and classes of platforms that are often critical in digital transformation:
Agile portfolio management, ideation, or innovation platforms aim to engage the entire organization to solicit ideas for new products, operational improvements, and other investments that drive innovation, growth, or cost efficiencies. These platforms also serve as a central tool to communicate the status of active initiatives and enable individuals to step up and get involved.
Collaboration platforms include everything from voice, video, and conferencing to “social” collaboration tools, wikis, project meeting spaces, and document management systems. The exact tool selected is less important than getting employees to use them regularly in meaningful business collaboration that drives results.
Crowdsourcing is a way to elicit outside help using digital tools. In digital transformation, these tools can help get access to the resources to do many repetitive data-oriented jobs such as data cleansing, monitoring, and editing.
Citizen development may be an attribute of a platform or a platform in by itself that enables business users to self-service the development of their own applications. This empowerment enables teams to be more efficient and data driven.
Data science tools include analytic modeling tools, data-wrangling tools, and data visualization. It is the toolkit used by data scientists for discovering insights from data.
These technologies all lie at the intersection of business and IT and are the foundations of digital business practices. I will cover more about these platforms for engaging with business in Chapters Four through Seven.
While relational/SQL engines are still appropriate for many applications, architects have other options today, ranging from document stores aimed at storing semistructured metadata and content, NoSQL parallel processing engines like Hadoop, columnar database, key-value stores, and graph databases. These Big Data platforms are designed to handle massive volumes of data that have a high velocity of new and changing data or a variety of structured, semistructured, and unstructured data types.
Databases are at the heart of running digital enterprises, so selecting the appropriate databases, establishing the infrastructure, identifying talent to work on these new platforms, and demonstrating business value from the investment is a significant challenge. In essence, you are taking approximately three decades of built-up capabilities, expertise, and legacy investments in SQL databases and making a bold statement that the organization needs an additional data engine to compete digitally.
Implementing big data technologies is no small task, and IT leaders can get consumed with selection criteria and the overhead to establish the capability. It’s easy to lose sight of the business needs and opportunities while investing the time to develop the infrastructure, skills, and data integration. Yet for many businesses where data is the future of the business, this is one of the more critical investments.
I will be dedicating a whole chapter to the data-driven organization, which speaks to practices to enable the organization with smarter data practices. This covers the role of data scientists, how IT should upgrade its data services, and how to enable the collaboration required to give the business a competitive edge with data. For now, let’s turn to the underlying technologies that have emerged under Big Data.
For those of you that are mystified by all the Big Data technologies, I hope this brief history of how they emerged will help you understand it better.
Twenty years ago, if you were selecting a database, you were largely choosing between one of three options; Oracle, Microsoft, or open source. Under open source, you were probably looking at either MySQL (now owned by Oracle) or PostgreSQL. The new “use case” beyond enterprise computing was using these databases as the “back end” for web applications. The challenges 20 years ago were largely about performance and high availability.
Media companies in the late 1990s had what was at the time a unique challenge in storing and searching content from short blog posts to long form content and eventually whole books. Back then, you had one of two architectural choices, either store the content as a CLOB in an RDBMS and limit your functionality to basic keyword searching or use a “search engine” that indexed the full content and offered full text search capabilities. Search engine technologies became very competitive into the 2000s until a breed of databases based on document storage presented a new technical opportunity. These document stores enabled storing and searching content out of a single repository designed to store documents and not rows and columns of data. They also enabled new search paradigms that were looking to simplify query construction by using key/value pairs or by offering more expressive searches based on XML and XQuery. These document stores are now one of the options to handle the “variety” of Big Data as these stores can store and search a variety of formats such as straight text, XML, various document formats (MS Word, PDF, etc.) and digital assets (images, videos, etc.).
Now if you were processing a lot of numerical data back in the 1990s, you had a different set of challenges. Databases were relatively slow back then and had exponentially worse performance on larger data sets. If you had a larger data set, your best option was to replicate, partition, or shard the database onto multiple servers. Then you had to find a way to load balance queries to them. It took a little bit of engineering innovation at the larger web companies to execute and demonstrate horizontal scalability at the database level and handle large volumes of data, but there were many ways to accomplish the task.
The main limitation of this approach to scalability is that it largely worked for static, slow moving, low-velocity data. Any data that changed needed to be stored in a “master” database, then replicated to the partitions before it was available for all search activity. The higher the velocity, the greater the technical challenge in keeping databases in sync with the changes.
The approach was most challenging in at least two important use cases. First, companies doing analytics with enterprise BI platforms wanted to do higher levels of computation on data sets that needed updating more frequently. Beyond just handling higher volumes and velocities of data, the computations required were often no match for BI platforms or the underlying databases. If the computation was too complex to compute in either of these technologies, engineers often elected to compute them in batch as part of the data integration pipeline. This gave engineers some more options on how to scale the computations without affecting end user performance, but it implied that users had to wait for data refreshes if batch processing took time to process.
The second place where database partitioning became challenging is when online social networks emerged and e-commerce sites required higher levels of personalization. Social networks and e-commerce sites collect and store lots of user information that needs to be leveraged in near real time to produce a personalized experience for both the end user and their “friends” in the network. The engineering challenge is significant as social databases do much more writing activity, real-time processing, and more complex queries to implement personalization.
So it’s no surprise that the emergence of Big Data technologies came from Google and Yahoo inasmuch as both companies had the challenges in the early 2000s of handling both the volume and the velocity of these payloads. Hadoop and MapReduce technologies emerged as the answer, and their innovation was largely about simplifying the work required to parallel-process queries across thousands of computing nodes. This, of course, was made possible by the emergence of Cloud computing and other methods to virtualize servers. Developers could then dial up all the computing resources they needed and run queries on Hadoop without having to understand the communication, storage, and synchronization complexities behind parallel computing.
Hadoop caught on both in media circles and in its results. The media loved the Big Data concept and showcasing examples of companies providing innovative solutions. This success led to demand from more companies looking to gain some advantage from these technologies and grow beyond their conservative RDBMS data architecture. This demand enabled the emergence of other new technologies and vendors in the “Big Data Landscape.”3
Like any emerging technology that’s experiencing significant demand, there was a significant talent gap especially in the late 2000s that could help enterprises and businesses take advantage of Hadoop. A whole class of companies, products, and services emerged to help businesses manage Hadoop infrastructure, map business problems to Hadoop implementations, or further simplify Hadoop’s programming environment.
In addition, once entrepreneurs and data scientists recognized that there is a market beyond traditional RDBMS, additional Big Data platforms emerged with capabilities to match different workloads. Columnar databases in memory data stores, data stream processing, graph databases, and other engines emerged to solve different query, performance, and scalability challenges.
Beyond that, an entire ecosystem of other analytics, application, and other data service products emerged that either embed or extend Big Data technologies. Data integration technologies were enabled to move data into and out of these platforms. Content management technologies interfaced with document stores. Analytic and BI platforms were upgraded to handle Big Data platforms as a backend database technology. Capabilities like machine learning and artificial intelligence became more mainstream. Some enterprise systems moved off their classic RDBMS back ends to more powerful and faster columnar databases.
Flash forward to today, and the Big Data Landscape4 is huge, consisting of 600-plus products and services. Today, there is the added challenge of not only implementing the technology and demonstrating competitive business value but also of selecting the most appropriate technologies up front.
Developers tend to be passionate about the platforms that they are most knowledgeable about and have had the most success with. Fifteen years ago, when there were fewer mainstream platforms, there were significant debates on Java versus .Net versus ColdFusion or MySQL versus Oracle versus Microsoft. It was rare to find someone who was technically proficient and objective about these platforms, and in most situations, one could substitute one platform for the other, leverage its strengths, and compensate for its weaknesses. The scope for selecting a platform was largely related to application development, maintenance, scalability, and performance.
I maintain that databases today are a different and changed domain. Their importance far exceeds internal and external application development needs: Database platforms are the foundation for most corporations’ Big Data, analytics, and data-driven practices. The schemas developed will likely be connected to other databases and leveraged for many needs beyond their original purposes. They will most definitely be maintained and extended by developers that didn’t author them, and ideally they will be interfaced by business users leveraging self-service tools to perform analytics.
Figure 3-2 is a simple visual that may help steer you in the right direction. In thinking about this transformation, here are some simple questions and guidelines on where to focus efforts:
Figure 3-2. Optimizing Big Data platforms based on data requirements and priorities
How big is your data? The bigger the data, the more likely you will need to look at infrastructure to store, process, and manage larger data sets. You’re more likely to select mature database technologies that have service providers that can help you scale and manage it properly in either public or private Clouds.
How fast is the data changing? If your business derives value in presenting results faster for direct revenue, real-time decision making, or competitive advantage, it is likely you will need algorithms to directly churn data to drive other systems or decisions. If the intelligence required primarily looks at a lot of recently created data, then you’re more likely to be looking at some of the data streaming databases.
How complex is your data? Complexity comes in many forms. It may be unstructured data, data that has rich relational metadata requiring subject matter expertise, or data that’s sparse and has other quality issues. If it’s unstructured, then you might want to review document stores for this data. If it’s semistructured or sparse data, then a document store might still work, but a columnar database may be a more versatile option. If you have a taxonomy or metadata and you have to store and process more information about the relationships between elements, then you might want to look at graph databases.
But most organizations’ data can’t simply fall neatly into one bucket, and you’re more likely to have several use cases. In addition, you should experiment with applying these technologies to see what works best with the types of data, applications, and analytics.
To determine the appropriate technologies, many larger organization either developed Big Data labs or partnered with service providers to leverage their labs. The thinking behind the labs is to establish the infrastructure to run experiments, centralize the engineering expertise, and let business units leverage the lab to perform proof of concepts. The idea is to prove out both the business case and the underlying technologies.
The other extreme is startups where the CTO makes some quick decisions on the best platforms for the underlying business model and quickly builds up the environment. Startups that select optimal or near optimal database platforms may achieve an advantage over their competitors and is ultimately one of the success factors for these businesses.
This leaves a very fat middle of companies that must make some hard choices. These companies may not have the funding or skills to develop labs that don’t have near-term ROI. They may also not have the ability to attract superstar talent that can come into the organization, work within its cultural boundaries to experiment, select the appropriate technologies, and demonstrate competitive value within an ambitious time frame. These companies are more likely to be laggards when it comes to implementing new Big Data technologies.
Which is more likely the wrong decision? Remember that digital transformation in an industry happens fast. Once the transformation is in progress, you might have as little as ten years to transform to a competitive digital business. If processing larger amounts of complex data is required to be competitive, and it more than likely is, the business needs to make some bets on new technologies and accept that “learned failures” are a better option than standing on the sideline.
If midtier companies can’t easily fund labs or attract top talent, how can they best select and experiment with Big Data technologies? They should do a couple of things that are within reach.
First, you need to free up the top talent in your organization so that they can work on emerging opportunities and experiment with new technologies. As part of this program, you’re going to have to get lots of training and time to prototype so that they are better versed in new technologies and how they operate. You also must consider how to reward and retain them because the last thing you want is to see them jump ship once they have training in these sought-after technologies.
Second, sign up with research partners that can provide both out-of-the-box research on the underlying technologies as well as expert advice. You’re not going to get the organization smarter by just “doing it yourself,” and you’re likely to be able to buy basic knowledge rather than just build it up internally.
Third, tackle the culture issues up front that may become obstacles to investment and experimentation. It is likely that you have people in the organization who resist change or prefer the status quo and do not want to get involved in the risks of new technologies and practices. The best way to manage this is to identify new opportunities on the edge of your corporation’s core competencies. Try to find use cases that don’t disrupt existing business or challenge the status quo directly and use them to demonstrate new possibilities and hopefully quick wins.
Next, get some outside help that can bring in appropriate experience. You might not be able to fund a lab, but you’re most likely going to need outside help to either select, experiment, or implement these new technologies. Even the act of shopping for a partner will help develop points of view on the appropriate technologies. Today, there are many options for midtier companies, including smaller service providers and boutique or industry-specific firms that might make suitable partners.
Finally, selectively hire some new talent. You might not be able to afford or have the wherewithal to attract top talent, but you can certainly find individuals who have had some expertise and ideally success with Big Data technologies. Find someone who’s ready for a bigger challenge and ensure that he or she is a good cultural fit to the organization but make sure you give the new addition enough time and room to be successful.
Supporting Big Data technologies is a lot more complex than just having developers successfully using them and the infrastructure in place to support the workload. Think of all the operational responsibilities involved, such as existing relational databases including disaster recovery, data backups, archiving, performance tuning, scaling, and security. These operational considerations will be needed for every Big Data platform selected and used in production. What’s worse is that the tools to perform these operations are often far more primitive than what’s available in the far more mature RDBMS platforms, and the expertise is harder to attain. Managing the operations related to the Big Data platforms can be time–consuming, especially in the early years, and it’s not trivial getting existing DBAs skilled in relational databases to learn the new technologies and support them over their existing responsibilities.
So you should pick your technologies as most organizations will not be able to support multiple platforms. That means you need a group thinking about the big strategic picture beyond what works in the lab today or where developers have had some success completing proof of concepts. You need to cherry-pick platforms and partners that have long-term viability, as well as proven business, development, and operational capabilities to back up your selection. Finally, you should think strategically about a roadmap of new use cases to make sure that Big Data platforms become strategic and provide sufficient business value so that the operational considerations and costs are not underserved.
The types, disparity, and volume of platforms can be overwhelming, and it is easy for IT organizations to get lost reviewing options or get sidetracked by going too deeply into any single platform. It’s easy for technologists to debate platform endlessly or get lost on whether to support existing technologies or select new ones. Once platforms are in place, there may be debates over which ones to leverage in order to solve different challenges. How do you avoid getting lost in these debates?
Develop a small team that will begin proposing a reference architecture. This should be a simple communication tool that has a couple of components to convey to your team the past, current state, and future direction of technology.
The reference architecture should start with one or more conceptual diagrams of the system architecture. I like color-coding components showing “legacy,” meaning technologies that you’d like to phase out over time; “current,” showing technologies that are accepted in the reference architecture and will continue to be utilized in new investments; and “future,” showing technologies that you plan to add into the ecosystem.
This diagram should be conceptual and stay high level so that it can be shown to any business leader that cares about the technology architecture. Figure 3-3 shows an example for a “self-service BI” platform. It includes (A) replacing a legacy BI tools with a modernized tool that enables business users to create their own analytics, (B) connecting the BI tool directly with exiting enterprise systems, and (C) instituting a new technology for integrating other data sources. The diagram shows what is legacy such as spreadsheets that will be replaced with dashboards, what’s existing such as the CRM and ERP systems, and new systems and data like the self-service BI tool and social data feeds that are being introduced.
Figure 3-3. Reference architecture on modernizing data integration and analytics technologies
The second diagram should be a high-level table documenting some basic information on the current and legacy technology components. At minimum, I like stating the name, its version, key applications or processes where it is utilized, main capabilities used, future capabilities that will be explored, and, if known, when the next major upgrade is scheduled. For legacy technologies, this might highlight timing and high-level plans to migrate. You can use an ITIL configuration item standard5 for cataloging existing assets, or better yet, leverage a configuration management tool. One note is that these standards and tools are designed for IT and need a simplified presentation when sharing with business leaders.
I like to have a separate table for future technologies documenting what problems it will solve, vendors being considered, and status of the selection process.
Finally, I like showing a roadmap in years, quarters, or months illustrating what technologies are being shut down, what new ones are being enabled, and what technologies are being upgraded. Figure 3-4 shows a more detailed roadmap illustrating the introduction of the two new platforms and the shutdown of two others shown in Figure 3-3.
Figure 3-4. Example technology roadmap detailed with major transitions and milestones
The roadmaps you present should fit on one page and should have enough detail for the audience. If the one shown in Figure 3-4 is too detailed for an executive audience, you can show a rolled-up version like shown in Figure 3-5.
Figure 3-5. High-level technology roadmap
When developing roadmaps, I recommend using a tool rather than spreadsheets or presentations. Ideally, this should be pulled directly from your agile or portfolio management tools. Even when presenting to executives, I think it’s important to acclimate them with presentation materials extracted from tools rather than investing the time to hand-build diagrams.
Some purists will argue that displaying long-term roadmaps is “not agile.” Keep in mind that the near-term roadmaps should be completed using the agile estimated methods described earlier, but it’s still important to include longer-term forecasted milestones. The full roadmap is a communication tool and should be used to convey the full vision.
The reference architecture should help tell the story of what digital technologies will become strategic, what will be phased out over time, and what will likely be added depending on whether there is sufficient business need. It is a picture to align the organization starting with the IT team and extending to the greater organization. A technical roadmap provides clarity on how to leverage existing platforms and capabilities with “big picture” knowledge of the technical landscape and what is likely to get implemented into the future.
CIO should update this document with some regularity showing what’s been done, what’s in progress, what’s in near-term planning, and what’s changed. The latter is important because today’s technology and competitive landscape changes quickly, and future plans should adapt to these conditions.
Developing the reference architecture is a great opportunity to engage the top architects in the IT group. Most in the group inherited technology and architecture decisions made by their predecessors and should manage to these legacy decisions and systems. Planning the architecture roadmap is key to aligning and motivating the IT organization that will always prefer looking at “blue-sky” technology opportunities over today’s operational reality. It’s a key exercise for the CIO and IT leaders to change the IT culture.
What I have described in the last two chapters is the foundation of the new IT Organization. New practices like agile and DevOps along with a modern view of architecture will lead technology teams to deliver on transformational business practices and capabilities.
But practices and technologies will not get you there without considering the people and the culture. Culture is what brings people together to collaborate and solve problems without creating too much stress. It’s what gets people motivated to work harder when there is a business need to go the extra mile. It’s what protects the IT organization when it is facing execution issues and there are unhappy stakeholders. It creates safety when individuals need confidence that they can ask for help and get it without retribution. It celebrates small and big wins. Transformation is a journey, and it makes the ride worthwhile, fun, challenging, and rewarding.
Great IT culture is personified by the perks and drive portrayed in startups and successful Internet companies. These companies go out of their way to make technologists happy, excited, intellectually stimulated, rewarded, and competitive but not stressful. They design open workspaces and encourage open dialogue, collaboration, mind resting, and fun. Some provide meals and other perks to make it easier for employees to spend more time in the office solving problems.
While these steps don’t by themselves creative a strong culture, they certainly make it easier for IT organizations and their employees to establish one.
Unfortunately, the environment and perks seen in these companies is not the norm and not the heritage of many companies. Many IT organizations reside in company cultures that are hard on teams and resources that are viewed as either cost centers or as service organizations to the primary revenue-generating departments.
IT puts in long hours to get things done. They are beaten up for the 0.1% uptime they miss, for the 5% of scope that fails to meet a minority of stakeholder expectations, for the poor user experience from one third-party application for an edge use case of a subset of users, by users that are unhappy with a technology selection and would prefer selecting their own, by the CFO whenever she needs to find a budget reduction, by legal when the data provided by IT for discovery takes too long to produce and doesn’t tell the story they need to prove, by sales for anything that is perceived as blocking them from getting their bonus, or by the CEO for not having the latest innovation he read about on an airplane. I could go on.
As I write this section, I received a text from a colleague CIO struggling like other CIOs with cultural issues:
My issue and why I have been pushed so close [to burnout] is that my industry as a whole is still coming to terms w technology and my Mgmt doesn’t care at all about my group, or to spend money on me and openly communicates that to others in the same breath they openly communicate all the IT things they want or are broken or lacking . . . the spiral is destructive for me and I have never once been told I have done a good job at anything, not that I need a pat on the back or recognition I’m not in it for that but a little appreciation for the 80 hour work weeks for over a decade would be nice lol.
To which I replied:
I face the same issue. Many orgs don’t know how to treat people. Period. It’s worse when you are considered a cost center or services org. But the stress is the conflict created when you’re challenging that assumption by having a stake and contribution in success or growth. Most orgs, leaders, and people aren’t ready to treat a greater part of the org equally.
This divide can be a significant issue for CIOs that are charged with transformation. You can’t easily plow through years of transformational activities if the organization throws you under the bus for the small misses. Negative behavior completely undermines the change management practices required to make transformational successful. How do you go to business users and have them do something better, more valuable, faster, but very different from what was done in the past, when the CIO and her staff have to defend their decisions, practices, and execution at every step?
My colleague points out that it’s a downward spiral. The more value the IT organization is required to contribute to help the business grow, be more competitive, or transform, the more likely that they will simultaneously be creating a target on their back. Change-resistant individuals will take their opportunity to fire at this target, and that forces the CIO to defend. The issue is particularly challenging at the top where many leaders who have spent their careers getting to that seat don’t want to share the spotlight or the rewards when they are available.
The added challenge is when this stress percolates down to the IT staff. It’s fair to say that CIOs must develop a “thick skin” to handle cultural, change management, and transformational challenges. That is not likely to be the case for the CIO’s lieutenants or the staff. The CIO is charged to get the IT organization to ramp up execution, partner with the business, deliver innovation, and sponsor transformation. And the CIO can make this successful in year one when there is little expectation, fewer challenges, and IT is asked to step up and deliver. But as time progresses, negative culture can impede progress, demoralize staff, increase stress, and probably lead to turnover.
While these are all company and business cultural challenges, I have found the best way to counter it is by working on the culture within IT. The CIO doesn’t command a large enough sword to enact top-down cultural changes in one swing. This will take time, require demonstrating business success from transformational programs, and need investing in relationship building to gain supporters. But the CIO and IT leaders can do a lot of change within the technology department, and that can often influence other broader changes.
Your team’s culture is unique. It has developed over time based on the nature of the business, the industry and market conditions you compete in, the geographies and locale of where you operate, the mission of the IT team and what is expected of them, how it hires and rewards individuals, and many other factors.
But I like to think of culture in terms of what is important for successful partnership, collaboration, and execution. If the team is aligned around similar values and principles, it develops a culture around them. Some of the values I think are important are tied to key practices like agile and DevOps, and the main cultural challenges are how teams handle issues that cross between operations and development. Some cultural elements should be tied to business expectations like the team’s ability to interact with business leaders (business acumen) and the team’s responsiveness to issues and requests. Finally, some should be aspirational, like the ability to deliver innovative solutions or to develop appropriate KPIs, metrics, and dashboards in order to be data driven. Figure 3-66 depicts some of the values I attempt to instill in the IT organization.
Once you have some principles established, here are some behaviors I look for that are signs of a strong IT culture:
1.Agile teams get things done first, worry about process mechanics later. When introducing agile practices or even when agile has been used for some time and the team is looking to mature its practice, I find that strong teams will think agile, focus on execution first, and address mechanics of the process second. For example, a new agile team should get its backlog going first and commit, worry about story points and how to handle unfinished stories at the end of a sprint later. A mature agile organization, with multiple teams prioritizing stories, figures out the communication mechanics between teams during the sprint and formalizes communication practices later.
Figure 3-6. IT culture and the “digital” mindset that drives collaboration and results
2.Agile teams know how to use the business’s products so that they can see where and how to make improvements. User interface, workflow, and access to insightful, actionable data is so important to successful customer-facing application design and business system workflows that the technologists working on them should step into their user’s shoes and experience the technology for themselves.
3.Individuals are hungry to learn more and take the initiative to train themselves. Sure, I can get individuals and a team formal training, but that’s not where it starts. A strong IT team prefers rolling up their sleeves, experiment first, and ask for training once they know the basics.
4.It’s not always the business’s fault. Agile teams might blame the product owner for overpromising, and everyone has something critical to say about the business strategy, but strong IT teams will think through how to improve their own practices before blaming or being critical of other business functions.
5.Speaking openly about where they suck and need to improve is important. I want to hear about technical debt from software architects. Operations teams need to look at metrics and decide where improvements can have business impact. Quality assurance teams need to own up when their testing is inadequate. Most importantly, teams need to speak up when they recognize a problem because a problem well stated is half solved.
6.Agile teams share information on how things work, document process, and educate colleagues on how to fix things. IT teams that horde information, overcomplicate things so that business users don’t understand how they work, or make it impossible for their colleagues to enhance their technical implementations are impeding organization growth and scalability.
7.Agile teams want to get involved in understanding business priorities and challenges. They ask the product owner questions on the priorities or want to understand how their efforts are contributing to growth or efficiencies. They’ll seek out multiple solutions to a problem and debate which ones provide optimal solutions. They’ll listen to marching orders but seek to participate in the debate on what and how the business should move an agenda forward. They are strategic agile thinkers and learn how to innovate.
8.Agile teams leverage data and ask questions. To become smarter about their own operations, they will collect meaningful data, convert it to metrics, look for trends, and prioritize improvements. They will learn to ask good questions about data and take other steps to help the organization become data driven.
9.There is an interest in spending time together outside of the office and having fun. One of my teams threw summer family picnics and assembled everyone for pot luck Diwali celebrations. Good teams want to spend time together, celebrate wins, appreciate individual interests, and be human.
10.Agile teams are responsive when there are operational issues. It’s one thing to have all the systems monitored, ITIL practices to handle incidents, playbooks to recover from an issue, and strong technical skills to research root causes. These are all things strong operations teams work toward, but the best ones also develop a good bedside manner, are responsive to even the most difficult users, find ways to communicate using language everyone understands, and make sure business teams receive regular updates when there are problems.
11.Great IT teams refuse to fail, are willing to take on challenges, and find ways to make their initiatives succeed. They are relentless to hit deadlines. They seek to keep things simple. They promote enthusiasm and heal downers and doubters.
What practices should transformational leaders focus on to improve IT culture and address bad behaviors? Here are three:
1.Drive collaboration and communication behaviors. Leaders should ensure teams are working well with their business colleagues on opportunities, solutions, and innovation. To irradiate the dividing line between “requirements” and “implementation,” the number of people in the technology team who need to be able to communicate and collaborate with stakeholders is significantly larger than in decades past. Improving communication is an easy starting point to get developers, system administrators, engineers, and business analysts to be better prepared to collaborate. Here are some negative behaviors that transformational leaders need to course correct on:
Mentally says no, or actually says no to an idea or request without listening to the business need and priority
Communicates in technical jargon or overly complicates explanations back to a business user when responding to a question
Creates blocks and waits for requirements without engaging the stakeholder or proactively proposing opportunities, problems that need to be addressed, and solutions
Fails to regularly communicate status when resolving a critical issue
2.Evangelize actions that drive efficiency. It isn’t going to be easy to justify investments to fix manual work or to dedicate application development releases in order to fix bad software. Both issues need to be avoided before they are created and fixed incrementally over time. It’s why addressing behaviors like the following are important:
Fails to automate tasks and leaves significant manual steps
Creates technical debt but does not itemize it in the backlog to be fixed
Doesn’t provide documentation, training, or other knowledge transfer to ensure that colleagues can support the technologies he or she developed
3.Ensure technology decisions are data and quality driven. IT is the hub of so much operational data, yet they often leave out the discipline required to review this data, develop metrics, and leverage in their decisions on priorities, problems, and solutions. If IT isn’t using its own data, then how can it be a part of a data-driven organization? In addition, if IT isn’t paying attention to quality or security, then how can it deliver reliable and safe solutions? These behaviors underscore issues that need to be course corrected:
Chases rabbits when trying to diagnose performance issues without reviewing logs and metrics to guide efforts to areas of higher importance
Overengineers solutions and inflates estimates
Treats quality assurance, testing, and security as an afterthought and doesn’t bake these paradigms into designs
Doesn’t implement unit tests, performance evaluations, or other test-driven practices to ensure that code is reliable and functioning properly
Are you ready to lead a cultural transformation?