Monitoring

The monolithic world has a few advantages of its own. Monitoring and logging is one of those areas where things are easier compared to microservices. The sheer number of microservices across which an enterprise system might be spread can be mind-boggling.

As discussed in Chapter 1, An Introduction to Microservices, in the Prerequisites for a microservice architecture section, an organization should be prepared for the profound change. The monitoring framework was one of the key requirements for this.

Unlike a monolithic architecture, monitoring is very much required from the very beginning in a microservice-based architecture. There is a wide range of reasons why monitoring can be categorized:

Health: We need to preemptively know when a service failure is imminent. Key parameters, such as CPU and memory utilization, along with other metadata, could be a precursor to either the impending failure or just a flaw in the service that needs to be fixed. Just imagine an insurance company's rate engine getting overloaded and going out of service or even performing slowly when a few hundred field executives try to share the cost with potential clients. Nobody likes to wait these days.
Availability: There might be a situation when the service may not perform extensive calculations, but the bare availability of the service itself might be crucial to the entire system. In such a scenario, I remember relying upon pings to listeners that would wait for a few minutes before shooting out emails to the system administrators. It worked for monoliths with one or two services to be monitored. However, with microservices, much more metadata comes into the picture.
Performance: For platforms receiving high footfall, such as banking and e-commerce, availability alone does not deliver the service required. Considering the number of people converging at their platforms in very short spans, ranging from a few minutes to even tens of seconds, performance is not a luxury anymore. You need to know how the system is responding by means of data, such as concurrent users being served, and compare that with the health parameters in the background. This might provide an e-commerce platform with the ability to decide whether upgrades are required before the upcoming holiday season. For more sales, you need to serve a higher number of people.
Security: In any system, you can plan resilience only up to a specific level. No matter how well designed a system is, there will be thresholds beyond which the system will falter, which can result in a domino effect. However, having a thoughtfully designed security system in place could easily avert DoS and SQL Injection attacks. This would really matter from system-to-system when dealing with microservices. So think ahead and think carefully when setting up trust levels between your microservices. The default strategy that I have seen people utilizing is securing the endpoints with microservices. However, covering this aspect increases your system's security and is worth spending some time on.
Auditing: Healthcare, financing, and banking are a few of the domains that have the strictest compliance standards concerning associated services. And it is pretty much the same the world over. Depending upon the kind of compliance you are dealing with, you might have a requirement to keep the data for a specific period of time as a record, keep the data in a specific format to be shared with regulatory authorities, or even sync with systems provided by the authority. Taxation systems could be another example here. With a distributed architecture, you don't want to risk losing the data recordset related to even a single transaction since that would amount to a compliance failure.
Troubleshooting system failures: This, I bet, will be a favorite for a long time to come of anybody who is getting started with microservices. I remember the initial days when I used to try troubleshooting a scenario involving two Windows services. I never thought of recommending a similar design again. But the times have changed and so has the technology.

When providing a service to other clients, monitoring becomes all the more important. In today's competitive world, an SLA would be part of any deal and has a cost associated with it, in the event of both success and failure. Ever wondered how easily we assumed that the Microsoft Azure SLA would stand true come what may? I have grown so used to it that the queries from clients worried about cloud resource availability are answered with a flat reply of 99.9 per cent uptime without even the blink of an eye.

So unless you can be confident of agreeing an SLA with your clients when providing a service, they can't count on it to promise the same SLA going forward. As a matter of fact, no SLA might mean that your services are probably not stable enough to provide one.