
Fault and performance monitoring of cloud environments is not straight-forward, but done properly, it can provide critical information on what is not performing and why. Doing it "properly" involves monitoring each critical system in the cloud using simultaneous internal, external and appliance-like monitoring instances. Real-time comparative analysis of this collected data enables speedy notification on not only the type of fault or performance issue, but also its location within the cloud. Outside of fault conditions, it also provides ongoing reportable assurance that performance and configuration specifications are being maintained.
The key to obtaining vital point-of-failure data is to monitor each critical component in the cloud environment. For example, consider the following relatively common entry-level cloud configuration:
Even with this simplest form of cloud topology, it's clear that the standard approach to monitoring the "website" is unable to differentiate between a failure with any one or more of the servers involved, their network connections or the clouds outgoing connections.
To monitor this environment, we monitor both web servers (A and B), the Load Balancer and the SQL Server. Remmon's comparative analysis between performance and returned data for the Load Balancer and Web Servers A and B enable potential capacity issues or failures to be identified in advance of an actual fault being perceived by a user. It also provides advanced warning of synchronisation issues between multi-VM backends (such as configuration or content differences between Web Server A and B).
Our cloud monitoring solution incorporates data collected from four sources:
All Remmon monitoring instances support HTTP(S), SQL, ICMP (ping), DNS and TCP level interactions with target systems. An modular probe framework means application specific, multi-step or logic based monitoring can be accommodated via extension.
Comparison of the data collected provides Remmon with the analytical power to narrow down points of failure or inadequate performance. While more simplistic monitoring tools might provide fault or failure warnings, Remmon's approach of monitor everything combined with continual comparison and analysis provides a total solution that can identify an actual point of failure in a complex multi-tiered cloud environment.