Investigate Transient Performance Anomalies
This recommendation is based on the observation of short-lived anomalies in the performance of your app. Often, these anomalies are benign due to expected events like a database backup or code deployment. However, they may also reflect an instability in the app that could ripple to other parts of the system that should be addressed with software or deployment changes. We recommend you consider investigating these episodes to understand their cause and impact. Consider defining policies to receive alerts on these incidents in the future.
Understand Long-lived Performance Changes
This recommendation is based on the observation of long-lived changes in the performance of your app. The permanent deviations from historical behavior may be an expected result of changes in software or infrastructure, or they may be unexpected side-effects of similar changes. They may also be due to changes external to the system, like an increase in latency in a third-party service. We recommend you understand these problems and address any issues that could propagate to other parts of the system. Also, remember to update any policies that you have configured for these metrics.
Reconfigure Unbalanced Clusters
This recommendation is based on the observation of variation in behavior across members in a cluster. Because members of a cluster are expected to exhibit uniform behavior, this deviation can indicate problems in the cluster. We recommend that you check that clusters are configured properly. Confirm that all members have the same software (and same versions) installed, are running the same services, are running on equivalent resources, load is balanced evenly, and no member is running out of resources.
Replace Contended Instance
This recommendation is based on the observation that instances are exhibiting signs of contention with other instances on the same physical host. Consider replacing these contended hosts. Newly deployed instances may be assigned to a different physical infrastructure that exhibits less contention and, thus, provides more consistent performance.
Upsize Busy Instances
This recommendation is based on the observation that certain instances are being fully utilized much of the time. We sample CPU utilization periodically and compute the percentage of samples which show the instance busy. We consider a node to be "busy" or "fully utilized" when the system reports that the CPU is idle only a tiny fraction of the time. We may also observe that some instances show indications of being throttled indicating that your maximum available CPU may be limited. Consider re-deploying these instances on larger hardware.
Prevent Resource Exhaustion
This recommendation is based on our prediction that, by current usage trends, some of your resources will be exhausted soon. The impact of resource exhaustion can vary. Running out of memory can cause services to crash. Running out of disk space can render nodes unusable. Running out of disk space on a database can lead to catastrophic system failure. We recommend that you address these issues before they lead to further problems.
Please reach out to us at firstname.lastname@example.org if you have questions or feedback about our insights feature.