Bytewax is implementing its first metrics collection system. Metrics will be collected from a variety of backend services written in multiple languages deployed in a Kubernetes cluster. These metrics will include information about API request calls, internal metrics for backend services, and metrics about pods deployed in our Kubernetes cluster.
In addition to internal metrics, we will be collecting metrics from user applications that we deploy into our Kubernetes cluster.
Internally, we will be collecting and using metrics to monitor the health of our internal systems, and to respond to incidents as they occur.
Externally, we would like to offer our customers the ability to see metrics about their deployed applications and to assess performance issues with their own code.
- What storage solutions are appropriate for the system described above? How would you compare them in terms of cost and complexity? What factors in this domain make a storage solution more or less appropriate for this application?
- How would metrics be published to the system described above? In what format should our metrics be? How would customer metrics be published to this system? How would a reporting system be able to answer questions about multiple hosts running the same service?
- How would we use these metrics as a foundation for an observability system? How would you design a monitoring system to alert us to issues in our infrastructure and our customerâ€™s infrastructure?
When providing answers to the following questions, keep the following considerations in mind:
- What factors are important when designing for this aspect of the system?
- What questions would you like to ask stakeholders to help you design a more appropriate solution?
- If there are multiple solutions that might be appropriate, how would you compare and contrast them?
- Are there specific technologies that we should consider for parts of the solution we build?
- If you would suggest that we use a commercially available solution to this problem, please take the time to articulate why and how that product solves our needs and how we would approach integrating this solution into our infrastructure.