clarifying needs && decisions to make
introduction#
i already talked about nagios && centreon before, and as far as my understanding goes, i was confident talking about them
but then, someone asked me some help to chose an appropriate monitoring solution for his needs; && i wasn’t able to do so
because in my understanding, i put aside the aspects to consider to deploy a monitoring solution, what makes people chosing a solution over another
i wanted to came back to him, fully understanding his needs, restrictions && wants, to provide an appropriate solution
this article aims to help you choosing an appropriate monitoring solution, based on studying your needs, ressources && infrastructure
why need monitoring?#
one of the most important point to clarify is: why do you need monitoring?
asking you this question will help you choose what type of monitoring will suit you the most
⚠️ do not misunderstand: why you need to monitor && what you need to monitor
this part is at the founding of your monitoring implementation: you won’t need to change your monitoring solution on the future if you plan from the start your monitoring usecases, its purposes && on the future
what to monitor#
once you’ve clarified the reasons you need to do monitoring, now or in the future, it’s time to define its scope by clarifying what to monitor
this part seems the most obvious as it is for smaller projects, but it’s always good to plan down the scope of your actions
mainly, monitoring is used to keep an eye on services, hardware, uptime or whatever that needs to be watched repetitivly && automatically
but taking the time to think of what you’d need to monitor, the possible individualities or unecessaries seems to be for the good to me
think of what kind of metrics you want to monitor, because it’ll define what kind of monitoring solution will also suit your metrics
that can lead to more advanced usecases, not obvious from start, like service discovery, logging connections, activities…
making clear what you want to monitor will considerably reduce the scope of the solutions you’d have for your needs too
define the type of metrics you want to monitor, e.g. for website uptime, please don’t go w/ big solutions, a simple uptime kuma is more than enough…
that could also lead to use more than one solution, because none of them check all your requirements: take time to think && make down what you need or will need to monitor once
monitoring != statistics#
an observation i’d like to make is the misconception that monitoring is statistics
they both deals w/ analysis && data, but it’s important to distinguish they are fields w/ different focuses and responsibilities
monitoring is to make sure that something is working as planned at a given time, statistics is to make sure that thing is evolving well
you can do data analytics over monitored metrics, but not the other way arround, as well as you don’t hire a Data Analyst guy to do monitoring stuff ._.
targetted audience#
depending on the targetted audience of your monitoring, inconspicuous restrictions could show up
in the case you monitor multiple sites from various companies, you’d maybe need to let them access on their monitored metrics
i could have also named this part “the difference between the personnal && the production use of monitoring”
if the targetted audience is you, you don’t need to make that much effort to understand what you’d want from your monitoring, i guess…
for a production use, you’d have to consider more aspects that will maybe lead you to more corporate solutions for availability, scalability, data ease of access, graphical interfaces…
for personnal use, the criticality of your solution isn’t to consider sometimes
support needed#
do you have time to maintain the integrity of your monitoring solution? especially if it’s big
do you need someone to talk to fix bugs quickly, so you don’t have to go through forums || community chats to investigate - if the solution is ever pointed…
a serious deployment could also integrate a subscription from the manufacturer to debug || troubleshoot its solution quickly - for production use
hidden cost#
the people maintaining the solution in your team can be a hidden cost: taking on his time to troubleshoot, debug updates, maintaining…
sometimes they also need to form people on how to use or quickly troubleshoot, do documentations etc.
for me, the biggest cost is the time took by people who are responsible for the solution: the time spent && the time that will be spent
other than people, hidden cost can show up later by your solution: ease of moving data “sh** i can’t dump the database…”, the storage consumming place…
i consider ongoing maintenance, potential scaling problems, risk analysis for serious production, repairability && flexibility cost - the higher they are, the higher will be the degree of complexity of use as hidden cost
overall budget#
do i really need to go into details for this one?…