Prometheus Metrics¶
Nautobot supports optionally exposing native Prometheus metrics from the application. Prometheus is a popular time series metric platform used for monitoring.
Configuration¶
Metrics are not exposed by default. Metric exposition can be toggled with the METRICS_ENABLED configuration setting which exposes metrics at the /metrics HTTP endpoint, e.g. https://nautobot.local/metrics.
In addition to the METRICS_ENABLED setting, database and/or caching metrics can also be enabled by changing the database engine and/or caching backends from django.db.backends / django_redis.cache to django_prometheus.db.backends / django_prometheus.cache.backends.redis:
DATABASES = {
"default": {
# Other settings...
"ENGINE": "django_prometheus.db.backends.postgresql", # use "django_prometheus.db.backends.mysql" with MySQL
}
}
# Other settings...
CACHES = {
"default": {
# Other settings...
"BACKEND": "django_prometheus.cache.backends.redis.RedisCache",
}
}
Added in version 2.2.1
In case the /metrics endpoint is not performant or not required, you can disable specific apps with the METRICS_DISABLED_APPS configuration setting.
For more information see the django-prometheus docs.
Authentication¶
Added in version 2.1.5
Metrics by default do not require authentication to view. Authentication can be toggled with the METRICS_AUTHENTICATION configuration setting. If set to True, this will require the user to be logged in or to use an API token. See REST API Authentication for more details on API authentication.
Sample Telegraf configuration¶
[[inputs.prometheus]]
urls = ["http://localhost/metrics"]
metric_version=2
http_headers = {"Authorization" = "Token 0123456789abcdef0123456789abcdef01234567"}
Metric Types¶
Nautobot makes use of the django-prometheus library to export a number of different types of metrics, including:
- Per model insert, update, and delete counters
- Per view request counters
- Per view request latency histograms
- Request body size histograms
- Response body size histograms
- Response code counters
- Database connection, execution, and error counters
- Cache hit, miss, and invalidation counters
- Django middleware latency histograms
- Other Django related metadata metrics
Additionally, there are a number of metrics custom to Nautobot specifically:
| Name | Description | Type | Exposed By |
|---|---|---|---|
health_check_database_info |
Result of the last database health check | Gauge | Web Server |
health_check_redis_backend_info |
Result of the last redis health check | Gauge | Web Server |
nautobot_app_metrics_processing_ms |
The time it took to collect custom app metrics from all installed apps | Gauge | Web Server |
nautobot_worker_started_jobs |
The amount of jobs that were started | Counter | Worker |
nautobot_worker_finished_jobs |
The amount of jobs that were finished (incl. status label) | Counter | Worker |
nautobot_worker_exception_jobs |
The amount of jobs that ran into an exception (incl. exception type label) | Counter | Worker |
nautobot_worker_singleton_conflict |
The amount of jobs that encountered a closed singleton lock | Counter | Worker |
Note
Due to the multitude of possible deployment scenarios (web server and worker co-hosted on the same machine or not, different possible entrypoint commands for both contexts) some of the metrics exposed for specific components may also be present on the other component. It is up to the operator to account for this when working with the resulting metrics.
These for example give you the option to identify the individual failure/exception rates of specific jobs. Note that all of these metrics are per instance. Thus, you need to do perform aggregations in your visualizations in order to get a complete picture if you are using multiple web servers and/or workers.
For the exhaustive list of exposed metrics, visit the /metrics endpoint on your Nautobot instance. For further information about the different metrics types, see the relevant Prometheus documentation.
Multi Processing Notes¶
When deploying Nautobot in a multi-process manner (e.g. running multiple uWSGI workers) the Prometheus client library requires the use of a shared directory to collect metrics from all worker processes. To configure this, first create or designate a local directory to which the worker processes have read and write access, and then configure your WSGI service (e.g. uWSGI) to define this path as the prometheus_multiproc_dir environment variable.
Warning
If having accurate long-term metrics in a multi-process environment is crucial to your deployment, it's recommended you use the uwsgi library instead of gunicorn. The issue lies in the way gunicorn tracks worker processes (vs uwsgi) which helps manage the metrics files created by the above configurations. If you're using Nautobot with gunicorn in a containerized environment following the one-process-per-container methodology, then you will likely not need to change to uwsgi. More details can be found in issue #3779.