Prometheus

While the Telegraf portion of Nautobot Netobs can include many tags as part of metrics, some metrics are hard to enrich like interfaces.

The NetObs app can generate 3 different kinds of metrics:

Info
Count
Gauge

Info Metrics

Prometheus info metrics allows Nautobot users to enrich their metrics without increasing the cardinality of the metrics.

A typical metric generated from the Telegraf models may look like:

interface_oper_status{device="rtr01.mke01", location="mke", interface="1/1/1"} 1

When visualizing or alerting, this information can be somewhat lackluster. We don't know if that interface is actually important and we need to page someone out if it goes down or if it's a connection to a laptop that someone may have disconnected to take home for the night.

Prometheus info metrics are metrics that always return a value of 1 so that it is easy to multiple together to get useful information.

A sample info metric may look like:

sot_interface_info{device="rtr01.mke01", interface="1/1/1", status="Active", type="QSFP28", enabled="True", far_end_device="rtr01.osh01", far_end_interface="1/2/1", description="to rtr01.osh01 1/2/1", cid="123abc", circuit_provider="telco", ip_address="2001:db8:c001:d00d::1", mtu="9600", role="transit"} 1

Note

Not all of these labels may be applicable for every network, this is just showing what could be possible.

In PromQL, you can then join and filter on the metrics.

The below is a sample of how to join the metrics to create an alert for an interface oper down alert for only active interface that have a role of transit, peering, or core.

interface_oper_status * on(device,interface) group_right() sot_interface_info{status="Active", role=~"(transit|peering|core)"} < 1

Count Metrics

The Count metrics generated by Nautobot are not the typical Prometheus Count metrics, they are instead a count of objects in the Nautobot database. They are setup as a gauge as the numbers can go up or down.

It is important to note that the labels defined for the count metric should be limited as it will generate metrics for every possible combination of labels. For instance, if trying to create a count of how many Devices of each Device Type exist, the label should be just device_type=device_type__name and not device_type=device_type__name,manufacturer=device_type__manufacturer__name as the 2nd way will give a lot of impossible Manufacturer/Device Type Combinations (e.g. a MikroTik 7750 SR-12 should never exist).

Gauge Metrics

Gauge Metrics are metrics that use data as it exists in Nautobot (or can be mapped). Right now, gauge metrics are limited to Device and Interface objects. Some possible gauge metrics would be Device Status, Interface MTU, Interface Circuit Maintenance Planned Status, Device Location Latitude, or Device Redundancy Group Priority.

The Enum Values allow for the ability to convert a String into a Gauge. This is useful for converting Statuses (Active, Planned, etc.) into a float for the gauge metric (1.0, 0.0).

Config

There are some settings in Constance that can be set to configure Prometheus info metrics. To edit these, go to the Netobs Dashboard and click on config.

Name	Description	Default
METRIC INCLUDE STATS	Whether to include statistics about how long it takes to generate the metric output.	`False`
METRIC PREFIX	What to prepend the metrics with.	`sot_`
METRIC CACHE TIME	Whether to cache the metric output or not. Disabled by default but can be set to improve performance.	`0`

Prometheus Models

Metrics

Prometheus metrics are the base model for generating metrics.

Gauge metrics are currently limited to just devices and interfaces. This allows you to set a default float value. Any Device or Interface Metrics will use that as their default value.

A field can also be specified to be used as part of the Prometheus Metric Sync job.

If you set the Dynamic Group, info and count metrics will only be generated for members of the selected dynamic group.

Metric Labels

These are the labels that get added to the metric. This can be either fields directly part of the model or nested fields.

In order to use a nested field, use double underscores (__) between the name and the nested value. An example would be device_type__manufacturer__name.

This also supports using callable methods such as get_cable_peer() on an interface. An example of a Circuit Provider attached to an interface would be get_cable_peer__circuit__provider__name.

Metric Value Mappers

When using a Gauge Metric with a field, not all fields are integers or floats. The Metric Value Mapper will allow converting text into a float.

A common example would be if the Metric's field is set to status__name, we could create the following Value Mappers to map the field value to the metric value.

Field Value	Metric Value
Active	1.0
Planned	2.0
Retired	900.0

Accessing Metrics

Once metrics are created, the metrics can be viewed from a couple of different end points. For testing/validation purposes, these can be viewed in a web browser. For production usage, a Telegraf agent can scrape via the HTTP input plugin.

View all metrics

To see all the metrics from all the Metric models, you can view /api/plugins/netobs/metrics/.

Only metrics that have a status of Active will be displayed here.

View a specific metric

In some cases, you may want to scrape metrics at different polling intervals or test a new info metric. The individual info metrics can be accessed via the label.

Using the interface example from above, the URL endpoint would be /api/plugins/netobs/metrics/interface/.

Sample Telegraf Configuration

Below is a sample Telegraf configuration to pull the API metrics.

[[inputs.prometheus]]
urls = ["http://nautobot:8080/api/plugins/netobs/metrics/"]
metric_version=2
http_headers = {"Authorization" = "Token 0123456789abcdef0123456789abcdef01234567"}
interval = "120s"
timeout = "20s"