Skip to content

NTC Telemetry Solution

Opinionated Set Up

This page documents the NTC opinionated installation in a generic sense.

The NTC Telemetry solution provides for an automated creation of Telegraf configurations that are then pushed to a Git repository. From there, FluxCD observes the Git repository to deploy changes based on the data inside of Git.

Components

  • Nautobot


    Nautobot provides the data for Telegraf configurations, provides Jobs for the generating the configuration, relationships to view the configuration for the device, and pushes configuration to Git .


    Responsibility: Client with escalation to NTC Support

  • Templated Configuration


    Templates to generate configuration, JSON data from Telegraf configuration to generate large configurations quickly.


    Responsibility: Client with escalation to NTC Support

  • GitOps for scalable deployment


    Leveraging GitOps, there is versioned history within Git of the entire configuration.


    Responsibility: NTC

  • Kubernetes


    Kubernetes backend to provide scalable expansion.


    Responsibility: Client with over the shoulder support by NTC. No proactive upgrades.

  • FluxCD


    Updates the environment based on the Git repository.


    Responsibility: NTC

  • Telegraf


    Telegraf containers are deployed by FluxCD into the proper Kubernetes environment.

  • Vault


    An instance of Hashicorp Vault is leveraged by the solution. This Vault instance is where the secrets are stored and accessed by Kubernetes. This Vault instance cannot be shared out for use outside of the system per licensing requirements by Hashicorp.


    Responsibility: NTC

WorkFlow

A general flow involves the following components.

All Components All Components

The first portion of the flow on the left within Nautobot is all documented here. This workflow assumes that you need to add a new Telegraf Plugin. You can skip the first step when adding an existing set up.

The second half of the workflow within the Kubernetes and FluxCD are automated by the system and deploys whenever there is a change to the Git repo.

Common Activities

Adding a device to be monitored

Step 1: Add device to Nautobot
Step 2: Create appropriate attributes to fit into the Nautobot Dynamic Group
Step 3: Run the create agents job
Step 4: Run Add Agent to Agent Group Job job
Step 5: Run Add Agent to Agent Group Job job

Working with Nautobot Dynamic Groups

When adding a new device to be monitored, the controls are handled by the associated Dynamic Groups. Let's say that a dynamic group is defined as the following filter:

{
    "site": [
        "atl"
    ],
    "status": [
        "active"
    ],
    "tag": [
        "monitoring-icmp"
    ]
}

This dynamic group filter would need to have the device be located at the site atl, have a status of active, and have a tag applied monitoring-icmp. This allows for the primary IP address to be ICMP monitored regardless of the device type, model, or anything else. As long as it is set to active, at the site ATL, and has the appropriate tag.

The dynamic groups are created using various filters available within Nautobot and is documented further on the Nautobot documentation page

Dynamic groups are found under Organization -> Dynamic Groups within the Nautobot UI.

Adding New Device with New Secret

Secrets are created manually within the GitOps repository. For each new secret credential a separate file is required to be able to load the secret into the Kubernetes environment.

How to View Container Logs

Container logs are sent to the Loki instance. This is then available in the Loki instance with the logs label of network-metrics. When needing to investigate a particular device having challenges, navigate in Nautobot to the device. Then select the Telegraf tab. This will give the base container name. From there, navigate to the Loki explore page within Grafana and start typing:

{namespace="network-metrics", pod=~"{{ telegraf_instance_name }}"}

The pod name must be typed out and not pasted in. You should rely on the auto completion search.

This will provide logs that will provide insight about what may or may not be going on.