Observe COS for robotics¶
COS for robotics provides valuable insights about one’s fleet of devices. Maybe more importantly it also alerts the fleet operators should anything dysfunction. As such, COS for robotics can be seen as a critical piece of infrastructure. And while it is resilient, it can be subject to failure and disrupts the monitoring of the fleet.
For this reason, COS for robotics is itself observable as well, using most of the tools already used for observing the robots fleets. While COS for robotics could perfectly observe itself, this wouldn’t make much sense in case of a large outage. Instead, we recommend deploying a separate COS Lite stack in production which responsibility is to monitor COS for robotics.
We assume hereafter that both COS for robotics and COS Lite are deployed and set up. COS Lite deployment is very similar to that of COS for robotics and a tutorial can be found on COS Lite documentation website. We also assume that COS Lite deployment includes support for distributed tracing support with Tempo as well as the Blackbox Exporter for probing of endpoints.
Note
There are multiple strategies for the topology of this double deployment. For the sake of this how-to, we assume that COS Lite and COS for robotics live in their respective models and that both are controlled by the same Juju controller. Make sure to read Juju’s documentation about coss-model relations as we are using this feature and since it may impact the very boostrapping of the controllers depending on the topology.
Prelude¶
For good measure, let us start by making sure that everything is fine. We first check the COS for robotics stack:
$ juju status --model robcos-model
Model Controller Cloud/Region Version SLA Timestamp
robcos-model robcos-controller microk8s/localhost 3.6.4 unsupported 14:39:49Z
App Version Status Scale Charm Channel Rev Address Exposed Message
alertmanager 0.27.0 active 1 alertmanager-k8s latest/stable 156 10.152.183.200 no
catalogue active 1 catalogue-k8s latest/stable 81 10.152.183.111 no
cos-registration-server active 1 cos-registration-server-k8s latest/edge 8 10.152.183.235 no
foxglove-studio active 1 foxglove-studio-k8s latest/edge 1 10.152.183.96 no
grafana 9.5.3 active 1 grafana-k8s latest/stable 139 10.152.183.182 no
loki 2.9.6 active 1 loki-k8s latest/stable 187 10.152.183.140 no
prometheus 2.52.0 active 1 prometheus-k8s latest/stable 232 10.152.183.148 no
ros2bag-fileserver active 1 ros2bag-fileserver-k8s latest/edge 3 10.152.183.83 no
traefik 2.11.0 active 1 traefik-k8s latest/stable 234 10.152.183.242 no Serving at 100.83.155.248
Unit Workload Agent Address Ports Message
alertmanager/0* active idle 10.1.105.131
catalogue/0* active idle 10.1.105.135
cos-registration-server/0* active idle 10.1.105.167
foxglove-studio/0* active idle 10.1.105.173
grafana/0* active idle 10.1.105.149
loki/0* active idle 10.1.105.136
prometheus/0* active idle 10.1.105.155
ros2bag-fileserver/0* active idle 10.1.105.139
traefik/0* active idle 10.1.105.133 Serving at 100.83.155.248
Offer Application Charm Rev Connected Endpoint Interface Role
traefik traefik traefik-k8s 234 1/1 traefik-route traefik_route provider
The COS for robotics stack is ready. What about the COS Lite stack:
$ juju status --model cos-model
Model Controller Cloud/Region Version SLA Timestamp
cos-model robcos-controller cos-k8s/localhost 3.6.4 unsupported 09:55:08Z
App Version Status Scale Charm Channel Rev Address Exposed Message
alertmanager 0.27.0 active 1 alertmanager-k8s latest/stable 156 10.152.183.87 no
blackbox 0.24.0 active 1 blackbox-exporter-k8s latest/stable 25 10.152.183.83 no
catalogue active 1 catalogue-k8s latest/stable 81 10.152.183.144 no
grafana 9.5.3 active 1 grafana-k8s latest/stable 139 10.152.183.247 no
loki 2.9.6 active 1 loki-k8s latest/stable 187 10.152.183.62 no
minio res:oci-image@220b31a active 1 minio ckf-1.9/edge 419 10.152.183.64 no
prometheus 2.52.0 active 1 prometheus-k8s latest/stable 232 10.152.183.50 no
s3 active 1 s3-integrator latest/edge 145 10.152.183.29 no
tempo active 1 tempo-coordinator-k8s latest/edge 73 10.152.183.237 no metrics-generator disabled. Add a relation over send-remote-write
tempo-worker 2.7.1 active 1 tempo-worker-k8s latest/edge 53 10.152.183.161 no metrics-generator disabled. No prometheus remote-write relation configured on the coordinator
traefik 2.11.0 active 1 traefik-k8s latest/stable 234 10.152.183.73 no Serving at 100.83.164.181
Unit Workload Agent Address Ports Message
alertmanager/0* active idle 10.1.128.139
blackbox/0* active idle 10.1.128.133
catalogue/0* active idle 10.1.128.141
grafana/0* active idle 10.1.128.155
loki/0* active idle 10.1.128.137
minio/0* active idle 10.1.128.132 9000-9001/TCP
prometheus/0* active idle 10.1.128.138
s3/0* active idle 10.1.128.145
tempo-worker/0* active idle 10.1.128.134 metrics-generator disabled. No prometheus remote-write relation configured on the coordinator
tempo/0* active idle 10.1.128.140 metrics-generator disabled. Add a relation over send-remote-write
traefik/0* active idle 10.1.128.191 Serving at 100.83.164.181
Alright, we’re all setup and we can get to relating the stacks.
Deploy the Grafana agent¶
With both COS for robotics and COS Lite deployed in their respective models, we must now ‘relate’ them through Juju relations.
Since the stacks live in separate models, we must establish a so called coss-model relations. These are two folds, firstly, we need to expose some applications from one model to the other, secondly, we can relate applications as we normally would using Juju.
To ease the setup, we deploy the Grafana agent in the COS for robotics model. This allows for connecting our applications in COS for robotics to the Grafana instance in COS Lite in a simpler manner as we will see later on. Not only is this simplifying the deployment but it also offer more flexibility when it comes to modifying the overall deployment topology. This setup is depicted in the following diagram:
--- config: fontFamily: ubuntu theme: dark --- graph LR; subgraph Cloud 1 [COS for Robotics] A(Grafana) B(Prometheus) C(Foxglove Studio) K(COS registration server) L(rosbag server) D(...) E(Grafana Agent) A --> E B --> E D --> E K --> E L --> E C --> E end subgraph Cloud 2 [COS Lite] F(Grafana) H(Prometheus) G(Loki) I(Tempo) J(...) end E --> F E --> H E --> G E --> I
A bi-model deployment of COS Lite observing COS for robotics.¶
To deploy the agent, issue the command:
juju deploy grafana-agent-k8s
Relating to the agent¶
Now that the agent is deployed, we can connect all the observability endpoints to it. And there are quite a few of them:
# alermanager
juju relate alermanager:grafana-dashboard grafana-agent:grafana-dashboards-consumer
juju relate alermanager:self-metrics-endpoint grafana-agent:metrics-endpoint
juju relate alermanager:tracing grafana-agent:tracing-provider
# catalogue
juju relate catalogue:tracing grafana-agent:tracing-provider
# cos-registration-server
juju relate cos-registration-server:logging grafana-agent:logging-provider
juju relate cos-registration-server:tracing grafana-agent:tracing-provider
juju relate cos-registration-server:grafana-dashboard grafana-agent:grafana-dashboards-consumer
# foxglove-studio
juju relate foxglove-studio:logging grafana-agent:logging-provider
juju relate foxglove-studio:tracing grafana-agent:tracing-provider
juju relate foxglove-studio:grafana-dashboard grafana-agent:grafana-dashboards-consumer
# grafana
juju relate grafana:charm-tracing grafana-agent:tracing-provider
juju relate grafana:workload-tracing grafana-agent:tracing-provider
juju relate grafana:metrics-endpoint grafana-agent:metrics-endpoint
# loki
juju relate loki:metrics-endpoint grafana-agent:metrics-endpoint
juju relate loki:grafana-dashboard grafana-agent:grafana-dashboards-consumer
juju relate loki:charm-tracing grafana-agent:tracing-provider
juju relate loki:workload-tracing grafana-agent:tracing-provider
# prometheus
juju relate prometheus:self-metrics-endpoint grafana-agent:metrics-endpoint
juju relate prometheus:grafana-dashboard grafana-agent:grafana-dashboards-consumer
juju relate prometheus:charm-tracing grafana-agent:tracing-provider
juju relate prometheus:workload-tracing grafana-agent:tracing-provider
# traefik
juju relate traefik:metrics-endpoint grafana-agent:metrics-endpoint
juju relate traefik:grafana-dashboard grafana-agent:grafana-dashboards-consumer
juju relate traefik:charm-tracing grafana-agent:tracing-provider
juju relate traefik:workload-tracing grafana-agent:tracing-provider
With all the relations established to the agent, we can now move on to exposing the COS Lite endpoints to the COS for robotics model in order to connect them.
Making an offer¶
As we mentioned earlier, the first step is to issue ‘offers’. To do so, issue the commands:
juju offer cos-model.grafana:grafana-dashboard cos-grafana
juju offer cos-model.loki:logging cos-loki
juju offer cos-model.prometheus:receive-remote-write cos-prometheus
juju offer cos-model.tempo:tracing cos-tempo
juju offer cos-model.blackbox:probes cos-blackbox
These create the offers from the cos-model
model,
where the COS Lite lives,
to be consumed in another model.
They are then ‘consumed’ on the COS for robotics model with:
juju consume cos-model.cos-grafana
juju consume cos-model.cos-loki
juju consume cos-model.cos-prometheus
juju consume cos-model.cos-tempo
juju consume cos-model.cos-blackbox
Once consumed, they appears in the juju status
output as SAAS
entries:
$ juju status
Model Controller Cloud/Region Version SLA Timestamp
robcos-model robcos-controller microk8s/localhost 3.6.4 unsupported 12:44:19Z
SAAS Status Store URL
cos-blackbox active robcos-controller admin/cos-model.cos-blackbox
cos-grafana active robcos-controller admin/cos-model.cos-grafana
cos-loki active robcos-controller admin/cos-model.cos-loki
cos-prometheus active robcos-controller admin/cos-model.cos-prometheus
cos-tempo active robcos-controller admin/cos-model.cos-tempo
...
Relating to COS Lite¶
With the COS Lite endpoints now available in the COS for robotics model, all we have to do is to relate them to the Grafana agent.
We proceed with:
juju relate grafana-agent:grafana-dashboards-provider cos-grafana:grafana-dashboard
juju relate grafana-agent:logging-consumer cos-loki:logging
juju relate grafana-agent:send-remote-write cos-prometheus:receive-remote-write
juju relate grafana-agent:tracing cos-tempo:tracing
We also relate our applications to Blackbox:
juju relate cos-registration-server:probes cos-blackbox:probes
juju relate foxglove-studio:probes cos-blackbox:probes
Voila.
The COS for robotics stack is now observable by COS Lite.