Tools to monitore your validator

Hello friends! As a continuation of my article about setting up Alerts 5, I consider it necessary to make a general analysis of useful software which you will need to make the validator always work.

Special thanks for important parts @p1xel32 :+1:

Here I will present how monitoring is performed using 3 utilities:

  • 1st Part - Prometheus
  • 2nd Part - Grafanacloud
  • 3d Part - node exporter
  • 4th Part - Dashborad setting up
  • 5th Part - Conclusion

before we start - deploy and connecting to a new server

Update packages:

sudo apt update && sudo apt upgrade -y

Installing the file editor

apt install nano

Prometeus: 1. Create a dedicated user and group for Prometheus on your server

groupadd --system prometheus

useradd -s /sbin/nologin --system -g prometheus prometheus

1.2. Download the latest version of Prometheus

wget https://github.com/prometheus/prometheus/releases/download/v2.42.0/prometheus-2.42.0.linux-amd64.tar.gz

1.3 extract

tar -xvf prometheus*.tar.gz

1.4 change the directory to the extracted directory

cd prometheus-2.42.0.linux-amd64

1.5 create some required directories

mkdir /etc/prometheus

mkdir /var/lib/prometheus

1.6 copy the required files

mv prometheus.yml /etc/prometheus/prometheus.yml

mv consoles/ console_libraries/ /etc/prometheus/

mv prometheus promtool /usr/local/bin/

1.7 create a systemd service file

nano /etc/systemd/system/prometheus.service

add lines:

[Unit]

Description=Prometheus

Documentation= Overview | Prometheus

Wants=network-online.target

After=network-online.target

[Service]

Type=simple

User=prometheus

Group=prometheus

ExecReload=/bin/kill -HUP $MAINPID

ExecStart=/usr/local/bin/prometheus \

–config.file=/etc/prometheus/prometheus.yml \

–storage.tsdb.path=/var/lib/prometheus \

–web.console.templates=/etc/prometheus/consoles \

–web.console.libraries=/etc/prometheus/console_libraries \

–web.listen-address=0.0.0.0:9090 \

–web.external-url=

SyslogIdentifier=prometheus

Restart=always

[Install]

WantedBy=multi-user.target

1.8 Save and close the file then set proper ownership and permission to the Prometheus directory

chown -R prometheus:prometheus /etc/prometheus/

chmod -R 775 /etc/prometheus/

chown -R prometheus:prometheus /var/lib/prometheus/

GrafanaCloud:

2 Create account and api keys grafana free service

grafana.com

Grafana Cloud 1

Welcome to Grafana Cloud

2.1 Head over to your Grafana Cloud Portal and select Send Metrics on Prometheus. If you scroll above, you should see the section for API Key.

Click on Generate now and create an API Key with the Role MetricsPublisher.

Copy the Prometheus config and save it locally. The url and username should be unique for every user.

The password in both snippet should be filled with your API key.

1

11220×600 45 KB

2

2724×342 13 KB

2.2 change prometheus config change url, password and username in config

3

31213×592 58.8 KB

4

41015×300 12.9 KB

nano /etc/prometheus/prometheus.yml

replace 5 lines by yours(origin_prometheus, url, username, password, job_name exporter targets) :

#Sample config for Prometheus.

global:

scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.

evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

#scrape_timeout is set to the global default (10s).

#external systems (federation, remote storage, Alertmanager).

external_labels:

monitor: ‘example’

origin_prometheus: AnyName

remote_write:

basic_auth:

username: 77777

password: AOHSDJASHDKASDUhkasjdhauKSADHausdhaskj

#Alertmanager configuration

alerting:

alertmanagers:

  • static_configs:

  • targets: [‘localhost:9093’]

#Load rules once and periodically evaluate them according to the global ‘evaluation_interval’.

rule_files:

#- “first_rules.yml”

#- “second_rules.yml”

#A scrape configuration containing exactly one endpoint to scrape:

#Here it’s Prometheus itself.

scrape_configs:

#The job name is added as a label job=<job_name> to any timeseries scraped from this config.

  • job_name: ‘prometheus’

#Override the global default and scrape targets from this job every 5 seconds.

scrape_interval: 5s

scrape_timeout: 5s

#metrics_path defaults to ‘/metrics’

#scheme defaults to ‘http’.

static_configs:

  • targets: [‘localhost:9090’]

  • job_name: exporter

#If prometheus-node-exporter is installed, grab stats about the local

#machine by default.

static_configs:

  • targets: [‘localhost:9100’]

  • job_name: AnyName

static_configs:

  • targets: [‘localhost:9101’]

2.3 Run prometheus:

systemctl daemon-reload

systemctl start prometheus

systemctl enable prometheus

  1. Next, go to the server where your node is installed and install Node Exporter:

3.1 Install and Configure node_exporter

wget https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz

3.2 extract

tar -xvzf node_exporter-1.5.0.linux-amd64.tar.gz

3.3 move the extracted directory to the /etc/prometheus/

mv node_exporter-1.5.0.linux-amd64 /etc/prometheus/node_exporter

3.4 set proper ownership

chown -R prometheus:prometheus /etc/prometheus/node_exporter

3.5 create a systemd service file

nano /etc/systemd/system/node_exporter.service

Add the following lines:

[Unit]

Description=Node Exporter

Wants=network-online.target

After=network-online.target

[Service]

User=prometheus

ExecStart=/etc/prometheus/node_exporter/node_exporter

[Install]

WantedBy=default.target

3.6 Run Node exporter

systemctl daemon-reload

systemctl start node_exporter

systemctl enable node_exporter

  1. Dashboard setting up

5

51220×800 72 KB

Now go to grafana.net → dashboard → import dashboard → import your desired dashboard + you can import exporter dashboard with the detailed server info for example 11074.

Also in that dashboard you can add any statistic about your node which was collected by prometheus.

Useful commands:

Check status

systemctl status prometheus

systemctl status node_exporter

Switch off prometheus and exporter

systemctl stop prometheus && systemctl disable prometheus

systemctl stop node_exporter && systemctl disable node_exporter

That’s all you need to monitor your node - please remember that alerts is really important part as well since need to instantly react on what’s happening on logs.
I hope that guide was helpful for you to understand what tools do you need to be aware of your validator health. Enjoy your day!

29 Likes

Thank you for this wonderful and helpful guide.

8 Likes

Lond and enjoyable read

5 Likes

Great pst, thank you for taking the time to put together :grinning:

4 Likes

I love this!
It is so informative

4 Likes

Legend :saluting_face:
Thanks for sharing mate

3 Likes

Grafana cloud is web based? or runs on one’s system?

2 Likes

You need to instal it on the server.

4 Likes

Good info about Prometheus!

6 Likes

Thank you forum

5 Likes

Thank you for the awesome information!

5 Likes

Ive already done some dev stuff with a module maybe I’ll monitor a validator next :thinking:

4 Likes

yup thanks for sharing

5 Likes

Nodes are somewhat hard to manage, good thing to have this tool

2 Likes

Thank you for information. :blush:

4 Likes

Thanks for sharing

2 Likes

thank you for sharing this!