Tools to monitore your validator

cryptomolot · February 18, 2024, 1:38pm

Hello friends! As a continuation of my article about setting up Alerts 5, I consider it necessary to make a general analysis of useful software which you will need to make the validator always work.

Special thanks for important parts @p1xel32

Here I will present how monitoring is performed using 3 utilities:

1st Part - Prometheus
2nd Part - Grafanacloud
3d Part - node exporter
4th Part - Dashborad setting up
5th Part - Conclusion

before we start - deploy and connecting to a new server

Update packages:

sudo apt update && sudo apt upgrade -y

Installing the file editor

apt install nano

Prometeus: 1. Create a dedicated user and group for Prometheus on your server

groupadd --system prometheus

useradd -s /sbin/nologin --system -g prometheus prometheus

1.2. Download the latest version of Prometheus

wget https://github.com/prometheus/prometheus/releases/download/v2.42.0/prometheus-2.42.0.linux-amd64.tar.gz

1.3 extract

tar -xvf prometheus*.tar.gz

1.4 change the directory to the extracted directory

cd prometheus-2.42.0.linux-amd64

1.5 create some required directories

mkdir /etc/prometheus

mkdir /var/lib/prometheus

1.6 copy the required files

mv prometheus.yml /etc/prometheus/prometheus.yml

mv consoles/ console_libraries/ /etc/prometheus/

mv prometheus promtool /usr/local/bin/

1.7 create a systemd service file

nano /etc/systemd/system/prometheus.service

add lines:

[Unit]

Description=Prometheus

Documentation= Overview | Prometheus

Wants=network-online.target

After=network-online.target

[Service]

Type=simple

User=prometheus

Group=prometheus

ExecReload=/bin/kill -HUP $MAINPID

ExecStart=/usr/local/bin/prometheus \

–config.file=/etc/prometheus/prometheus.yml \

–storage.tsdb.path=/var/lib/prometheus \

–web.console.templates=/etc/prometheus/consoles \

–web.console.libraries=/etc/prometheus/console_libraries \

–web.listen-address=0.0.0.0:9090 \

–web.external-url=

SyslogIdentifier=prometheus

Restart=always

[Install]

WantedBy=multi-user.target

1.8 Save and close the file then set proper ownership and permission to the Prometheus directory

chown -R prometheus:prometheus /etc/prometheus/

chmod -R 775 /etc/prometheus/

chown -R prometheus:prometheus /var/lib/prometheus/

GrafanaCloud:

2 Create account and api keys grafana free service

grafana.com

Grafana Cloud 1

Welcome to Grafana Cloud

2.1 Head over to your Grafana Cloud Portal and select Send Metrics on Prometheus. If you scroll above, you should see the section for API Key.

Click on Generate now and create an API Key with the Role MetricsPublisher.

Copy the Prometheus config and save it locally. The url and username should be unique for every user.

The password in both snippet should be filled with your API key.

11220×600 45 KB

2724×342 13 KB

2.2 change prometheus config change url, password and username in config

31213×592 58.8 KB

41015×300 12.9 KB

nano /etc/prometheus/prometheus.yml

replace 5 lines by yours(origin_prometheus, url, username, password, job_name exporter targets) :

#Sample config for Prometheus.

global:

scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.

evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

#scrape_timeout is set to the global default (10s).

#external systems (federation, remote storage, Alertmanager).

external_labels:

monitor: ‘example’

origin_prometheus: AnyName

remote_write:

url: https://prometheus-prod-12-prod-us-central-4.grafana.net/api/prom/push

basic_auth:

username: 77777

password: AOHSDJASHDKASDUhkasjdhauKSADHausdhaskj

#Alertmanager configuration

alerting:

alertmanagers:

static_configs:
targets: [‘localhost:9093’]

#Load rules once and periodically evaluate them according to the global ‘evaluation_interval’.

rule_files:

#- “first_rules.yml”

#- “second_rules.yml”

#A scrape configuration containing exactly one endpoint to scrape:

#Here it’s Prometheus itself.

scrape_configs:

#The job name is added as a label job=<job_name> to any timeseries scraped from this config.

job_name: ‘prometheus’

#Override the global default and scrape targets from this job every 5 seconds.

scrape_interval: 5s

scrape_timeout: 5s

#metrics_path defaults to ‘/metrics’

#scheme defaults to ‘http’.

static_configs:

targets: [‘localhost:9090’]
job_name: exporter

#If prometheus-node-exporter is installed, grab stats about the local

#machine by default.

static_configs:

targets: [‘localhost:9100’]
job_name: AnyName

static_configs:

targets: [‘localhost:9101’]

2.3 Run prometheus:

systemctl daemon-reload

systemctl start prometheus

systemctl enable prometheus

Next, go to the server where your node is installed and install Node Exporter:

3.1 Install and Configure node_exporter

wget https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz

3.2 extract

tar -xvzf node_exporter-1.5.0.linux-amd64.tar.gz

3.3 move the extracted directory to the /etc/prometheus/

mv node_exporter-1.5.0.linux-amd64 /etc/prometheus/node_exporter

3.4 set proper ownership

chown -R prometheus:prometheus /etc/prometheus/node_exporter

3.5 create a systemd service file

nano /etc/systemd/system/node_exporter.service

Add the following lines:

[Unit]

Description=Node Exporter

Wants=network-online.target

After=network-online.target

[Service]

User=prometheus

ExecStart=/etc/prometheus/node_exporter/node_exporter

[Install]

WantedBy=default.target

3.6 Run Node exporter

systemctl daemon-reload

systemctl start node_exporter

systemctl enable node_exporter

Dashboard setting up

51220×800 72 KB

Now go to grafana.net → dashboard → import dashboard → import your desired dashboard + you can import exporter dashboard with the detailed server info for example 11074.

Also in that dashboard you can add any statistic about your node which was collected by prometheus.

Useful commands:

Check status

systemctl status prometheus

systemctl status node_exporter

Switch off prometheus and exporter

systemctl stop prometheus && systemctl disable prometheus

systemctl stop node_exporter && systemctl disable node_exporter

That’s all you need to monitor your node - please remember that alerts is really important part as well since need to instantly react on what’s happening on logs.
I hope that guide was helpful for you to understand what tools do you need to be aware of your validator health. Enjoy your day!

Ssimo7 · February 18, 2024, 8:03pm

Thank you for this wonderful and helpful guide.

asadaly1901 · February 18, 2024, 10:11pm

Lond and enjoyable read

cuttlekid · February 19, 2024, 8:02pm

Great pst, thank you for taking the time to put together

FAK142 · February 19, 2024, 8:13pm

I love this!
It is so informative

Samrichy · February 20, 2024, 3:00am

Legend
Thanks for sharing mate

symbolizm39 · February 20, 2024, 4:12am

Grafana cloud is web based? or runs on one’s system?

cryptomolot · March 8, 2024, 5:34pm

You need to instal it on the server.

yfhmeta · March 23, 2024, 8:26am

Good info about Prometheus!

topnft99144 · March 23, 2024, 12:22pm

Thank you forum

fantom · March 25, 2024, 12:51pm

Thank you for the awesome information!

michelleobomber · March 30, 2024, 7:09pm

Ive already done some dev stuff with a module maybe I’ll monitor a validator next

captainhairy31 · April 3, 2024, 2:20am

yup thanks for sharing

schulx · April 3, 2024, 8:39pm

Nodes are somewhat hard to manage, good thing to have this tool

skazka20001 · April 5, 2024, 12:23pm

Thank you for information.

TonySquire · April 6, 2024, 12:58am

Thanks for sharing

bembeles · April 7, 2024, 6:33am

thank you for sharing this!

Topic		Replies	Views
Alerts integration on your validator (full node) General	6	179	February 19, 2024
How to join validator set via snapshot Development	11	297	May 7, 2024
Alerts for your validator via telegram public Development	17	354	March 8, 2024
Building Scalable dApps with Aptos and ValidationCloud's Node API! Ecosystem	11	127	May 2, 2024
Nansen Joins Aptos as a Validator! News	18	181	March 30, 2025

Tools to monitore your validator

Grafana Cloud 1

Related topics