Troubleshooting
This page contains a handful of tips for solving problems with your validator node
Last updated
This page contains a handful of tips for solving problems with your validator node
Last updated
If your node was underperforming, chances are you were suspended.
First of all, to check whether you are suspended, simply go to the Suspensions tab (note that the link is for testnet, for mainnet please use the analogous tab on azero.dev) and use the search bar to see if your validator is there:
If you confirm you have been suspended, there are two things you should do:
Locate and fix the source of your issues. Please feel free to contact the Aleph Zero team on Discord if you have trouble locating the issue.
Having session keys in your keystore is crucial to being a successful validator. Without them your node won't be producing blocks and won't be taking part in the consensus. Read on to find out how to check if you actually have your session keys and how to restore them in case something went wrong.
If you are using the Aleph Node Runner, this will be checked automatically on each subsequent run of the node if you provide your stash account address to the --stash_account
flag. The scripts will also help you generate new keys and set them for your account. That said, you might still consider following the procedure below useful, as it can give you a deeper understanding of the matter.
First, you need to go to https://test.azero.dev/#/chainstate and run the session::nextKeys
command to find your keys (for Aura and for AlephBFT, which are used for block production and finalization, respectively).
This particular query is submitted using the green 'plus' sign to the right of the text fields.
The output should look like this:
Now, you need to construct your public session key by gluing the two together: the aura key needs to be pasted in its entirety (including 0x
) and the aleph key without the leading 0x
. Copy the result, as we'll need it in the next step.
Example
For the following output:
{
aura: 0x11111111111111111111111111111111
aleph: 0x22222222222222222222222222222222
}
The resulting public session key would be:
0x1111111111111111111111111111111122222222222222222222222222222222
Now we need to issue an RPC call to check if there exists a private session key matching this public key. Important: this needs to be run against your local node. If you are using the wallet, you need to make sure you switch to your local node as described in the section about generating your keys.
You need to go to Developer -> RPC Calls and choose the author::hasSessionKeys
call. In the sessionKeys
argument, you need to paste the previously constructed public key, like below:
If the output you get is true
, you are all set. If you get false
, you'll need to follow the next section on generating your session keys again.
You will need to run this command on your machine:
Remember to replace <YOUR NODE'S SESSION KEY>
with the value you copied in the previous step 🙂
If you see true
in this output, everything is okay. If you see false
, you'll need to follow the next section.
As this is described in the "Making your node validate" section, we will only provide a brief reminder:
You need to go to Developer -> RPC Calls section, call (on your local node!) the author::rotateKeys
method and copy the result.
You need to go the Network -> Staking -> Accounts, click on the "Set session keys" button and, in the modal, paste the key you got from rotateKeys
.
And, just like that, you have your session keys again!
In order to successfully take part in the consensus, your node not only needs to be able to make outgoing connections to its peers but also accept incoming ones. As this process is a little more cumbersome, we created a handy Python script to do it for you. You will need Python >= 3.2 and the requests
package (pip3 install requests
).
You can copy it, save it to a file like check_peers.py
and run it:
If in the output you see in_peers
with a value greater than zero, you're good to go.
Otherwise, you can try the following:
Make sure your ports 30333 and 30343 are open: this means making sure that it's not blocked by any firewall and also making sure that it's not hidden behind NAT (which probably comes down to getting a public, fixed IP).
If you are sure you've done the above and the in_peers
value is still 0, don't panic!. It may be the inherent non-determinism of peer-to-peer communication. Waiting some time (say, 24h) for the network state to stabilize might help.
Another reason your node is not taking part in consensus might be it is not accepting connections from other Validator nodes on port 30343. This might be caused by providing a wrong IP/DNS argument to the Node.
When your node does not take part in the current session of consensus, this status log will report the following message:
This is fine and you should not worry about that.
In case your Node is going to be in a session, this status log will change and start reporting your connections with other Validators. Start of session can be found in logs by searching for the initialization message. Example of such log is:
where NodeIndex
is your node's index in the committee, so might be different from the one in example. This log indicates initialization of AlephBFT and after that your node will start taking part in consensus.
Now after you have found a time at which node is taking part in consensus, a typical healthy Clique Network status log for a session with four Validators looks like this:
The total number of incoming and outgoing connections should correspond to the number of Validators in the current session minus one. For every peer Validator, the connection type is randomly set either to incoming or outgoing with a 50/50 chance. Therefore, the exact amount of expected incoming connections is unknown, and may vary between sessions.
In case your node is healthy but there are some minor problems with the network, your node can start reporting other logs like:
This reports other types of connections, or missing connections. As long as only single connections are missing, you probably don't have to worry about that, as this may be the result of peer's misconfiguration.
A common problem that might appear if you do not have 30343 port open is that no one can connect to you. Then the status log will look like the following:
This can mean two things:
your public validator address is not available for other Validators to connect to,
mapping of your ports is incorrect and your node does not listen on port 30343.
Please, note that even though your node might have established all outgoing connections, this still means that it is misconfigured!
On the other hand, having no connections at all probably indicates a general problem with your Internet connection.
In order for your node to function correctly this needs to be fixed. To do that you can make sure that your validator address and port (--ip/--dns
argument or VALIDATOR_PUBLIC_ADDRESS
environment variable) is not blocked by any firewall or that it is not hidden behind NAT (which probably comes down to getting a public, fixed IP). You can also verify that your node is setup to listen on a correct port (VALIDATOR_PORT
environment variable, by default set to 30343).
After your suspension period is over, click the "Validate" button again, as described in the section.
This can be diagnosed by searching Node logs. You can access your logs just like described in the section . Good idea would be to investigate logs from the last 24 hours of your node running. When looking at logs, you are interested in searching for status logs of the Clique Network, which is used to connect the Validators directly with each other. Those are the logs that start with: Clique Network status
.