Troubleshooting

This page contains a handful of tips for solving problems with your validator node

Suspension

If your node was underperforming, chances are you were suspended.

First of all, to check whether you are suspended, simply go to the Suspensions tab (note that the link is for testnet, for mainnet please use the analogous tab on azero.dev) and use the search bar to see if your validator is there:

If you confirm you have been suspended, there are two things you should do:

  1. Locate and fix the source of your issues. Please feel free to contact the Aleph Zero team on Discord if you have trouble locating the issue.

Session keys

Having session keys in your keystore is crucial to being a successful validator. Without them your node won't be producing blocks and won't be taking part in the consensus. Read on to find out how to check if you actually have your session keys and how to restore them in case something went wrong.

If you are using the Aleph Node Runner, this will be checked automatically on each subsequent run of the node if you provide your stash account address to the --stash_account flag. The scripts will also help you generate new keys and set them for your account. That said, you might still consider following the procedure below useful, as it can give you a deeper understanding of the matter.

Is my node running with session keys?

First, you need to go to https://test.azero.dev/#/chainstate and run the session::nextKeys command to find your keys (for Aura and for AlephBFT, which are used for block production and finalization, respectively).

This particular query is submitted using the green 'plus' sign to the right of the text fields.

The output should look like this:

{
  aura: 0x32...
  aleph: 0x51...
}

Now, you need to construct your public session key by gluing the two together: the aura key needs to be pasted in its entirety (including 0x) and the aleph key without the leading 0x. Copy the result, as we'll need it in the next step.

Example For the following output: {

aura: 0x11111111111111111111111111111111 aleph: 0x22222222222222222222222222222222 } The resulting public session key would be: 0x1111111111111111111111111111111122222222222222222222222222222222

Now we need to issue an RPC call to check if there exists a private session key matching this public key. Important: this needs to be run against your local node. If you are using the wallet, you need to make sure you switch to your local node as described in the section about generating your keys.

Using the wallet

You need to go to Developer -> RPC Calls and choose the author::hasSessionKeys call. In the sessionKeys argument, you need to paste the previously constructed public key, like below:

If the output you get is true, you are all set. If you get false, you'll need to follow the next section on generating your session keys again.

Using the command line

You will need to run this command on your machine:

curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "author_hasSessionKeys", "params":["<YOUR NODE'S SESSION KEY>"]}' localhost:9944

Remember to replace <YOUR NODE'S SESSION KEY> with the value you copied in the previous step 🙂

If you see true in this output, everything is okay. If you see false, you'll need to follow the next section.

Generating your session keys

As this is described in the "Making your node validate" section, we will only provide a brief reminder:

  1. You need to go to Developer -> RPC Calls section, call (on your local node!) the author::rotateKeys method and copy the result.

  2. You need to go the Network -> Staking -> Accounts, click on the "Set session keys" button and, in the modal, paste the key you got from rotateKeys.

And, just like that, you have your session keys again!

Accepting inbound connections

In order to successfully take part in the consensus, your node not only needs to be able to make outgoing connections to its peers but also accept incoming ones. As this process is a little more cumbersome, we created a handy Python script to do it for you. You will need Python >= 3.2 and the requests package (pip3 install requests).

import requests

LOCAL_RPC = "http://localhost:9944"


def analyze_peers(status):
    out_peers = 0
    in_peers = 0
    unknown = 0
    connected = status['result']['connectedPeers']

    for peer_id, peer in connected.items():
        try:
            endpoint = peer['endpoint']
            if 'dialing' in endpoint:
                out_peers += 1
            elif 'listening' in endpoint:
                in_peers += 1
            else:
                unknown += 1
        except Exception as e:
            unknown += 1
    print(f"out_peers: {out_peers}, in_peers: {in_peers}, unknown: {unknown}")


def network_status(addr):
    r = requests.post(addr, json={"jsonrpc":"2.0","id":"1","method":"system_unstable_networkState","params":[]})
    res_json = r.json()
    return res_json


if __name__ == '__main__':
    analyze_peers(network_status(LOCAL_RPC))

You can copy it, save it to a file like check_peers.py and run it:

python3 check_peers.py

If in the output you see in_peers with a value greater than zero, you're good to go.

Otherwise, you can try the following:

  1. Make sure your ports 30333 and 30343 are open: this means making sure that it's not blocked by any firewall and also making sure that it's not hidden behind NAT (which probably comes down to getting a public, fixed IP).

  2. If you are sure you've done the above and the in_peers value is still 0, don't panic!. It may be the inherent non-determinism of peer-to-peer communication. Waiting some time (say, 24h) for the network state to stabilize might help.

Accepting consensus connections

Another reason your node is not taking part in consensus might be it is not accepting connections from other Validator nodes on port 30343. This might be caused by providing a wrong IP/DNS argument to the Node.

Healthy node, not participating in the current session

When your node does not take part in the current session of consensus, this status log will report the following message:

Clique Network status: not maintaining any connections;

This is fine and you should not worry about that.

Healthy node, participating in the current session

In case your Node is going to be in a session, this status log will change and start reporting your connections with other Validators. Start of session can be found in logs by searching for the initialization message. Example of such log is:

AlephBFT-member: NodeIndex(0) Starting a new session.    

where NodeIndex is your node's index in the committee, so might be different from the one in example. This log indicates initialization of AlephBFT and after that your node will start taking part in consensus.

Now after you have found a time at which node is taking part in consensus, a typical healthy Clique Network status log for a session with four Validators looks like this:

Clique Network status: expecting 2 incoming connections; have - 2 [5FrU…6yKJGgEt, 5GYJ…NgNcvREH]; attempting 1 outgoing connections; have - 1 [5DMy…7Y4teh9M];

The total number of incoming and outgoing connections should correspond to the number of Validators in the current session minus one. For every peer Validator, the connection type is randomly set either to incoming or outgoing with a 50/50 chance. Therefore, the exact amount of expected incoming connections is unknown, and may vary between sessions.

In case your node is healthy but there are some minor problems with the network, your node can start reporting other logs like:

Clique Network status: expecting 2 incoming connections; have - 1 [5FrU…6yKJGgEt]; missing - 1 [5GYJ…NgNcvREH]; attempting 1 outgoing connections; have - 1 [5DMy…7Y4teh9M];

This reports other types of connections, or missing connections. As long as only single connections are missing, you probably don't have to worry about that, as this may be the result of peer's misconfiguration.

Misconfigured node

A common problem that might appear if you do not have 30343 port open is that no one can connect to you. Then the status log will look like the following:

Clique Network status: expecting 2 incoming connections; WARNING! No incoming peers even though we expected them, maybe connecting to us is impossible; missing - 2 [5FrU…6yKJGgEt, 5GYJ…NgNcvREH]; attempting 1 outgoing connections; have - 1 [5DMy…7Y4teh9M];     

This can mean two things:

  • your public validator address is not available for other Validators to connect to,

  • mapping of your ports is incorrect and your node does not listen on port 30343.

Please, note that even though your node might have established all outgoing connections, this still means that it is misconfigured!

On the other hand, having no connections at all probably indicates a general problem with your Internet connection.

In order for your node to function correctly this needs to be fixed. To do that you can make sure that your validator address and port (--ip/--dns argument or VALIDATOR_PUBLIC_ADDRESS environment variable) is not blocked by any firewall or that it is not hidden behind NAT (which probably comes down to getting a public, fixed IP). You can also verify that your node is setup to listen on a correct port (VALIDATOR_PORT environment variable, by default set to 30343).

Last updated