Skip to main content

Troubleshoot

This resource is designed to help you efficiently diagnose and resolve common issues encountered while interacting with the Shibarium PoS network as a validator.

Bor

Bor is unable to connect to peers

Bor stops importing new blocks, with logs displaying messages similar to the following:

Aug 19 13:33:35 shibarium-shibarium-validator-backup-4 bor[124475]: INFO [08-19|13:33:35.123] Looking for peers                        peercount=0 tried=0 static=7
Aug 19 13:33:36 shibarium-shibarium-validator-backup-4 bor[124475]: INFO [08-19|13:33:36.916] Whitelisting milestone deferred err="chain out of sync"
Aug 19 13:33:48 shibarium-shibarium-validator-backup-4 bor[124475]: INFO [08-19|13:33:48.916] Whitelisting milestone deferred err="chain out of sync"
Solution
  • Increase maxpeer count to 200
  • Add the bootnodes under static and trusted nodes
  • If this doesn’t resolve the issue, try adding the peers manually using the IPC console

Error: Bad block/Invalid Merkle

A bad block or invalid Merkle root error occurs when the Heimdall and Bor layers are not in sync. Heimdall, as the consensus layer for Shibarium POS chain, directs Bor to create blocks accordingly. A bad block error occurs when the Bor moves ahead to create a block which has not been directed by Heimdall. This causes an invalid hash being created, and hence results in an invalid Merkle root.

Solution 1

Restart the Bor service using the following command:

sudo service bor restart

Typically, a restart of the Bor service should resolve the problem, and that's because restarting causes Bor to reconnect with Heimdall, start syncing, and create blocks correctly.

If restarting the Bor service does not fix the problem, try the next option.

Solution 2

Make the following checks:

  • Check if your Heimdall and REST servers are running. The Heimdall service might have stopped, and thus causing the bad block issue on Bor.

  • Check the logs for your Heimdall first using the following command:

    journalctl -u heimdalld -f
  • Check if everything is working correctly.

  • Restart the services that are not running. This should cause Bor to automatically resolve the problem

If restarting both the Bor and Heimdall services doesn't solve the problem, it could be that Bor is stuck on a particular block.

Solution 3

Check the bad block in logs for Bor.

  • Check Bor logs with this command:

    journalctl -u bor -f

    The bad block is typically displayed in the logs

  • Note the bad block number.

  • Convert the block number to a hexadecimal number.

info

Use this tool to convert the block number to a hexadecimal number.

  • Roll back the chain by a few hundred blocks, i.e., set Bor at the right block height using the debug.setHead() function. Use the following command:
bor attach ./.bor/data/bor.ipc
> debug.setHead("0xE92570")

The debug.setHead() function allows Bor to set the tip at a particular block height, resyncing from a previous block.

The command should return a null upon successful execution. Once this occurs, you can resume monitoring Bor to verify whether the chain progresses beyond the previously problematic block.

If none of these solutions work for you, please contact the Shibarium Support team immediately.

Issue: Bor synchronization is slow

If Bor synchronization is slow, it may be caused by one or more of the following factors:

  • The node is running on a fork - means at certain point the block production was done by forking on a different block and that has impacted the further block production.
  • The machine is not working at optimum levels and could be with insufficient resources. This can be addressed by checking the following:
    • IOPS
      • IOPS stands for Input/Output state of cycle.
      • The rate of reading is usually higher than write speed.
      • 6000 is the recommended range for IOPS.
    • Processing power
      • Processor has to be 8 or 16 core.
      • RAM: 32 GB is the minimum; 64 GB is recommended.
      • Block import should be more than 2 block for every second.
    • Node sync rate should be at 15-20 blocks every 8 secs.
Solution

Since the issue is likely due to insufficient hardware resources, consider upgrading to double the current specs.

Validator Bor is stuck on a block for a long time

This implies that the Bor service on your sentry node is also stuck because your validator gets information from your sentry.

Solution
  • Please check the Bor logs on your sentry and see if everything is normal and functional.
  • Restart the Bor service on your sentry node, then simultaneously restart the Bor service on your validator.

Retrying again in 5 seconds to fetch data from Heimdall path=bor/span/1

These logs in Bor mean that it cannot connect to Heimdall. Heimdall appears to be out of sync, and thus it lacks the data needed by Bor.

Solution

The recommended approach is to clear the historical data from both Heimdall and Bor, then resync using a snapshot.

Verify the following:

  1. Are Heimdall logs normal, or do they show any errors?
  2. Confirm Heimdall is fully synced by running: curl localhost:26657/status
  3. Check whether Heimdall is connected to other peers.
curl localhost:26657/net_info? | jq .result.n_peers

If there are no peers, verify that the seeds or persistent peers are correctly configured on Heimdall, and ensure that port 26656 is open.

etherbase missing: etherbase must be explicitly specified

To fix this issue, the signer address that is used to mine must be added in the miner.etherbase section in the config.toml file.

Error: Failed to unlock account (0x…) No key for given address or file

This error occurs because the path to the password.txt file is incorrect. Follow the steps below to resolve this issue.

Solution
  1. Kill the Bor process.
  2. Copy the Bor keystore file to: /var/lib/bor/keystore/
  3. And the password.txt file to: /var/lib/bor/password.txt
  4. Ensure that the user 'Bor' has permission to access the password.txt file. You can do this by running the following command: sudo chown -R bor:nogroup /var/lib/bor/

Steps to prune the node

Follow the steps below to prune your node:

  1. Check your Bor data size before pruning using the following command:

    du -sh /usr/bin/bor
  2. Stop Bor.

    sudo service bor stop
  3. Start tmux to ensure that even if your SSH connection is reset, the process is running on the remote machine using tmux.

  4. Start pruning.

    sudo bor snapshot prune-state --datadir  /usr/bin/bor

    The default --datadir is /usr/bin/bor.

  5. Once the pruning is completed, you will see success logs and details. Then start Bor again using:

    sudo service bor start
  6. Check your Bor data size after pruning using:

    du -sh  /usr/bin/bor

Heimdall

Log: Error dialing seed/Looking for peers or stopping peer for error/Dialing failed

This log is expected when you first start Heimdall, as it takes some time to find and connect to peers. If the issue persists, check the following:

  • Verify that your Heimdall node is configured with the latest seeds as listed in the node setup documentation.

If the error persists after updating to the latest seeds or confirming that you are using the correct ones, follow these steps:

  1. Increase max_num_inbound_peers and max_num_outbound_peers in /var/lib/heimdall/config/config.toml:

    max_num_inbound_peers = 300
    max_num_outbound_peers = 100
  2. Start heimdalld service using the following command:

    sudo service heimdalld start

Issue: Validator Heimdall is unable to connect to peers

This typically means that your sentry Heimdall is running into issues.

Solution
  • Check your sentry Heimdall to ensure that the service is running properly.
  • If the service is stopped, restarting it on your sentry node should resolve the issue.
  • Likewise, after addressing any issues with your sentry, restarting your Heimdall service should also help resolve the problem.

Technical FAQ

1. Are the private keys same for Heimdall and Bor keystore?

Yes, the private key used for generating Validator keys and Bor Keystore is the same. The private key used in this instance is your Wallet's ETH address where your shibarium testnet tokens are stored.

2. List of Common Commands

WIP.

3. Default Directories

  • Heimdall genesis file: /var/lib/heimdall/config/genesis.json
  • Heimdall-config.toml file: /var/lib/heimdall/config/heimdall-config.toml
  • Heimdall config.toml file: /var/lib/heimdall/config/config.toml
  • Heimdall data directory: /var/lib/heimdall/data/
  • Bor config.toml file: /var/lib/bor/config.toml
  • Bor data directory: /var/lib/bor/data/bor/chaindata

4. From where do I create the API key?

You can access this link: https://infura.io/register . Make sure that once you have set up your account and project, you copy the API key for SeBONEia and not Shibarium.

Shibarium is selected by default.

5. How do I delete remnants of Heimdall and Bor?

Run the following commands to delete the remnants of Heimdall and Bor from your machines.

For the Linux package, run: $ sudo dpkg -i bor

And delete the Bor directory using: $ sudo rm -rf /var/lib/bor

For binaries, run: $ sudo rm -rf /var/lib/bor

And then run: $ sudo rm /var/lib/heimdall

6. How many validators can be active concurrently?

Under the current limit, a maximum of 105 validators can be active at any given time. It's important to note that active validators are primarily those with high uptime, while participants with significant downtime may be removed.

7. How much should I stake?

A minimum stake of 10,000 BONE tokens is required. We recommend setting a Heimdall fee of 10 BONE.

8. I'm not clear on which Private Key should I add when I generate validator key.

The private key to be used is your wallet's ETH address where your shibarium testnet tokens are stored. You can complete the setup with one public-private key pair tied to the address submitted on the form.

9. Is there a way to know if Heimdall is synced?

You can run the following command to check it:

$ curl [http://localhost:26657/status](http://localhost:26657/status)

Check the value of the catching_up flag. If it is false then the node is all synced up.

10. Which file do I add the API key in?

Once you have created the API key, you need to add it to the heimdall-config.toml file.

11. How to check if the correct signer address is used for validator setup?

To check the signer address, run the following command on the validator node:

heimdalld show-account

12. Error: Failed to unlock account (0x...) No key for given address or file

This error occurs because the path for the password.txt file is incorrect. You can follow the below steps to rectify this:

  1. Copy the Bor keystore file to /var/lib/bor/keystore
  2. Copy password.txt to /var/lib/bor/
  3. Make sure you have added correct address in /var/lib/bor/config.toml.
  4. Ensure that the priv_validator_key.json and UTC-<time>-<address> files have relevant permissions. To set relevant permissions for priv_validator_key.json, run sudo chown -R heimdall:nogroup /var/lib/heimdall/config/priv_validator_key.json, and similarly, run sudo chown -R bor:nogroup /var/lib/bor/data/keystore/UTC-<time>-<address> for the UTC-<time>-<address> file.

13. My node is not signing any checkpoints

Try the following solutions:

  1. Start by checking and updating the bor_rpc_url parameter in the heimdall-config.toml file of the validator to any external node RPC providers and restart the services. This change helps to avoid missing checkpoints.
info

At this point in time, the node will not mine blocks. So once the issue is fixed, the changes made have to be reverted for the node to return to normal functionality.

  1. Verify that the Heimdall service is running normally on both your sentry and validator nodes. If the service has stopped unexpectedly or is encountering errors, attempt to restart it and check if it resumes normal operation.
  2. Check your Bor service logs for any errors or signs of abrupt halting. Try restarting your Bor service to resolve the issue.
  3. If these steps don't resolve the issue, please contact our support team and share the relevant logs for further assistance.

14. Consequences of a validator missing checkpoints

  • Economic impact
  • Loss of reputation as a reliable validator
  • Missed node rewards for delegators
  • Repeatedly missing checkpoints can lead to grace period one and two, followed by final notice and removal from the network.