Skip to main content

Core (full node) troubleshooting

Goal

This article will guide you to troubleshoot Hathor core (full node). Each of the following sections provides resolution to a problem you may encounter while operating and using Hathor full node.

Error initializing node

Situation

You ran the command to start the full node and received the following error message:

[error][hathor.manager] Error initializing node. The last time you executed your full node it wasn't stopped correctly. The storage is not reliable anymore and, because of that, so you must run a full verification or remove your storage and do a full sync.

Cause

It happens when the full node's database (namely, the ledger) is in an unreliable state. As explained by the message, this occurs when the full node does not shut down properly — e.g., the process was abruptly killed, or the machine turned off.

Solution

Restart your full node from an empty database. This solution entails the initial synchronization of the ledger from genesis block, which naturally takes a long time. However, one can expedite it by using a snapshot (of the ledger).

If you want to carry out this solution using a snapshot, see Slow initial synchronization.

If you want to carry out this solution without using a snapshot, follow this procedure:

  1. Start a shell session.
  2. Change the working directory to <absolute_path_hathor_full_node>/data, replacing the <absolute_path_hathor_full_node> placeholder with the directory where you installed your full node.
  3. Remove all content of the data subdirectory.
  4. Restart your full node as usual.
warning

If you are using the ledger events produced by your full node, your event-consuming client application will need to discard all events and reprocess them from scratch.

The ledger events generated by the full node are specific to a database instance. Therefore, whenever you restart from an empty database, the events logged in event-consuming client applications of the full node will no longer be valid.

Peer discovery failure

Situation

Your full node is initializing but stalls during the peer discovery phase. It logs one or more of the following error messages:

[error][hathor.p2p.peer_discovery.dns] errback extra={'dns_seed_lookup_text', 'alpha.nano-testnet.hathor.network'} result=<twisted.python.failure.Failure builtins.OSError: [Errno 65] No route to host>

Cause

The full node is failing to perform the DNS query. This is a known issue in Hathor Core that occurs only in very specific environments, typically related to OS and home LAN configurations of personal machines.

The DNS provides a set of peer IPs that serves as the seed for peer discovery. Without this seed, the full node is unable to connect to the network, as it does not know any of its peers.

Workaround

Until this issue is resolved, you can manually work around it by querying the network's DNS and providing the retrieved IPs to Hathor core as seed for peer discovery. To carry out this workaround, follow this procedure:

  1. Select the DNS corresponding to the Hathor Network instance you want to connect to:

    • Mainnet: mainnet.hathor.network
    • Testnet: golf.testnet.hathor.network
    • Nano-testnet: alpha.nano-testnet.hathor.network
  2. Start a shell session.

  3. Query the DNS of the selected network. For example:

    dig TXT golf.testnet.hathor.network

    In ANSWER SECTION you will obtain the IPs to be used. For example:

    ...
    ;; ANSWER SECTION:
    golf.testnet.hathor.network. 60 IN TXT "tcp://18.156.174.211:40403/?id=6d6d72156f20d294c6677a8963ebe70df66b5beaf12773c16de250f8275fb6c5"
    golf.testnet.hathor.network. 60 IN TXT "tcp://18.199.240.217:40403/?id=e4466f8e05e93dc7b077af3807830bee296936772033b73ee32da59e5400d8fd"
    golf.testnet.hathor.network. 60 IN TXT "tcp://34.230.30.110:40403/?id=ffcc778abd0cf1be33062bbdaa48f9909e2e5a2947390efc72070c38c0505e69"
    ...
  4. Restart your full node using one or more times the --bootstrap option/environment variable, once for each IP you want to provide as seed. For example:

poetry run hathor-cli run_node --status 8080 --testnet --data ../data --wallet-index --bootstrap tcp://18.156.174.211:40403 --bootstrap tcp://18.199.240.217:40403 --bootstrap tcp://34.230.30.110:40403

Note that providing a single IP is sufficient as a seed for peer discovery, while multiple IPs ensure that the full node can successfully find an available peer. Additionally, option/environment variable --bootstrap must receive exactly one argument per use. Therefore, you must append it as many times as the number of IPs you want to provide.

tip

Instead of manually querying the DNS, you can embed it directly into the command used to start your full node. For example:

HATHOR_DNS=golf.testnet.hathor.network; poetry run hathor-cli run_node --status 8080 --testnet --data ../data --wallet-index $(dig TXT $HATHOR_DNS +short | sed 's/"/--bootstrap /' | sed 's/"//')

Unable to connect to mainnet

Situation

You have just started a full node, and it is attempting to connect to its peers in mainnet. It then begins logging one or more of the following warning messages:

[warning][hathor.p2p.protocol] remote error payload=Blocked (by <peer_id_of_your_full_node>). Get in touch with Hathor team. peer_id=None remote=<IP_of_some_peer>:40403

Diagnosis

Send an HTTP API request to check the status of the full node. For example:

curl -X GET http://localhost:8080/v1a/status/ | jq .connections

In the API response, look for the connections object. If its properties connected_peers, handshaking_peers, and connecting_peers, all have empty arrays, it means your full node is unable to connect to any other peer (which means it is not connected to the network).

status HTTP API response
{
"connected_peers": [],
"handshaking_peers": [],
"connecting_peers": []
},

Cause

At the moment, Hathor Network mainnet operates with a whitelist — i.e., only peers whose id is in the whitelist are able to connect to the network. The warning message(s) your full node received means that one or more peers rejected the connection because your full node's peer_id is not in the whitelist.

Solution

See How to connect Hathor full node to mainnet.

Slow initial synchronization

Situation

You have successfully connected your full node to Hathor Network, but syncing with its peers is taking too much time.

Cause

The initial synchronization is a process that happens when you deploy a new full node, or when you restart a full node after some period offline. In case of a new deploy, it will need to sync the entire ledger from genesis block. In case of restarting a full node that was offline for a long time, it will need to sync the ledger from the point it stopped.

In either case, this processes naturally takes a long time (hours), because the full node must download and validate all transactions and blocks in the ledger

As of February 2024, syncing from genesis block takes on average 10 hours for Hathor Network testnet and 24 hours for mainnet. As time passes and the ledger grows, the time required for initial syncing tends to increase.

Workaround

To expedite this process, you can bootstrap your full node from a snapshot. Snapshots allow nodes to rapidly catch up with the network. The trade off is that your full node will be relying on the snapshot to create its ledger, rather than making the entire validation process on its own.

To use this workaround, see How to bootstrap from a snapshot.

To know more about snapshots, see Snapshot at encyclopedia.

Connection failure

Situation

Your full node is currently operational and is logging one or more of the following warning messages:

[warning][hathor.p2p.protocol] remote error payload=Connection rejected. peer_id=None remote=<IP_of_some_peer>:40403

This means that a peer responded by rejecting the connection.

[warning][hathor.p2p.manager] connection failure endpoint=tcp://<IP_of_some_peer>:40403 failure=User timeout caused connection failure.

This means that a peer did not respond to the connection request.

[warning][hathor.p2p.protocol] Connection closed for idle timeout. peer_id=None remote=<IP_of_some_peer>:54254

Connection failures are a normal aspect of a full node's ongoing operation. As long as your full node remains well-connected to the network, these messages should not be a cause for concern.

Diagnosis

To determine if your full node is well-connected to the network, send an HTTP API request to check its status. For example:

curl -X GET http://localhost:8080/v1a/status/ | jq .connections

In the API response, look for the connections object. Count how many objects the connected_peers property has:

status HTTP API response
{
"connected_peers": [
{
"id": "<connected_peer_id_1>",
...
},
{
"id": "<connected_peer_id_2>",
...
},
...
{
"id": "<connected_peer_id_n>",
...
},
],
...
},

To be considered well-connected, a full node should average around 20 connections on mainnet, or 5 to 10 on testnet.

HTTP 503: service unavailable

Situation

You sent an HTTP API request to the full node and received the following status message as response: Server Error 503: Service Unavailable.

Diagnosis

Ensure that the Server Error 503: Service Unavailable status message is originating from the full node itself, not from a reverse proxy.

Cause

If the full node itself is responding with a status code 503, this means that it has been started without the wallet-index parameter. As a result, it cannot process requests that depend on this parameter for proper functioning.

Solution

Restart your full node with the wallet-index option/environment variable.

If you installed your full node from source code, restart it using the --wallet-index option. For example:

poetry run hathor-cli run_node --status 8080 --testnet --data ../data --wallet-index

Unresponsive full node

Situation

Your full node was normally responding to your API requests but then suddenly became unresponsive. This typically manifests with one or more of the following error messages:

  • request timed out
  • connection timed out
  • connection reset by peer
  • unable to connect to the server

Diagnosis

Check the host to ensure the full node is still up and running. If so, this might indicate that your full node is experiencing high CPU usage. See the section High CPU usage of this article.

High CPU usage

Situation

Your full node is presenting one or more of the following symptoms:

  • It suddenly becomes unresponsive to API requests.
  • It suddenly rejects all new connections with other peers.
  • It suddenly drops established connections with its peers.

Diagnosis

When these symptoms appear together, they indicate that your full node is experiencing high CPU usage, which means zero or near-zero CPU idle time. Use a utility — such as top, htop, vmstat, or mpstat —, to confirm high CPU usage on the full node's host.

Causes

There are two well-established causes for high CPU usage in a full node:

  1. Using version 1 of the synchronization algorithm.
  2. Using addresses with a high number of transactions.

Synchronization is the process by which all nodes of a blockchain network maintain the same copy of the ledger. The first version of the synchronization algorithm implemented in Hathor protocol may consume a lot of CPU time when the full node is connected to a high number of peers in the network. To solve this problem, Hathor protocol was updated with a new version of synchronization algorithm (version 2), which has been the default since Hathor core v0.59.0.

Processing API requests related to addresses with a high number of transactions consumes a significant amount of CPU time of a full node. Some use cases may involve many of these addresses and may require its full node to process multiple requests related to such addresses simultaneously. This can lead to high CPU usage in the use case's full node.

Resolutions

Resolution for cause 1 (sync algorithm v1)

If you are running a full node with Hathor core v0.58.0 or earlier, update it to v0.59.0 or later. See How to upgrade Hathor full node.

Resolution for cause 2 (addresses with high number of transactions)

If you already upgraded Hathor core to v0.59.0 or later, and are still experiencing high CPU usage, chances are that the problem is related to responding API requests involving addresses with a high number of transactions — e.g., calculating the balance or history of such addresses. If this is the case for your full node, a resolution may vary depending of your use case. Send a message to the #development channel on Hathor Discord server for assistance from Hathor team and community members.

I still need help

If this article does not address your problem, or if the provided instructions were insufficient, send a message to the #development channel on Hathor Discord server for assistance from Hathor team and community members.

What's next?