Core (full node) troubleshooting
Goal
This article will guide you to troubleshoot Hathor core (full node). Each of the following sections provides resolution to a problem you may encounter while operating and using Hathor full node.
Error initializing node
Situation
You ran the command to start the full node and received the following error message:
[error][hathor.manager] Error initializing node. The last time you executed your full node it wasn't stopped correctly. The storage is not reliable anymore and, because of that, so you must run a full verification or remove your storage and do a full sync.
Cause
It happens when the full node's database (namely, the ledger) is in an unreliable state. As explained by the message, this occurs when the full node does not shut down properly — e.g., the process was abruptly killed, or the machine turned off.
Solutions
There are two possible solutions:
- Execute a full verification of the database.
- Restart from an empty database.
Solution (1) is a process that takes time. In turn, solution (2) entails the initial synchronization of the ledger from genesis block, which is a process that takes even longer. However, one can expedite solution (2), by using a snapshot (of the ledger), making it by far the fastest approach.
Therefore, we recommend restarting from an empty database and using a snapshot, combining solution (2) with the procedure of section Slow initial synchronization of this article.
Solution 1: execute a full verification of the database
- Source code
- Docker container
- Docker compose
If you installed your full node from source code, start it using the --x-full-verification
option. For example:
poetry run hathor-cli run_node --status 8080 --testnet --data ../data --wallet-index --x-full-verification
If you installed your full node as a Docker container, start the container using --x-full-verification
as an option of the subcommand run_node
. For example:
docker run \
-it -p 8080:8080 -v <absolute_path_hathor_full_node>/data:/data \
hathornetwork/hathor-core \
run_node --status 8080 --testnet --data /data --wallet-index --x-full-verification
If you installed your full node using Docker compose, add the HATHOR_X_FULL_VERIFICATION=true
environment variable to docker-compose.yml
file. For example:
services:
hathor-core:
image: hathornetwork/hathor-core
command: run_node
ports:
- "8080:8080"
- "8081:8081"
volumes:
- <absolute_path_hathor_full_node>/data:/data
environment:
- HATHOR_STATUS=8080
- HATHOR_STRATUM=8081
- HATHOR_TESTNET=true
- HATHOR_DATA=/data
- HATHOR_WALLET_INDEX=true
- HATHOR_CACHE=true
- HATHOR_CACHE_SIZE=100000
- HATHOR_X_FULL_VERIFICATION=true
...
The next time you start your full node, it will execute a full verification of the database. This is necessary only once. Therefore, be sure to remove the full verification option/environment variable from your deployment configuration afterward. Otherwise, the full node will execute the full verification process every time it restarts, even if the database is already in a reliable state.
If the full verification of the database fails, use solution 2.
Solution 2: restart from an empty database
If you want to carry out this solution using a snapshot, see Slow initial synchronization.
If you want to carry out this solution without using a snapshot, follow this procedure:
- Start a shell session.
- Change the working directory to
<absolute_path_hathor_full_node>/data
, replacing the<absolute_path_hathor_full_node>
placeholder with the directory where you installed your full node. - Remove all content of the
data
subdirectory. - Restart your full node as usual.
If you are using the event subsystem of your full node, your event-consuming client application will need to discard all events and reprocess them from scratch.
The events generated by the event subsystem of your full node are specific to a database instance. Therefore, whenever you restart from an empty database, the events logged in event-consuming client applications of the full node will no longer be valid.
Unable to connect to mainnet
Situation
You have just started a full node, and it is attempting to connect to its peers in mainnet. It then began to receive one or more of the following warning messages:
[warning][hathor.p2p.protocol] remote error payload=Blocked (by <peer_id_of_your_full_node>). Get in touch with Hathor team. peer_id=None remote=<IP_of_some_peer>:40403
Diagnosis
Send an HTTP API request to check the status of the full node. For example:
curl -X GET http://localhost:8080/v1a/status/ | jq .connections
In the API response, look for the connections
object. If its properties connected_peers
, handshaking_peers
, and connecting_peers
, all have empty arrays, it means your full node is unable to connect to any other peer (which means it is not connected to the network).
{
"connected_peers": [],
"handshaking_peers": [],
"connecting_peers": []
},
Cause
At the moment, Hathor Network mainnet operates with a whitelist — i.e., only peers whose id is in the whitelist are able to connect to the network. The warning message(s) your full node received means that one or more peers rejected the connection because your full node's peer_id is not in the whitelist.
Solution
See How to connect Hathor full node to mainnet.
Slow initial synchronization
Situation
You have successfully connected your full node to Hathor Network, but syncing with its peers is taking too much time.
Cause
The initial synchronization is a process that happens when you deploy a new full node, or when you restart a full node after some period offline. In case of a new deploy, it will need to sync the entire ledger from genesis block. In case of restarting a full node that was offline for a long time, it will need to sync the ledger from the point it stopped.
In either case, this processes naturally takes a long time (hours), because the full node must download and validate all transactions and blocks in the ledger
As of February 2024, syncing from genesis block takes on average 10 hours for Hathor Network testnet and 24 hours for mainnet. As time passes and the ledger grows, the time required for initial syncing tends to increase.
Workaround
To expedite this process, you can bootstrap your full node from a snapshot. Snapshots allow nodes to rapidly catch up with the network. The trade off is that your full node will be relying on the snapshot to create its ledger, rather than making the entire validation process on its own.
To use this workaround, see How to bootstrap from a snapshot.
To know more about snapshots, see Snapshot at encyclopedia.
Connection failure
Situation
Your full node is currently operational and logs one or more of the following warning messages:
[warning][hathor.p2p.protocol] remote error payload=Connection rejected. peer_id=None remote=<IP_of_some_peer>:40403
This means that a peer responded by rejecting the connection.
[warning][hathor.p2p.manager] connection failure endpoint=tcp://<IP_of_some_peer>:40403 failure=User timeout caused connection failure.
This means that a peer did not respond to the connection request.
[warning][hathor.p2p.protocol] Connection closed for idle timeout. peer_id=None remote=<IP_of_some_peer>:54254
Connection failures are a normal aspect of a full node's ongoing operation. As long as your full node remains well-connected to the network, these messages should not be a cause for concern.
Diagnosis
To determine if your full node is well-connected to the network, send an HTTP API request to check its status. For example:
curl -X GET http://localhost:8080/v1a/status/ | jq .connections
In the API response, look for the connections
object. Count how many objects the connected_peers
property has:
{
"connected_peers": [
{
"id": "<connected_peer_id_1>",
...
},
{
"id": "<connected_peer_id_2>",
...
},
...
{
"id": "<connected_peer_id_n>",
...
},
],
...
},
To be considered well-connected, a full node should average around 20 connections on mainnet, or 5 to 10 on testnet.
HTTP 503: service unavailable
Situation
You sent an HTTP API request to the full node and received the following status message as response: Server Error 503: Service Unavailable
.
Diagnosis
Ensure that the Server Error 503: Service Unavailable
status message is originating from the full node itself, not from a reverse proxy.
Cause
If the full node itself is responding with a status code 503, this means that it has been started without the wallet-index
parameter. As a result, it cannot process requests that depend on this parameter for proper functioning.
Solution
Restart your full node with the wallet-index
option/environment variable.
- Source code
- Docker container
- Docker compose
If you installed your full node from source code, restart it using the --wallet-index
option. For example:
poetry run hathor-cli run_node --status 8080 --testnet --data ../data --wallet-index
If you installed your full node as a Docker container, restart it using --wallet-index
as an option of the subcommand run_node
. For example:
docker run \
-it -p 8080:8080 -v <absolute_path_hathor_full_node>/data:/data \
hathornetwork/hathor-core \
run_node --status 8080 --testnet --data /data --wallet-index
If you installed your full node using Docker compose, add the HATHOR_WALLET_INDEX=true
environment variable to docker-compose.yml
file. For example:
services:
hathor-core:
image: hathornetwork/hathor-core
command: run_node
ports:
- "8080:8080"
- "8081:8081"
volumes:
- <absolute_path_hathor_full_node>/data:/data
environment:
- HATHOR_STATUS=8080
- HATHOR_STRATUM=8081
- HATHOR_TESTNET=true
- HATHOR_DATA=/data
- HATHOR_WALLET_INDEX=true
- HATHOR_CACHE=true
- HATHOR_CACHE_SIZE=100000
...
Unresponsive full node
Situation
Your full node was normally responding to your API requests but then suddenly became unresponsive. This typically manifests with one or more of the following error messages:
request timed out
connection timed out
connection reset by peer
unable to connect to the server
Diagnosis
Check the host to ensure the full node is still up and running. If so, this might indicate that your full node is experiencing high CPU usage. See the section High CPU usage of this article.
High CPU usage
Situation
Your full node is presenting one or more of the following symptoms:
- It suddenly becomes unresponsive to API requests.
- It suddenly rejects all new connections with other peers.
- It suddenly drops established connections with its peers.
Diagnosis
When these symptoms appear together, they indicate that your full node is experiencing high CPU usage, which means zero or near-zero CPU idle time. Use a utility — such as top, htop, vmstat, or mpstat —, to confirm high CPU usage on the full node's host.
Causes
There are two well-established causes for high CPU usage in a full node:
- Using version 1 of the synchronization algorithm.
- Using addresses with a high number of transactions.
Synchronization is the process by which all nodes of a blockchain network maintain the same copy of the ledger. The first version of the synchronization algorithm implemented in Hathor protocol may consume a lot of CPU time when the full node is connected to a high number of peers in the network. To solve this problem, Hathor protocol was updated with a new version of synchronization algorithm (version 2), which has been the default since Hathor core v0.59.0.
Processing API requests related to addresses with a high number of transactions consumes a significant amount of CPU time of a full node. Some use cases may involve many of these addresses and may require its full node to process multiple requests related to such addresses simultaneously. This can lead to high CPU usage in the use case's full node.
Resolutions
Resolution for cause 1 (sync algorithm v1)
If you are running a full node with Hathor core v0.58.0 or earlier, update it to v0.59.0 or later. See How to upgrade Hathor full node.
Resolution for cause 2 (addresses with high number of transactions)
If you already upgraded Hathor core to v0.59.0 or later, and are still experiencing high CPU usage, chances are that the problem is related to responding API requests involving addresses with a high number of transactions — e.g., calculating the balance or history of such addresses. If this is the case for your full node, a resolution may vary depending of your use case. Send a message to the #development
channel on Hathor Discord server for assistance from Hathor team and community members.
I still need help
If this article does not address your problem, or if the provided instructions were insufficient, send a message to the #development
channel on Hathor Discord server for assistance from Hathor team and community members.
What's next?
-
Hathor full node configuration: to customize or refine the installation setup.
-
Hathor full node pathway: to know how to operate this application.