-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Background
We have set up LND with a Postgres database, and it has a dedicated CPU with about 4GB of RAM requested and available for itself. The database is populated and it's not like it's not responding but the responses are extremely slow. In our cluster, we also have lndmon running for the metrics exporting but clearly, it has way too many "restarts" on kubernetes because it is failing and timing out:
NAME READY STATUS RESTARTS AGE
bitcoind-0 2/2 Running 0 2d21h
lnd1-0 3/3 Running 7 2d22h
lnd1-lndmon-78997ccdf4-7plz9 1/1 Running 746 2d21h
We had opened a channel to this LND instance and because of this erratic behaviour of not being able to respond properly all the time, the channel had gotten force-closed as well.
Logs from LND:
$ kubectl logs lnd1-0 -c lnd | grep ERR | tail
2022-07-04 06:03:34.203 [ERR] DISC: unable to validate channel update short_chan_id=759840600204771329: failed to connect to `host=10.36.0.12 user=lnd_user database=lnd_db`: server error (FATAL: remaining connection slots are reserved for non-replication superuser connections (SQLSTATE 53300))
2022-07-04 06:12:06.942 [ERR] DISC: unable to validate channel update short_chan_id=797497774017609729: failed to connect to `host=10.36.0.12 user=lnd_user database=lnd_db`: server error (FATAL: remaining connection slots are reserved for non-replication superuser connections (SQLSTATE 53300))
2022-07-04 06:12:40.412 [ERR] RPCS: [/routerrpc.Router/SubscribeHtlcEvents]: context canceled
2022-07-04 06:20:44.384 [ERR] DISC: unable to validate channel update short_chan_id=779182109372448769: failed to connect to `host=10.36.0.12 user=lnd_user database=lnd_db`: server error (FATAL: remaining connection slots are reserved for non-replication superuser connections (SQLSTATE 53300))
2022-07-04 06:22:35.041 [ERR] RPCS: [/routerrpc.Router/SubscribeHtlcEvents]: context canceled
2022-07-04 06:27:20.468 [ERR] RPCS: [/routerrpc.Router/SubscribeHtlcEvents]: context canceled
2022-07-04 06:47:02.180 [ERR] DISC: unable to validate channel update short_chan_id=763321653983117313: failed to connect to `host=10.36.0.12 user=lnd_user database=lnd_db`: server error (FATAL: remaining connection slots are reserved for non-replication superuser connections (SQLSTATE 53300))
2022-07-04 07:01:22.169 [ERR] DISC: unable to validate channel update short_chan_id=799154738011242497: failed to connect to `host=10.36.0.12 user=lnd_user database=lnd_db`: server error (FATAL: remaining connection slots are reserved for non-replication superuser connections (SQLSTATE 53300))
2022-07-04 07:07:03.159 [ERR] DISC: unable to validate channel update short_chan_id=814529209106300929: failed to connect to `host=10.36.0.12 user=lnd_user database=lnd_db`: server error (FATAL: remaining connection slots are reserved for non-replication superuser connections (SQLSTATE 53300))
2022-07-04 07:09:51.814 [ERR] DISC: unable to validate channel update short_chan_id=758370553249398784: failed to connect to `host=10.36.0.12 user=lnd_user database=lnd_db`: server error (FATAL: remaining connection slots are reserved for non-replication superuser connections (SQLSTATE 53300))
Logs from lndmon:
$ k logs lnd1-lndmon-78997ccdf4-7plz9
2022-07-04 06:28:32.725 [INF] LNDMON: Starting Prometheus exporter...
2022-07-04 06:28:32.726 [INF] HTLC: Starting Htlc Monitor
2022-07-04 06:28:32.727 [INF] LNDMON: Prometheus active!
rpc error: code = DeadlineExceeded desc = context deadline exceeded
2022/07/04 07:01:18 http: superfluous response.WriteHeader call from github.com/lightninglabs/lndmon/health.(*Checker).ServeHTTP (health.go:76)
rpc error: code = DeadlineExceeded desc = context deadline exceeded
2022/07/04 07:01:18 http: superfluous response.WriteHeader call from github.com/lightninglabs/lndmon/health.(*Checker).ServeHTTP (health.go:76)
rpc error: code = DeadlineExceeded desc = context deadline exceeded
2022/07/04 07:09:48 http: superfluous response.WriteHeader call from github.com/lightninglabs/lndmon/health.(*Checker).ServeHTTP (health.go:76)
rpc error: code = DeadlineExceeded desc = context deadline exceeded
2022/07/04 07:09:48 http: superfluous response.WriteHeader call from github.com/lightninglabs/lndmon/health.(*Checker).ServeHTTP (health.go:76)
rpc error: code = DeadlineExceeded desc = context deadline exceeded
2022/07/04 07:25:18 http: superfluous response.WriteHeader call from github.com/lightninglabs/lndmon/health.(*Checker).ServeHTTP (health.go:76)
rpc error: code = DeadlineExceeded desc = context deadline exceeded
2022/07/04 07:25:18 http: superfluous response.WriteHeader call from github.com/lightninglabs/lndmon/health.(*Checker).ServeHTTP (health.go:76)
Environment
- Version:
0.15.0-beta commit=v0.15.0-beta - Using Docker Image:
lightninglabs/lnd - Postgres running on GCP CloudSQL, which has maximum 25 concurrent connections, so in lnd config, it's setup like:
db.backend=postgres
db.postgres.dsn=<sensitive>
db.postgres.timeout=0
db.postgres.maxconnections=21
Expected behaviour
LND should at least be responding all the time.
Actual behaviour
There is some caching involved in the RPC I believe which is why after one getinfo call, there is about 5m time when requests don't time out and after that, the cycle repeats.