Add Outbound Remote Signer implementation#8754
Add Outbound Remote Signer implementation#8754ViktorT-11 wants to merge 39 commits intolightningnetwork:masterfrom
Outbound Remote Signer implementation#8754Conversation
|
Important Review skippedAuto reviews are limited to specific labels. 🏷️ Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Outbound Remote Signer implementation
0525eb6 to
6abefde
Compare
b34174d to
aabced7
Compare
aabced7 to
cfe7fc3
Compare
|
Interesting PR 👀, does this somehow relate to the VLS project, a bit of time ago somebody told me that there is a proxy in the making to connect LND to the VLS signer project, wondering if there is some progress made in that direction ? |
Thanks a lot! I would say that the current PR is somewhat related to VLS in terms of future functionality it could enable. However, this PR only focuses on reversing the connection setup between the watch-only node and the signer node, so that the signer node makes the outbound connection to the watch-only node instead of the other way around. In a setup where mobile signer wallets use VLS for validation which are connected to watch-only nodes, this functionality is a necessary pre-requisite unless the mobile wallets are online constantly. However, this PR as it currently stand does not add any validation or enable an integration with VLS yet. The goal is to address such needs in future PRs. My hope is that the
I think the project you're referring to is: |
ViktorT-11
left a comment
There was a problem hiding this comment.
Added a few comments for reviewers where it'd be especially helpful with feedback :)! Might add some more comments tomorrow if I think of something else.
One more area I'd appreciate to get some feedback from reviewers for is:
- Iit could be useful to let the sign requests that's passed over the stream from the watch-only node to the signer, also pass through the RPC interceptor on the signer node side. If that's something you think we should look into supporting, let me know!
| "/walletrpc.WalletKit/SignCoordinatorStreams": {{ | ||
| Entity: "onchain", | ||
| Action: "write", | ||
| }, { | ||
| Entity: "message", | ||
| Action: "write", | ||
| }, { | ||
| Entity: "signer", | ||
| Action: "generate", | ||
| }, { | ||
| Entity: "address", | ||
| Action: "read", | ||
| }}, |
There was a problem hiding this comment.
Wasn't really sure what kind of perms we'd want to require in the macaroon for the remote signer when opening the stream with the watch-only node. Therefore mimicked the perms of the inbound remote signer required permissions. Though as this is the other way around i.e. the remote signer is connecting to the watch-only node, that maybe doesn't makes sense as it's not strictly required.
Perhaps we'd like to add some new kind of entity for this specific case?
There was a problem hiding this comment.
I think adding a special permission just for the remote signer here probably makes sense.
There was a problem hiding this comment.
Ok cool, I added a new entity called remotesigner, with the action generate (I wasn't sure which action made most sense here, so feel free to suggest using read/write if you think that makes more sense).
lnrpc/walletrpc/walletkit.proto
Outdated
| connects to the watch-only lnd node, to initialize a handshake between | ||
| the nodes. | ||
| */ | ||
| bool signer_registration = 2; |
There was a problem hiding this comment.
Let me know if it makes sense to add some kind of versioning for the signer, so that the watch-only node can know what kind of functionality the signer supports.
There was a problem hiding this comment.
Hmm, good question. I don't think that adding an independent version here gives us much. I think (at least in the beginning), we'd want the signer and watch-only to be on the same main lnd version. So maybe we should send the full version string here and initially just compare it to the one running on the watch-only, then error out if it differs?
Or don't do anything yet other than specifying this field as a string, so we can do it in the future?
There was a problem hiding this comment.
Or don't do anything yet other than specifying this field as a string, so we can do it in the future?
Makes sense, thanks! I agree that it probably makes sense to just enable the support so that we can define the versioning idea when we actually decide on it in the future, so I edited this to be a string instead.
cfe7fc3 to
b48621e
Compare
guggero
left a comment
There was a problem hiding this comment.
Awesome work on this PR. The commit structure is super nice and easy to follow.
I only did a first read-through to grasp the different areas the code touches. Haven't looked at all the new code in detail yet, but will follow up with a second round soon.
So this is mainly to checkpoint my initial comments. Will dig in deeper tomorrow.
lnrpc/walletrpc/walletkit.proto
Outdated
| connects to the watch-only lnd node, to initialize a handshake between | ||
| the nodes. | ||
| */ | ||
| bool signer_registration = 2; |
There was a problem hiding this comment.
Hmm, good question. I don't think that adding an independent version here gives us much. I think (at least in the beginning), we'd want the signer and watch-only to be on the same main lnd version. So maybe we should send the full version string here and initially just compare it to the one running on the watch-only, then error out if it differs?
Or don't do anything yet other than specifying this field as a string, so we can do it in the future?
| "/walletrpc.WalletKit/SignCoordinatorStreams": {{ | ||
| Entity: "onchain", | ||
| Action: "write", | ||
| }, { | ||
| Entity: "message", | ||
| Action: "write", | ||
| }, { | ||
| Entity: "signer", | ||
| Action: "generate", | ||
| }, { | ||
| Entity: "address", | ||
| Action: "read", | ||
| }}, |
There was a problem hiding this comment.
I think adding a special permission just for the remote signer here probably makes sense.
config_builder.go
Outdated
| cleanUpTasks []func() | ||
| cleanUp = func() { | ||
| for _, fn := range cleanUpTasks { | ||
| if fn == nil { |
There was a problem hiding this comment.
When would a cleanup function be nil? Shouldn't we be able to expect them always to be non-nil, the way we add them below?
There was a problem hiding this comment.
Yeah that's correct, currently they'll never be nil given the current code. This was a copy of the pattern used in the BuildWalletConfig function, and I guess it could be useful if we'd ever add a remote signer implementation that doesn't have any cleanup func.
But I'll remove the nil check if you think that's just adding extra complexity :)!
There was a problem hiding this comment.
'd ever add a remote signer implementation that doesn't have any cleanup func.
But in that case, that impl can just return func() {}. We can even require that via the comment in the API
I think better to have fewer nil checks where possible
There was a problem hiding this comment.
I also dont think you need a cleanup return value here. Just have a Stop on the RemoteSigner which performs a disconnect of its established connection. Then you just add that Stop call to your clean-up list
There was a problem hiding this comment.
also: pre-existing, but I think that currently NewRPCKeyRing (and now Build) is the only call in BuildChainControl besides NewChainControl that does some sort of "start" functionality. I'd argue that we really should keep these New* functions as constructors and then leave any starting for later. But again this is out of scope here since it is pre-existing.
lnwallet/rpcwallet/remote_signer.go
Outdated
| "signing node through RPC: %v", err) | ||
| } | ||
|
|
||
| defer func() { |
There was a problem hiding this comment.
Why is this in a defer? Since we're done now?
And not sure if we also want to report errors when not being able to disconnect?
So we might just be able to do return conn.Close() here instead?
guggero
left a comment
There was a problem hiding this comment.
Took a deep dive into the remote signer client and sign coordinator. Looks very good, just need to do another pass to grok all layers of the mutexes and wait groups and Go channels being used...
| s.mu.Lock() | ||
| defer s.mu.Unlock() | ||
|
|
||
| select { | ||
| case <-s.quit: | ||
| return nil, ErrShuttingDown | ||
| default: | ||
| } | ||
|
|
||
| s.wg.Add(1) | ||
|
|
||
| return func() { | ||
| s.wg.Done() | ||
| }, nil |
There was a problem hiding this comment.
Hmm, there are a lot of layers with the mutexes and wait groups... Probably need to take another, closer look at those, to make sure we can't run into deadlocks...
There was a problem hiding this comment.
Please do :)! I've tried to be very careful to ensure that couldn't happen, so hopefully I haven't missed any scenario where that could occur! But some extra eyes on this would indeed be very helpful 🙏
915f062 to
6e965f8
Compare
Change the `InterceptorChain` behavior to allow a remote signer to call the `walletrpc.SignCoordinatorStreams` while the `rpcState` is set to `allowRemoteSigner`. This state precedes the `rpcActive` state, which allows all RPCs. This change is necessary because `lnd` needs the remote signer to be connected before some of the internal dependencies for RPC sub-servers can be created. These dependencies must be inserted into the sub-servers before moving the `rpcState` to `rpcActive`.
The `SetServerActive` moves the `rpcState` from `rpcActive` to `serverActive`. Update the docs to correctly reflect that.
To enable an outbound remote signer to connect to lnd before all dependencies for the RPC sub-servers are created, we need to separate the process of adding dependencies to the sub-servers from created the sub-servers. Prior to this commit, the RPC sub-servers were created and enabled only after all dependencies were in place. Such a limitation prevents accepting an incoming connection request from an outbound remote signer (e.g., a `walletrpc.SignCoordinatorStreams` RPC call) to the `WalletKitServer` until all dependencies for the RPC sub-servers are created. However, this limitation would not work, as we need the remote signer in order to create some of the dependencies for the other RPC sub-servers. Therefore, we need to enable calls to at least the `WalletKitServer` and the main RPC server before creating the remaining dependencies. This commit refactors the logic for the main RPC server and sub-servers, allowing them to be enabled before dependencies are inserted into the sub-servers. The `WalletState` for the `InterceptorChain` is only set to `RpcActive` after all dependencies have been created and inserted, ensuring that RPC requests won't be allowed into the sub-servers before the dependencies exist. An upcoming commit will set the state to `AllowRemoteSigner` before all dependencies are created, enabling an outbound remote signer to connect when needed.
This commit adds the `RemoteSignerConnection` reference to the `WalletKit` `Config`, enabling it to be accessed from the `WalletKit` sub-server. When a remote signer connects by calling the `SignCoordinatorStreams` RPC endpoint, we need to pass the stream from the outbound remote signer to the `InboundRemoteSignerConnection` `AddConnection` function. This change ensures that the `InboundRemoteSignerConnection` `AddConnection` function is reachable from the `SignCoordinatorStreams` RPC endpoint implementation. Note that this field should only to be populated when the `RPCKeyRing` `RemoteSignerConnection` is an `InboundConnection` instance, as the only that connection type implements the `InboundRemoteSignerConnection` interface.
With the ability to reach the `InboundRemoteSignerConnection` `AddConnection` function in the `WalletKit` sub-server, we now implement the `SignCoordinatorStreams` RPC endpoint.
This commit populates the `RemoteSignerConnection` reference in the `WalletKit` config before other dependencies are added. To ensure that an outbound remote signer can connect before other dependencies are created, and since we use this reference in the walletrpc `SignCoordinatorStreams` RPC, we must populate this dependency prior to other dependencies during the lnd startup process.
This commit populates the `RemoteSignerConnection` reference in the `WalletKit` config before other dependencies are added. To ensure that an outbound remote signer can connect before other dependencies are created, and since we use this reference in the walletrpc `SignCoordinatorStreams` RPC, we must populate this dependency prior to other dependencies during the lnd startup process.
Previous commits added functionality to handle the incoming connection from an outbound remote signer and ensured that the outbound remote signer could connect before any signatures from the remote signer are needed. However, one issue still remains: we need to ensure that we wait for the outbound remote signer to connect when starting lnd before executing any code that requires the remote signer to be connected. This commit adds a `ReadySignal` function to the `WalletController` that returns a channel, which will signal once the wallet is ready to be used. For an `InboundConnection`, this channel will only signal once the outbound remote signer has connected. This can then be used to ensure that lnd waits for the outbound remote signer to connect during the startup process.
With the functionality in place to allow an outbound remote signer to connect before any signatures are needed and the ability to wait for this connection, this commit enables the functionality to wait for the remote signer to connect before proceeding with the startup process. This includes setting the `WalletState` in the `InterceptorChain` to `AllowRemoteSigner` before waiting for the outbound remote signer to connect.
With all the necessary components on the watch-only node side in place to support usage of an outbound remote signer, the `lncfg` package can now permit the `remotesigner.allowinboundconnection` setting to be configured. This commit also adds support for the building `InboundConnection` instances in the `RemoteSignerConnectionBuilder`.
With support for the outbound remote signer now added, we update the documentation to detail how to enable the use of this new remote signer type.
Update release notes to include information about the support for the new outbound remote signer type.
Update the harness to allow creating a watch-only node without starting it. This is useful for tests that need to create a watch-only node prior to starting it, such as tests that use an outbound remote signer.
rename testRemoteSignerRadomSeedOutbound to testRemoteSignerRandomSeedOutbound, as random was misspelled.
testOutboundRSMacaroonEnforcement tests that a valid macaroon including the `remotesigner` entity is required to connect to a watch-only node that uses an outbound remote signer, while the watch-only node is in the state (WalletState_ALLOW_REMOTE_SIGNER) where it waits for the signer to connect.
This commit fixes that word wrapping for the deriveCustomScopeAccounts function docs, and ensures that it wraps at 80 characters or less.
Allow the remote signer to reconnect to the wallet after disconnecting, as long as the remote signer reconnects within the timeout limit. This is not a complete solution to the problem to allow the watch-only node to stay online when the remote signer is disconnected, but is more fault-tolerant than the current implementation as it allows the remote to be temporarily disconnected.
14403fc to
b89ea08
Compare
🔴 PR Severity: CRITICAL
🔴 Critical (14 files)
🟠 High (21 files)
🟡 Medium (6 files)
🟢 Low (21 files)
AnalysisThis PR introduces a comprehensive outbound remote signer implementation, which is a fundamental architectural change to how LND handles transaction signing. This warrants CRITICAL severity for the following reasons: Primary Concerns:
Key Risk Areas:
Recommendation: This PR requires expert review from maintainers familiar with the wallet signing infrastructure, channel state machine, and security model. Special attention should be paid to the To override, add a |
b89ea08 to
23213ea
Compare
🔴 PR Severity: CRITICAL
🔴 Critical (9 files)
🟠 High (28 files)
🟡 Medium (8 files)
🟢 Low (17 files)
AnalysisThis PR introduces Outbound Remote Signer functionality, which enables lnd to delegate signing operations to an external signer service. This is classified as CRITICAL for the following reasons: Primary Concerns:
Recommended Review Focus:
Risk Assessment: This change introduces a new architectural component (remote signing) that sits in the critical path of all signing operations. Expert review from maintainers familiar with lnwallet, signing operations, and channel state management is essential. To override, add a |
23213ea to
967cb5d
Compare
🔴 PR Severity: CRITICAL
🔴 Critical (12 files)
🟠 High (24 files)
🟡 Medium (9 files)
🟢 Low (10 files - tests/docs/generated)Documentation:
Tests:
⚙️ Auto-generated (7 files - excluded from severity calculation)
AnalysisThis PR is classified as CRITICAL due to the following factors: Primary Concerns:
Recommended Review Focus:
This PR requires expert-level review from maintainers familiar with lnd's wallet architecture, signing protocols, and security model. To override, add a |
|
Thanks a lot for the thorough review @yyforyongyu 🎉🙏! I've addressed your feedback with the above pushes 🎉 |
967cb5d to
b62e7cd
Compare
🔴 PR Severity: CRITICAL
🔴 Critical (9 files)
🟠 High (20+ files)
🟡 Medium (10 files)
🟢 Low (20+ files)
AnalysisThis PR implements a major new feature: Outbound Remote Signer support. This is classified as CRITICAL because:
RecommendationThis PR requires expert review with deep knowledge of:
Reviewers should pay special attention to:
To override, add a |
starius
left a comment
There was a problem hiding this comment.
Great work! Thanks! 🚀
I went through all the commits, left some comments. Most of them are local, but some are large scope:
- I propose a stable naming for remote signer operation modes. Word "inbound" and "outbound" change their meaning depending to which side they are applied. I propose to use node2signer and signer2node or something similar (direction-explicit and perspective-neutral).
- We should consider a separate RPC port for accepting remote signer connections instead of reusing the main lnd RPC port. Sharing the port creates pressure to bind on 0.0.0.0, exposing the full RPC surface. A dedicated listener would also eliminate the need for the ALLOW_REMOTE_SIGNER state machine. Fine as a follow-up, but worth deciding on before this ships.
| // Else, we are in outbound mode, so we verify the connection config. | ||
| err := r.ConnectionCfg.Validate() |
There was a problem hiding this comment.
I think this terminology is a bit confusing. Here the remote signer is in Inbound mode, but the watch-only mode is in outbound mode, IIUC. It is hard to map the word to the actual mode of operation being used.
Can we use the naming so that it is called the same on both sides? What about this one:
- node2signer - Watch-only node connects to signer (legacy)
- signer2node - Signer connects to watch-only node (new)
| /* | ||
| The registration challenge allows the remote signer to pass data that will | ||
| be signed by the watch-only lnd. The resulting signature will be returned in | ||
| the RegistrationResponse message. | ||
| */ | ||
| string registration_challenge = 1; |
There was a problem hiding this comment.
Which key does the node use to sign the challenge? Is it the main node key used for node id? Could you add more details here, please?
If the signing scheme is not decided yet, we can drop the field for now or make it bytes, since bytes is a better type to binary data like a challenge or a signature.
| // Ready returns a channel that nil gets sent over once the remote | ||
| // signer is ready to accept requests. | ||
| Ready(ctx context.Context) chan error |
There was a problem hiding this comment.
What happens is ctx expires? Could you expand the godoc, please?
| // waits for the corresponding response. | ||
| DeriveSharedKey(ctx context.Context, | ||
| in *signrpc.SharedKeyRequest, | ||
| opts ...grpc.CallOption) (*signrpc.SharedKeyResponse, error) |
There was a problem hiding this comment.
Does it make sense to remove opts ...grpc.CallOption here and in methods below to make the interface more honest? We don't pass them at least in signer-to-node mode. And they are not actually used in the legacy mode anyway.
| err := remoteSigner.connect(ctx, cfg) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("error connecting to the remote "+ | ||
| "signing node through RPC: %v", err) |
| /* | ||
| SignCoordinatorStreams dispatches a bi-directional streaming RPC that | ||
| allows a remote signer to connect to lnd and remotely provide the signatures | ||
| required for any on-chain related transactions or messages. | ||
| */ | ||
| rpc SignCoordinatorStreams (stream SignCoordinatorResponse) | ||
| returns (stream SignCoordinatorRequest); |
There was a problem hiding this comment.
I propose to consider the following idea before shipping. Maybe in a follow-up PR.
Can we make a separate gRPC server for SignCoordinatorStreams RPC? If the main RPC server (the one used by lncli) serves this, then it is very tempting to bind the main RPC server on 0.0.0.0 to allow access for SignCoordinatorStreams from the remote signer. But opening world access to the main RPC server is not a good idea from security standpoint. It is possible to use port forwarding from remote signer but not everyone will do it. We can reduce attack surface of the configuration which is the most convenient to use.
We can actually shoot two birds with one stone: if we have a separate RPC server for SignCoordinatorStreams, we don't need ALLOW_REMOTE_SIGNER! The separate RPC server will start independently and have its own rules on when to allow connections.
subrpcserver_config.go
Outdated
| reflect.ValueOf(chanStateDB), | ||
| ) | ||
|
|
||
| kRing, ok := cc.Wallet.WalletController.(*rpcwallet.RPCKeyRing) |
There was a problem hiding this comment.
We could use an interface having RemoteSignerConnection() method instead of *rpcwallet.RPCKeyRing
| // has been executed, the read lock should be held when accessing values | ||
| // from the cfg. | ||
| // The write lock should be held when setting the cfg. | ||
| sync.RWMutex |
There was a problem hiding this comment.
mu sync.RWMutexSo public methods of mutex (Lock, Unlock, etc) do not propagate to WalletKit.
| **Note:** The `save_to` path should match the `remotesigner.macaroonpath` | ||
| specified in step 1. If the signer and watch-only nodes are on separate | ||
| environments, move the macaroon to the `remotesigner.macaroonpath` after baking | ||
| it instead. |
There was a problem hiding this comment.
It should be watchonlynode.macaroonpath, not remotesigner.macaroonpath
| It is also recommended to set the following parameter, which defines the | ||
| interval at which the watch-only node will check if the signer node is still | ||
| connected. If the signer node is disconnected during a check, the watch-only | ||
| node will shut down: | ||
|
|
||
| ```text | ||
| # Set the interval for how often the watch-only node will check that the signer | ||
| # node is still connected. | ||
| healthcheck.remotesigner.interval=5s | ||
| ``` | ||
|
|
||
| Since the signer node set up in Step 1 increases the delay between connection | ||
| attempts slightly with each failed attempt, it may take some time before it | ||
| reconnects to the watch-only node after it has been started. Setting a high | ||
| value for this configuration field will help ensure that the watch-only node | ||
| does not time out when starting up. |
There was a problem hiding this comment.
This conflates two things. The healthcheck.remotesigner.interval controls how often the watch-only node pings the signer after connection. The startup timeout is controlled by remotesigner.startuptimeout (default 5 minutes). The health check interval has nothing to do with startup timeouts.
This PR introduces the functionality to utilize an alternative remote signer implementation, in which the remote signer node establishes an outbound connection to the watch-only node.
In the existing remote signer implementation, the watch-only node establishes an outbound connection to the remote signer, which accepts an inbound connection. The implementation introduced by this PR eliminates the need for the remote signer to allow any inbound connections.
To enable an outbound remote signer using the functionality introduced in this PR, please run
make release-install& follow these steps:https://github.com/ViktorT-11/lnd/blob/2024-05-add-outbound-remote-signer/docs/remote-signing.md#outbound-remote-signer-example
Note:
This PR does not address the requirement for the remote signer to remain online while the watch-only node is operational. Currently, all RPC requests sent to the Remote signer will fail if it goes offline, and the health monitor will then proceed to shutdown the watch-only node. Additionally, this PR does not implement any validation on the remote signer side, i.e. the Remote Signer will blindly sign whatever is sent to it.
These issues will be addressed in future PRs.
Final note:
I plan resolve any CI issues ASAP, so reviewers can await those fixes before starting the review if preferred. I also intend to add some review comments on a few points where feedback would be particularly helpful.