feat: diagnostic replay and trace stripping for release builds#211
feat: diagnostic replay and trace stripping for release builds#211
Conversation
When a script compiled with Options.release (no traces) fails during TxBuilder evaluation, the evaluator automatically recompiles from SIR with error traces and replays to produce diagnostic logs. - Add abstract withErrorTraces to CompiledPlutus base class - Add failedScriptHash to PlutusScriptEvaluationException - Add debugScripts parameter through evaluator and balancing chain - Add CompiledPlutus overloads for spend (4) and mint (3) on TxBuilder - Add references(utxo, compiled) overload for reference script replay - Add withDebugScript for manual debug script registration
Replace Map[ScriptHash, CompiledPlutus[?]] with Map[ScriptHash, DebugScript] throughout the diagnostic replay pipeline. DebugScript wraps either a pre-compiled debug PlutusScript (for external builders like meshJS) or a lazy recompilation from CompiledPlutus (for Scalus TxBuilder). - Add DebugScript class with apply(PlutusScript) and fromCompiled(CompiledPlutus) - Add debugScripts field to rules.Context for Emulator path - Add submitSync(tx, debugScripts) overload to JVM/JS Emulator - Add submitTx(txBytes, debugScripts) JS-friendly overload to JEmulator - Propagate scriptHash in PlutusScriptValidationException and SubmitError - Add TxBuilder.withDebugScript(scriptHash, DebugScript) overload
Add RemoveTraces SIR transformer that strips fully-applied Trace builtin calls before lowering, replacing them with their value argument and cleaning up dead let bindings. Applied via new `removeTraces` option in Options (enabled by default in Options.release). The `withErrorTraces` method restores traces for diagnostic replay.
There was a problem hiding this comment.
Pull request overview
This PR implements a comprehensive diagnostic replay feature for Plutus scripts compiled in release mode. When production scripts fail with empty logs, the system can automatically recompile from SIR with error traces enabled and replay the evaluation to provide detailed diagnostic information - all without bloating the production script size.
Changes:
- New
RemoveTracesSIR transformer strips trace builtin calls and their message computations from release builds, reducing script size and execution cost - New
DebugScriptAPI enables external transaction builders (e.g., Bloxbean CCL, meshJS) to provide pre-compiled debug scripts for diagnostic replay - TxBuilder automatically registers
CompiledPlutusscripts for diagnostic replay, with new overloadedspendandmintmethods acceptingCompiledPlutusdirectly
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| scalus-core/shared/src/main/scala/scalus/compiler/sir/RemoveTraces.scala | Implements SIR transformation to remove fully-applied Trace builtin calls and dead Unit bindings |
| scalus-core/shared/src/test/scala/scalus/compiler/sir/RemoveTracesSpec.scala | Comprehensive test coverage for RemoveTraces transformation including edge cases |
| scalus-core/shared/src/main/scala/scalus/uplc/DebugScript.scala | New public API for wrapping debug scripts with lazy evaluation |
| scalus-core/shared/src/main/scala/scalus/uplc/Compiled.scala | Integration of RemoveTraces into compilation pipeline and withErrorTraces implementation |
| scalus-core/shared/src/main/scala/scalus/compiler/compiler.scala | Adds removeTraces option, enabled by default in Options.release |
| scalus-core/shared/src/main/scala/scalus/compiler/sir/SIRDefaultOptions.scala | Default option configuration for removeTraces |
| scalus-cardano-ledger/shared/src/main/scala/scalus/cardano/ledger/PlutusScriptEvaluator.scala | Core replay logic that recompiles and re-evaluates failed scripts with debug traces |
| scalus-cardano-ledger/shared/src/main/scala/scalus/cardano/txbuilder/TxBuilder.scala | New CompiledPlutus overloads for spend/mint, withDebugScript methods, and debug script propagation |
| scalus-cardano-ledger/shared/src/main/scala/scalus/cardano/txbuilder/TransactionBuilder.scala | Threads debugScripts parameter through balancing and evaluation |
| scalus-cardano-ledger/shared/src/main/scala/scalus/cardano/ledger/rules/Entities.scala | Adds debugScripts to Context for ledger validation |
| scalus-cardano-ledger/shared/src/main/scala/scalus/cardano/ledger/Entities.scala | Adds scriptHash parameter to PlutusScriptValidationException |
| scalus-cardano-ledger/shared/src/main/scala/scalus/cardano/node/BlockchainProvider.scala | Updates ScriptFailure with optional scriptHash field |
| scalus-cardano-ledger/shared/src/main/scala/scalus/cardano/node/EmulatorBase.scala | Adds submit overload accepting debugScripts |
| scalus-cardano-ledger/jvm/src/main/scala/scalus/cardano/node/Emulator.scala | JVM-specific submitSync implementation with debug scripts |
| scalus-cardano-ledger/js/src/main/scala/scalus/cardano/node/Emulator.scala | JS-specific submitSync implementation with debug scripts |
| scalus-cardano-ledger/js/src/main/scala/scalus/cardano/node/JEmulator.scala | JavaScript interop layer for debug scripts via dictionary mapping |
| scalus-cardano-ledger/shared/src/main/scala/scalus/cardano/ledger/rules/PlutusScriptsTransactionMutator.scala | Propagates debug scripts to evaluator in ledger validation |
| scalus-cardano-ledger/jvm/src/test/scala/scalus/cardano/txbuilder/DiagnosticReplayTest.scala | Comprehensive end-to-end tests for diagnostic replay feature |
| scalus-cardano-ledger/jvm/src/test/scala/scalus/cardano/txbuilder/TxBuilderPerformanceTest.scala | Updates test evaluator to support debugScripts parameter |
| scalus-site/content/testing/debugging.mdx | Documentation for diagnostic replay feature and trace removal |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| ## Diagnostic Replay for Release Scripts | ||
|
|
||
| When deploying to production, you typically compile scripts with `Options.release` (no error traces) to minimize script size and execution costs. However, if a release script fails, the error logs will be empty — making it hard to diagnose the issue. |
There was a problem hiding this comment.
The documentation states "if a release script fails, the error logs will be empty" but this is not accurate. When Options.release is used, removeTraces = true strips trace statements, but generateErrorTraces = false means error traces (like require messages) are still not included. The documentation should clarify that BOTH trace logs AND error messages will be missing from release scripts, not just "error logs will be empty".
| When deploying to production, you typically compile scripts with `Options.release` (no error traces) to minimize script size and execution costs. However, if a release script fails, the error logs will be empty — making it hard to diagnose the issue. | |
| When deploying to production, you typically compile scripts with `Options.release` (which sets `removeTraces = true` and `generateErrorTraces = false`) to minimize script size and execution costs. This means both trace logs and detailed error traces (e.g., `require` messages) are omitted in release scripts, so if a release script fails, you will not see useful error information in the logs — making it hard to diagnose the issue. |
scalus-cardano-ledger/shared/src/main/scala/scalus/cardano/node/BlockchainProvider.scala
Outdated
Show resolved
Hide resolved
| case Right(map) => map | ||
| case Left(error) => Map.empty[ScriptHash, Script] |
There was a problem hiding this comment.
The error handling in the debug scripts parsing silently returns an empty map when AllResolvedScripts.allResolvedScriptsMap fails. This means if there's an issue resolving scripts (e.g., BadInputsUTxOException), the debug scripts won't be registered and diagnostic replay will silently fail. Consider logging this error or propagating it to the caller so that users are aware when debug script registration fails.
| case Right(map) => map | |
| case Left(error) => Map.empty[ScriptHash, Script] | |
| case Right(map) => map | |
| case Left(error) => | |
| // Log the error so that failures in resolving scripts are visible to users | |
| js.Dynamic.global.console.error( | |
| s"Emulator.submitTx(debugScripts): failed to resolve scripts for transaction: ${error}" | |
| ) | |
| Map.empty[ScriptHash, Script] |
| val debugScriptsMap: Map[ScriptHash, DebugScript] = debugScripts.flatMap { | ||
| case (hashHex, doubleCborHex) => | ||
| val hash = ScriptHash.fromHex(hashHex) | ||
| val doubleCbor = ByteString.fromHex(doubleCborHex) | ||
| // Determine language from the release script in the transaction | ||
| val languageOpt = resolvedScripts.get(hash).collect { case ps: PlutusScript => | ||
| ps.language | ||
| } | ||
| languageOpt.map { language => | ||
| val plutusScript: PlutusScript = language match | ||
| case Language.PlutusV1 => Script.PlutusV1(doubleCbor) | ||
| case Language.PlutusV2 => Script.PlutusV2(doubleCbor) | ||
| case Language.PlutusV3 => Script.PlutusV3(doubleCbor) | ||
| case _ => Script.PlutusV3(doubleCbor) | ||
| hash -> DebugScript(plutusScript) | ||
| } | ||
| }.toMap |
There was a problem hiding this comment.
The debug scripts dictionary parsing uses flatMap which silently drops entries where the hash is not found in resolvedScripts or the script is not a PlutusScript. This means if a user provides a debug script for a hash that doesn't appear in the transaction, it will be silently ignored. Consider adding logging or validation to inform users when their provided debug scripts are not being used, as this could indicate a mismatch between the provided hashes and the actual transaction scripts.
| val debugScriptsMap: Map[ScriptHash, DebugScript] = debugScripts.flatMap { | |
| case (hashHex, doubleCborHex) => | |
| val hash = ScriptHash.fromHex(hashHex) | |
| val doubleCbor = ByteString.fromHex(doubleCborHex) | |
| // Determine language from the release script in the transaction | |
| val languageOpt = resolvedScripts.get(hash).collect { case ps: PlutusScript => | |
| ps.language | |
| } | |
| languageOpt.map { language => | |
| val plutusScript: PlutusScript = language match | |
| case Language.PlutusV1 => Script.PlutusV1(doubleCbor) | |
| case Language.PlutusV2 => Script.PlutusV2(doubleCbor) | |
| case Language.PlutusV3 => Script.PlutusV3(doubleCbor) | |
| case _ => Script.PlutusV3(doubleCbor) | |
| hash -> DebugScript(plutusScript) | |
| } | |
| }.toMap | |
| val debugScriptsMap: Map[ScriptHash, DebugScript] = | |
| debugScripts.foldLeft(Map.empty[ScriptHash, DebugScript]) { | |
| case (acc, (hashHex, doubleCborHex)) => | |
| val hash = ScriptHash.fromHex(hashHex) | |
| val doubleCbor = ByteString.fromHex(doubleCborHex) | |
| // Determine language from the release script in the transaction | |
| val languageOpt = resolvedScripts.get(hash).collect { case ps: PlutusScript => | |
| ps.language | |
| } | |
| languageOpt match | |
| case Some(language) => | |
| val plutusScript: PlutusScript = language match | |
| case Language.PlutusV1 => Script.PlutusV1(doubleCbor) | |
| case Language.PlutusV2 => Script.PlutusV2(doubleCbor) | |
| case Language.PlutusV3 => Script.PlutusV3(doubleCbor) | |
| case _ => Script.PlutusV3(doubleCbor) | |
| acc + (hash -> DebugScript(plutusScript)) | |
| case None => | |
| // Inform users when a provided debug script is not used | |
| js.Dynamic.global.console.warn( | |
| s"Debug script for hash $hashHex was provided but no matching Plutus script was found in the transaction." | |
| ) | |
| acc |
| b.value match { | ||
| case SIR.Const(Constant.Unit, _, _) if !containsVar(restSir, b.name) => | ||
| (acc, bodyAndTail) | ||
| case _ => (b :: acc, bodyAndTail) | ||
| } | ||
| } |
There was a problem hiding this comment.
The dead binding elimination logic only removes bindings where the RHS is exactly Const(Unit) and the binding is unused. However, after trace removal, a binding could contain other trivial expressions beyond Const(Unit) (e.g., a pure expression that has no side effects). While this is a minor optimization opportunity, the current implementation is conservative and correct - it only removes the most common case from log statements. Consider documenting this limitation in a comment or revisiting in the future if larger script size reductions are needed.
| ## Diagnostic Replay for Release Scripts | ||
|
|
||
| When deploying to production, you typically compile scripts with `Options.release` (no error traces) to minimize script size and execution costs. However, if a release script fails, the error logs will be empty — making it hard to diagnose the issue. | ||
|
|
||
| **Diagnostic replay** solves this: when you use `CompiledPlutus` (e.g., `PlutusV3.compile(...)`) with `TxBuilder`, the builder automatically registers the compiled script for replay. If the release script fails with empty logs, the evaluator: | ||
|
|
||
| 1. Recompiles the script from SIR with error traces enabled | ||
| 2. Replays the failing evaluation with the same arguments | ||
| 3. Collects the diagnostic logs from the replay | ||
| 4. Includes them in the `PlutusScriptEvaluationException` | ||
|
|
There was a problem hiding this comment.
The documentation mentions that diagnostic replay "recompiles the script from SIR with error traces enabled" but doesn't explain that this requires keeping the SIR around (which CompiledPlutus does). Users coming from external tools (like meshJS or Bloxbean CCL) might not understand they need to provide a pre-compiled debug version via the DebugScript API. Consider adding a note that explains when users need to use DebugScript.apply(debugPlutusScript) vs. automatic replay with CompiledPlutus.
| else logs :+ "[diagnostic replay: debug script succeeded unexpectedly]" | ||
| catch | ||
| case NonFatal(e) => | ||
| Array(s"[diagnostic replay failed: ${e.getMessage}]") |
There was a problem hiding this comment.
The replayWithDiagnostics method catches all NonFatal exceptions during debug script evaluation and returns a generic error message. If the debug script fails to parse or has issues during evaluation, the diagnostic information may be lost. Consider logging more detailed information about the failure (e.g., the exception type and stack trace) to help users understand why diagnostic replay failed. This would make debugging the diagnostic replay itself much easier.
| Array(s"[diagnostic replay failed: ${e.getMessage}]") | |
| Logger("PlutusScriptEvaluator").error( | |
| s"Diagnostic replay failed for script $hash with args: ${args | |
| .mkString(", ")}", | |
| e | |
| ) | |
| val exceptionType = e.getClass.getName | |
| val message = Option(e.getMessage).getOrElse("") | |
| Array(s"[diagnostic replay failed: $exceptionType: $message]") |
| catch case NonFatal(_) => replayFailed = true | ||
| val logs = replayLogger.getLogs | ||
| if replayFailed then logs | ||
| else logs :+ "[diagnostic replay: debug script succeeded unexpectedly]" |
There was a problem hiding this comment.
When the debug script succeeds unexpectedly (replay doesn't fail but the original script did), the diagnostic logs still include the trace output plus a warning message. This is good for debugging, but consider whether this scenario indicates a potential issue with the diagnostic replay mechanism itself. The message "[diagnostic replay: debug script succeeded unexpectedly]" is helpful, but you may want to log this at a higher level or add a warning since it suggests the debug script and release script have different behavior.
| else logs :+ "[diagnostic replay: debug script succeeded unexpectedly]" | |
| else | |
| val warning = | |
| "[diagnostic replay: debug script succeeded unexpectedly]" | |
| Logger("PlutusScriptEvaluator").warn( | |
| s"$warning for script hash $hash" | |
| ) | |
| logs :+ warning |
| test("script hash is preserved in withErrorTraces") { | ||
| // withErrorTraces changes the script bytes (adds traces) so it produces a DIFFERENT hash | ||
| // The original hash is used for lookup - this is by design since we look up by original hash | ||
| val release = failingScriptRelease | ||
| val withTraces = release.withErrorTraces | ||
| // The SIR is the same | ||
| assert(release.sir == withTraces.sir) | ||
| // Language is the same | ||
| assert(release.language == withTraces.language) | ||
| // Options differ only in generateErrorTraces | ||
| assert(withTraces.options.generateErrorTraces) | ||
| assert(!release.options.generateErrorTraces) | ||
| } |
There was a problem hiding this comment.
The test comment at line 206 states "withErrorTraces changes the script bytes (adds traces) so it produces a DIFFERENT hash" but then the test at lines 214-216 only checks that generateErrorTraces flags differ. The test should also verify that the script hashes are actually different to validate the comment's assertion. Consider adding assert(release.script.scriptHash != withTraces.script.scriptHash) to verify this important claim about hash differences.
| /** Registers a compiled script for diagnostic replay. | ||
| * | ||
| * When a release script (compiled without error traces) fails during evaluation with empty | ||
| * logs, the evaluator will use the registered compiled script to recompile from SIR with error | ||
| * traces and replay the failing evaluation, producing diagnostic logs. | ||
| * | ||
| * This is automatically called by the `spend` and `mint` overloads that accept | ||
| * [[CompiledPlutus]]. Use this method directly for reference-script use cases where the script | ||
| * is not attached to the transaction but you still want diagnostic replay. | ||
| * | ||
| * @param compiled | ||
| * the compiled Plutus script to register for diagnostic replay | ||
| */ | ||
| def withDebugScript(compiled: CompiledPlutus[?]): TxBuilder = { | ||
| registerDebugScript(compiled) | ||
| } |
There was a problem hiding this comment.
The withDebugScript method documentation states it's "automatically called by the spend and mint overloads that accept CompiledPlutus" but doesn't mention that these new overloads require passing CompiledPlutus instead of PlutusScript. Users who are upgrading existing code that uses validator.script will need to change to just validator to enable diagnostic replay. Consider adding a migration note in the documentation or in the scaladoc explaining this change for existing users.
- Reorder ScriptFailure params to keep `logs` second for backwards compat - Add console.error/warn logging in JEmulator for failed script resolution and unmatched debug script hashes - Improve diagnostic replay error messages with exception class names - Add scribe log.warn/error for unexpected replay outcomes - Document conservative dead-binding elimination in RemoveTraces - Clarify Options.release behavior in docs (both traces and error traces omitted) - Add note about CompiledPlutus vs DebugScript API for external builders - Add migration note to withDebugScript scaladoc - Add assertion that release/debug script hashes differ in test
Summary
TxBuilder, it automatically re-lowers the SIR with error traces and log traces enabled, replays the evaluation, and provides detailed diagnostics — without bloating production script sizeDebugScriptAPI: New public API for external tx builders (e.g. Bloxbean CCL) to access diagnostic replay for failed scriptsRemoveTracesSIR transformer removes fully-appliedTracebuiltin calls and their message subtrees before lowering, reducing script size and execution cost. Enabled by default inOptions.releaseviaremoveTraces = true