Skip to content

Feature Request: frigg repair --remove-missing to handle phantom resources #505

@seanspeaks

Description

@seanspeaks

Problem Statement

When AWS resources are manually deleted outside of CloudFormation, the stack becomes inconsistent - CloudFormation believes the resources still exist, but they don't. This creates a "phantom resource" problem where:

  1. frigg doctor correctly identifies these as MISSING_RESOURCE issues
  2. Attempting to deploy fails because CloudFormation tries to UPDATE the phantom resources instead of CREATE them
  3. The stack can enter UPDATE_ROLLBACK_FAILED state, requiring multiple continue-update-rollback commands to skip each phantom resource
  4. Even after rollback succeeds, deployment still fails on the same phantom resources

Current Workaround Gap

  • frigg repair --import: Handles orphaned resources (exist in AWS, not in CloudFormation) ✅
  • frigg repair --reconcile: Handles property drift (resources exist but have wrong values) ✅
  • Missing: No automated way to remove phantom resources (exist in CloudFormation, not in AWS) ❌

Users must either:

  • Delete and recreate the entire stack (disruptive)
  • Manually apply the AWS two-step template modification process (error-prone)

Proposed Solution

Add a new --remove-missing flag to frigg repair that automatically removes phantom resources from CloudFormation tracking using the AWS-recommended two-step approach.

Usage

```bash

Detect and remove phantom resources

frigg repair --remove-missing

With auto-confirmation

frigg repair --remove-missing --yes

Remove specific resources only

frigg repair --remove-missing --resources ResourceId1,ResourceId2
```

Implementation Design

High-Level Flow

  1. Run health check to detect missing resources (already done by frigg doctor)
  2. Generate CloudFormation template from existing .serverless build artifacts
  3. Step 1: Modify template to add DeletionPolicy: Retain to phantom resources
  4. Deploy modified template (CloudFormation update succeeds because no actual AWS operations needed)
  5. Step 2: Remove phantom resources entirely from template
  6. Deploy cleaned template (removes resources from CloudFormation tracking)
  7. Verify with post-repair health check

Why This Works

According to AWS CloudFormation documentation, the DeletionPolicy: Retain approach allows CloudFormation to cleanly remove resources from stack tracking without attempting to delete them from AWS.

Leveraging Existing Build Artifacts

The implementation should leverage templates already generated by:

  • frigg build.serverless/<stack-name>.json or similar
  • frigg doctor → May already have template access via stackRepository

This avoids re-running expensive infrastructure generation and ensures consistency with the deployed stack.

Example Scenario

Before:
```
$ frigg doctor my-stack

Issues:

  • [MISSING_RESOURCE] MyAuroraCluster (AWS::RDS::DBCluster)
  • [MISSING_RESOURCE] MyAuroraInstance (AWS::RDS::DBInstance)
  • [MISSING_RESOURCE] MyDBSubnetGroup (AWS::RDS::DBSubnetGroup)

$ frigg deploy --stage prod
❌ Error: DBSubnetGroup my-db-subnet-group does not exist
```

After (with new feature):
```
$ frigg repair my-stack --remove-missing

🔧 Found 3 missing resource(s) to remove from stack tracking:
• MyAuroraCluster (AWS::RDS::DBCluster)
• MyAuroraInstance (AWS::RDS::DBInstance)
• MyDBSubnetGroup (AWS::RDS::DBSubnetGroup)

Proceed with removal? (y/N): y

📋 Step 1/2: Adding DeletionPolicy: Retain...
✓ Stack updated successfully

📋 Step 2/2: Removing resources from template...
✓ Stack updated successfully

✅ Removed 3 phantom resource(s) from CloudFormation tracking!

$ frigg deploy --stage prod
✓ Deployment successful (created 3 new resources)
```

Technical Implementation Notes

File Structure

```
frigg-cli/repair-command/
├── index.js # Add handleRemoveMissingRepair()
├── strategies/
│ ├── import-orphaned.js # Existing
│ ├── reconcile-properties.js # Existing
│ └── remove-missing.js # NEW - phantom resource removal
```

Key Functions Needed

  1. getMissingResources(report): Extract missing resources from health report
  2. loadBuildTemplate(stackIdentifier): Load from .serverless or regenerate
  3. addRetentionPolicy(template, resourceIds): Inject DeletionPolicy: Retain
  4. removeResources(template, resourceIds): Remove resource definitions
  5. deployTemplateUpdate(stackIdentifier, template): Apply template changes via CloudFormation

Integration with Existing Code

The implementation follows the same pattern as existing repair strategies:

  • Reuse AWSStackRepository for CloudFormation operations
  • Reuse HealthCheckReport for resource identification
  • Follow confirmation/verbose/yes flag patterns
  • Report success/failure counts in final summary

Benefits

  1. Automated Recovery: No manual template manipulation required
  2. Safe Operation: Uses AWS-recommended approach with DeletionPolicy: Retain
  3. Consistent UX: Follows existing frigg repair patterns
  4. Efficient: Leverages pre-built templates from .serverless artifacts
  5. Completes the Repair Trilogy: Import (orphaned), Reconcile (drift), Remove (phantom)

Testing Strategy

Test Scenarios

  1. Single missing resource removal
  2. Multiple missing resources (different types)
  3. Missing resources with dependencies
  4. Nested stack resources
  5. Error handling when template generation fails
  6. Dry-run mode (preview without applying)

Expected Outcomes

  • Stack transitions: UPDATE_ROLLBACK_COMPLETEUPDATE_COMPLETEUPDATE_COMPLETE
  • Post-repair health check shows 0 missing resources
  • Subsequent frigg deploy successfully creates new resources

Related AWS Documentation

Priority

High - This is a common operational scenario (manual deletion, failed rollbacks) that currently has no automated recovery path in Frigg.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions