-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Problem Statement
When AWS resources are manually deleted outside of CloudFormation, the stack becomes inconsistent - CloudFormation believes the resources still exist, but they don't. This creates a "phantom resource" problem where:
frigg doctorcorrectly identifies these asMISSING_RESOURCEissues- Attempting to deploy fails because CloudFormation tries to UPDATE the phantom resources instead of CREATE them
- The stack can enter
UPDATE_ROLLBACK_FAILEDstate, requiring multiplecontinue-update-rollbackcommands to skip each phantom resource - Even after rollback succeeds, deployment still fails on the same phantom resources
Current Workaround Gap
frigg repair --import: Handles orphaned resources (exist in AWS, not in CloudFormation) ✅frigg repair --reconcile: Handles property drift (resources exist but have wrong values) ✅- Missing: No automated way to remove phantom resources (exist in CloudFormation, not in AWS) ❌
Users must either:
- Delete and recreate the entire stack (disruptive)
- Manually apply the AWS two-step template modification process (error-prone)
Proposed Solution
Add a new --remove-missing flag to frigg repair that automatically removes phantom resources from CloudFormation tracking using the AWS-recommended two-step approach.
Usage
```bash
Detect and remove phantom resources
frigg repair --remove-missing
With auto-confirmation
frigg repair --remove-missing --yes
Remove specific resources only
frigg repair --remove-missing --resources ResourceId1,ResourceId2
```
Implementation Design
High-Level Flow
- Run health check to detect missing resources (already done by
frigg doctor) - Generate CloudFormation template from existing
.serverlessbuild artifacts - Step 1: Modify template to add
DeletionPolicy: Retainto phantom resources - Deploy modified template (CloudFormation update succeeds because no actual AWS operations needed)
- Step 2: Remove phantom resources entirely from template
- Deploy cleaned template (removes resources from CloudFormation tracking)
- Verify with post-repair health check
Why This Works
According to AWS CloudFormation documentation, the DeletionPolicy: Retain approach allows CloudFormation to cleanly remove resources from stack tracking without attempting to delete them from AWS.
Leveraging Existing Build Artifacts
The implementation should leverage templates already generated by:
frigg build→.serverless/<stack-name>.jsonor similarfrigg doctor→ May already have template access viastackRepository
This avoids re-running expensive infrastructure generation and ensures consistency with the deployed stack.
Example Scenario
Before:
```
$ frigg doctor my-stack
Issues:
- [MISSING_RESOURCE] MyAuroraCluster (AWS::RDS::DBCluster)
- [MISSING_RESOURCE] MyAuroraInstance (AWS::RDS::DBInstance)
- [MISSING_RESOURCE] MyDBSubnetGroup (AWS::RDS::DBSubnetGroup)
$ frigg deploy --stage prod
❌ Error: DBSubnetGroup my-db-subnet-group does not exist
```
After (with new feature):
```
$ frigg repair my-stack --remove-missing
🔧 Found 3 missing resource(s) to remove from stack tracking:
• MyAuroraCluster (AWS::RDS::DBCluster)
• MyAuroraInstance (AWS::RDS::DBInstance)
• MyDBSubnetGroup (AWS::RDS::DBSubnetGroup)
Proceed with removal? (y/N): y
📋 Step 1/2: Adding DeletionPolicy: Retain...
✓ Stack updated successfully
📋 Step 2/2: Removing resources from template...
✓ Stack updated successfully
✅ Removed 3 phantom resource(s) from CloudFormation tracking!
$ frigg deploy --stage prod
✓ Deployment successful (created 3 new resources)
```
Technical Implementation Notes
File Structure
```
frigg-cli/repair-command/
├── index.js # Add handleRemoveMissingRepair()
├── strategies/
│ ├── import-orphaned.js # Existing
│ ├── reconcile-properties.js # Existing
│ └── remove-missing.js # NEW - phantom resource removal
```
Key Functions Needed
- getMissingResources(report): Extract missing resources from health report
- loadBuildTemplate(stackIdentifier): Load from
.serverlessor regenerate - addRetentionPolicy(template, resourceIds): Inject
DeletionPolicy: Retain - removeResources(template, resourceIds): Remove resource definitions
- deployTemplateUpdate(stackIdentifier, template): Apply template changes via CloudFormation
Integration with Existing Code
The implementation follows the same pattern as existing repair strategies:
- Reuse
AWSStackRepositoryfor CloudFormation operations - Reuse
HealthCheckReportfor resource identification - Follow confirmation/verbose/yes flag patterns
- Report success/failure counts in final summary
Benefits
- Automated Recovery: No manual template manipulation required
- Safe Operation: Uses AWS-recommended approach with
DeletionPolicy: Retain - Consistent UX: Follows existing
frigg repairpatterns - Efficient: Leverages pre-built templates from
.serverlessartifacts - Completes the Repair Trilogy: Import (orphaned), Reconcile (drift), Remove (phantom)
Testing Strategy
Test Scenarios
- Single missing resource removal
- Multiple missing resources (different types)
- Missing resources with dependencies
- Nested stack resources
- Error handling when template generation fails
- Dry-run mode (preview without applying)
Expected Outcomes
- Stack transitions:
UPDATE_ROLLBACK_COMPLETE→UPDATE_COMPLETE→UPDATE_COMPLETE - Post-repair health check shows 0 missing resources
- Subsequent
frigg deploysuccessfully creates new resources
Related AWS Documentation
Priority
High - This is a common operational scenario (manual deletion, failed rollbacks) that currently has no automated recovery path in Frigg.