Skip to content

Comments

RHCLOUD-41972: Fix quickstart content structure to store complete data#367

Open
catastrophe-brandon wants to merge 6 commits intomainfrom
btweed/fix/quickstarts
Open

RHCLOUD-41972: Fix quickstart content structure to store complete data#367
catastrophe-brandon wants to merge 6 commits intomainfrom
btweed/fix/quickstarts

Conversation

@catastrophe-brandon
Copy link
Contributor

@catastrophe-brandon catastrophe-brandon commented Sep 10, 2025

  • Update contentWrapper struct to match OpenShift QuickStart format
  • Add apiVersion, kind, and spec fields to preserve full content
  • Previously only metadata was stored, losing actual quickstart content
  • Add verification test to ensure complete content storage
  • Fixes issue where spec section (displayName, description, etc.) was lost

🤖 Generated with Claude Code

Summary by Sourcery

Refactor and enhance the database seeding process to preserve complete QuickStart data structures, standardize file and database operations with new helpers, implement a phased workflow with detailed logging and error tracking, and add verification tests to ensure content integrity.

New Features:

  • Introduce FileHelper for consistent YAML/JSON file operations with logging and error handling
  • Introduce DBHelper for standardized database operations with error wrapping and logging
  • Add SeedingResult to track counts and errors across quickstart and help topic seeding

Enhancements:

  • Refactor database seeding into phased workflow (cleanup, default tags, metadata discovery, content processing, favorites restoration) with detailed logging
  • Extend content wrapper to preserve full QuickStart structure including apiVersion, kind, spec, and content metadata when adding tags
  • Implement batch processing for cleaning and restoring favorites, tags, quickstarts, and help topics

Documentation:

  • Add CLAUDE.md guidance file for Claude Code integration

Tests:

  • Add content_verification_test to assert presence of apiVersion, kind, metadata, and spec in generated content
  • Add migration_test to verify database quickstart count matches filesystem metadata after seeding
  • Update existing tests to handle errors from findTags and path suffix trimming

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Sep 10, 2025

Reviewer's Guide

This PR revamps how quickstart content and tags are loaded and stored by introducing a comprehensive content wrapper (including apiVersion, kind, spec and metadata), refactors the database seeding into modular phases with helper classes (FileHelper and DBHelper) for consistent error handling and logging, and updates tests to verify that full quickstart data is preserved.

Sequence diagram for modular database seeding phases using helpers

sequenceDiagram
participant M as "main()"
participant DB as "database.SeedTags()"
participant FH as "FileHelper"
participant DH as "DBHelper"
M->>DB: Call SeedTags()
DB->>DB: clearOldContent()
DB->>DH: Use DBHelper for batch deletes
DB->>DB: seedDefaultTags()
DB->>DH: Use DBHelper for FindOrCreate
DB->>DB: findTags()
DB->>FH: Use FileHelper for file discovery
DB->>DB: For each MetadataTemplate
DB->>DB: processQuickstartPhase/processHelpTopicPhase
DB->>DH: Use DBHelper for DB operations
DB->>FH: Use FileHelper for content parsing
DB->>DB: SeedFavorites()
DB->>DH: Use DBHelper for batch create
DB-->>M: Return seeding result
Loading

Class diagram for SeedingResult struct

classDiagram
class SeedingResult {
  +QuickstartsProcessed : int
  +QuickstartsCreated : int
  +QuickstartsUpdated : int
  +HelpTopicsProcessed : int
  +HelpTopicsCreated : int
  +HelpTopicsUpdated : int
  +TagsCreated : int
  +FavoritesRestored : int
  +Errors : []error
  +addError(msg string) error
}
Loading

File-Level Changes

Change Details Files
Extend file utilities and wrap content to preserve full QuickStart structure
  • Add contentWrapper struct with apiVersion, kind, metadata, spec, content fields
  • Implement FileHelper methods: ReadYAMLFile, ReadJSONFromYAML, GlobFiles, AddTagsToContent, ExtractStringFromMetadata
  • Use JSON+YAML un/marshaling to read, tag, and output complete content
pkg/database/file_utils.go
Refactor database seeding to use helpers and phased workflow
  • Introduce SeedingResult to track processed/created/updated counts and errors
  • Replace direct GORM calls with DBHelper methods (Create, Update, Delete, ClearAssociation, AppendAssociation, ProcessBatch)
  • Split SeedTags into cleanup, default tags, discovery, processing, and favorites phases with detailed logging
pkg/database/db_seed.go
pkg/database/db_utils.go
Enhance metadata discovery and tag application
  • Update findTags and readMetadata to use FileHelper.GlobFiles and ReadYAMLFile with improved error handling
  • Extract quickstart names via FileHelper.ExtractStringFromMetadata
  • Log summary of discovered files and read failures
pkg/database/db_seed.go
Update migration command for robust logging and error handling
  • Log start/completion of migration and seeding phases with logrus
  • Handle AutoMigrate and SeedTags errors via Fatalf instead of panic
cmd/migrate/migrate.go
Adjust and add tests to verify error flows and content structure
  • Switch from TrimRight to TrimSuffix for paths and handle findTags error in db_test.go
  • Add content_verification_test.go to assert apiVersion, kind, metadata, spec presence
  • Add migration_test.go and update main_test.go to panic on seeding errors
pkg/database/db_test.go
pkg/database/content_verification_test.go
pkg/database/main_test.go
pkg/database/migration_test.go

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@catastrophe-brandon catastrophe-brandon marked this pull request as ready for review September 11, 2025 20:06
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Consider changing the APIVersion and Kind fields from interface{} to string for better type safety and clarity in the wrapper struct.
  • In content_verification_test, add assertions for actual spec values (e.g. displayName, description) and the expected tags rather than only checking for non-nil fields to make the test more robust.
  • Use an embedded or dedicated test fixture for the sample quickstart YAML instead of a relative file path to avoid CI path-related flakiness.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider changing the APIVersion and Kind fields from interface{} to string for better type safety and clarity in the wrapper struct.
- In content_verification_test, add assertions for actual spec values (e.g. displayName, description) and the expected tags rather than only checking for non-nil fields to make the test more robust.
- Use an embedded or dedicated test fixture for the sample quickstart YAML instead of a relative file path to avoid CI path-related flakiness.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@catastrophe-brandon catastrophe-brandon marked this pull request as draft September 11, 2025 20:18
@catastrophe-brandon catastrophe-brandon marked this pull request as ready for review September 12, 2025 15:33
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Much of the tag‐processing code still uses raw DB calls—consider switching those to your new DBHelper methods so error handling and logging stay consistent across the board.
  • The function named findTags now actually discovers metadata templates for quickstarts and topics; renaming it to something like discoverMetadata or loadMetadataTemplates will make its purpose clearer.
  • There are a lot of Infof/Debugf calls sprinkled throughout the seeding and helper logic—review your log levels to avoid excessive noise in production builds and consider demoting some logs to Debug only.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Much of the tag‐processing code still uses raw DB calls—consider switching those to your new DBHelper methods so error handling and logging stay consistent across the board.
- The function named findTags now actually discovers metadata templates for quickstarts and topics; renaming it to something like discoverMetadata or loadMetadataTemplates will make its purpose clearer.
- There are a lot of Infof/Debugf calls sprinkled throughout the seeding and helper logic—review your log levels to avoid excessive noise in production builds and consider demoting some logs to Debug only.

## Individual Comments

### Comment 1
<location> `pkg/database/file_utils.go:89` </location>
<code_context>
+
+// AddTagsToContent reads a content file, adds tags to its metadata, and returns JSON
+// Handles various file structures with fallback logic for robustness
+func (h *FileHelper) AddTagsToContent(path string, tags interface{}) ([]byte, error) {
+	h.logger.Debugf("Adding tags to content: %s", path)
+	raw, err := os.ReadFile(path)
</code_context>

<issue_to_address>
AddTagsToContent assumes the file structure matches contentWrapper.

Files with structures that differ from contentWrapper may not receive tags as intended. Please add validation or fallback logic to handle unexpected file formats.

Suggested implementation:

```golang
	var wrap contentWrapper
	if err := yaml.Unmarshal(raw, &wrap); err != nil {
		// Fallback: try to unmarshal into a generic map
		var generic map[string]interface{}
		if err2 := yaml.Unmarshal(raw, &generic); err2 != nil {
			return nil, fmt.Errorf("unmarshal %s: %w", path, err)
		}
		// Add tags to a top-level "metadata" key
		if generic["metadata"] == nil {
			generic["metadata"] = make(map[string]interface{})
		}
		metadata, ok := generic["metadata"].(map[string]interface{})
		if !ok {
			// If metadata is not a map, replace it
			metadata = make(map[string]interface{})
			generic["metadata"] = metadata
		}
		metadata["tags"] = tags
		return json.Marshal(generic)
	}
	if wrap.Metadata == nil {
		wrap.Metadata = make(map[string]interface{})

```

```golang
	wrap.Metadata["tags"] = tags
	return json.Marshal(wrap)

```
</issue_to_address>

### Comment 2
<location> `pkg/database/db_utils.go:125` </location>
<code_context>
+
+// ProcessBatch processes a batch of items with error collection and progress logging
+// Returns (successCount, errors) - callers can aggregate errors if needed
+func (h *DBHelper) ProcessBatch(items []interface{}, processor func(interface{}) error, batchName string) (successCount int, errors []error) {
+	h.logger.Infof("Starting batch %s (%d items)", batchName, len(items))
+	
</code_context>

<issue_to_address>
ProcessBatch does not support early termination on critical errors.

Allow the processor to signal early termination on critical errors, or clarify in documentation that all items are processed regardless of errors.

Suggested implementation:

```golang
import (
	"fmt"
	"strings"
	"errors"
	"gorm.io/gorm"
	"github.com/sirupsen/logrus"
)

// ErrCriticalBatchError can be returned by processor to signal early termination of batch processing.
var ErrCriticalBatchError = errors.New("critical batch error: early termination requested")

// ProcessBatch processes a batch of items with error collection and progress logging.
// If the processor returns ErrCriticalBatchError, processing stops early.
// Returns (successCount, errors) - callers can aggregate errors if needed.
func (h *DBHelper) ProcessBatch(items []interface{}, processor func(interface{}) error, batchName string) (successCount int, errors []error) {
	h.logger.Infof("Starting batch %s (%d items)", batchName, len(items))


```

```golang
	for i, item := range items {
		err := processor(item)
		if err != nil {
			errors = append(errors, fmt.Errorf("item %d: %w", i, err))
			if errors.Is(err, ErrCriticalBatchError) {
				h.logger.Errorf("Critical error encountered in batch %s at item %d: %v. Terminating batch early.", batchName, i, err)
				break
			}
		} else {
			successCount++
		}
	}
	h.logger.Infof("Finished batch %s: %d succeeded, %d errors", batchName, successCount, len(errors))
	return successCount, errors

```
</issue_to_address>

### Comment 3
<location> `pkg/database/db_seed.go:494` </location>
<code_context>
-					DB.Create(&newTag)
-					originalTag = newTag
-				}
+// processQuickstartTags handles tag processing for a quickstart
+func processQuickstartTags(template MetadataTemplate, quickstart models.Quickstart, result *SeedingResult) error {
+	for _, tag := range template.Tags {
</code_context>

<issue_to_address>
Consider refactoring the duplicated tag-processing logic into a single helper method on DBHelper to simplify both quickstart and help topic phases.

Here’s one way to collapse most of that duplicated looping and error-handling in your two “process…Tags” helpers into a single, small helper on your `DBHelper`.  Once you have this, both quickstarts and helptopics become a one-liner:

```go
// in your dbhelper.go
func (h *DBHelper) ProcessTagsFor(
    item interface{},          // &quickstart or &helpTopic
    assocName string,          // "Quickstarts" or "HelpTopics"
    tagTemplates []TagTemplate,
) (created int, errs []error) {
    for _, tt := range tagTemplates {
        tag := models.Tag{Type: models.TagType(tt.Kind), Value: tt.Value}
        // lookup or create
        if _, err := h.FindOrCreate(
            &tag,
            map[string]interface{}{"type": tag.Type, "value": tag.Value},
            "tag", tt.Value,
        ); err != nil {
            errs = append(errs, err)
            continue
        }
        // append association
        if err := h.AppendAssociation(&tag, assocName, item, "tag-"+assocName, tt.Value); err != nil {
            errs = append(errs, err)
            continue
        }
        created++
    }
    return
}
```

Then your phases shrink to:

```go
func processQuickstartPhase(t MetadataTemplate, defs map[string]models.Tag, res *SeedingResult) {
    qs, isNew, err := seedQuickstart(t, defs["quickstart"])
    if err != nil { /**/ }
    res.QuickstartsProcessed++
    if isNew { res.QuickstartsCreated++ } else { res.QuickstartsUpdated++ }

    // clear & re-tag in one call
    DB.Model(&qs).Association("Tags").Clear()
    if _, errs := dbHelper.ProcessTagsFor(&qs, "Quickstarts", t.Tags); len(errs) > 0 {
        res.Errors = append(res.Errors, errs...)
    }
}
```

```go
func processHelpTopicPhase(t MetadataTemplate, defs map[string]models.Tag, res *SeedingResult) {
    hts, created, err := seedHelpTopic(t, defs["helptopic"])
    if err != nil { /**/ }
    res.HelpTopicsProcessed += len(hts)
    res.HelpTopicsCreated += created
    res.HelpTopicsUpdated += len(hts) - created

    // for each topic just reuse the same helper
    for i := range hts {
        DB.Model(&hts[i]).Association("Tags").Clear()
        if _, errs := dbHelper.ProcessTagsFor(&hts[i], "HelpTopics", t.Tags); len(errs) > 0 {
            res.Errors = append(res.Errors, errs...)
        }
    }
}
```

Benefits:

- All of your lookup/create/associate/save boilerplate moves into one small function.
- Error-collection and counters happen in a single place.
- You’ve eliminated two large, near-identical loops (`processQuickstartTags` vs `processHelpTopicTags`) plus half a dozen logging lines.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@catastrophe-brandon
Copy link
Contributor Author

@sourcery-ai resolve

@catastrophe-brandon catastrophe-brandon changed the title Fix quickstart content structure to store complete data RHCLOUD-41972: Fix quickstart content structure to store complete data Sep 13, 2025
catastrophe-brandon and others added 2 commits September 29, 2025 10:03
…/logging-and-errors"

This reverts commit f471e07, reversing
changes made to 14f54dd.
- Update contentWrapper struct to match OpenShift QuickStart format
- Add apiVersion, kind, and spec fields to preserve full content
- Previously only metadata was stored, losing actual quickstart content
- Add verification test to ensure complete content storage
- Fixes issue where spec section (displayName, description, etc.) was lost

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

@justinorringer justinorringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny change to the log, otherwise I think it's worth a try!

panic(err)
logrus.Fatalf("Database migration failed: %v", err)
}
logrus.Info("Database migration completed successfully")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be an else here? It'll print both failed and success haha.

if err := database.SeedTags(); err != nil {
logrus.Fatalf("Database seeding failed: %v", err)
}
logrus.Info("Database seeding completed successfully")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also add we need to still panic/exit the process with non-zero if any errors

if err := DB.Save(&defaultTag).Error; err != nil {
return returnValue, createdCount, fmt.Errorf("failed to save default tag for updated help topic %s: %w", nameStr, err)
}
DB.Save(&defaultTag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capturing the errors for these makes sense :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinorringer can you elaborate? Do you mean we should defer the return until we try saving all the entities?

Copy link
Contributor

@florkbr florkbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there! 👍


### Build and Deployment
- Generate OpenAPI spec: `make generate-spec`
- Build Docker image: `docker build . -t quickstarts:latest`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably update this later to try with podman (not blocking)

if err := database.SeedTags(); err != nil {
logrus.Fatalf("Database seeding failed: %v", err)
}
logrus.Info("Database seeding completed successfully")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also add we need to still panic/exit the process with non-zero if any errors

fileHelper := NewFileHelper("test")

// Test with the hcs-getting-started quickstart
contentPath := "../docs/quickstarts/hcs-getting-started/hcs-getting-started.yml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want a dedicated test yaml in case someone removes this in the future, but for now 👍

assert.NotNil(t, parsed["spec"], "Should have spec")
}

func getKeys(m map[string]interface{}) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably reuse logic from another package for this, but for now 👍


err = yaml.Unmarshal(yamlfile, &template)
if err != nil {
t.Logf("Warning: Failed to unmarshal metadata file %s: %v", file, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to run this as part of the Konflux pipeline to validate PRs. I would also expect us to throw an error if one of the yamls is malformed preventing the PR from being merged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be awesome to add strict enforcement to the structure of the docs directory. So if I try to add a quickstart in the wrong place it blows up. Future JIRA link needed.

logrus.Debugf("Processing tag: %s=%s for quickstart %s", tag.Kind, tag.Value, quickstart.Name)

quickstart.Tags = append(quickstart.Tags, originalTag)
r := DB.Preload("Quickstarts").Where("type = ? AND value = ?", models.TagType(newTag.Type), newTag.Value).Find(&originalTag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use the dbHelper?


// Create tag-quickstart association
if err := DB.Model(&originalTag).Association("Quickstarts").Append(&quickstart); err != nil {
errMsg := fmt.Sprintf("failed to create tag association %s=%s for quickstart %s: %v", tag.Kind, tag.Value, quickstart.Name, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we didn't have error handling prior it's hard to tell what the flow prior to these changes were (returning early or not). As long things are working by the end this should be fine, but something to think about.

Comment on lines +536 to +538
errMsg := fmt.Sprintf("failed to save quickstart %s after adding tag: %v", quickstart.Name, err)
logrus.Error(errMsg)
result.addError(errMsg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern of:
errMsg :=
logrus.Error
result.addError

Is duplicated a few times. I wonder if we can have a helper func that does it to reduce dupe?

newTag.Type = models.TagType(tag.Kind)
newTag.Value = tag.Value

r := DB.Preload("HelpTopics").Where("type = ? AND value = ?", models.TagType(newTag.Type), newTag.Value).Find(&originalTag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🫨

}

SeedFavorites(favorites)
func SeedTags() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is SeedTags appropriate for this func? It seems to be doing more than just tags

@florkbr
Copy link
Contributor

florkbr commented Oct 1, 2025

It doesn't seem changing LOG_LEVEL actually changes what is printed by the scripts FWIW. I also noticed we have finer grain logging for help topics (from specific groups) but not for quickstarts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants