feat: add `--action-retries` option to ci by jake-kramer · Pull Request #461 · grafana/wait-for-github

jake-kramer · 2026-02-04T18:33:53Z

Follow up to #450

guicaulada

Nice work! Thank you for putting this together! I thought we would eventually need this for the ci command as well. The implementation looks good and follows the same approach.

I left a couple minor suggestions to make the code a little more robust and easier to maintain... these are not blocking, just improvements to consider.

Let me know if you have any questions!

cmd/wait-for-github/ci.go

guicaulada · 2026-02-05T20:41:59Z

cmd/wait-for-github/ci.go

-func (ci checkAllCI) Check(ctx context.Context) error {
+// tryRerunFailedWorkflows attempts to rerun failed workflows if retries are available.
+// Returns true if we should continue waiting (workflows were rerun or rerun failed temporarily).
+func (ci *checkAllCI) tryRerunFailedWorkflows(ctx context.Context) bool {


This method is nearly identical to the one in pr.go, I think it would be better to extract the shared logic so it's easier to maintain and there are no divergences in the future.

2c278e7 is what I came up with

cmd/wait-for-github/ci_test.go

guicaulada

Thank you for working on my previous suggestions!

I noticed a minor type mismatch due to the cmd.Int function returning int64 and left a suggestion regarding the tests structure to remove duplication.

Let me know what you think!

guicaulada · 2026-02-08T18:25:40Z

cmd/wait-for-github/ci.go

+		ref:           ref,
+		checks:        cmd.StringSlice("check"),
+		excludes:      cmd.StringSlice("exclude"),
+		actionRetries: cmd.Int("action-retries"),


There's a type mismatch here, cmd.Int returns int64, both on prConfig and ciConfig we have actionRetries int and on pr.go we use:

Suggested change

actionRetries: cmd.Int("action-retries"),

actionRetries: int(cmd.Int("action-retries")),

guicaulada · 2026-02-08T18:42:59Z

cmd/wait-for-github/ci_test.go

+func TestCheckAllCIActionRetries(t *testing.T) {
+	t.Parallel()
+
+	tests := []struct {
+		name             string
+		actionRetries    int
+		rerunCount       int
+		rerunError       error
+		expectedExitCode *int
+	}{
+		{
+			name:             "CI failed with action-retries, workflows rerun",
+			actionRetries:    2,
+			rerunCount:       1,
+			rerunError:       nil,
+			expectedExitCode: nil, // Should continue waiting after rerun
+		},
+		{
+			name:             "CI failed with action-retries, no workflows to rerun",
+			actionRetries:    2,
+			rerunCount:       0,
+			rerunError:       nil,
+			expectedExitCode: &one, // Should fail immediately
+		},
+		{
+			name:             "CI failed with action-retries, rerun error continues waiting",
+			actionRetries:    2,
+			rerunCount:       0,
+			rerunError:       cli.Exit("rerun failed", 1),
+			expectedExitCode: nil, // Should continue waiting and retry later
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			t.Parallel()
+
+			fakeCIStatusChecker := &FakeCIStatusChecker{
+				status:     github.CIStatusFailed,
+				RerunCount: tt.rerunCount,
+				RerunError: tt.rerunError,
+			}
+			cfg := &config{
+				recheckInterval: 1,
+				logger:          testLogger,
+			}
+			ciConf := &ciConfig{
+				owner:         "owner",
+				repo:          "repo",
+				ref:           "ref",
+				actionRetries: tt.actionRetries,
+			}
+
+			ctx, cancel := context.WithTimeout(context.Background(), 1)
+			cancel()
+
+			err := checkCIStatus(ctx, fakeCIStatusChecker, cfg, ciConf)
+
+			if tt.expectedExitCode != nil {
+				var exitErr cli.ExitCoder
+				require.ErrorAs(t, err, &exitErr)
+				require.Equal(t, *tt.expectedExitCode, exitErr.ExitCode())
+			} else {
+				// Context expired before CI could complete, which is expected
+				// since we cancelled the context immediately
+				require.Error(t, err)
+			}
+		})
+	}
+}
+
+func TestCheckSpecificCIActionRetries(t *testing.T) {
+	t.Parallel()
+
+	tests := []struct {
+		name             string
+		actionRetries    int
+		rerunCount       int
+		rerunError       error
+		expectedExitCode *int
+	}{
+		{
+			name:             "CI failed with action-retries, workflows rerun",
+			actionRetries:    2,
+			rerunCount:       1,
+			rerunError:       nil,
+			expectedExitCode: nil, // Should continue waiting after rerun
+		},
+		{
+			name:             "CI failed with action-retries, no workflows to rerun",
+			actionRetries:    2,
+			rerunCount:       0,
+			rerunError:       nil,
+			expectedExitCode: &one, // Should fail immediately
+		},
+		{
+			name:             "CI failed with action-retries, rerun error continues waiting",
+			actionRetries:    2,
+			rerunCount:       0,
+			rerunError:       cli.Exit("rerun failed", 1),
+			expectedExitCode: nil, // Should continue waiting and retry later
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			t.Parallel()
+
+			fakeCIStatusChecker := &FakeCIStatusChecker{
+				status:     github.CIStatusFailed,
+				RerunCount: tt.rerunCount,
+				RerunError: tt.rerunError,
+			}
+			cfg := &config{
+				recheckInterval: 1,
+				logger:          testLogger,
+			}
+			ciConf := &ciConfig{
+				owner:         "owner",
+				repo:          "repo",
+				ref:           "ref",
+				checks:        []string{"check1", "check2"},
+				actionRetries: tt.actionRetries,
+			}
+
+			ctx, cancel := context.WithTimeout(context.Background(), 1)
+			cancel()
+
+			err := checkCIStatus(ctx, fakeCIStatusChecker, cfg, ciConf)
+
+			if tt.expectedExitCode != nil {
+				var exitErr cli.ExitCoder
+				require.ErrorAs(t, err, &exitErr)
+				require.Equal(t, *tt.expectedExitCode, exitErr.ExitCode())
+			} else {
+				// Context expired before CI could complete, which is expected
+				// since we cancelled the context immediately
+				require.Error(t, err)
+			}
+		})
+	}
+}


I noticed TestCheckAllCIActionRetries and TestCheckSpecificCIActionRetries look very similar, and the only difference between them is whether checks was populated on ciConfig. We could have a single function if we add it as a field in the test table to remove duplication.

For example:

Suggested change

func TestCheckAllCIActionRetries(t *testing.T) {

t.Parallel()

tests := []struct {

name string

actionRetries int

rerunCount int

rerunError error

expectedExitCode *int

}{

{

name: "CI failed with action-retries, workflows rerun",

actionRetries: 2,

rerunCount: 1,

rerunError: nil,

expectedExitCode: nil, // Should continue waiting after rerun

},

{

name: "CI failed with action-retries, no workflows to rerun",

actionRetries: 2,

rerunCount: 0,

rerunError: nil,

expectedExitCode: &one, // Should fail immediately

},

{

name: "CI failed with action-retries, rerun error continues waiting",

actionRetries: 2,

rerunCount: 0,

rerunError: cli.Exit("rerun failed", 1),

expectedExitCode: nil, // Should continue waiting and retry later

},

}

for _, tt := range tests {

t.Run(tt.name, func(t *testing.T) {

t.Parallel()

fakeCIStatusChecker := &FakeCIStatusChecker{

status: github.CIStatusFailed,

RerunCount: tt.rerunCount,

RerunError: tt.rerunError,

}

cfg := &config{

recheckInterval: 1,

logger: testLogger,

}

ciConf := &ciConfig{

owner: "owner",

repo: "repo",

ref: "ref",

actionRetries: tt.actionRetries,

}

ctx, cancel := context.WithTimeout(context.Background(), 1)

cancel()

err := checkCIStatus(ctx, fakeCIStatusChecker, cfg, ciConf)

if tt.expectedExitCode != nil {

var exitErr cli.ExitCoder

require.ErrorAs(t, err, &exitErr)

require.Equal(t, *tt.expectedExitCode, exitErr.ExitCode())

} else {

// Context expired before CI could complete, which is expected

// since we cancelled the context immediately

require.Error(t, err)

}

})

}

}

func TestCheckSpecificCIActionRetries(t *testing.T) {

t.Parallel()

tests := []struct {

name string

actionRetries int

rerunCount int

rerunError error

expectedExitCode *int

}{

{

name: "CI failed with action-retries, workflows rerun",

actionRetries: 2,

rerunCount: 1,

rerunError: nil,

expectedExitCode: nil, // Should continue waiting after rerun

},

{

name: "CI failed with action-retries, no workflows to rerun",

actionRetries: 2,

rerunCount: 0,

rerunError: nil,

expectedExitCode: &one, // Should fail immediately

},

{

name: "CI failed with action-retries, rerun error continues waiting",

actionRetries: 2,

rerunCount: 0,

rerunError: cli.Exit("rerun failed", 1),

expectedExitCode: nil, // Should continue waiting and retry later

},

}

for _, tt := range tests {

t.Run(tt.name, func(t *testing.T) {

t.Parallel()

fakeCIStatusChecker := &FakeCIStatusChecker{

status: github.CIStatusFailed,

RerunCount: tt.rerunCount,

RerunError: tt.rerunError,

}

cfg := &config{

recheckInterval: 1,

logger: testLogger,

}

ciConf := &ciConfig{

owner: "owner",

repo: "repo",

ref: "ref",

checks: []string{"check1", "check2"},

actionRetries: tt.actionRetries,

}

ctx, cancel := context.WithTimeout(context.Background(), 1)

cancel()

err := checkCIStatus(ctx, fakeCIStatusChecker, cfg, ciConf)

if tt.expectedExitCode != nil {

var exitErr cli.ExitCoder

require.ErrorAs(t, err, &exitErr)

require.Equal(t, *tt.expectedExitCode, exitErr.ExitCode())

} else {

// Context expired before CI could complete, which is expected

// since we cancelled the context immediately

require.Error(t, err)

}

})

}

}

func TestCheckCIActionRetries(t *testing.T) {

t.Parallel()

tests := []struct {

name string

checks []string

actionRetries int

rerunCount int

rerunError error

expectedExitCode *int

}{

{

name: "All CI failed with action-retries, workflows rerun",

actionRetries: 2,

rerunCount: 1,

expectedExitCode: nil,

},

{

name: "All CI failed with action-retries, no workflows to rerun",

actionRetries: 2,

rerunCount: 0,

expectedExitCode: &one,

},

{

name: "All CI failed with action-retries, rerun error continues waiting",

actionRetries: 2,

rerunError: cli.Exit("rerun failed", 1),

expectedExitCode: nil,

},

{

name: "Specific CI failed with action-retries, workflows rerun",

checks: []string{"check1", "check2"},

actionRetries: 2,

rerunCount: 1,

expectedExitCode: nil,

},

{

name: "Specific CI failed with action-retries, no workflows to rerun",

checks: []string{"check1", "check2"},

actionRetries: 2,

rerunCount: 0,

expectedExitCode: &one,

},

{

name: "Specific CI failed with action-retries, rerun error continues waiting",

checks: []string{"check1", "check2"},

actionRetries: 2,

rerunError: cli.Exit("rerun failed", 1),

expectedExitCode: nil,

},

}

for _, tt := range tests {

t.Run(tt.name, func(t *testing.T) {

t.Parallel()

fakeCIStatusChecker := &FakeCIStatusChecker{

status: github.CIStatusFailed,

RerunCount: tt.rerunCount,

RerunError: tt.rerunError,

}

cfg := &config{

recheckInterval: 1,

logger: testLogger,

}

ciConf := &ciConfig{

owner: "owner",

repo: "repo",

ref: "ref",

checks: tt.checks,

actionRetries: tt.actionRetries,

}

ctx, cancel := context.WithTimeout(context.Background(), 1)

cancel()

err := checkCIStatus(ctx, fakeCIStatusChecker, cfg, ciConf)

if tt.expectedExitCode != nil {

var exitErr cli.ExitCoder

require.ErrorAs(t, err, &exitErr)

require.Equal(t, *tt.expectedExitCode, exitErr.ExitCode())

} else {

// Context expired before CI could complete, which is expected

// since we cancelled the context immediately

require.Error(t, err)

}

})

}

}

The checks field defaults to nil for the "all CI" cases, which exercises the checkAllCI path, while the cases with checks populated exercise the checkSpecificCI path.

jake-kramer requested a review from a team as a code owner February 4, 2026 18:33

jake-kramer force-pushed the action-retries-ci branch 2 times, most recently from c7fd2fb to 2d3d453 Compare February 4, 2026 18:43

jake-kramer changed the title ~~feat: Add --action-retries option to cio~~ feat: add --action-retries option to cio Feb 4, 2026

feat: add --action-retries option to ci

ed87209

follow up to grafana#450

jake-kramer force-pushed the action-retries-ci branch from 2d3d453 to ed87209 Compare February 5, 2026 20:14

guicaulada changed the title ~~feat: add --action-retries option to cio~~ feat: add --action-retries option to ci Feb 5, 2026

guicaulada reviewed Feb 5, 2026

View reviewed changes

jake-kramer added 3 commits February 5, 2026 16:17

Store pointer to checkAllCI

acd52e9

Better error assertion in tests

03c92c2

Extract shared TryRerunFailedWorkflows function

2c278e7

julienduchesne mentioned this pull request Feb 6, 2026

fix: don't exit immediately when no concluded workflow runs to retry #465

Draft

3 tasks

guicaulada reviewed Feb 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `--action-retries` option to ci#461

feat: add `--action-retries` option to ci#461
jake-kramer wants to merge 4 commits intografana:mainfrom
jake-kramer:action-retries-ci

jake-kramer commented Feb 4, 2026

Uh oh!

guicaulada left a comment

Uh oh!

Uh oh!

guicaulada Feb 5, 2026

Uh oh!

jake-kramer Feb 5, 2026

Uh oh!

Uh oh!

Uh oh!

guicaulada left a comment

Uh oh!

guicaulada Feb 8, 2026

Uh oh!

guicaulada Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	actionRetries: cmd.Int("action-retries"),
	actionRetries: int(cmd.Int("action-retries")),

Conversation

jake-kramer commented Feb 4, 2026

Uh oh!

guicaulada left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

guicaulada Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

jake-kramer Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

guicaulada left a comment

Choose a reason for hiding this comment

Uh oh!

guicaulada Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

guicaulada Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants