Skip to content

Add harm bound to AHR designs#640

Open
LittleBeannie wants to merge 19 commits into
mainfrom
618-add-harm-bound
Open

Add harm bound to AHR designs#640
LittleBeannie wants to merge 19 commits into
mainfrom
618-add-harm-bound

Conversation

@LittleBeannie

@LittleBeannie LittleBeannie commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

To solve issue #618.

@jdblischak: In addition to "Efficacy" and "Futility" bound, I added "Harm" bound, which sequentially leads to some changes in gs_bound_summary(). The changes in gs_bound_summary() is suggested by GPT5.5. Could you please review if these AI-suggested changes looks good to you?

@LittleBeannie LittleBeannie self-assigned this Jun 10, 2026
@LittleBeannie LittleBeannie linked an issue Jun 10, 2026 that may be closed by this pull request
LittleBeannie and others added 9 commits June 11, 2026 10:31
The helper-support-as_rtf.R is auto-sourced by testit, so the explicit
source() call failed during R CMD check. Also update as_gt and as_rtf
snapshots to reflect the new bound ordering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Run roxygenize to sync Rd files with code (fixes WARNING)
- Remove test-independent-as_rtf.R (the .md snapshot is sufficient)
- Replace all() with bare logical vector in test assertions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…alone

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yihui

yihui commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Note on snapshot test changes

The .md snapshot diffs look large but are caused by a single behavioral change: as_gt() now sorts bounds by factor level order (Efficacy, Futility, Harm) via arrange(x2, Analysis, Bound), whereas previously it used arrange(x2, Analysis) which preserved the summary() order (Futility before Efficacy, from desc(bound) alphabetical sorting).

This means every analysis section in both test-independent-as_gt.md and test-independent-as_rtf.md has its Efficacy and Futility rows swapped. The actual numeric values are unchanged.

The other visible change in as_gt.md is lrrrrrcrrrrr (first column alignment changed from left to center).

@yihui

yihui commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Correction on the alignment change: the lrrrrrcrrrrr (first column left → center) is not from this PR's code. It's from gt 1.3.0 changing its default LaTeX alignment for the first data column in row-grouped tables. The old snapshot was generated with an older gt version. Since CI also installs gt 1.3.0, the updated snapshot is correct.

@LittleBeannie

Copy link
Copy Markdown
Collaborator Author

Thank you @yihui for helping with the testit snapshot issues!

Comment thread R/gs_design_ahr.R
Comment thread tests/testit/test-independent-as_gt.md
Comment thread R/gs_bound_summary.R
harm_bound <- x |> dplyr::filter(bound == "harm") |> dplyr::pull(z)
futility_bound <- x |> dplyr::filter(bound == "lower") |> dplyr::pull(z)
(harm_bound <= futility_bound)
})

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please instruct the AI to write unit tests for every function that it edits

Comment thread R/gs_bound_summary.R

# One-sided design should not include Futility column
if (all(is.na(out[["Futility"]]))) out[["Futility"]] <- NULL
if (all(is.na(out[["Harm"]]))) out[["Harm"]] <- NULL

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good when I tested locally, but this needs some unit tests for long-term maintenance. For example, confirm that a one-sided design does not return the column Harm

Comment thread R/gs_power_npe.R
Comment thread man/gs_design_ahr.Rd Outdated
upper = gs_b, upar = -qnorm(0.025), test_upper = TRUE,
lower = gs_b, lpar = -1, test_lower = TRUE,
harm = gs_b, hpar = -2, test_harm = TRUE
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to compute and return a futility or harm bound when there is only a single analysis?

If I run this with test_harm = FALSE, it only returns an efficacy bound for the single analysis. But with test_harm = TRUE, it returns all 3 bounds (efficacy, futility, harm).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please provide your code example?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using the example immediately above. Here is more minimal code to reproduce what I observe:

library("gsDesign2")

# Returns upper bound
gs_design_ahr(analysis_time = 40)$bound

# Returns upper and lower bounds 
gs_design_ahr(analysis_time = 40, test_harm = TRUE)$bound

# Returns upper, lower, and harm bounds 
gs_design_ahr(analysis_time = 40, hpar = -2, test_harm = TRUE)$bound

This matters because the bounds return by gs_design_ahr() are directly used and reported by gs_bound_summary(). Does it make sense to include columns for Futility and Harm for a trial with a single analysis time?

x <- gs_design_ahr(analysis_time = 40, hpar = -2, test_harm = TRUE)
gs_bound_summary(x)
##      Analysis                Value Efficacy Futility    Harm
## 1       Final                    Z   1.9600   1.9444 -2.0000
## 2      N: 428          p (1-sided)   0.0250   0.0259  0.9772
## 3 Events: 280         ~HR at bound   0.7911   0.7925  1.2702
## 4   Month: 40     P(Cross) if HR=1   0.0250   0.9741  0.0228
## 5             P(Cross) if AHR=0.68   0.9000   0.0973  0.0000

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are really good questions. I confirmed with Keaven:

  • When it is a fixed design, there is no harm bound.
  • When it is a group sequential design, the harm bound and lower bound are relatively independent. However, when the lower bound exists, the harm bound is always below the harm bound.

I will test the these 2 bullets in a coming commit!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit 9117f53 gives error messages when users test harm bound in a fixed design with one single analysis.

Besides, I added your 3 examples as developer tests in the commit 55bbe26.

@yihui yihui left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked at the test part as I don't have expertise on the design part.


```{r}
candidate_harm_bounds <- lapply(astar_candidates, function(astar_candidate) {
gs_harm <- gsDesign::gsSurv(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pkgdown workflow failed with the error ! unused arguments (sfharm = gsDesign::sfHSD, sfharmparam = -2) because this new article requires the dev version of {gsDesign}

Comment thread R/gs_design_ahr.R
}
if (n_analysis == 1 && test_harm) {
stop("gs_design_ahr() harm bound cannot be tested if there is only one analysis.")
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about gs_design_npe() and gs_power_npe()? They still return the harm bound for a fixed design when test_harm = TRUE.

library("gsDesign2")

gs_design_npe()
## # A tibble: 1 × 10
## analysis bound     z probability probability0 theta  info info0 info1 info_frac
## <dbl> <chr> <dbl>       <dbl>        <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>
##   1        1 upper  1.96         0.9        0.025   0.1 1051. 1051. 1051.         1

gs_design_npe(test_harm = TRUE)
## analysis bound        z probability probability0 theta info_frac     info    info0    info1
## 1        1 upper 1.959964         0.9        0.025   0.1         1 1050.742 1050.742 1050.742
## 2        1 lower     -Inf         0.0        0.000   0.1         1 1050.742 1050.742 1050.742
## 3        1  harm     -Inf         0.0        0.000   0.1         1 1050.742 1050.742 1050.742

gs_power_npe()
## analysis bound        z probability theta theta1 info_frac info info0 info1
## 1        1 upper 1.959964  0.03144531   0.1    0.1         1    1     1     1
## 2        1 lower     -Inf  0.00000000   0.1    0.1         1    1     1     1

> gs_power_npe(test_harm = TRUE)
## analysis bound        z probability theta theta1 info_frac info info0 info1
## 1        1 upper 1.959964  0.03144531   0.1    0.1         1    1     1     1
## 2        1 lower     -Inf  0.00000000   0.1    0.1         1    1     1     1
## 3        1  harm     -Inf  0.00000000   0.1    0.1         1    1     1     1

(filter(x$bound, bound == "lower")$spending_time %==% expected)
})

assert("Harm bound is not provided when it is a fixed design", {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. All of the calls below use the default of test_harm = FALSE. Thus the harm bound is not returned because of this argument, and has nothing to do with the fact that these are fixed designs.

upper = gs_b, upar = qnorm(1 - 0.025), test_upper = TRUE,
lower = gs_b, lpar = -Inf, test_lower = FALSE)

(all.equal(x1$bound[abs(x1$z) != Inf], "upper"))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all.equal() should be used for numeric comparisons. I think == should be sufficient for comparing strings. Alternatively you could use identical().

(all.equal(x1$bound[abs(x1$z) != Inf], "upper"))
(all.equal(x2$bound[abs(x2$z) != Inf], "upper"))
(all.equal(x3$bound$bound[abs(x3$bound$z) != Inf], "upper"))
(all.equal(x4$bound$bound[abs(x4$bound$z) != Inf], "upper"))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using != Inf seems like cheating here. Don't you want to confirm that the bound table does not include a row for the harm bound at all?

Only x1 (produced by gs_power_npe()) returns a lower bound with a z of -Inf. Given that test_lower = FALSE, shouldn't the function gs_power_npe() instead be updated to not return this lower bound?

(all(x2$bound$z[x2$bound$bound == "lower"] - x2$bound$z[x2$bound$bound == "harm"] >= 0))
})

assert("Harm bound is always below the lower bound in a group sequential design", {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this test different than the one above? The comment used to describe them is identical.

upper = gs_spending_bound,
upar = list(sf = gsDesign::sfLDOF, total_spend = 0.025),
test_upper = TRUE,
test_lower = FALSE,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No lower bound is returned because test_lower = FALSE

harm = gs_spending_bound,
hpar = list(sf = gsDesign::sfHSD, total_spend = 0.1, param = -4),
test_harm = TRUE)
x3 <- gs_design_ahr(analysis_time = c(12, 24, 36), info_frac = NULL,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This x3 object is never used

test_harm = TRUE)

(all(x1$bound$z[x1$bound$bound == "lower"] - x1$bound$z[x1$bound$bound == "harm"] >= 0))
(all(x2$bound$z[x2$bound$bound == "lower"] - x2$bound$z[x2$bound$bound == "harm"] >= 0))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are meaningless. Because test_lower = FALSE, there is no lower bound.

x1$bound
##   analysis bound probability probability0        z ~hr at bound    nominal p spending_time
## 1        1 upper 0.002264189 5.380432e-05 3.872763    0.4485851 5.380432e-05     0.3080415
## 2        2 upper 0.550377366 9.209304e-03 2.357870    0.7299828 9.190059e-03     0.7407917
## 3        3 upper 0.899999844 2.500000e-02 2.009598    0.7938368 2.223685e-02     1.0000000

x1$bound$z[x1$bound$bound == "lower"]
## numeric(0)
x1$bound$z[x1$bound$bound == "harm"]
## numeric(0)
x1$bound$z[x1$bound$bound == "lower"] - x1$bound$z[x1$bound$bound == "harm"] >= 0
## logical(0)
all(x1$bound$z[x1$bound$bound == "lower"] - x1$bound$z[x1$bound$bound == "harm"] >= 0)
## [1] TRUE

test_lower = c(TRUE, TRUE, FALSE),
harm = gs_spending_bound,
hpar = list(sf = gsDesign::sfHSD, total_spend = 0.1, param = -4, timing = NULL),
test_harm = c(TRUE, TRUE, TRUE)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense for test_harm to be TRUE for the 3rd analysis but test_lower be FALSE?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add harm bound

4 participants