I am looking to test for differences in means between three groups. Normally I would just run a oneway anova and call it a day. However, I am using survey data and Stata does not allow the use of the anova command with the svy commands. My solution is to run an adjusted Wald test to compare the equality of means across the three groups. Would any of you approach this differently?
svy: mean age, over(race) test [age]1 = [age]2 = [age]3Tags: None Carlo Lazzaro
Steven:
there's nothing that -regress- can do worse than -anova-.
I would go as in the following toy-example:
. use http://www.stata-press.com/data/r16/nhanes2.dta . svy: regress weight i.race (running regress on estimation sample) Survey: Linear regression Number of strata = 31 Number of obs = 10,351 Number of PSUs = 62 Population size = 117,157,513 Design df = 31 F( 2, 30) = 21.04 Prob > F = 0.0000 R-squared = 0.0106 ------------------------------------------------------------------------------ | Linearized weight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- race | Black | 3.308829 .8027919 4.12 0.000 1.671524 4.946134 Other | -7.697939 1.637091 -4.70 0.000 -11.03681 -4.35907 | _cons | 71.77969 .1784263 402.29 0.000 71.41578 72.14359 ------------------------------------------------------------------------------ . mat list e(b) e(b)[1,4] 1b. 2. 3. race race race _cons y1 0 3.3088294 -7.6979386 71.779686 . test 1b.race=2.race=3.race Adjusted Wald test ( 1) 1b.race - 2.race = 0 ( 2) 1b.race - 3.race = 0 F( 2, 30) = 21.04 Prob > F = 0.0000 .Kind regards,
. svy: mean weight, over(race) (running mean on estimation sample) Survey: Mean estimation Number of strata = 31 Number of obs = 10,351 Number of PSUs = 62 Population size = 117,157,513 Design df = 31 --------------------------------------------------------------- | Linearized | Mean Std. Err. [95% Conf. Interval] --------------+------------------------------------------------ c.weight@race | White | 71.77969 .1784263 71.41578 72.14359 Black | 75.08852 .7304715 73.59871 76.57832 Other | 64.08175 1.61103 60.79603 67.36746 --------------------------------------------------------------- . mat list e(b) e(b)[1,3] c.weight@ c.weight@ c.weight@ 1.race 2.race 3.race y1 71.779686 75.088516 64.081748 . test [email protected][email protected][email protected] Adjusted Wald test ( 1) [email protected] - [email protected] = 0 ( 2) [email protected] - [email protected] = 0 F( 2, 30) = 21.04 Prob > F = 0.0000 .As expected, -test- results are identical. Kind regards,
A minor variation on Carlo's advice in #2 is to swap test with testparm.
testparm i.race
Thank you Carlo and Leonardo. My approach yields identical results to these two, which is reassuring since that is to be expected. I am attempting to address a reviewer's concern that my current approach does not account for multiple pairwise comparisons. I am not comparing many variables in my Table 1 so I am not as worried about this. But I find it interesting that my Bonferroni-adjusted p-values are identical to the non-adjusted p-values. I guess this supports my notion that correction is not needed.
Last edited by Steven Spivack; 02 Sep 2020, 08:45 .Here is an example of one of my Table 1 variables.
. svy: mean q55_1, over(tablecat) (running mean on estimation sample) Survey: Mean estimation Number of strata = 1 Number of obs = 1,178 Number of PSUs = 628 Population size = 3,266.6842 Design df = 627 1: tablecat = 1 2: tablecat = 2 3: tablecat = 3 -------------------------------------------------------------- | Linearized Over | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ q55_1 | 1 | 18.97772 2.897018 13.28869 24.66675 2 | 27.42266 4.367904 18.84517 36.00016 3 | 29.33308 4.657676 20.18654 38.47961 -------------------------------------------------------------- . test [q55_1]1 = [q55_1]2 = [q55_1]3, mtest(b) Adjusted Wald test ( 1) [q55_1]1 - [q55_1]2 = 0 ( 2) [q55_1]1 - [q55_1]3 = 0 --------------------------------------- | F(df,626) df p -------+------------------------------- (1) | 2.58 1 0.2171 # (2) | 3.56 1 0.1193 # -------+------------------------------- all | 2.24 2 0.1068 --------------------------------------- # Bonferroni-adjusted p-values . test [q55_1]1 = [q55_1]2 = [q55_1]3 Adjusted Wald test ( 1) [q55_1]1 - [q55_1]2 = 0 ( 2) [q55_1]1 - [q55_1]3 = 0 F( 2, 626) = 2.24 Prob > F = 0.1068
You're Bonferroni-adjusted p-values (in this case) will be exactly twice the values of unadjusted p-values from the regression table (for example). If by Table 1 you mean the usual demographics table, then perhaps you could include the p-value from the overall test of differences (i.e., from the "all" line in the Adjusted Wald test in post #6). Adjusting for multiple comparisons really only affects the pair-wise contrasts.
Thanks Leonardo! I was presenting the p value from the Adjusted Wald test but I will make that more explicit in my methods section. I appreciate all of your help.
Thank you for this useful thread. I am also comparing the means across 3 groups (accounting for survey design). For ease, I will use the same variable names in the examples above. My main question is with the line of code to test for differences between means (including the Bonferroni correction)
test 1b.race=2.race=3.race, mtest(b)
This only tests the difference between group 1b and 2, and 1b and 3. How can I also add the comparison between group 2 and 3? I realise I could run the comparison between group 2 and 3 in a separate test command, but this then isn't included in the Bonferroni correction. I could run them all unadjusted and manually correct the p-values myself (multiply by number of comparisons), but my comparison of group 2 v 3 in my case p = 0.572, so multiplying by 3 does not seem correct? My current solution would be to adjust the alpha criterion to evaluate the unadjusted p-values to (i.e. 0.05/3 = 0.0167), but I wondered if there's another solution using the test command to get all 3 pairwise comparisons with their Bonferroni adjusted p values?
Thank you,
Jenny
Thank you for this useful thread. I am also comparing the means across 3 groups (accounting for survey design). For ease, I will use the same variable names in the examples above. My main question is with the line of code to test for differences between means (including the Bonferroni correction)
test 1b.race=2.race=3.race, mtest(b)
This only tests the difference between group 1b and 2, and 1b and 3. How can I also add the comparison between group 2 and 3? I realise I could run the comparison between group 2 and 3 in a separate test command, but this then isn't included in the Bonferroni correction. I could run them all unadjusted and manually correct the p-values myself (multiply by number of comparisons), but my comparison of group 2 v 3 in my case p = 0.572, so multiplying by 3 does not seem correct? My current solution would be to adjust the alpha criterion to evaluate the unadjusted p-values to (i.e. 0.05/3 = 0.0167), but I wondered if there's another solution using the test command to get all 3 pairwise comparisons with their Bonferroni adjusted p values?
Thank you,
Jenny
Hi Jenny, welcome. In future, please start a new thread instead of resurrecting old and answered threads. This makes it easier for people to search for questions that may help them later on.
The PDF documentation for -test- shows some examples of the type of syntax you need when you want to use multiplicity adjustment. If you wish to see the adjusted differences, their confidence intervals and p-values, then you could use -margins-.
webuse census3, clear regress brate i.region // just perform adjustment multiple comparisons, showing only adjusted p-values and test statistics test (1.region=2.region) (1.region=3.region) (1.region=4.region) (2.region=3.region) (2.region=4.region) (3.region=4.region), mtest(bonferroni) // alternative method, but show the estimate of each contrast margins i.region, pwcompare(effect) mcomp(bonferroni)
. regress brate i.region ( output omitted) ------------------------------------------------------------------------------ brate | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- region | NCentral | 30.86111 9.785122 3.15 0.003 11.16468 50.55754 South | 25.67361 9.246071 2.78 0.008 7.062236 44.28499 West | 59.34188 9.622477 6.17 0.000 39.97284 78.71092 | _cons | 136.8889 7.396857 18.51 0.000 121.9998 151.778 ------------------------------------------------------------------------------ . test (1.region=2.region) (1.region=3.region) (1.region=4.region) (2.region=3.region) (2.region=4.region) (3.region=4.region), mtest(bonferroni) (output omitted) -------------------------------------- | F(df,46) df p > F -------+------------------------------ (1) | 9.95 1 0.0170* (2) | 7.71 1 0.0475* (3) | 38.03 1 0.0000* (4) | 0.37 1 1.0000* (5) | 10.28 1 0.0147* (6) | 16.51 1 0.0011* -------+------------------------------ All | 13.23 3 0.0000 -------------------------------------- * Bonferroni-adjusted p-values . margins i.region, pwcompare(effect) mcomp(bonferroni) Pairwise comparisons of adjusted predictions Number of obs = 50 Model VCE: OLS Expression: Linear prediction, predict() --------------------------- | Number of | comparisons -------------+------------- region | 6 --------------------------- ------------------------------------------------------------------------------------ | Delta-method Bonferroni Bonferroni | Contrast std. err. t P>|t| [95% conf. interval] -------------------+---------------------------------------------------------------- region | NCentral vs NE | 30.86111 9.785122 3.15 0.017 3.881821 57.8404 South vs NE | 25.67361 9.246071 2.78 0.048 .1805786 51.16664 West vs NE | 59.34188 9.622477 6.17 0.000 32.81103 85.87273 South vs NCentral | -5.1875 8.474164 -0.61 1.000 -28.55225 18.17725 West vs NCentral | 28.48077 8.883337 3.21 0.015 3.987856 52.97368 West vs South | 33.66827 8.285826 4.06 0.001 10.8228 56.51374 ------------------------------------------------------------------------------------