Give a dataset and a list of variables, or just the data in the vars. For best results, convert categorical variables into factors. Provides a table of estimated descriptive statistics optionally by group levels.
Usage
egltable(
vars,
g,
data,
idvar,
strict = TRUE,
parametric = TRUE,
paired = FALSE,
simChisq = FALSE,
sims = 1000000L
)
Arguments
- vars
Either an index (numeric or character) of variables to access from the
data
argument, or the data to be described itself.- g
A variable used tou group/separate the data prior to calculating descriptive statistics.
- data
optional argument of the dataset containing the variables to be described.
- idvar
A character string indicating the variable name of the ID variable. Not currently used, but will eventually support
egltable
supporting repeated measures data.- strict
Logical, whether to strictly follow the type of each variable, or to assume categorical if the number of unique values is less than or equal to 3.
- parametric
Logical whether to use parametric tests in the case of multiple groups to test for differences. Only applies to continuous variables. If
TRUE
, the default, uses one-way ANOVA, and a F test. IfFALSE
, uses the Kruskal-Wallis test.- paired
Logical whether the data are paired or not. Defaults to
FALSE
. IfTRUE
, the grouping variable,g
, must have two levels andidvar
must be specified. When used a paired t-test is used for parametric, continuous data and a Wilcoxon test for paired non parametric, continuous data and a McNemar chi square test is used for categorical data.- simChisq
Logical whether to estimate p-values for chi-square test for categorical data when there are multiple groups, by simulation. Defaults to
FALSE
. Useful when there are small cells as will provide a more accurate test in extreme cases, similar to Fisher Exact Test but generalizing to large dimension of tables.- sims
Integer for the number of simulations to be used to estimate p-values for the chi-square tests for categorical variables when there are multiple groups. Defaults to one million (
1e6L
).
Examples
egltable(iris)
#> M (SD)/N (%)
#> <char> <char>
#> 1: Sepal.Length 5.84 (0.83)
#> 2: Sepal.Width 3.06 (0.44)
#> 3: Petal.Length 3.76 (1.77)
#> 4: Petal.Width 1.20 (0.76)
#> 5: Species
#> 6: setosa 50 (33.3%)
#> 7: versicolor 50 (33.3%)
#> 8: virginica 50 (33.3%)
egltable(colnames(iris)[1:4], "Species", data = iris)
#> setosa M (SD) versicolor M (SD) virginica M (SD)
#> <char> <char> <char> <char>
#> 1: Sepal.Length 5.01 (0.35) 5.94 (0.52) 6.59 (0.64)
#> 2: Sepal.Width 3.43 (0.38) 2.77 (0.31) 2.97 (0.32)
#> 3: Petal.Length 1.46 (0.17) 4.26 (0.47) 5.55 (0.55)
#> 4: Petal.Width 0.25 (0.11) 1.33 (0.20) 2.03 (0.27)
#> Test
#> <char>
#> 1: F(2, 147) = 119.26, p < .001, Eta-squared = 0.62
#> 2: F(2, 147) = 49.16, p < .001, Eta-squared = 0.40
#> 3: F(2, 147) = 1180.16, p < .001, Eta-squared = 0.94
#> 4: F(2, 147) = 960.01, p < .001, Eta-squared = 0.93
egltable(iris, parametric = FALSE)
#> Mdn (IQR)/N (%)
#> <char> <char>
#> 1: Sepal.Length 5.80 (1.30)
#> 2: Sepal.Width 3.00 (0.50)
#> 3: Petal.Length 4.35 (3.50)
#> 4: Petal.Width 1.30 (1.50)
#> 5: Species
#> 6: setosa 50 (33.3%)
#> 7: versicolor 50 (33.3%)
#> 8: virginica 50 (33.3%)
egltable(colnames(iris)[1:4], "Species", iris,
parametric = FALSE)
#> setosa Mdn (IQR) versicolor Mdn (IQR) virginica Mdn (IQR)
#> <char> <char> <char> <char>
#> 1: Sepal.Length 5.00 (0.40) 5.90 (0.70) 6.50 (0.67)
#> 2: Sepal.Width 3.40 (0.48) 2.80 (0.48) 3.00 (0.38)
#> 3: Petal.Length 1.50 (0.18) 4.35 (0.60) 5.55 (0.78)
#> 4: Petal.Width 0.20 (0.10) 1.30 (0.30) 2.00 (0.50)
#> Test
#> <char>
#> 1: KW chi-square = 96.94, df = 2, p < .001
#> 2: KW chi-square = 63.57, df = 2, p < .001
#> 3: KW chi-square = 130.41, df = 2, p < .001
#> 4: KW chi-square = 131.19, df = 2, p < .001
egltable(colnames(iris)[1:4], "Species", iris,
parametric = c(TRUE, TRUE, FALSE, FALSE))
#> setosa See Rows versicolor See Rows
#> <char> <char> <char>
#> 1: Sepal.Length, M (SD) 5.01 (0.35) 5.94 (0.52)
#> 2: Sepal.Width, M (SD) 3.43 (0.38) 2.77 (0.31)
#> 3: Petal.Length, Mdn (IQR) 1.50 (0.18) 4.35 (0.60)
#> 4: Petal.Width, Mdn (IQR) 0.20 (0.10) 1.30 (0.30)
#> virginica See Rows Test
#> <char> <char>
#> 1: 6.59 (0.64) F(2, 147) = 119.26, p < .001, Eta-squared = 0.62
#> 2: 2.97 (0.32) F(2, 147) = 49.16, p < .001, Eta-squared = 0.40
#> 3: 5.55 (0.78) KW chi-square = 130.41, df = 2, p < .001
#> 4: 2.00 (0.50) KW chi-square = 131.19, df = 2, p < .001
egltable(colnames(iris)[1:4], "Species", iris,
parametric = c(TRUE, TRUE, FALSE, FALSE), simChisq=TRUE)
#> setosa See Rows versicolor See Rows
#> <char> <char> <char>
#> 1: Sepal.Length, M (SD) 5.01 (0.35) 5.94 (0.52)
#> 2: Sepal.Width, M (SD) 3.43 (0.38) 2.77 (0.31)
#> 3: Petal.Length, Mdn (IQR) 1.50 (0.18) 4.35 (0.60)
#> 4: Petal.Width, Mdn (IQR) 0.20 (0.10) 1.30 (0.30)
#> virginica See Rows Test
#> <char> <char>
#> 1: 6.59 (0.64) F(2, 147) = 119.26, p < .001, Eta-squared = 0.62
#> 2: 2.97 (0.32) F(2, 147) = 49.16, p < .001, Eta-squared = 0.40
#> 3: 5.55 (0.78) KW chi-square = 130.41, df = 2, p < .001
#> 4: 2.00 (0.50) KW chi-square = 131.19, df = 2, p < .001
diris <- data.table::as.data.table(iris)
egltable("Sepal.Length", g = "Species", data = diris)
#> setosa M (SD) versicolor M (SD) virginica M (SD)
#> <char> <char> <char> <char>
#> 1: Sepal.Length 5.01 (0.35) 5.94 (0.52) 6.59 (0.64)
#> Test
#> <char>
#> 1: F(2, 147) = 119.26, p < .001, Eta-squared = 0.62
tmp <- mtcars
tmp$cyl <- factor(tmp$cyl)
tmp$am <- factor(tmp$am, levels = 0:1)
egltable(c("mpg", "hp"), "vs", tmp)
#> 0 M (SD) 1 M (SD) Test
#> <char> <char> <char> <char>
#> 1: mpg 16.62 (3.86) 24.56 (5.38) t(df=30) = -4.86, p < .001, d = 1.73
#> 2: hp 189.72 (60.28) 91.36 (24.42) t(df=30) = 5.73, p < .001, d = 2.04
egltable(c("mpg", "hp"), "am", tmp)
#> 0 M (SD) 1 M (SD) Test
#> <char> <char> <char> <char>
#> 1: mpg 17.15 (3.83) 24.39 (6.17) t(df=30) = -4.11, p < .001, d = 1.48
#> 2: hp 160.26 (53.91) 126.85 (84.06) t(df=30) = 1.37, p = .180, d = 0.49
egltable(c("am", "cyl"), "vs", tmp)
#> Warning: Chi-squared approximation may be incorrect
#> 0 N (%) 1 N (%)
#> <char> <char> <char>
#> 1: am
#> 2: 0 12 (66.7%) 7 (50.0%)
#> 3: 1 6 (33.3%) 7 (50.0%)
#> 4: cyl
#> 5: 4 1 (5.6%) 10 (71.4%)
#> 6: 6 3 (16.7%) 4 (28.6%)
#> 7: 8 14 (77.8%) 0 (0.0%)
#> Test
#> <char>
#> 1: Chi-square = 0.91, df = 1, p = .341, Phi = 0.17
#> 2:
#> 3:
#> 4: Chi-square = 21.34, df = 2, p < .001, Cramer's V = 0.82
#> 5:
#> 6:
#> 7:
tests <- with(sleep,
wilcox.test(extra[group == 1],
extra[group == 2], paired = TRUE))
#> Warning: cannot compute exact p-value with ties
#> Warning: cannot compute exact p-value with zeroes
str(tests)
#> List of 7
#> $ statistic : Named num 0
#> ..- attr(*, "names")= chr "V"
#> $ parameter : NULL
#> $ p.value : num 0.00909
#> $ null.value : Named num 0
#> ..- attr(*, "names")= chr "location shift"
#> $ alternative: chr "two.sided"
#> $ method : chr "Wilcoxon signed rank test with continuity correction"
#> $ data.name : chr "extra[group == 1] and extra[group == 2]"
#> - attr(*, "class")= chr "htest"
## example with paired data
egltable(c("extra"), g = "group", data = sleep, idvar = "ID", paired = TRUE)
#> 1 M (SD) 2 M (SD) Test
#> <char> <char> <char> <char>
#> 1: extra 0.75 (1.79) 2.33 (2.00) t(df=9) = 4.06, p = .003, d = 1.28
## what happens when ignoring pairing (p-value off)
# egltable(c("extra"), g = "group", data = sleep, idvar = "ID")
## paired categorical data example
## using data on chick weights to create categorical data
tmp <- subset(ChickWeight, Time %in% c(0, 20))
tmp$WeightTertile <- cut(tmp$weight,
breaks = quantile(tmp$weight, c(0, 1/3, 2/3, 1), na.rm = TRUE),
include.lowest = TRUE)
egltable(c("weight", "WeightTertile"), g = "Time",
data = tmp,
idvar = "Chick", paired = TRUE)
#> 0 M (SD)/N (%) 20 M (SD)/N (%)
#> <char> <char> <char>
#> 1: weight 41.06 (1.13) 209.72 (66.51)
#> 2: WeightTertile
#> 3: [39,41.7] 32 (64.0%) 0 (0.0%)
#> 4: (41.7,169] 18 (36.0%) 14 (30.4%)
#> 5: (169,361] 0 (0.0%) 32 (69.6%)
#> Test
#> <char>
#> 1: t(df=45) = 17.10, p < .001, d = 2.52
#> 2: McNemar's Chi-square = 39.00, df = 3, p < .001
#> 3:
#> 4:
#> 5:
rm(tmp)