7.1 Recoding a Continuous Variable to a Categorica

For three categories we specify four bounds, which can include Inf and -Inf. If a data value falls outside of the specified bounds, it’s categorized as NA. The result of cut() is a factor, and you can see from the example that the factor levels are named after the bounds.

To change the names of the levels, set the labels:

pg$wtclass <- cut(pg$weight, breaks = c(0, 5, 6, Inf), labels = c("small", "medium", "large")) pg #> weight group wtclass #> 1 4.17 ctrl small #> 2 5.58 ctrl medium #> ...<26 more rows>... #> 29 5.80 trt2 medium #> 30 5.26 trt2 medium

As indicated by the factor levels, the bounds are by default open on the left and closed on the right. In other words, they don’t include the lowest value, but they do include the highest value. For the smallest category, you can have it include both the lower and upper values by setting include.lowest=TRUE. In this example, this would result in 0 values going into the small category; otherwise, 0 would be coded as NA.

If you want the categories to be closed on the left and open on the right, set right = FALSE:

cut(pg$weight, breaks = c(0, 5, 6, Inf), right = FALSE) #> [1] [0,5) [5,6) [5,6) [6,Inf) [0,5) [0,5) [5,6) [0,5) [5,6) #> [10] [5,6) [0,5) [0,5) [0,5) [0,5) [5,6) [0,5) [6,Inf) [0,5) #> [19] [0,5) [0,5) [6,Inf) [5,6) [5,6) [5,6) [5,6) [5,6) [0,5) #> [28] [6,Inf) [5,6) [5,6) #> Levels: [0,5) [5,6) [6,Inf)

(责任编辑：)

搜索

热门标签:

7.1 Recoding a Continuous Variable to a Categorica