使用pairs绘制矩阵多元图-阿里云开发者社区

如果 X 是一个数值矩阵或者数据框，命令

> pairs(X)

将产生 X 的列之间两两相对的成对散点图阵列（pairwise scatterplot matrix）。也就是说，X的每一列相对 X 的所有其他列而产生 n(n-1) 个图，并且把这些图以阵列个形式显示在图区。这个图形阵列的行列图形尺度一致。

例如 :

> X <- matrix(1:8, 2, 4)

> X

[,1] [,2] [,3] [,4]

[1,] 1 3 5 7

[2,] 2 4 6 8

> pairs(X)

主要是不是每幅图都有坐标, 所以看起来很难理解, 其实每个格子都有坐标, 没画出来的话就在对立面.

使用pairs绘制矩阵多元图 - 德哥@Digoal - PostgreSQL research

我们看上图的第一列, 横坐标固定, 横坐标对应矩阵第一列的数据.

纵坐标, 每个格子有不同的纵坐标, 这些纵坐标和矩阵的其他几列对应.

上图第1列 :

第1个格子是第1列和第1列的数据, 横坐标是第1列的数据, 纵坐标是第1列的数据. (无图显示)

第2个格子是第1列和第2列的数据, 横坐标代表第1列, 是第1列的数据, 纵坐标是第2列的数据.

第3个格子是第1列和第3列的数据, 横坐标代表第1列 , 是第1列的数据, 纵坐标是第3列的数据.

第4个格子是第1列和第4列的数据, 横坐标代表第1列 , 是第1列的数据, 纵坐标是第4列的数据.

上图第2列 :

从上开始

第1个格子是第2列和第1列的数据, 横坐标是第2列的数据, 纵坐标是第1列的数据.

第2个格子是第2列和第2列的数据, 横坐标是第2列的数据, 纵坐标是第2列的数据. (无图显示)

第3个格子是第2列和第3列的数据, 横坐标是第2列的数据, 纵坐标是第3列的数据.

第4个格子是第2列和第4列的数据, 横坐标是第2列的数据, 纵坐标是第4列的数据.

........

pairs还支持使用公式, 输入多个变量时, 每个变量代表一列, 绘制与其他变量的多元图.

例如 :

> x <- 1:10

> y <- 101:110

> z <- 201:210

> x

[1] 1 2 3 4 5 6 7 8 9 10

> y

[1] 101 102 103 104 105 106 107 108 109 110

> z

[1] 201 202 203 204 205 206 207 208 209 210

> pairs(~ x / y / z)

> pairs(~ x + y + z)

以上得到的图是一致的 :

x,y,z每个代表1列, 与其他列绘制多元图.

例如第1列的横坐标为x, 纵坐标分别为y,z绘制2副图.

第2列的横坐标为y, 纵坐标分别为x,z绘制2副图.

第3列的横坐标为z, 纵坐标分别为x,y绘制2副图.

[参考]

1. > help(pairs)

pairs                 package:graphics                 R Documentation

Scatterplot Matrices

Description:

     A matrix of scatterplots is produced.

Usage:

     pairs(x, ...)
     
     ## S3 method for class 'formula'
     pairs(formula, data = NULL, ..., subset,
           na.action = stats::na.pass)
     
     ## Default S3 method:
     pairs(x, labels, panel = points, ...,
           lower.panel = panel, upper.panel = panel,
           diag.panel = NULL, text.panel = textPanel,
           label.pos = 0.5 + has.diag/3, line.main = 3,
           cex.labels = NULL, font.labels = 1,
           row1attop = TRUE, gap = 1, log = "")
     
Arguments:

       x: the coordinates of points given as numeric columns of a
          matrix or data frame.  Logical and factor columns are
          converted to numeric in the same way that ‘data.matrix’ does.

 formula: a formula, such as ‘~ x + y + z’.  Each term will give a
          separate variable in the pairs plot, so terms should be
          numeric vectors.  (A response will be interpreted as another
          variable, but not treated specially, so it is confusing to
          use one.)

    data: a data.frame (or list) from which the variables in ‘formula’
          should be taken.

  subset: an optional vector specifying a subset of observations to be
          used for plotting.

na.action: a function which indicates what should happen when the data
          contain ‘NA’s.  The default is to pass missing values on to
          the panel functions, but ‘na.action = na.omit’ will cause
          cases with missing values in any of the variables to be
          omitted entirely.

  labels: the names of the variables.

   panel: ‘function(x, y, ...)’ which is used to plot the contents of
          each panel of the display.

     ...: arguments to be passed to or from methods.

          Also, graphical parameters can be given as can arguments to
          ‘plot’ such as ‘main’.  ‘par("oma")’ will be set
          appropriately unless specified.

lower.panel, upper.panel: separate panel functions (or ‘NULL’) to be
          used below and above the diagonal respectively.

diag.panel: optional ‘function(x, ...)’ to be applied on the diagonals.

text.panel: optional ‘function(x, y, labels, cex, font, ...)’ to be
          applied on the diagonals.

label.pos: ‘y’ position of labels in the text panel.

line.main: if ‘main’ is specified, ‘line.main’ gives the ‘line’
          argument to ‘mtext()’ which draws the title.  You may want to
          specify ‘oma’ when changing ‘line.main’.

cex.labels, font.labels: graphics parameters for the text panel.

row1attop: logical. Should the layout be matrix-like with row 1 at the
          top, or graph-like with row 1 at the bottom?

     gap: distance between subplots, in margin lines.

     log: a character string indicating if logarithmic axes are to be
          used: see ‘plot.default’. ‘log = "xy"’ specifies logarithmic
          axes for all variables.

Details:

     The ijth scatterplot contains ‘x[,i]’ plotted against ‘x[,j]’.
     The scatterplot can be customised by setting panel functions to
     appear as something completely different. The off-diagonal panel
     functions are passed the appropriate columns of ‘x’ as ‘x’ and
     ‘y’: the diagonal panel function (if any) is passed a single
     column, and the ‘text.panel’ function is passed a single ‘(x, y)’
     location and the column name.  Setting some of these panel
     functions to ‘NULL’ is equivalent to _not_ drawing anything there.

     The graphical parameters ‘pch’ and ‘col’ can be used to specify a
     vector of plotting symbols and colors to be used in the plots.

     The graphical parameter ‘oma’ will be set by ‘pairs.default’
     unless supplied as an argument.

     A panel function should not attempt to start a new plot, but just
     plot within a given coordinate system: thus ‘plot’ and ‘boxplot’
     are not panel functions.

     By default, missing values are passed to the panel functions and
     will often be ignored within a panel.  However, for the formula
     method and ‘na.action = na.omit’, all cases which contain a
     missing values for any of the variables are omitted completely
     (including when the scales are selected).

Author(s):

     Enhancements for R 1.0.0 contributed by Dr. Jens
     Oehlschlaegel-Akiyoshi and R-core members.

References:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_.  Wadsworth & Brooks/Cole.

Examples:

     pairs(iris[1:4], main = "Anderson's Iris Data -- 3 species",
           pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])
     
     ## formula method
     pairs(~ Fertility + Education + Catholic, data = swiss,
           subset = Education < 20, main = "Swiss data, Education < 20")
     
     pairs(USJudgeRatings)
     ## show only lower triangle (and suppress labeling for whatever reason):
     pairs(USJudgeRatings, text.panel = NULL, upper.panel = NULL)
     
     ## put histograms on the diagonal
     panel.hist <- function(x, ...)
     {
         usr <- par("usr"); on.exit(par(usr))
         par(usr = c(usr[1:2], 0, 1.5) )
         h <- hist(x, plot = FALSE)
         breaks <- h$breaks; nB <- length(breaks)
         y <- h$counts; y <- y/max(y)
         rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)
     }
     pairs(USJudgeRatings[1:5], panel = panel.smooth,
           cex = 1.5, pch = 24, bg = "light blue",
           diag.panel = panel.hist, cex.labels = 2, font.labels = 2)
     
     ## put (absolute) correlations on the upper panels,
     ## with size proportional to the correlations.
     panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
     {
         usr <- par("usr"); on.exit(par(usr))
         par(usr = c(0, 1, 0, 1))
         r <- abs(cor(x, y))
         txt <- format(c(r, 0.123456789), digits = digits)[1]
         txt <- paste0(prefix, txt)
         if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
         text(0.5, 0.5, txt, cex = cex.cor * r)
     }
     pairs(USJudgeRatings, lower.panel = panel.smooth, upper.panel = panel.cor)
     
     pairs(iris[-5], log = "xy") # plot all variables on log scale
     pairs(iris, log = 1:4, # log the first four
           main = "Lengths and Widths in [log]", line.main=1.5, oma=c(2,2,3,2))

使用pairs绘制矩阵多元图

热门文章

最新文章

相关电子书