Skip to contents

Enables the creation of scatterplots in a convenient and customizable manner. Additionally, it allows the user to calculate a correlation metric of interest, and fit a linear or loess line to the data.

Usage

scatterplotter(
  data,
  x_val,
  y_val,
  col_val = NA,
  style = "light",
  colors = NA,
  y_lab = y_val,
  x_lab = x_val,
  title = "",
  legend_lab = "",
  fit = "none",
  discrete = TRUE,
  linecolor = "black",
  pointcolor = "black",
  corr_method = "pearson",
  alternative = "two.sided",
  fit_method = "glm",
  se = FALSE,
  labels = NA,
  formula = "y ~ x",
  pointsize = 1,
  point_alpha = 1,
  display_n = T,
  facet_val = NA,
  ...
)

Arguments

data

The data.frame to be used for the visualization.

x_val

string, the name of the column to plot on the x axis.

y_val

string, the name of the column to plot on the y axis.

col_val

string, name of the column to use for coloring points. Default is NA.

style

string, palette style to be used for scale_color_au. Default is light. Style is only applied if colors remains NA.

colors

vector containing the colors to be used for the fill aesthetic. Default is NA. If unspecified, the function uses au_colors().

y_lab

string, the y axis label. Default is the string passed into y_val.

x_lab

string, the x axis label. Default is the sting passed into x_val.

title

string, the title of the plot to be displayed on top. Default is "".

legend_lab

string, the legend title. Default is color.

fit

string, single, grouped, or none. When using single, the model is fit to the entire dataset displayed on the plot. When grouped is used, a separate line is fit to the groups defined by col_val. When none is selected, no line is being fit. Default is none.

discrete

boolean, TRUE applies a discrete color scale, FALSE applies a continuous color scale. Default is TRUE.

linecolor

string, the color of the fitted line in single mode. Default is black.

pointcolor

sting, the color to use for points when col_val = NA.Default is black.

corr_method

string, the correlation method to pass into stat_cor(). Default is pearson.

alternative

string, the alternative to pass into stat_cor(). Default is two.sided.

fit_method

string, the fitting method passed into geom_smooth(). Default is glm.

se

boolean, when TRUE, the confidence interval around the fitted line is displayed. Default is FALSE.

labels

vector, the legend annotations. Default is the unique values in y_val.

formula

string, the formula to use for fitting the line with geom_smooth(). Default is y ~ x.

pointsize

num, point size passed into geom_point(). Default is 1.

point_alpha

num, point opacity passed into geom_point(). Default is 1.

display_n

boolean, if TRUE, the plot displays the sample size appended to the title. Default is TRUE. The sample size is calculated on the basis of the number of rows in the data frame, so ensure that the observations in data are unique before calling scatterplotter.

facet_val

string, the name of the column to facet by. Default is NA.

...

other parameters passed into stat_cor() or facet_wrap().

Value

A ggplot object.

Examples

scatterplotter(iris, "Sepal.Width", "Sepal.Length",col_val = "Species", style = "tracerx",
y_lab = "Sepal Length", x_lab = "Sepal Width",  title = "Comparing sepal widths and lengths per species",
fit = "single", corr_method = "pearson", legend_lab = "Species", se = FALSE,
labels = c("Species 1", "Species 2", "Species 3"))