Developing Your First R PackageA Case Study with esvisDaniel Anderson04-10-20181 / 56

Want to follow along?

If you'd like to follow along, pleae make sure you have the following packages installed

install.packages(c("tidyverse", "esvis", 
                   "devtools", "roxygen2", 
                   "usethis"))

2 / 56

#whomi

Research Assistant Professor (newly) in the College of Education
Work a lot in growth modeling and measurement
Preacher of the R gospel
Dad of two amazing girls

3 / 56

the fam4 / 56

Some reviewWhat is an R function?
5 / 56

Some review

What is an R function?
- Anything that carries out an operation in R, including + and <-

5 / 56

Some review

What is an R function?
- Anything that carries out an operation in R, including + and <-
What are the components of a function?

5 / 56

Some review

What is an R function?
- Anything that carries out an operation in R, including + and <-
What are the components of a function?
- Formals (arguments), Body (everything between the braces), and Environment (where the function lives)

fun_components

5 / 56

When should you write a function?

Hadley's rule

You should consider writing a function whenever you’ve copied and pasted a block of code more than twice (i.e. you now have three copies of the same code).

r4ds

6 / 56

7 / 56

Bundle your functions

Once you've written more than one function, you may want to bundle them. There are two general ways to do this:

8 / 56

Bundle your functions

Once you've written more than one function, you may want to bundle them. There are two general ways to do this:

source?

Write a package

8 / 56

Bundle your functions

Once you've written more than one function, you may want to bundle them. There are two general ways to do this:

source?

Write a package

8 / 56

Reasons to avoid sourceingDocumentation is generally more sparse
Directory issuesWhich leads to reproducibility issues

9 / 56

More importantly

Bundling functions into a package is not that hard!

10 / 56

my journey with esvis11 / 56

Background

Effect sizes

Standardized mean differences

12 / 56

Background

Effect sizes

Standardized mean differences

Assumes reasonably normally distributed distributions (mean is a good indicator of central tendency)

12 / 56

Background

Effect sizes

Standardized mean differences

Assumes reasonably normally distributed distributions (mean is a good indicator of central tendency)
Differences in means may not reflect differences at all points in scale if variances are different

12 / 56

Background

Effect sizes

Standardized mean differences

Assumes reasonably normally distributed distributions (mean is a good indicator of central tendency)
Differences in means may not reflect differences at all points in scale if variances are different
Substantive interest may also lie with differences at other points in the distribution.

12 / 56

Varying differences

Quick simulated example

library(tidyverse)
common_var <- tibble(low  = rnorm(1000, 10, 1),
                     high = rnorm(1000, 12, 1),
                     var  = "common")
diff_var <- tibble(low  = rnorm(1000, 10, 1),
                   high = rnorm(1000, 12, 2),
                   var  = "diff")
d <- bind_rows(common_var, diff_var)
head(d)

## # A tibble: 6 x 3
##     low  high var   
##   <dbl> <dbl> <chr> 
## 1 10.4   11.4 common
## 2  9.48  10.7 common
## 3 11.7   10.4 common
## 4  8.97  11.0 common
## 5  9.96  12.1 common
## 6  8.76  12.1 common

13 / 56

Restructure the data for plotting

d <- d %>% 
  gather(group, value, -var) 
d

## # A tibble: 4,000 x 3
##    var    group value
##    <chr>  <chr> <dbl>
##  1 common low   10.4 
##  2 common low    9.48
##  3 common low   11.7 
##  4 common low    8.97
##  5 common low    9.96
##  6 common low    8.76
##  7 common low   10.1 
##  8 common low   11.1 
##  9 common low   11.9 
## 10 common low    9.50
## # ... with 3,990 more rows

14 / 56

Plot the distributions

theme_set(theme_minimal())
ggplot(d, aes(value, color = group)) +
  geom_density(lwd = 1.5) +
  facet_wrap(~var)

15 / 56

Binned effect sizes

Cut the distributions into $n$ bins (based on percentiles)
Calculate the mean difference between paired bins
Divide each mean difference by the overall pooled standard deviation

$d_{[i]} = \frac{{\bar{X}}_{f o c_{[i]}} - {\bar{X}}_{r e f_{[i]}}}{\sqrt{\frac{(n_{f o c} - 1) V a r_{f o c} + (n_{r e f} - 1) V a r_{r e f}}{n_{f o c} + n_{r e f} - 2}}}$

16 / 56

Binned effect sizes

Cut the distributions into $n$ bins (based on percentiles)
Calculate the mean difference between paired bins
Divide each mean difference by the overall pooled standard deviation

$d_{[i]} = \frac{{\bar{X}}_{f o c_{[i]}} - {\bar{X}}_{r e f_{[i]}}}{\sqrt{\frac{(n_{f o c} - 1) V a r_{f o c} + (n_{r e f} - 1) V a r_{r e f}}{n_{f o c} + n_{r e f} - 2}}}$

visualize it!

16 / 56

Back to the simultated example

common <- filter(d, var == "common")
diff   <- filter(d, var == "diff")

library(esvis)
qtile_es(value ~ group, common)

##   ref_group foc_group low_qtile high_qtile midpoint        es         se
## 1      high       low      0.00       0.33    0.165 -2.060092 0.09645691
## 2      high       low      0.33       0.66    0.495 -2.072788 0.09651680
## 3      high       low      0.66       0.99    0.825 -2.044473 0.09605817

qtile_es(value ~ group, diff)

##   ref_group foc_group low_qtile high_qtile midpoint         es         se
## 1      high       low      0.00       0.33    0.165 -0.6429559 0.07995721
## 2      high       low      0.33       0.66    0.495 -1.3213209 0.08592584
## 3      high       low      0.66       0.99    0.825 -1.9278210 0.09421322

17 / 56

Visualize it

Common Variance

binned_plot(value ~ group, common)

Different Variance

binned_plot(value ~ group, diff)

18 / 56

Wait a minute...

The esvis package will (among other things) calculate and visually display binned effect sizes.
But how did we get from an idea, to functions, to a package?

confused

19 / 56

taking a step back20 / 56

Package CreationThe (a) recipeCome up with a brilliant an idea can be boring and mundane but just something you do a lot

21 / 56

Package CreationThe (a) recipeCome up with a brilliant an idea can be boring and mundane but just something you do a lot

Write a function! or more likely, a set of functions
21 / 56

Package CreationThe (a) recipeCome up with a brilliant an idea can be boring and mundane but just something you do a lot

Write a function! or more likely, a set of functions
Create package skelton
21 / 56

Package CreationThe (a) recipeCome up with a brilliant an idea can be boring and mundane but just something you do a lot

Write a function! or more likely, a set of functions
Create package skelton
Document your function
21 / 56

Package CreationThe (a) recipeCome up with a brilliant an idea can be boring and mundane but just something you do a lot

Write a function! or more likely, a set of functions
Create package skelton
Document your function
Install/fiddle/install
21 / 56

Package CreationThe (a) recipeCome up with a brilliant an idea can be boring and mundane but just something you do a lot

Write a function! or more likely, a set of functions
Create package skelton
Document your function
Install/fiddle/install
Write tests for your functions
21 / 56

Package CreationThe (a) recipeCome up with a brilliant an idea can be boring and mundane but just something you do a lot

Write a function! or more likely, a set of functions
Create package skelton
Document your function
Install/fiddle/install
Write tests for your functions
Host your package somewhere public (GitHub is probably best) and promote it - leverage the power of open source!
21 / 56

Package Creation

The (a) recipe

Come up with ~~a brilliant~~ an idea
- can be boring and mundane but just something you do a lot
Write a function! or more likely, a set of functions
Create package skelton
Document your function
Install/fiddle/install
Write tests for your functions
Host your package somewhere public (GitHub is probably best) and promote it - leverage the power of open source!

Use tools throughout (which we'll talk about momentarily) to help automate many of the steps, and make the whole thing less painful

21 / 56

A really good point

1a) check that no one had the same idea 😇
— Maëlle Salmon 🐟 (@ma_salmon) April 10, 2018

And some further recommendations/good advice

22 / 56

Some resources

We surely won't get through all the steps tonight. In my mind, the best resources are:

Advanced R

R Packages

23 / 56

Some resources

We surely won't get through all the steps tonight. In my mind, the best resources are:

Advanced R

R Packages

For a really quick but really good intro, see Hilary Parker's blog post

23 / 56

Our package

We're going to write a package today! Let's keep it really simple...

Idea: Report basic descriptive statistics for a vector, x: n, mean, and sd. Let's also have it report on the number of missing observations.

24 / 56

Our functionLet's have it return either (a) a named vector, or (b) a dataframe (whichever you prefer is fine)
What will be the formal arguments?
What will the body look like?
25 / 56

Our functionLet's have it return either (a) a named vector, or (b) a dataframe (whichever you prefer is fine)
What will be the formal arguments?
What will the body look like?Want to give it a go?
25 / 56

The approach I took...

describe <- function(x) {
  n     <- as.integer(length(na.omit(x)))
  nmiss <- as.integer(sum(is.na(x)))
  mn    <- mean(x, na.rm = TRUE)
  stdev <- sd(x, na.rm = TRUE)
  out <- tibble::tibble(n_valid   = n, 
                        n_missing = nmiss, 
                        mean      = mn, 
                        sd        = stdev)
  out
}

26 / 56

The approach I took...

describe <- function(x) {
   n     <- as.integer(length(na.omit(x))) # Count number of valid cases
  nmiss <- as.integer(sum(is.na(x)))
  mn    <- mean(x, na.rm = TRUE)
  stdev <- sd(x, na.rm = TRUE)
  out <- tibble::tibble(n_valid   = n, 
                        n_missing = nmiss, 
                        mean      = mn, 
                        sd        = stdev)
  out
}

27 / 56

The approach I took...

describe <- function(x) {
  n     <- as.integer(length(na.omit(x))) 
   nmiss <- as.integer(sum(is.na(x))) # Count the number of missing
  mn    <- mean(x, na.rm = TRUE)
  stdev <- sd(x, na.rm = TRUE)
  out <- tibble::tibble(n_valid   = n, 
                        n_missing = nmiss, 
                        mean      = mn, 
                        sd        = stdev)
  out
}

28 / 56

The approach I took...

describe <- function(x) {
  n     <- as.integer(length(na.omit(x)))
  nmiss <- as.integer(sum(is.na(x)))
   mn    <- mean(x, na.rm = TRUE) # Calculate mean
  stdev <- sd(x, na.rm = TRUE)
  out <- tibble::tibble(n_valid   = n, 
                        n_missing = nmiss, 
                        mean      = mn, 
                        sd        = stdev)
  out
}

29 / 56

The approach I took...

describe <- function(x) {
  n     <- as.integer(length(na.omit(x)))
  nmiss <- as.integer(sum(is.na(x)))
  mn    <- mean(x, na.rm = TRUE)
   stdev <- sd(x, na.rm = TRUE) # Standard deviation
  out <- tibble::tibble(n_valid   = n, 
                        n_missing = nmiss, 
                        mean      = mn, 
                        sd        = stdev)
  out
}

30 / 56

The approach I took...

describe <- function(x) {
  n     <- as.integer(length(na.omit(x)))
  nmiss <- as.integer(sum(is.na(x)))
  mn    <- mean(x, na.rm = TRUE)
  stdev <- sd(x, na.rm = TRUE)
   out <- tibble::tibble(n_valid   = n,     # Bundle it all
                         n_missing = nmiss, 
                         mean      = mn, 
                         sd        = stdev)
  out
}

31 / 56

The approach I took...

describe <- function(x) {
  n     <- as.integer(length(na.omit(x)))
  nmiss <- as.integer(sum(is.na(x)))
  mn    <- mean(x, na.rm = TRUE)
  stdev <- sd(x, na.rm = TRUE)
  out <- tibble::tibble(n_valid   = n, 
                        n_missing = nmiss, 
                        mean      = mn, 
                        sd        = stdev)
   out # Return the tibble
}

32 / 56

Informal testing

describe(rnorm(100))

## # A tibble: 1 x 4
##   n_valid n_missing   mean    sd
##     <int>     <int>  <dbl> <dbl>
## 1     100         0 0.0203  1.10

describe(c(rnorm(1000, 10, 4), rep(NA, 27)))

## # A tibble: 1 x 4
##   n_valid n_missing  mean    sd
##     <int>     <int> <dbl> <dbl>
## 1    1000        27  10.0  4.20

33 / 56

Demo

Package skeleton:

usethis::create_package
usethis::use_r
Use roxygen2 special comments for documentation
Run devtools::document
Install and restart, play around

34 / 56

roxygen2 comments

Typical arguments

@param: Describe the formal arguments. State argument name and the describe it.

#' @param x Vector to describe

@return: What does the function return

#' @return A tibble with descriptive data

@example or more commonly @examples: Provide examples of the use of your function.
@export: Export your function

If you don't include @export, your function will be internal, meaning others can't access it easily.

35 / 56

Other docs.gitignore: Files to ignore for git commits with some pre-slugged entries
NAMESPACE: Created by {roxygen2}. Don't edit it. If you need to, trash it and it will be reproduced. 
DESCRIPTION: Describes your package (more on next slide)
man/: The documentation files. Created by {roxygen2}. Don't edit.
36 / 56

`DESCRIPTION`

Metadata about the package. Default fields for our package are

Package: practice
Version: 0.0.0.9000
Title: What the Package Does (One Line, Title Case)
Description: What the package does (one paragraph).
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
License: What license is it under?
Encoding: UTF-8
LazyData: true
ByteCompile: true
RoxygenNote: 6.0.1

37 / 56

`DESCRIPTION`

Metadata about the package. Default fields for our package are

Package: practice
Version: 0.0.0.9000
Title: What the Package Does (One Line, Title Case)
Description: What the package does (one paragraph).
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
License: What license is it under?
Encoding: UTF-8
LazyData: true
ByteCompile: true
RoxygenNote: 6.0.1

This is where the information for citation(package = "practice") will come from.

37 / 56

`DESCRIPTION`

Metadata about the package. Default fields for our package are

Package: practice
Version: 0.0.0.9000
Title: What the Package Does (One Line, Title Case)
Description: What the package does (one paragraph).
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
License: What license is it under?
Encoding: UTF-8
LazyData: true
ByteCompile: true
RoxygenNote: 6.0.1

This is where the information for citation(package = "practice") will come from.

Some advice - edit within RStudio, or a good text editor like sublimetext. "Fancy" quotes and things can screw this up.

37 / 56

Description File Fields

The ‘Package’, ‘Version’, ‘License’, ‘Description’, ‘Title’, ‘Author’, and ‘Maintainer’ fields are mandatory, all other fields are optional. - Writing R Extensions

Some optional fields include

Imports and Suggests (we'll do this in a minute).
URL
BugReports
License (we'll have {usethis} create this for us).
LazyData

38 / 56

`DESCRIPTION` for {esvis}

Package: esvis
Type: Package
Title: Visualization and Estimation of Effect Sizes
Version: 0.1.0.9000
Authors@R: person("Daniel", "Anderson", email = "daniela@uoregon.edu", 
       role = c("aut", "cre"))
Description: A variety of methods are provided to estimate and visualize
    distributional differences in terms of effect sizes. Particular emphasis
    is upon evaluating differences between two or more distributions across
    the entire scale, rather than at a single point (e.g., differences in
    means). For example, Probability-Probability (PP) plots display the
    difference between two or more distributions, matched by their empirical
    CDFs (see Ho and Reardon, 2012; <doi:10.3102/1076998611411918>), allowing
    for examinations of where on the scale distributional differences are
    largest or smallest. The area under the PP curve (AUC) is an effect-size
    metric, corresponding to the probability that a randomly selected
    observation from the x-axis distribution will have a higher value
    than a randomly selected observation from the y-axis distribution. 
    Binned effect size plots are also available, in which the distributions
    are split into bins (set by the user) and separate effect sizes (Cohen's
    d) are produced for each bin - again providing a means to evaluate the
    consistency (or lack thereof) of the difference between two or more 
    distributions at different points on the scale. Evaluation of empirical 
    CDFs is also provided, with  built-in arguments for providing annotations 
    to help evaluate distributional differences at specific points (e.g., 
    semi-transparent shading). All function take a consistent argument 
    structure. Calculation of specific effect sizes is also possible. The
    following effect sizes are estimable: (a) Cohen's d, (b) Hedges' g, 
    (c) percentage above a cut, (d) transformed (normalized) percentage above 
    a cut, (e)  area under the PP curve, and (f) the V statistic (see Ho, 
    2009; <doi:10.3102/1076998609332755>), which essentially transforms the 
    area under the curve to standard deviation units. By default, effect sizes 
    are calculated for all possible pairwise comparisons, but a reference 
    group (distribution) can be specified.

39 / 56

`DESCRIPTION` for {esvis} (continued)

Depends:
    R (>= 3.1)
Imports:
    sfsmisc
URL: https://github.com/DJAnderson07/esvis
BugReports: https://github.com/DJAnderson07/esvis/issues
License: MIT + file LICENSE
LazyData: true
RoxygenNote: 6.0.1
Suggests: 
    testthat, 
    viridisLite

40 / 56

DemoChange the author name. Add a contributer just for fun.

Add a license. We'll go for MIT license using usethis::use_mit_license("First and Last Name")
Install and reload.
41 / 56

Declare dependencies

The function depends on the tibble function within the {tibble} package.
We have to declare this dependency

42 / 56

Declare dependencies

The function depends on the tibble function within the {tibble} package.
We have to declare this dependency

My preferred approach

Declare package dependencies: usethis::use_package
Create a package documentation page: usethis::use_package_doc
- Declare all dependencies for your package there
- Only import the functions you need - not the entire package
  - Use #' importFrom pkg fun_name
Generally won't have to worry about namespacing (tibble::tibble becomes just plain old tibble). The likelihood of conflicts is also reduced, so long as you don't import the full package.

42 / 56

Demo43 / 56

Write tests!What does it mean to write tests?ensure your package does what you expect it to

44 / 56

Write tests!

What does it mean to write tests?
- ensure your package does what you expect it to
Why write tests?
- If you write a new function, and it breaks an old one, that's good to know!
- Reduces bugs, makes your package code more robust

44 / 56

Write tests!

What does it mean to write tests?
- ensure your package does what you expect it to
Why write tests?
- If you write a new function, and it breaks an old one, that's good to know!
- Reduces bugs, makes your package code more robust
How do you write tests?
- usethis::use_testthat sets up the infrastructure
- make assertions, e.g.: testthat::expect_equal(), testthat::expect_warning(), testthat::expect_error()

44 / 56

Testing

We'll skip over testing for today, because we just don't have time to cover everything. A few good resources:

Richie Cotton's book

r-pkgs Chapter

Karl Broman Blog Post

45 / 56

Check your R packageUse devtools::check() to run the same checks CRAN will run on your R package.Use devtools::build_win() to run the checks on CRAN computers.

46 / 56

Check your R package

Use devtools::check() to run the same checks CRAN will run on your R package.
- Use devtools::build_win() to run the checks on CRAN computers.

The first time, you'll likely get errors. Be patient. It will probably be frustrating, but ultimately worth the effort.

46 / 56

Let's check now!47 / 56

🎉 Hooray! 🎉

You have a package!

48 / 56

A few other best practicesCreate a README with usethis::use_readme_rmd.
49 / 56

A few other best practices

Create a README with usethis::use_readme_rmd.
Try to get your code coverage up above 80%.

49 / 56

A few other best practices

Create a README with usethis::use_readme_rmd.
Try to get your code coverage up above 80%.
Automate wherever possible ({devtools} and {usethis} help a lot with this)

49 / 56

A few other best practices

Create a README with usethis::use_readme_rmd.
Try to get your code coverage up above 80%.
Automate wherever possible ({devtools} and {usethis} help a lot with this)
Use the {goodpractice} package to help you package code be more robust, specifically with goodpractice::gp(). It will give you lots of good ideas

49 / 56

A few other best practices

Create a README with usethis::use_readme_rmd.
Try to get your code coverage up above 80%.
Automate wherever possible ({devtools} and {usethis} help a lot with this)
Use the {goodpractice} package to help you package code be more robust, specifically with goodpractice::gp(). It will give you lots of good ideas
Host on GitHub, and capitalize on integration with other systems (all free, but require registering for an account)

49 / 56

Any time left?Why you should use git and GitHub50 / 56

esvis

51 / 56

Quickly

Get started with usethis::use_git, followed by usethis::use_github.

For this to work, you’ll need to set a GITHUB_PAT environment variable in your ~/.Renviron. Follow Jenny Bryan’s instructions, and use edit_r_environ() to easily access the right file for editing

Note: I haven't played around with this much. Standard git procedures will work too.

52 / 56

Create a `README`

Use standard R Markdown. Setup the infrastructure with usethis::use_readme_rmd.
Write it just like a normal R Markdown doc and it should all flow into the README.

53 / 56

Use Travis/AppveyorRegister for a free account
Run usethis::use_travis and usethis::use_appveyor to get started.Go to each respective website and "turn on" the repo
Copy and paste the code to the badge into your README.

54 / 56

Use Travis/Appveyor

Register for a free account
Run usethis::use_travis and usethis::use_appveyor to get started.
- Go to each respective website and "turn on" the repo
- Copy and paste the code to the badge into your README.

Now all your code will be automatically tested on Mac/Linux (Travis CI) and Windows (Appveyor)

54 / 56

codevoc

You can test your code coverage each time you push a new commit by using codecov. Initialize with usethis::use_coverage(). Overall setup process is pretty similar to Travis CI/Appveyor.

Easily see what is/is not covered by tests!

55 / 56

That's allThanks so much!56 / 56

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help