+ - 0:00:00
Notes for current slide
Notes for next slide

Developing Your First R Package

A Case Study with esvis

Daniel Anderson

04-10-2018

1 / 56

Want to follow along?

If you'd like to follow along, pleae make sure you have the following packages installed

install.packages(c("tidyverse", "esvis",
"devtools", "roxygen2",
"usethis"))
2 / 56

#whomi

  • Research Assistant Professor (newly) in the College of Education
  • Work a lot in growth modeling and measurement
  • Preacher of the R gospel
  • Dad of two amazing girls

3 / 56

the fam

4 / 56

Some review

  • What is an R function?
5 / 56

Some review

  • What is an R function?

    • Anything that carries out an operation in R, including + and <-
5 / 56

Some review

  • What is an R function?

    • Anything that carries out an operation in R, including + and <-
  • What are the components of a function?

5 / 56

Some review

  • What is an R function?

    • Anything that carries out an operation in R, including + and <-
  • What are the components of a function?

    • Formals (arguments), Body (everything between the braces), and Environment (where the function lives)

fun_components

5 / 56

When should you write a function?

Hadley's rule

You should consider writing a function whenever you’ve copied and pasted a block of code more than twice (i.e. you now have three copies of the same code).

r4ds

6 / 56
7 / 56

Bundle your functions

Once you've written more than one function, you may want to bundle them. There are two general ways to do this:

8 / 56

Bundle your functions

Once you've written more than one function, you may want to bundle them. There are two general ways to do this:

source?

Write a package

8 / 56

Bundle your functions

Once you've written more than one function, you may want to bundle them. There are two general ways to do this:

source?

Write a package

8 / 56

Reasons to avoid sourceing

  • Documentation is generally more sparse
  • Directory issues
    • Which leads to reproducibility issues
9 / 56

More importantly

Bundling functions into a package is not that hard!

10 / 56

my journey with esvis

11 / 56

Background

Effect sizes

Standardized mean differences

12 / 56

Background

Effect sizes

Standardized mean differences

  • Assumes reasonably normally distributed distributions (mean is a good indicator of central tendency)
12 / 56

Background

Effect sizes

Standardized mean differences

  • Assumes reasonably normally distributed distributions (mean is a good indicator of central tendency)

  • Differences in means may not reflect differences at all points in scale if variances are different

12 / 56

Background

Effect sizes

Standardized mean differences

  • Assumes reasonably normally distributed distributions (mean is a good indicator of central tendency)

  • Differences in means may not reflect differences at all points in scale if variances are different

  • Substantive interest may also lie with differences at other points in the distribution.

12 / 56

Varying differences

Quick simulated example

library(tidyverse)
common_var <- tibble(low = rnorm(1000, 10, 1),
high = rnorm(1000, 12, 1),
var = "common")
diff_var <- tibble(low = rnorm(1000, 10, 1),
high = rnorm(1000, 12, 2),
var = "diff")
d <- bind_rows(common_var, diff_var)
head(d)
## # A tibble: 6 x 3
## low high var
## <dbl> <dbl> <chr>
## 1 10.4 11.4 common
## 2 9.48 10.7 common
## 3 11.7 10.4 common
## 4 8.97 11.0 common
## 5 9.96 12.1 common
## 6 8.76 12.1 common
13 / 56

Restructure the data for plotting

d <- d %>%
gather(group, value, -var)
d
## # A tibble: 4,000 x 3
## var group value
## <chr> <chr> <dbl>
## 1 common low 10.4
## 2 common low 9.48
## 3 common low 11.7
## 4 common low 8.97
## 5 common low 9.96
## 6 common low 8.76
## 7 common low 10.1
## 8 common low 11.1
## 9 common low 11.9
## 10 common low 9.50
## # ... with 3,990 more rows
14 / 56

Plot the distributions

theme_set(theme_minimal())
ggplot(d, aes(value, color = group)) +
geom_density(lwd = 1.5) +
facet_wrap(~var)

15 / 56

Binned effect sizes

  1. Cut the distributions into n bins (based on percentiles)
  2. Calculate the mean difference between paired bins
  3. Divide each mean difference by the overall pooled standard deviation

d[i]=X¯foc[i]X¯ref[i](nfoc1)Varfoc+(nref1)Varrefnfoc+nref2

16 / 56

Binned effect sizes

  1. Cut the distributions into n bins (based on percentiles)
  2. Calculate the mean difference between paired bins
  3. Divide each mean difference by the overall pooled standard deviation

d[i]=X¯foc[i]X¯ref[i](nfoc1)Varfoc+(nref1)Varrefnfoc+nref2

visualize it!

16 / 56

Back to the simultated example

common <- filter(d, var == "common")
diff <- filter(d, var == "diff")
library(esvis)
qtile_es(value ~ group, common)
## ref_group foc_group low_qtile high_qtile midpoint es se
## 1 high low 0.00 0.33 0.165 -2.060092 0.09645691
## 2 high low 0.33 0.66 0.495 -2.072788 0.09651680
## 3 high low 0.66 0.99 0.825 -2.044473 0.09605817
qtile_es(value ~ group, diff)
## ref_group foc_group low_qtile high_qtile midpoint es se
## 1 high low 0.00 0.33 0.165 -0.6429559 0.07995721
## 2 high low 0.33 0.66 0.495 -1.3213209 0.08592584
## 3 high low 0.66 0.99 0.825 -1.9278210 0.09421322
17 / 56

Visualize it

Common Variance

binned_plot(value ~ group, common)

Different Variance

binned_plot(value ~ group, diff)

18 / 56

Wait a minute...

  • The esvis package will (among other things) calculate and visually display binned effect sizes.
  • But how did we get from an idea, to functions, to a package?

confused

19 / 56

taking a step back

20 / 56

Package Creation

The (a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
21 / 56

Package Creation

The (a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
  2. Write a function! or more likely, a set of functions
21 / 56

Package Creation

The (a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
  2. Write a function! or more likely, a set of functions
  3. Create package skelton
21 / 56

Package Creation

The (a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
  2. Write a function! or more likely, a set of functions
  3. Create package skelton
  4. Document your function
21 / 56

Package Creation

The (a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
  2. Write a function! or more likely, a set of functions
  3. Create package skelton
  4. Document your function
  5. Install/fiddle/install
21 / 56

Package Creation

The (a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
  2. Write a function! or more likely, a set of functions
  3. Create package skelton
  4. Document your function
  5. Install/fiddle/install
  6. Write tests for your functions
21 / 56

Package Creation

The (a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
  2. Write a function! or more likely, a set of functions
  3. Create package skelton
  4. Document your function
  5. Install/fiddle/install
  6. Write tests for your functions
  7. Host your package somewhere public (GitHub is probably best) and promote it - leverage the power of open source!
21 / 56

Package Creation

The (a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
  2. Write a function! or more likely, a set of functions
  3. Create package skelton
  4. Document your function
  5. Install/fiddle/install
  6. Write tests for your functions
  7. Host your package somewhere public (GitHub is probably best) and promote it - leverage the power of open source!

Use tools throughout (which we'll talk about momentarily) to help automate many of the steps, and make the whole thing less painful

21 / 56

A really good point


And some further recommendations/good advice

22 / 56

Some resources

We surely won't get through all the steps tonight. In my mind, the best resources are:

Advanced R

R Packages

23 / 56

Some resources

We surely won't get through all the steps tonight. In my mind, the best resources are:

Advanced R

R Packages

For a really quick but really good intro, see Hilary Parker's blog post

23 / 56

Our package

We're going to write a package today! Let's keep it really simple...

  1. Idea: Report basic descriptive statistics for a vector, x: n, mean, and sd. Let's also have it report on the number of missing observations.
24 / 56

Our function

  • Let's have it return either (a) a named vector, or (b) a dataframe (whichever you prefer is fine)
  • What will be the formal arguments?
  • What will the body look like?
25 / 56

Our function

  • Let's have it return either (a) a named vector, or (b) a dataframe (whichever you prefer is fine)
  • What will be the formal arguments?
  • What will the body look like?

    Want to give it a go?

25 / 56

The approach I took...

describe <- function(x) {
n <- as.integer(length(na.omit(x)))
nmiss <- as.integer(sum(is.na(x)))
mn <- mean(x, na.rm = TRUE)
stdev <- sd(x, na.rm = TRUE)
out <- tibble::tibble(n_valid = n,
n_missing = nmiss,
mean = mn,
sd = stdev)
out
}
26 / 56

The approach I took...

describe <- function(x) {
n <- as.integer(length(na.omit(x))) # Count number of valid cases
nmiss <- as.integer(sum(is.na(x)))
mn <- mean(x, na.rm = TRUE)
stdev <- sd(x, na.rm = TRUE)
out <- tibble::tibble(n_valid = n,
n_missing = nmiss,
mean = mn,
sd = stdev)
out
}
27 / 56

The approach I took...

describe <- function(x) {
n <- as.integer(length(na.omit(x)))
nmiss <- as.integer(sum(is.na(x))) # Count the number of missing
mn <- mean(x, na.rm = TRUE)
stdev <- sd(x, na.rm = TRUE)
out <- tibble::tibble(n_valid = n,
n_missing = nmiss,
mean = mn,
sd = stdev)
out
}
28 / 56

The approach I took...

describe <- function(x) {
n <- as.integer(length(na.omit(x)))
nmiss <- as.integer(sum(is.na(x)))
mn <- mean(x, na.rm = TRUE) # Calculate mean
stdev <- sd(x, na.rm = TRUE)
out <- tibble::tibble(n_valid = n,
n_missing = nmiss,
mean = mn,
sd = stdev)
out
}
29 / 56

The approach I took...

describe <- function(x) {
n <- as.integer(length(na.omit(x)))
nmiss <- as.integer(sum(is.na(x)))
mn <- mean(x, na.rm = TRUE)
stdev <- sd(x, na.rm = TRUE) # Standard deviation
out <- tibble::tibble(n_valid = n,
n_missing = nmiss,
mean = mn,
sd = stdev)
out
}
30 / 56

The approach I took...

describe <- function(x) {
n <- as.integer(length(na.omit(x)))
nmiss <- as.integer(sum(is.na(x)))
mn <- mean(x, na.rm = TRUE)
stdev <- sd(x, na.rm = TRUE)
out <- tibble::tibble(n_valid = n, # Bundle it all
n_missing = nmiss,
mean = mn,
sd = stdev)
out
}
31 / 56

The approach I took...

describe <- function(x) {
n <- as.integer(length(na.omit(x)))
nmiss <- as.integer(sum(is.na(x)))
mn <- mean(x, na.rm = TRUE)
stdev <- sd(x, na.rm = TRUE)
out <- tibble::tibble(n_valid = n,
n_missing = nmiss,
mean = mn,
sd = stdev)
out # Return the tibble
}
32 / 56

Informal testing

describe(rnorm(100))
## # A tibble: 1 x 4
## n_valid n_missing mean sd
## <int> <int> <dbl> <dbl>
## 1 100 0 0.0203 1.10
describe(c(rnorm(1000, 10, 4), rep(NA, 27)))
## # A tibble: 1 x 4
## n_valid n_missing mean sd
## <int> <int> <dbl> <dbl>
## 1 1000 27 10.0 4.20
33 / 56

Demo

Package skeleton:

  • usethis::create_package
  • usethis::use_r
  • Use roxygen2 special comments for documentation
  • Run devtools::document
  • Install and restart, play around
34 / 56

roxygen2 comments

Typical arguments

  • @param: Describe the formal arguments. State argument name and the describe it.

#' @param x Vector to describe

  • @return: What does the function return

#' @return A tibble with descriptive data

  • @example or more commonly @examples: Provide examples of the use of your function.
  • @export: Export your function

If you don't include @export, your function will be internal, meaning others can't access it easily.

35 / 56

Other docs

  • .gitignore: Files to ignore for git commits with some pre-slugged entries
  • NAMESPACE: Created by {roxygen2}. Don't edit it. If you need to, trash it and it will be reproduced.
  • DESCRIPTION: Describes your package (more on next slide)
  • man/: The documentation files. Created by {roxygen2}. Don't edit.
36 / 56

DESCRIPTION

Metadata about the package. Default fields for our package are

Package: practice
Version: 0.0.0.9000
Title: What the Package Does (One Line, Title Case)
Description: What the package does (one paragraph).
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
License: What license is it under?
Encoding: UTF-8
LazyData: true
ByteCompile: true
RoxygenNote: 6.0.1
37 / 56

DESCRIPTION

Metadata about the package. Default fields for our package are

Package: practice
Version: 0.0.0.9000
Title: What the Package Does (One Line, Title Case)
Description: What the package does (one paragraph).
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
License: What license is it under?
Encoding: UTF-8
LazyData: true
ByteCompile: true
RoxygenNote: 6.0.1

This is where the information for citation(package = "practice") will come from.

37 / 56

DESCRIPTION

Metadata about the package. Default fields for our package are

Package: practice
Version: 0.0.0.9000
Title: What the Package Does (One Line, Title Case)
Description: What the package does (one paragraph).
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
License: What license is it under?
Encoding: UTF-8
LazyData: true
ByteCompile: true
RoxygenNote: 6.0.1

This is where the information for citation(package = "practice") will come from.

Some advice - edit within RStudio, or a good text editor like sublimetext. "Fancy" quotes and things can screw this up.

37 / 56

Description File Fields

The ‘Package’, ‘Version’, ‘License’, ‘Description’, ‘Title’, ‘Author’, and ‘Maintainer’ fields are mandatory, all other fields are optional. - Writing R Extensions

Some optional fields include

  • Imports and Suggests (we'll do this in a minute).
  • URL
  • BugReports
  • License (we'll have {usethis} create this for us).
  • LazyData
38 / 56

DESCRIPTION for {esvis}

Package: esvis
Type: Package
Title: Visualization and Estimation of Effect Sizes
Version: 0.1.0.9000
Authors@R: person("Daniel", "Anderson", email = "daniela@uoregon.edu",
role = c("aut", "cre"))
Description: A variety of methods are provided to estimate and visualize
distributional differences in terms of effect sizes. Particular emphasis
is upon evaluating differences between two or more distributions across
the entire scale, rather than at a single point (e.g., differences in
means). For example, Probability-Probability (PP) plots display the
difference between two or more distributions, matched by their empirical
CDFs (see Ho and Reardon, 2012; <doi:10.3102/1076998611411918>), allowing
for examinations of where on the scale distributional differences are
largest or smallest. The area under the PP curve (AUC) is an effect-size
metric, corresponding to the probability that a randomly selected
observation from the x-axis distribution will have a higher value
than a randomly selected observation from the y-axis distribution.
Binned effect size plots are also available, in which the distributions
are split into bins (set by the user) and separate effect sizes (Cohen's
d) are produced for each bin - again providing a means to evaluate the
consistency (or lack thereof) of the difference between two or more
distributions at different points on the scale. Evaluation of empirical
CDFs is also provided, with built-in arguments for providing annotations
to help evaluate distributional differences at specific points (e.g.,
semi-transparent shading). All function take a consistent argument
structure. Calculation of specific effect sizes is also possible. The
following effect sizes are estimable: (a) Cohen's d, (b) Hedges' g,
(c) percentage above a cut, (d) transformed (normalized) percentage above
a cut, (e) area under the PP curve, and (f) the V statistic (see Ho,
2009; <doi:10.3102/1076998609332755>), which essentially transforms the
area under the curve to standard deviation units. By default, effect sizes
are calculated for all possible pairwise comparisons, but a reference
group (distribution) can be specified.
39 / 56

DESCRIPTION for {esvis} (continued)

Depends:
R (>= 3.1)
Imports:
sfsmisc
URL: https://github.com/DJAnderson07/esvis
BugReports: https://github.com/DJAnderson07/esvis/issues
License: MIT + file LICENSE
LazyData: true
RoxygenNote: 6.0.1
Suggests:
testthat,
viridisLite
40 / 56

Demo

  • Change the author name.
    • Add a contributer just for fun.
  • Add a license. We'll go for MIT license using usethis::use_mit_license("First and Last Name")
  • Install and reload.
41 / 56

Declare dependencies

  • The function depends on the tibble function within the {tibble} package.
  • We have to declare this dependency
42 / 56

Declare dependencies

  • The function depends on the tibble function within the {tibble} package.
  • We have to declare this dependency

My preferred approach

  • Declare package dependencies: usethis::use_package
  • Create a package documentation page: usethis::use_package_doc
    • Declare all dependencies for your package there
    • Only import the functions you need - not the entire package
      • Use #' importFrom pkg fun_name
  • Generally won't have to worry about namespacing (tibble::tibble becomes just plain old tibble). The likelihood of conflicts is also reduced, so long as you don't import the full package.
42 / 56

Demo

43 / 56

Write tests!

  • What does it mean to write tests?
    • ensure your package does what you expect it to
44 / 56

Write tests!

  • What does it mean to write tests?

    • ensure your package does what you expect it to
  • Why write tests?

    • If you write a new function, and it breaks an old one, that's good to know!
    • Reduces bugs, makes your package code more robust
44 / 56

Write tests!

  • What does it mean to write tests?

    • ensure your package does what you expect it to
  • Why write tests?

    • If you write a new function, and it breaks an old one, that's good to know!
    • Reduces bugs, makes your package code more robust
  • How do you write tests?

    • usethis::use_testthat sets up the infrastructure
    • make assertions, e.g.: testthat::expect_equal(), testthat::expect_warning(), testthat::expect_error()
44 / 56

Testing

We'll skip over testing for today, because we just don't have time to cover everything. A few good resources:

45 / 56

Check your R package

  • Use devtools::check() to run the same checks CRAN will run on your R package.
    • Use devtools::build_win() to run the checks on CRAN computers.
46 / 56

Check your R package

  • Use devtools::check() to run the same checks CRAN will run on your R package.
    • Use devtools::build_win() to run the checks on CRAN computers.

The first time, you'll likely get errors. Be patient. It will probably be frustrating, but ultimately worth the effort.

46 / 56

Let's check now!

47 / 56

🎉 Hooray! 🎉

You have a package!

48 / 56

A few other best practices

  • Create a README with usethis::use_readme_rmd.
49 / 56

A few other best practices

  • Create a README with usethis::use_readme_rmd.

  • Try to get your code coverage up above 80%.

49 / 56

A few other best practices

  • Create a README with usethis::use_readme_rmd.

  • Try to get your code coverage up above 80%.

  • Automate wherever possible ({devtools} and {usethis} help a lot with this)

49 / 56

A few other best practices

  • Create a README with usethis::use_readme_rmd.

  • Try to get your code coverage up above 80%.

  • Automate wherever possible ({devtools} and {usethis} help a lot with this)

  • Use the {goodpractice} package to help you package code be more robust, specifically with goodpractice::gp(). It will give you lots of good ideas

49 / 56

A few other best practices

  • Create a README with usethis::use_readme_rmd.

  • Try to get your code coverage up above 80%.

  • Automate wherever possible ({devtools} and {usethis} help a lot with this)

  • Use the {goodpractice} package to help you package code be more robust, specifically with goodpractice::gp(). It will give you lots of good ideas

  • Host on GitHub, and capitalize on integration with other systems (all free, but require registering for an account)

49 / 56

Any time left?

Why you should use git and GitHub

50 / 56

esvis

51 / 56

Quickly

  • Get started with usethis::use_git, followed by usethis::use_github.

For this to work, you’ll need to set a GITHUB_PAT environment variable in your ~/.Renviron. Follow Jenny Bryan’s instructions, and use edit_r_environ() to easily access the right file for editing

Note: I haven't played around with this much. Standard git procedures will work too.

52 / 56

Create a README

  • Use standard R Markdown. Setup the infrastructure with usethis::use_readme_rmd.
  • Write it just like a normal R Markdown doc and it should all flow into the README.

53 / 56

Use Travis/Appveyor

  • Register for a free account
  • Run usethis::use_travis and usethis::use_appveyor to get started.
    • Go to each respective website and "turn on" the repo
    • Copy and paste the code to the badge into your README.
54 / 56

Use Travis/Appveyor

  • Register for a free account
  • Run usethis::use_travis and usethis::use_appveyor to get started.
    • Go to each respective website and "turn on" the repo
    • Copy and paste the code to the badge into your README.
  • Now all your code will be automatically tested on Mac/Linux (Travis CI) and Windows (Appveyor)

54 / 56

codevoc

You can test your code coverage each time you push a new commit by using codecov. Initialize with usethis::use_coverage(). Overall setup process is pretty similar to Travis CI/Appveyor.

Easily see what is/is not covered by tests!

55 / 56

That's all

Thanks so much!

56 / 56

Want to follow along?

If you'd like to follow along, pleae make sure you have the following packages installed

install.packages(c("tidyverse", "esvis",
"devtools", "roxygen2",
"usethis"))
2 / 56
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow