1: Welcome and RStudio Vision

Tareef Kawaf

  • Three key concepts

  • The way of RStudio

  • Reproducibility

  • Imporantance of 1439 (printing press technology)

  • Mass communication

  • Code is the reproducibility, Reproducibility is good science, and good science is good business.

  • If you belileve in the code and reproducibility, you can get reuse, automation, scheduling, and parametrization.

4 reasons to love code:

  • Repeatable
  • Inspectable
  • Reusable
  • Diffable

Code Importance

  • Complex problems will require code
  • Code is communication, and communication is critical
  • With code you can inspect how a problem is solved and either adopt it, or figure out how to improve it yourself.
  • With code, the answer is always YES!!

2: R shiny Keynote Talk

Joe Cheng

slide: https://speakerdeck.com/jcheng5/shiny-in-production

rstudio/cranwhales

cranwhales is a Shiny example application, meant to demonstrate how to convert synchronous (traditional) Shiny apps to asynchronous ones.

Staging and Production

Goals:

  • keep it up -Unplanned outages are rare or nonexistent

  • keep it safe -Data, functionality, and code are all kept safe from unauthorized access

  • keep it correct
    -works as intended, provide right answers

  • keep it snappy

    • Fast response time, ability to predict needed capacity for expected traffic

New tools for Shiny in production

  • Rstudio connenct

  • shinytest- automated UI testing for Shiny

  • shinyloadtest- load testing for shiny

  • profvis - profiler for R

  • Plot caching

  • Async -last resort technique for dealing with slow operations

Big Question: Is Cranwhales fast enough?

Performance workflow

  1. Using shinyloadtest to see if it’s fast enough

  2. if not, use profvis to see what’s making it slow.

  3. optimize
    3.1 Move work out of Shiny 3.2 Make code faster
    3.3 Use Caching
    3.4 Use async

  4. Repeat!

3: Getting it right: Writing reliable and maintainable R code

Amanda Gadrow, RStudio

Characteristics of good code:

  • Reliability

  • Reproducibility

  • Flexibility

  • Longevity

  • Scalability

How do we determine the quality of our code?

  1. Execution feedback
  2. User feedback
  3. measure it with tests; functional, integration, perfromance.

make the part of your regular development cycle for;

  • faster updates

  • R package-testthat, usethis

  • R package-shinytest

Modular design for ease of testing and maintenance

Resources to read/watch:

4: A guide to making your data reproducible

Karthik Ram

Set up Research compendium principles

  1. Stick with the conventions of your peers
  2. Keep data, methods and outputs separate
  3. Specify your computational environment as clearly as you can

Key components you’ll need for sharing a compendium

  1. licence
  2. VCS
  3. Metadata

3 separate different components ofa compendium

  1. Data (Small –> Medium)
    1.1 Small data: put small data inside packages, especially if you ship a methods package with your analysis
    1.1.1 CRAN =< 5mb
    1.2 piggyback package: Attach large [data] files to Github repositories
    1.3 Medium data arkdb package: Attach large [data] files to Github repositories

  2. Computing environment

  • It’s important to isolate the computing envrionment so that changes in software dependencies don’t break your analysis.

  • Adding a Dockerfile to your compendium

Many ways to write a Dockerfile for your project

o2r/containerit
jupyter/repo2docker

  • API
  • metadata (Tidy data)
  • tensorflow
  • Binder
    Binder is an open source project that is designed to make it really easy to share analyses that are in notebooks.

http://github.com/karthik/rstudio2019

Git + Docker + RStudio

  1. Workflows

Include a workflow to manage relationsihps between data output and code.

drake

General purpose workflow manager & pipeline toolkit for reporducibility and high-performance computing.

5: Solving R for data science

Jeffrey Arnold

R for Data Science: Exercise Solutions: https://jrnold.github.io/r4ds-exercise-solutions

github link: https://github.com/jrnold/r4ds-exercise-solutions

tidyverse work flow

tidyverse work flow

6: Working with categorical data in R without losing your mind

Amelia McNamara

Slides: http://www.amelia.mn/WranglingCats.pdf

Factors

R’s representation of categorical data. Consists of: 1. A set of values 2. An ordered set of valid levels

eyes <- factor(x = c("blue", "green", "green"),
               levels = c("blue", "brown", "green"))
eyes
## [1] blue  green green
## Levels: blue brown green

stringAsFactors: An unauthorized biography

Historical view from Roger Peng’s blog

Level manipulation functions
values change to match levels

  • fct_recode() Relabel levels “by hand”

  • fct_relevel() Reorder levels “by hand”

  • fct_reorder() Reorder levels by another variable

  • fct_collapse() Collapse levels “by hand”

  • fct_lump() Lump levels with small counts together

  • fct_other() Replace levels with “Other”

  • R Syntax Comparison:: cheat sheet Scroll to the bottom and find the Factors with forcats Cheat Sheet https://www.rstudio.com/resources/cheatsheets/

library(tidyverse)
library(forcats)
summary(GSS$OpinionOfIncome)
GSS <- GSS %>%
    mutate(tidyOpinionOfIncome = 
               fct_relevel(OpinionOfIncome,
                           "Far above average",
                           "Above average",
                           "Below average",
                           "Far below average"))
summary(GSS$tidyOpinionOfIncome)
  • defensive coding
  • summary(), summary(), summary()
  • Certain testing function

Takeaways:

• Use forcats • Practice defensive coding • summary() is your friend • assertthat and testthat • Check out http://bit.ly/WranglingCats

7: R4DS online learning community

Jesse Mostipak

    1. install R base
    1. install RStudio
    1. google How to [XYZ] in R?

R4DS Online community

8: Scaling R with Spark by Javier Luraschi, RStudio

  • how to use R with Spark?
library(sparklyr)
spark_install()
sc <- spark_connect(master = "local")

Doc: <spark.rstudio.com>

Blog: <blog.rstudio.com/tags/sparklyr>

9: Introducing MLflow for R-Managing the Machine Learning Lifecycle by Kevin Kuo

Mlflow- open source platform

MLflow

library(mlflow)
mlflow_log_param("foo")

10: Democratizing R using plumber

download today

Resource:

Plumber

OperAPI

Github Repo

Example of API library?

11: Solving a $15M/year Employee Attrition Problem with the tidyverse (work flow) by Matt Dancho

Data science workflow

Business problem –> techinical skills –> efficient process –> business value

12: 3D mapping by Tyler

  • what is rayshader?

https://github.com/tylermorganwall/rayshader

13: R/qtl2: Rewrite of a very old R package by Karl Broman, University of Wisconsin

https://www.biostat.wisc.edu/~kbroman/presentations/rqtl2_rstudio2019.pdf

14: Keynote talk Explicit direct instruction in programming education by Felienne

Why they all have twitter?

  • How to teach programming and other things?

15: R Markdown the bigger picture

Garret Grolemund

Slide: https://github.com/garrettgman/rmarkdown-the-bigger-picture/blob/master/rstudioconf2019-rmarkdown.pdf

  • The replication prices is the dark cloud.

  • Challenges in irresproducible researech.

  • The replication crisis for you is the credibility crisis.

  • Many applications of statistics are cargo-cult statistics.

One last thing: Reproducibility is an opportunity.

16: Pagedown: Creating Beautiful PDFs with R Markdown + CSS by Yihui Xie

pagedown_github * https://github.com/rstudio/pagedown

slide : http://bit.ly/pagedown

In HTML and the Web I trust.

It is easier parsing HTML than parsing PDF.

HTML and CSS will soon catch up the Typesetting style in Latex but it is difficult for Latex or Word to catch up in other aspects in HTML, such as the interactivity.

Installation

You can install from CRAN (an initial version has been released), but this package is only two months old. You are recommended to install from Github:

remotes::install_github(\(\color{red}{\text{'rstudio/pagedown'}}\))

This package requires Pandoc 2.x, which is currently bundled in RStudio 1.2.x (you may install the preview version of RStudio).

Google Chrome or Chromium is recommended to view and print HTML pages generated from this package.

Preview HTML/ Print to PDF

Manual: https://pagedown.rbind.io/

css grid

17: Introducing the gt Package

Rich Iannone

This package is mainly used for generating tables.

18: The lazy and easily distracted report writer

Mike Smith

xkcd figure

Who is your audience?

  • Present (distracted) me

  • Future (6 months later) me

  • Quantitative colleagues / reviewers

  • Decision makers (may not be quantitative)

Rule of three

  • Copy & paste code ≥3 times?
    • Write and use a function
  • Perform analysis across ≥ 3 endpoints?
    • Multiple markdown reports?
    • NOPE. Parameterised reports.
### Hide code if we're not rendering the report for quantitative audience.
if (!params$quantAudience) knitr::opts_chunk$set(echo = FALSE)
### Hide code if we're not rendering the report for quantitative audience.
data %>% 
    head(10)
# pull child text.Rmd

# r To pull child text, eval = params$p, echo=TRUE, child = "datamanipulate_txt.Rmd" 

19 Visualizing uncertainty with hypothetical outcomes plots

Claus Wilke

Tools for visualizing uncertainty with ggplot2: https://wilkelab.org/ungeviz/

  • Hypothetical outcome plots (HOPs) are a great way of communicating uncertainty to non-experts.

  • Hadley’s challenge poses three questions

  1. How do we generate outcomes?

    • Bayesian MCMC sampling (tidybayes)
    • Bootstrapping/resampling input data
    • Normal approximation to sampling distribution
  2. How do we get them into ggplot?

    • Use package-ungeviz
library(ungeviz)
  1. Is there anything to be done?

20: New language features in RStudio

Jonathan McPherson https://github.com/jmcphers

21: gganimate live cookbook

Thomas Lin Pedersen

What is gganimate?

  • An implementation of the grammar of animated graphics
  • An extension to ggplot2
  • Not what it was a year ago!

How do I use gganimate?

22: pkgman: A fresh approach to package installation

Gábor Csárdi https://github.com/gaborcsardi

R-package: pak https://github.com/r-lib/pak

  • The main goals of pkgman is to make package installation fast and more reliable. This allows new, simpler and safer workflows, such as separate package libraries for projects.

  • pak installs R packages from CRAN, Bioconductor, GitHub, and local files and directories. It is an alternative to install.packages() and devtools::install_github(). pak is fast, safe and convenient.

Installation

Install the package from CRAN:

install.packages("pak")

(After installation, you might also want to run pak::pak_setup(); it’ll be run automatically when needed but you might want to do it now to save some time later.)

Usage

Call pkg_install() to install CRAN or Bioconductor packages:

pak::pkg_install("usethis")

To install GitHub packages, use the user/repo syntax:

pak::pkg_install("r-lib/usethis")

All dependencies will be installed as well, to the same library.

Why pak?

Fast

  • Fast downloads and HTTP queries. pak performs all HTTP requests concurrently.

  • Fast installs. pak builds and installs packages concurrently.

  • Metadata and package cache. pak caches package metadata and all downloaded packages locally. It does not download the same package files over and over again.

  • Lazy installation. pak only installs the packages that are really necessary for the installation. If the requested package and its dependencies are already installed, pak does nothing.

Safe

  • Private library (pak’s own package dependencies do not affect your regular package libraries and vice versa).

  • Every pak operation runs in a sub-process, and the packages are loaded from the private library. pak avoids loading packages from your regular package libraries. (These package files would be locked on some systems, and locked packages cannot be updated. pak does not load any package in the main process, except for pak itself).

  • To avoid updating locked packages, pak warns and requests confirmation for loaded packages.

  • Dependency solver. pak makes sure that you end up in a consistent, working state of dependencies. If finds conflicts up front, before attempting installation.

Convenient

  • BioC packages. pak supports Bioconductor packages out of the box. It uses the Bioconductor version that is appropriate for your R version.

  • GitHub packages. pak supports GitHub packages out of the box. It also supports the `Remotes entry in DESCRIPTION files, so that GitHub dependencies of GitHub packages will also get installed. See e.g. https://cran.r-project.org/package=remotes/vignettes/dependencies.html

  • Package sizes. For CRAN packages pak shows the total sizes of packages it needs to download.

23: Effecitive use of Shiny modules by Eric Nantz

http://r-prodcast.org

slide: http://bit.ly/modules2019

24: Reactlog 2.0: Debugging the state of Shiny by Barret Schloerke

slide: https://bit.ly/rstudio-conf-2019-reactlog

demo: https://bit.ly/rstudio-conf-2019-reactlog-demo

25: Integrating React.js and Shiny by Alan Dipert

slide: http://bit.ly/reactR-tutorial

tutorial: https://github.com/react-R

26: Learning R survey

http://rstd.io/learning-r-survey

27: Lazy evaluation to R

http://rstd.io/tidy-eval-context

28: RStudio Cloud for Education

Mel Gregory

  1. https://rstudio.cloud
  2. Log in
  3. Create a new project

Basic terminology: projects and spaces

A project is a fundamental unit of work.

A space is a place for a group of people to collaborate on work.

Basic terminology: roles

  • Admin: can view, manage and edit all projects, manage space membership, and delete the space itself. Good for: System Administrator

  • Moderator: can view, manage, and edit all projects in the space. Good for: project lead

  • Contributor: seee public projects in the space, and create new projects in the space. Good for: project memeber/students.

Basic terminology: visibility

A project can be public or private.

  • Private projects in a space can be viewed by the author and space admins and moderators.

  • Provate projects have a lock icon.

Summary

RStudio Cloud: Not just for education

We create RStudio Cloud to make it easy for professionals, hobbyists, trainers, teachers and students to do, share, teach and learn data science using R.

Try it out: \(\color{red}{\text{https://rstudio.cloud}}\)

Talk to us: \(\color{red}{\text{https://community.rstudio.com}}\)

29: Pimp my RMD: a few tips for R Markdown by Yan Holtz

https://holtzy.github.io/Pimp-my-rmd/

30: Keynote 3: The unreasonable effectiveness of public work

David Robinson, DataCamp

A few types of sharing:

  1. blog
  2. Tweet
  3. Contribute to open source
  4. Give talks
  5. Record a screecasts
  6. Write a book

Why spend time on public work?

  • Advance your career
  • Practice good habits
  • Contribute to the community

Disclaimer:

  1. A lot of this talk is about what worked for me
  2. The unresonable Effectiveness of Public work (~Necessity~)

Write a blog

  • if you repeat your code 3 times, write a function.

  • if you give in-person advice to the same question 3 times, write a blog to explain/post it.

blogdown package

But what I can blog about?
1. provide some information
TidyTuesday (R4DS online community)
2. Teach a concept

Contribute to the community

  • Contribute by creating packages

Give a talk

  • Imagine giving the talk to yourself 3-6 months ago
  • Practice giving the talk out loud
  • Don’t rely on bullet points
  • Be sure to publish your slides

where to give a talk?

  • Internal seminar
  • Local meetup
  • Larger conference

Recording a screencasts

Limination of screencasts

  1. you need to be capable and confident enough to improvise

  2. You have to be comfortable embarrassing yourself!

Write a book

  • Introduce the bookdown package

31: Final Panel discussion

By Hilary Parker, Karthik Ram, Angela Bassa, Tracy Teal, Eduarado

Key words worth to mention in the conference

  • API
  • css
  • plumber
  • metadata
  • <www.stackoverflow.com>