URPP Tutorial on R

Stefan Wyder

Clean Coding & Project Organization

Running R from Command Line

Clean coding

The main goal is readability

"You're always working on a project with at least 1 other collaborator,
and that is future you" Hadley Wickham



Anyone should be able to pick up the code and understand what is does.


Use

Be consistent

Style 1 Style 2
Variables avgExpr avg_Expr
Functions CalculateAvgExpr() Calculate_Avg_Expr()
Constants kConstantName


 
see e.g. Google Stylesheet

Use functions

Break down the problem into functions


One operation, one function

Comments

"Write code that is so clear that you don't need to comment it, and then
comment it anyway." Tracy Teal
describe your intent
reasons for approaches
sources of data / code / algorithm

Some best practises

Different styles are possible. These are just some suggestions (learned the hard way)

Keep all your source files for a project in the same directory.

Then use relative paths.

df <- read.table("files/data.csv", header = TRUE)

rather than

df <- read.table("/home/wyder/PROJECTS/coexpression/files/data.csv", header = TRUE)

More tips

For larger projects separate your scripts

load.R load raw data
clean.R reformat and transform data, clean outliers, handles missing values
functions.R keep all functions in a separate file
do.R your actual analysis: loads, cleans data and does analysis

Bioconductor

www.bioconductor.org

Tools for the statistical analysis and comprehension of genomic data in the R programming language.
Also data integration.

Sources

Software Carpentry