5 Functions
5.1 Function Basics - an example using str()
Functions are the main way that you will handle or manipulate data. Something is a function if it is in the form functionname(arg1, arg2, …)
To see what a function does, and what arguments it requires, you can enter the function name into the help function, eg help(str)
5.2 Common (base) functions
5.2.1 Strings
x <- "The cat sat on the mat"
# Substring - Use with any vector or list. You should include the start and end position.
substr(x, 3, 4)
# Search for a string within a string - will return either TRUE or FALSE.
grepl("cat", x)
# Find and replace a string within a string.
gsub("cat","dog", x)
# Paste - Concatenates vectors or lists (including an optional separator). All values will be converted into strings.
paste("value1", "value2", sep=" ")
paste0("value1", "value2") # This function doesnt have any separator
# Remove leading/trailing whitespace from character strings with trimws()
trimws(" Hello World ")
# Apply a special format to a number/string and return it as a string
sprintf('%03d', 1)
#This simple example converts 1 into a string and pads it with 0's so that it 3 characters long. The first argument defines what and how text willl be displayed. The % symbol is a placeholders that can have different types. All subsequent arguments represent the data variables that are represented by the placeholders (in the order they occur).For more info see https://www.rdocumentation.org/packages/base/versions/3.4.3/topics/sprintf.5.2.2 Mathematical
Round/Floor/Ceiling
round(5.15, digits=1) # Specify number of decimal places to round to
round(515, digits=-1) # A negative digit rounds to the nearest 10^n (ie. 10, 100, 1000)
floor(5.15) # Rounds down to nearest whole number
ceiling(5.15) # Rounds up to nearest whole numberSum
5.2.3 Properties & Lookups
Length - The number of elements in an object
## [1] 1
## [1] 4
## [1] 11
Logic functions for different types of value
## [1] FALSE TRUE FALSE FALSE
## [1] TRUE
Find the POSITION of the element with highest/lowest value
5.2.4 Transformation
Reverse elements in a vector
Format numbers to include comma separators - this converts numbers to characters
## [1] "1,000" "1,500" "20,000"
When a DataFrame/tibble has been created, various functions can be used to describe it, such as:
- dim() # Shows number of rows and colums
- length() # Number of columns
- colnames() # The column names
- head() # Displays first 6 rows
- str() # variable name, type and example values
There are also some mathematical functions, including: * summary() # Summary statistics for numeric columns (mean, min, max, Q1, Q3) * colSums() # Sum of each column * colMeans() # Mean of each column
5.2.5 Other useful functions
Generate a number sequence
## [1] 1 2 3 4 5
## [1] 1 3 5
5.3 Stringr - more string functions
Loading packages will give you access to more complex functions:
library(stringr)
x <- "string to test"
# Count of words in a string
stringr::str_count(x, " ") # Different delimiters can be used## [1] 2
## [1] "string"
# Split a string into N pieces base on your chosen delimiter - stored in a matrix
stringr::str_split_fixed(x, "to", 2) ## [,1] [,2]
## [1,] "string " " test"
## [1] "string to replace"
# Locate the start and end position of a string - stored in a matrix
stringr::str_locate(x, "test")[[1]] # [[1]] is needed to return start, [[2]] to get end## [1] 11
# Detect if a sub-string appears within a string
stringr::str_detect(x, "test") # Returns either TRUE or FALSE## [1] TRUE
You can add regular expressions to these functions. https://stringr.tidyverse.org/articles/regular-expressions.html
5.4 Lubridate - more date functions
Lubridate helps parse, convert, and extract information from dates.
## [1] 6
## [1] 14
lubridate::month(date,
label = TRUE, # Switch between numeric (FALSE) and character (TRUE) month.
abbr = FALSE) # Switch between full (FALSE) and abbreviated (TRUE) names. ## [1] April
## 12 Levels: January < February < March < April < May < June < ... < December
## [1] 2
## [1] 2017
5.5 Converting data to other formats
5.5.1 JSON
The RJSONIO package can convert R objects into JSON.
A vector/list will become a JSON array ["“,”“,”"]
## [1] "[ 1, 2, 4, 8, 16 ]"
A dataframe or List will become a JSON object eg. {"“:[],”":[]}
5.6 User Defined Functions
The basic syntax to create a UDF is:
myfunction <- function(arg1, arg2, ... ){
statements
return(object)
}
#For example this function checks whether a number is equal to 1 and returns a Yes/No vector:
checkifone <- function(x){
if (x==1) { return("Yes") }
else return("No")
}
#You can return any object, including vectors, list, other objects or functions.To use a function :
checkifone(23)
# Or save the result of the function in a new variable
functionresult <- checkifone(23) #functionresult will contain either Yes or NO ##Using dplyr in a function - Passing columns names to a function conating dplyr will not work:
udf <- function(d, x, y) {
# Column names referenced in a function call must be converted with the enqou() function.
x <- enquo(x)
y <- enquo(y)
# Columns names can now be used by adding the !! prefix (or use the UQ() function)
return(d %>% group_by(!!x) %>% summarise(n =n(), avg=mean(!!y)))
}
udf(mtcars, cyl, mpg)5.7 Chaining/Piping
There is an alternative way of writing R code that may be easier to read/understand. It is called chaining/piping, and it reduces the need for nested functions by using a special operator %>%.
#Normal R code
xvar <- c(1,1,2,3,5,8,13,21) #initialses an array
xresult <- round(log(xvar),1) # Calculates the log, and rounds to nearest dp
#Chaining/Piping
xvar <- c(1,1,2,3,5,8,13,21)
xresult <- xvar %>% log() %>% round(1)
# The %>% effectively passes xvar to the log() function and the result is passed to round(). This means that complex code with multiple functions should be easier to read