2 Basic Concepts
R is an object oriented language. The main data structures (which are objects) include :
- Vectors
- Matrices
- DataFrame & tibbles
- Lists
- Factors
These data objects can be manipulated using functions (which are also objects). There are built in functions, but you can also create customised functions.
R functions are stored in packages in your R library. Packages can be installed that give you additional functionalities (such as machine learning or graphing capabilities).
2.1 Vectors
These can be a single values or multiple values like an array.
Each element in the vector is indexed from 1 to n. You can use these index numbers to extract an element, for example to get the 2nd element in colours vector:
## [1] "orange"
2.2 Matrices
These are 2-dimensional vectors, but can only be numeric.
2.3 Dataframes
A dataframe is a 2-dimensional table. They can be manually created by combining vectors together:
2.4 Tibbles
Tibbles are dataframes that can be easier to use, but you will need the tibble package and other tidyverse packages to work with them.
2.5 Extracting data from Dataframes/Tibbles
Rows and columns can be extracted from dataframes using index number or column names.
To extract the FIRST COLUMN of values as a DataFrame :
## # A tibble: 4 x 1
## Name
## <chr>
## 1 John Doe
## 2 Alice Liddel
## 3 Peter Piper
## 4 Jolie Hope
To extract the FIRST ROW of values as a DataFrame
## # A tibble: 1 x 3
## Name Age Gender
## <chr> <dbl> <chr>
## 1 John Doe 25.0 M
To extract the FIRST ROW of FIRST COLUMN of values as a Factor
## # A tibble: 1 x 1
## Name
## <chr>
## 1 John Doe
If you need more than 1 row/column you can use a semi-colon (:). For example to get the first 3 rows:
## # A tibble: 3 x 3
## Name Age Gender
## <chr> <dbl> <chr>
## 1 John Doe 25.0 M
## 2 Alice Liddel 29.0 F
## 3 Peter Piper 34.0 M
If you are using a tibble the results will be the same, but the output will always be another tibble.
You can convert a tibble column into a vector/list by adding extra []. This can also be done using the pull() function from dplyr.
## [1] "John Doe" "Alice Liddel" "Peter Piper" "Jolie Hope"
## [1] "John Doe"
2.6 Lists
Lists can be used to group objects together. They can contain different types of object. They are a bit like dictionaries in Python. Each item in a List can be given a name.
You can access Lists in the same way as vectors (by index number or name), but the results will also be in a list structure. To prevent this you need to use double brackets:
## [1] 2019
2.7 Factors
These are like lists but more complicated - you will normally want to convert them to regular vectors/lists.
If you have a Factor you can convert it to a vector list:
# If your factor contains strings
myfactor <- factor(c("North", "South", "West", "West", "South"))
as.vector(myfactor) ## [1] "North" "South" "West" "West" "South"
# If your factor only contains numbers
myfactor <- factor(c(100, 200, 300, 300, 200))
as.numeric(as.vector(myfactor)) # Nb. numbers are initally held as strings, so you need to convert to numeric## [1] 100 200 300 300 200
2.8 Dates
Dates are regarded as the number of days since 1st Jan 1970. To store a date you can use the as.Date() function which accepts dates written in the format ‘YYYY-MM-DD’. Other formats can be used is specified.
2.9 Style Guide
- Make sure you use correct upper/lower case spelling
- The three main data types are numeric, character and factor
- Use the setwd(‘C:/…’) function to indicate the working directory for your files
- Filepath reference must have forward slashes …/…/…/
- Use == when evaluating equilavence, eg. if (a==b) …
- To time a code add the following before and after : ptm <- proc.time() proc.time() - ptm
- To add comments: