R is a language and environment for statistical computing and graphics which has resemblance with S language and environment. Although R can be considered as a different implementation of S, there are some important differences.
R provides a wide variety of statistical(linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, etc.) and graphical techniques and is highly extensible. The S language is often considered as the main choice for research in statistical methodology and R provides an open-source route to participation in that activity.
Downloading R for Windows
Go to the website
https://cran.rstudio.com and download the R for Windows using the steps shown below (4.0.0 is the version that I have installed).
Installing R for Windows
After downloading the file install it by following the steps shown below.
After clicking Next> in the step 8, the installation will be begun.
Now open the “R” program and you will see something like the image shown below.
First, we must find out what our working directory is. Type
getwd() in the R console to find your current working directory. Usually, it is pointed to
C:/Users/username/Documents In my case, it is pointed to
C:/Users/Nipuna/Documents But it is a hassle when my working directory is pointed to the
Documents directory. So, I can change my working directory by clicking to File → Change dir. and selecting a relevant folder.
Nuts and Bolts
Go to File → New script. And now you will see another file that is opened in the RGui application. Simply type the following code line in the new file.
Now, we can run this function in two ways. The first way is to copy the whole script by pressing “CTRL+C” and pasting it on the console by “CTRL+V”. After even after pasting and pressing “Enter”, nothing will happen because first we need to access the element in the working environment and run it by choosing the element. So to list all the elements in the working environment type
ls() in the R-console. After typing that you will see that output is generated displaying
 "myfunction" So that is the only element in the current working environment. To run the function, type
myfunction() in the console. The parentheses
() are used here because it is a function. If it was a variable we should type
The second way to run this function is to save this “Untitled” script and access it in the R-console. Save the function by pressing “CTRL+S”, and give the file a simple name like
example.R As you can see
.R is the file extension of R files. Now to access the file, type
dir() in the R-console and it will show that the
"example.R" is in the working directory. Now to access the file, type
source("example.R") in the console. The
source() function loads the elements in the file to the console. So now we can type
ls() and find our element, which is
myfunction and type
myfunction() to run the function.
<- is the assignment operator in R-console. When you type
x <- 10 then 10 is assigned to the variable
x You can view the value of
x in the R-console by typing
print(x) or simply typing
x and pressing “enter”. If you just type
x <- and press “enter” there won’t be any errors and the console will be waiting for input.
There are five basic classes of data types in R. They are,
numeric(real numbers) ,
The most basic data structure in R is a
vector It contains elements of the same type. A vector’s type can be checked with
typeof() function and its length(the number of elements within it) can be checked by the function
length() Furthermore, an empty vector can be created with the
vector() function. The most basic way of generating vectors is using the
c() function. Another way of making a vector is by typing something like this,
x <- 0:6 Now a vector
x is made with
0 1 2 3 4 5 6 as its values.
But, there is an exception called
list , which is represented as a vector but can contain objects of different classes.
Numbers in R are generally treated as numeric objects, which is real and has double precision. But if you explicitly want to have an integer you can type
L after the number(eg:
x <- 1L). There is also a special number called
Inf which represents the infinity, and unlike other programming languages you can use
Inf for ordinary calculations(eg:
0 ) And the value
NaN represents undefined values(eg:
R objects can have attributes like
other user-defined attributes/metadata Attributes of an object can be accessed by using the
Assume that you have created a vector,
x <- c(5.2,"a") Then what would happen if you press “enter” after typing that? There won’t be any errors since a concept called
coercion occurs. It considers the precedence hierarchy and assigns the data types to the attributes inside the vector. For example, the type of the vector we typed earlier is now a
character type. The precedence hierarchy is
numeric(real numbers) ,
boolean(True/False with the decrease in precedence. But you can do explicit coercion by using the
as.* function. For example, when you type
x <- 0:2 and type
class(x) it gives you an
integer type. And when you type
as.numeric(x) it gives you
 0 1 2 as the output. And when you type,
as.logical(x) it gives you
 FALSE TRUE TRUE as the output.
Lists are the R objects which contain elements of different types like — numbers, strings, vectors, and another list insider it. A list can also contain a matrix or a function as its list elements. A list is created using the
x <- list(1,"a", TRUE) ).You can give names to list elements by making another vector using the function
names() (eg: For the above example we can assign names like this
Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length two(
n <- matrix(nrow=2, ncol=2)). You can find the dimensions of a matrix by typing
dim(n) gives you
2 2 as the answer). And when you type the
attributes(n) it gives you
$dim as the answer, indicating that the attributes are of dimension type. We can make a matrix by simply making a vector and assigning dimensions to it(eg:
n <- 1:10 gives you a vector with the values of
0 1 2 3 4 5 6 7 8 9 and when you type
dim(n) <- c(2,5) it assigns the values to a matrix named
n which has the dimensions
2 x 5 ). A simple way to do this is
n <- matrix(1:10, nrow=2,ncol=5)
Matrices can be created by column-binding or row-binding with
As you can see when there is a length mismatch the values are repeated to make the matrix.
Quick tip : Use “CTRL+L” to clear the R-console.
Factors are used to represent categorical data. Factors can be ordered or unordered. One can think of a factor as an integer vector where each integer has a label.
Factors are treated specially by modeling functions like
glm() Using factors with labels are better than using integers since factors are self -describing(eg: Having the values
tails of a coin is better than having
You can see that we can find the categorical data in a more specific way using the factors. The order of the levels can be set using the
levels argument to
factor() This is important in linear modeling because the first level is used as the baseline level,
Missing values are indicated by
NaN for undefined mathematical operations.
is.na() is used to test the objects whether they are
NA or not and similarly
is.nan() is used to test for
NA values have classes like integer
NA and character
NaN values are also
NA values, but the converse is not true.
Data frames can be used to store tabular data. They are represented as a special type of list where every element of the list has to have the same length. Furthermore, each element of the list can be taken as a column and the length of each element is the number of rows. Data frames have a special attribute called
row.names These data frames are usually created by calling
read.csv() These can be converted into matrices by using
Now let’s looks at how we can read and write data using R.
There are a few basic functions, which can be used to read data into R.
read.csv()for reading tabular data.
readLinesfor reading lines of a text file.
sourcefor reading R code files(inverse of
dgetfor reading in R code files(inverse of
loadfor reading in saved workspaces.
unserializefor reading single R objects in binary form.
There are analogous functions for writing data to files.
which does the inverse functions of the ones we’ve mentioned above.
Reading Data Files with
read.table() function has few important arguments.
filethe name of a file, or connection
headerlogically expressing whether the file has a header or not
sepindicates how the columns are separated
colClassesindicates the class of each column in the dataset
nrowsnumber of rows in the dataset
comment.chara character string indicating the comment character
skipthe number of lines to skip from the beginning
stringAsFactorsindicates whether the character variables should be coded as factors or not
Quick tip: Usually “#” is the way to start a comment in R.
Quick tip: When using large datasets on R it is good to calculate the memory before performing the task.
dputing are useful because the resulting textual format is editable and in case of corruption, we can recover it. It is due to that the metadata is preserved when using these methods.
Interfaces to the Outside World
Data are read using connection interfaces. Some of the most popular connections in R are,
bzfile are for compressed folders)
There are a number of ways to access the data in objects. For an example,
[ always returns an object of the same class as the original and
[[ is used to extract elements of a list or data frame — this can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame. In addition,
$ is used to extract elements of a list or data frame by name, semantics are similar to that of
These are the key concepts we need to understand to start programming in R. You can look at the documentation and understand the concepts further. We will be looking at how to start programming with R in a separate article.