Introduction to R Programming — Part I

Introduction

R is a language and environment for statistical computing and graphics which has resemblance with S language and environment. Although R can be considered as a different implementation of S, there are some important differences.

R provides a wide variety of statistical(linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, etc.) and graphical techniques and is highly extensible. The S language is often considered as the main choice for research in statistical methodology and R provides an open-source route to participation in that activity.

Downloading R for Windows

Go to the website and download the R for Windows using the steps shown below (4.0.0 is the version that I have installed).

Downloading R for Windows

Installing R for Windows

After downloading the file install it by following the steps shown below.

Installing R

After clicking Next> in the step 8, the installation will be begun.

Now open the “R” program and you will see something like the image shown below.

RGui

First, we must find out what our working directory is. Type in the R console to find your current working directory. Usually, it is pointed to In my case, it is pointed to But it is a hassle when my working directory is pointed to the directory. So, I can change my working directory by clicking to File → Change dir. and selecting a relevant folder.

Nuts and Bolts

Example Code

Go to File → New script. And now you will see another file that is opened in the RGui application. Simply type the following code line in the new file.

Example code.

Now, we can run this function in two ways. The first way is to copy the whole script by pressing “CTRL+C” and pasting it on the console by “CTRL+V”. After even after pasting and pressing “Enter”, nothing will happen because first we need to access the element in the working environment and run it by choosing the element. So to list all the elements in the working environment type in the R-console. After typing that you will see that output is generated displaying So that is the only element in the current working environment. To run the function, type in the console. The parentheses are used here because it is a function. If it was a variable we should type

The second way to run this function is to save this “Untitled” script and access it in the R-console. Save the function by pressing “CTRL+S”, and give the file a simple name like As you can see is the file extension of R files. Now to access the file, type in the R-console and it will show that the is in the working directory. Now to access the file, type in the console. The function loads the elements in the file to the console. So now we can type and find our element, which is and type to run the function.

R-console Input

The symbol is the assignment operator in R-console. When you type then 10 is assigned to the variable You can view the value of in the R-console by typing or simply typing and pressing “enter”. If you just type and press “enter” there won’t be any errors and the console will be waiting for input.

Data Types

Objects

There are five basic classes of data types in R. They are, , , , ,

The most basic data structure in R is a It contains elements of the same type. A vector’s type can be checked with function and its length(the number of elements within it) can be checked by the function Furthermore, an empty vector can be created with the function. The most basic way of generating vectors is using the function. Another way of making a vector is by typing something like this, Now a vector is made with as its values.

But, there is an exception called , which is represented as a vector but can contain objects of different classes.

Numbers

Numbers in R are generally treated as numeric objects, which is real and has double precision. But if you explicitly want to have an integer you can type after the number(eg: ). There is also a special number called which represents the infinity, and unlike other programming languages you can use for ordinary calculations(eg: is ) And the value represents undefined values(eg: is ).

Attributes

R objects can have attributes like , , , , , Attributes of an object can be accessed by using the function.

Mixing Objects

Assume that you have created a vector, Then what would happen if you press “enter” after typing that? There won’t be any errors since a concept called occurs. It considers the precedence hierarchy and assigns the data types to the attributes inside the vector. For example, the type of the vector we typed earlier is now a type. The precedence hierarchy is , , , , with the decrease in precedence. But you can do explicit coercion by using the function. For example, when you type and type it gives you an type. And when you type it gives you as the output. And when you type, it gives you as the output.

Lists

Lists are the R objects which contain elements of different types like — numbers, strings, vectors, and another list insider it. A list can also contain a matrix or a function as its list elements. A list is created using the function(eg: ).You can give names to list elements by making another vector using the function (eg: For the above example we can assign names like this ).

Matrices

Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length two()(eg: ). You can find the dimensions of a matrix by typing function(eg: gives you as the answer). And when you type the it gives you as the answer, indicating that the attributes are of dimension type. We can make a matrix by simply making a vector and assigning dimensions to it(eg: gives you a vector with the values of and when you type it assigns the values to a matrix named which has the dimensions ). A simple way to do this is

Matrices can be created by column-binding or row-binding with and

rbind( ) and cbind( ) examples

As you can see when there is a length mismatch the values are repeated to make the matrix.

Quick tip : Use “CTRL+L” to clear the R-console.

Factors

Factors are used to represent categorical data. Factors can be ordered or unordered. One can think of a factor as an integer vector where each integer has a label.

Factors are treated specially by modeling functions like and Using factors with labels are better than using integers since factors are self -describing(eg: Having the values and of a coin is better than having and ).

Creating a factor.

You can see that we can find the categorical data in a more specific way using the factors. The order of the levels can be set using the argument to This is important in linear modeling because the first level is used as the baseline level,

Missing Values

Missing values are indicated by or for undefined mathematical operations. is used to test the objects whether they are or not and similarly is used to test for

values have classes like integer and character And values are also values, but the converse is not true.

Data Frames

Data frames can be used to store tabular data. They are represented as a special type of list where every element of the list has to have the same length. Furthermore, each element of the list can be taken as a column and the length of each element is the number of rows. Data frames have a special attribute called These data frames are usually created by calling or These can be converted into matrices by using function.

Now let’s looks at how we can read and write data using R.

Reading Data

There are a few basic functions, which can be used to read data into R.

  1. for reading tabular data.
  2. for reading lines of a text file.
  3. for reading R code files(inverse of ).
  4. for reading in R code files(inverse of
  5. for reading in saved workspaces.
  6. for reading single R objects in binary form.

Writing Data

There are analogous functions for writing data to files.

which does the inverse functions of the ones we’ve mentioned above.

Reading Data Files with

The function has few important arguments.

  1. the name of a file, or connection
  2. logically expressing whether the file has a header or not
  3. indicates how the columns are separated
  4. indicates the class of each column in the dataset
  5. number of rows in the dataset
  6. a character string indicating the comment character
  7. the number of lines to skip from the beginning
  8. indicates whether the character variables should be coded as factors or not

Quick tip: Usually “#” is the way to start a comment in R.

Quick tip: When using large datasets on R it is good to calculate the memory before performing the task.

Textual Formats

and are useful because the resulting textual format is editable and in case of corruption, we can recover it. It is due to that the metadata is preserved when using these methods.

Interfaces to the Outside World

Data are read using connection interfaces. Some of the most popular connections in R are, and ( and are for compressed folders)

Subsetting

There are a number of ways to access the data in objects. For an example, always returns an object of the same class as the original and is used to extract elements of a list or data frame — this can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame. In addition, is used to extract elements of a list or data frame by name, semantics are similar to that of

These are the key concepts we need to understand to start programming in R. You can look at the documentation and understand the concepts further. We will be looking at how to start programming with R in a separate article.

Time and tide wait for no man.