As data analysts you will need to load data from different sources, for this reason I will show you how to load a CSV file and the different options you have with the “read.csv” function
We will use a CSV file , you can download the appropriate CSV Import file from this link
To start creating a data set , we will load CSV file calling to read.csv function:
df <- read.csv( "Countries.csv", header = TRUE , col.names = c("maptools","maps","gapminder") , sep = "," ,encoding = "utf-8" , stringsAsFactors = FALSE )
## Warning in read.table(file = file, header = header, sep = sep, quote = quote, : ## header and 'col.names' are of different lengths
## maptools maps gapminder ## 1 Afghanistan Afghanistan Afghanistan ## 2 Ã…land Islands <NA> <NA> ## 3 Albania Albania Albania ## 4 Algeria Algeria Algeria ## 5 American Samoa American Samoa Samoa ## 6 <NA> Andaman Islands <NA>
In this function we have used quite a few parameters as for example, header, colnames,sep…. Don´t worry about that, because I’m going to explain what it means.
It’s A boolean parameter, if we write True, we will load the data with header.
We can change the column names with this parameter, in this case we have wrote the same name that have the csv file.
If we have problems reading the data, we can select the kind of separator that we want to use for the file, in this case we have use “,” a separator because the file come with this separator.
If you are working with categorical data, this parameters is very usefully, because by default read.csv() convert characters string into factors, so if we want avoid it we must use this parameters as FALSE.
It replaces values (eg., characters, numbers) in you csv file with NA. If you try read.csv(“Countries.csv”, na.strings = “A”) you’ll see that all A’s in csv were replaced with NA’s.
With this parameter you can choose the classes of the columns, as the example:
colClasses = c(“character”,“complex”, “factor”, “integer”, “numeric”,“Date”, “logical”)))
If you want only change one variable you can do like this:
Select the number of lines of the data file to skip before beginning to read data.
It is used to mark character strings as known to be in Latin-1 or UTF-8, for exaple if you find problems loading data from Sapin is possible that you must put the parameter fileEncoding = “utf-8”
Here the most important parameters for this function, Do you have another that you use usually? Let’s us these parameters in the comments!