As data analysts you will need to load data from different sources, for this reason we will show you how to load a CSV file and the different options you have with the “read.csv” function
We will use a CSV file, you can download the appropriate CSV Import file from this link
To start creating a data set , we will load CSV file calling to read.csv function:
df <- read.csv( "Countries.csv", header = TRUE , col.names = c("maptools","maps","gapminder") , sep = "," ,encoding = "utf-8" , stringsAsFactors = FALSE )
## Warning in read.table(file = file, header = header, sep = sep, quote = quote, : ## header and 'col.names' are of different lengths
## maptools maps gapminder ## 1 Afghanistan Afghanistan Afghanistan ## 2 Ã…land Islands <NA> <NA> ## 3 Albania Albania Albania ## 4 Algeria Algeria Algeria ## 5 American Samoa American Samoa Samoa ## 6 <NA> Andaman Islands <NA>
In this function we have used quite a few parameters as for example, header, colnames,sep…. Do not worry about that, because we are going to explain what it means.
It’s A boolean parameter, if we write True, we will load the data with header.
We can change the column names with this parameter, in this case we have wrote the same name that have the csv file.
If we have problems reading the data, we can select the kind of separator that we want to use for the file, in this case we have use “,” a separator because the file come with this separator.
If you are working with categorical data, this parameter is very usefully, because by default read.csv() convert characters string into factors, so if we want avoid it we must use this parameters as FALSE.
It replaces values (eg., characters, numbers) in you csv file with NA. If you try read.csv(“Countries.csv”, na.strings = “A”) you’ll see that all A’s in csv were replaced with NA’s.
With this parameter you can choose the classes of the columns, as the example:
colClasses = c(“character”,“complex”, “factor”, “integer”, “numeric”,“Date”, “logical”)))
If you only want to change one variable you should:
Select the number of lines of the data file to skip before beginning to read data.
It is used to mark character strings as known to be in Latin-1 or UTF-8, for example if you find problems loading data from Spain is possible that you put the parameter fileEncoding = “utf-8”
Here we have presented the most important parameters for this function, Do you have another that you use usually? Let us know these parameters in the comments!
If you need more information about read csv function, you can check these websites: