Web Scraping of tables in R

How to scrape a web table in R?

Sometimes we have needed to extract data from Wikipedia or another web page but it is quite uncomfortable to be extracting the data with the keyboard and the mouse. So, For all those who want a quick extraction today I leave a code in R to extract HTML tables in a second.

For this code we will need two libraries: Rvest and magrittr.

require(rvest)
require(magrittr)
## Loading required package: magrittr
url <- "https://es.wikipedia.org/wiki/%C3%8Dndice_de_desarrollo_humano"
# we save in the variable url the website url.
pagina <- read_html(url, as.data.frame=T, stringsAsFactors = TRUE)
#We create a function with read_html to read the web page.
pagina %>%  
        html_nodes("table") %>% 
        #Here, we indicate that this is the table we want to extract.
        .[[3]] %>% 
        #Here we put of which table of the HTML is about, in our example it is the third table of the web.
        html_table(fill=T) -> x
        #We save it in a CSV.
View(x)
#Look at the table if is correct.
write.csv(x, "mis_datos_wikipedia.csv")

I hope I’ve helped you , a greeting: D



Summary
Web Scraping of tables in R
Article Name
Web Scraping of tables in R
Description
Web Scraping with R, Example of how to extract data from a web, Here is an example of how to extract data from html tables to save them in Csv.
Author
Publisher Name
Blog-R
Publisher Logo

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *