Web Scraping of tables in R
How to scrape a web table in R?
Sometimes we have needed to extract data from Wikipedia or another web page but it is quite uncomfortable to be extracting the data with the keyboard and the mouse. So, For all those who want a quick extraction today I leave a code in R to extract HTML tables in a second.
For this code we will need two libraries: Rvest and magrittr.
require(rvest)
require(magrittr)
## Loading required package: magrittr
url <- "https://es.wikipedia.org/wiki/%C3%8Dndice_de_desarrollo_humano"
# we save in the variable url the website url.
pagina <- read_html(url, as.data.frame=T, stringsAsFactors = TRUE)
#We create a function with read_html to read the web page.
pagina %>%
html_nodes("table") %>%
#Here, we indicate that this is the table we want to extract.
.[[3]] %>%
#Here we put of which table of the HTML is about, in our example it is the third table of the web.
html_table(fill=T) -> x
#We save it in a CSV.
View(x)
#Look at the table if is correct.
write.csv(x, "mis_datos_wikipedia.csv")
I hope I’ve helped you , a greeting: D
// add bootstrap table styles to pandoc tables
function bootstrapStylePandocTables() {
$('tr.header').parent('thead').parent('table').addClass('table table-condensed');
}
$(document).ready(function () {
bootstrapStylePandocTables();
});