Web scraping with cheerio
January 19, 2024Web scraping means extracting data from websites. This post covers extracting data from the page's HTML tags.
Prerequisites
cheeriopackage is installedHTML page is retrieved via an HTTP client
Usage
create a scraper object with
loadmethod by passing HTML content as an argument- set
decodeEntitiesoption to false to preserve encoded characters (like &) in their original form
const $ = load('<div><!-- HTML content --></div>', { decodeEntities: false });- set
find DOM elements by using CSS-like selectors
const items = $('.item');iterate through found elements using
eachmethoditems.each((index, element) => {// ...});access element content using specific methods
text -
$(element).text()HTML -
$(element).html()attributes
- all -
$(element).attr() - specific one -
$(element).attr('href')
- all -
child elements
- first -
$(element).first() - last -
$(element).last() - all -
$(element).children() - specific one -
$(element).find('a')
- first -
siblings
- previous -
$(element).prev() - next -
$(element).next()
- previous -
Disclaimer
Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.