Web scraping with cheerio
Web scraping means extracting data from websites. This post covers extracting data from the page's HTML tags.
Prerequisites
cheerio
package is installedHTML page is retrieved via an HTTP client
Usage
create a scraper object with
load
method by passing HTML content as an argument- set
decodeEntities
option to false to preserve encoded characters (like &) in their original form
const $ = load('<div><!-- HTML content --></div>', { decodeEntities: false });- set
find DOM elements by using CSS-like selectors
const items = $('.item');iterate through found elements using
each
methoditems.each((index, element) => {// ...});access element content using specific methods
text -
$(element).text()
HTML -
$(element).html()
attributes
- all -
$(element).attr()
- specific one -
$(element).attr('href')
- all -
child elements
- first -
$(element).first()
- last -
$(element).last()
- all -
$(element).children()
- specific one -
$(element).find('a')
- first -
siblings
- previous -
$(element).prev()
- next -
$(element).next()
- previous -
Disclaimer
Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.
Demo
The demo with the mentioned examples is available here.