Web scraping with cheerio
January 19, 2024Web scraping means extracting data from websites. This post covers extracting data from the page's HTML tags.
Prerequisites
cheerio
package is installedHTML page is retrieved via an HTTP client
Usage
create a scraper object with
load
method by passing HTML content as an argument- set
decodeEntities
option to false to preserve encoded characters (like &) in their original form
const $ = load('<div><!-- HTML content --></div>', { decodeEntities: false });- set
find DOM elements by using CSS-like selectors
const items = $('.item');iterate through found elements using
each
methoditems.each((index, element) => {// ...});access element content using specific methods
text -
$(element).text()
HTML -
$(element).html()
attributes
- all -
$(element).attr()
- specific one -
$(element).attr('href')
- all -
child elements
- first -
$(element).first()
- last -
$(element).last()
- all -
$(element).children()
- specific one -
$(element).find('a')
- first -
siblings
- previous -
$(element).prev()
- next -
$(element).next()
- previous -
Disclaimer
Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.
Course
Build your SaaS in 2 weeks - Start Now