homeprojectstemplates
 
   
🔍

Web scraping with cheerio

January 19, 2024

Web scraping means extracting data from websites. This post covers extracting data from the page's HTML tags.

Prerequisites

  • cheerio package is installed

  • HTML page is retrieved via an HTTP client

Usage

  • create a scraper object with load method by passing HTML content as an argument

    • set decodeEntities option to false to preserve encoded characters (like &) in their original form
    const $ = load('<div><!-- HTML content --></div>', { decodeEntities: false });
  • find DOM elements by using CSS-like selectors

    const items = $('.item');
  • iterate through found elements using each method

    items.each((index, element) => {
    // ...
    });
  • access element content using specific methods

    • text - $(element).text()

    • HTML - $(element).html()

    • attributes

      • all - $(element).attr()
      • specific one - $(element).attr('href')
    • child elements

      • first - $(element).first()
      • last - $(element).last()
      • all - $(element).children()
      • specific one - $(element).find('a')
    • siblings

      • previous - $(element).prev()
      • next - $(element).next()

Disclaimer

Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.

Demo

The demo with the mentioned examples is available here.

2023

Web scraping with jsdom

December 14, 2023

Web scraping means extracting data from websites. This post covers extracting data from the page's HTML when data is stored in JavaScript variable or stringified JSON.

The scraping prerequisite is retrieving an HTML page via an HTTP client.

Examples

The example below moves data into a global variable, executes the page scripts and accesses the data from the global variable.

import jsdom from 'jsdom';
fetch(URL)
.then((res) => res.text())
.then((response) => {
const dataVariable = 'someVariable.someField';
const html = response.replace(dataVariable, `var data=${dataVariable}`);
const dom = new jsdom.JSDOM(html, {
runScripts: 'dangerously',
virtualConsole: new jsdom.VirtualConsole()
});
console.log('data', dom?.window?.data);
});

The example below runs the page scripts, and access stringified JSON data.

import jsdom from 'jsdom';
fetch(URL)
.then((res) => res.text())
.then((response) => {
const dom = new jsdom.JSDOM(response, {
runScripts: 'dangerously',
virtualConsole: new jsdom.VirtualConsole()
});
const data = dom?.window?.document?.getElementById('someId')?.value;
console.log('data', JSON.parse(data));
});

Disclaimer

Please check the website's terms of service before scraping it. Some websites may have terms of service that prohibit such activity.

Identifying missing variables in Handlebars templates

November 3, 2023

Handlebars is a template engine that can create server-side views, e-mail templates, and invoice templates by injecting JSON data into HTML.

Resolving all variables in a Handlebars template is essential to maintain the accuracy of the displayed information and to prevent incomplete content or layout problems.

The following snippet checks for missing variables by overriding the default nameLookup function. It logs a warning for unresolved variables and sets the default value, empty string, in this case.

// ...
const originalNameLookup = Handlebars.JavaScriptCompiler.prototype.nameLookup;
Handlebars.JavaScriptCompiler.prototype.nameLookup = function(
parent,
name,
type
) {
if (type === 'context') {
const messageLog = JSON.stringify({
message: `Variable is not resolved in the template: ${name}`,
level: WARNING_LEVEL
// ...
});
return `${parent} && ${parent}.${name} ? ${parent}.${name} : (console.log(${messageLog}), ''`;
}
return originalNameLookup.call(this, parent, name, type);
};
// ...
const result = Handlebars.compile(template)(data);

Formatting Node.js codebase with Prettier

July 3, 2023

Formatting helps to stay consistent with code style throughout the whole codebase. Include format script in pre-hooks (pre-commit or pre-push). This post covers Prettier setup with JavaScript and TypeScript code.

Start by installing the prettier package as a dev dependency.

npm i prettier -D

Specify rules inside the .prettierrc config file.

{
"singleQuote": true,
"trailingComma": "all"
}

Add format script in the package.json file.

{
"scripts": {
// ...
"format": "prettier --write \"{src,test}/**/*.{js,ts}\""
}
}

Notes

If you use Eslint, install the eslint-config-prettier package as a dev dependency and update the Eslint configuration to use the Prettier config.

{
// ...
"extends": [
// ...
"prettier"
]
}

Using Visual Studio Code, you can install a prettier-vscode extension and activate formatting when file changes are saved.

Boilerplate

Here is the link to the boilerplate I use for the development.

Spies and mocking with Node test runner (node:test)

June 24, 2023

Node.js version 20 brings a stable test runner so you can run tests inside *.test.js files with node --test command. This post covers the primary usage of it regarding spies and mocking for the unit tests.

Spies are functions that let you spy on the behavior of functions called indirectly by some other code while mocking injects test values into the code during the tests.

mock.method can create spies and mock async, rejected async, sync, chained methods, and external and built-in modules.

  • Async function
import assert from 'node:assert/strict';
import { describe, it, mock } from 'node:test';
const calculationService = {
calculate: () => // implementation
};
describe('mocking resolved value', () => {
it('should resolve mocked value', async () => {
const value = 2;
mock.method(calculationService, 'calculate', async () => value);
const result = await calculationService.calculate();
assert.equal(result, value);
});
});
  • Rejected async function
const error = new Error('some error message');
mock.method(calculationService, 'calculate', async () => Promise.reject(error));
await assert.rejects(async () => calculateSomething(calculationService), error);
  • Sync function
mock.method(calculationService, 'calculate', () => value);
  • Chained methods
mock.method(calculationService, 'get', () => calculationService);
mock.method(calculationService, 'calculate', async () => value);
const result = await calculationService.get().calculate();
  • External modules
import axios from 'axios';
mock.method(axios, 'get', async () => ({ data: value }));
  • Built-in modules
import fs from 'fs/promises';
mock.method(fs, 'readFile', async () => fileContent);
  • Async and sync functions called multiple times can be mocked with different values using context.mock.fn and mockedFunction.mock.mockImplementationOnce.
describe('mocking same method multiple times with different values', () => {
it('should resolve mocked values', async (context) => {
const firstValue = 2;
const secondValue = 3;
const calculateMock = context.mock.fn(calculationService.calculate);
calculateMock.mock.mockImplementationOnce(async () => firstValue, 0);
calculateMock.mock.mockImplementationOnce(async () => secondValue, 1);
const firstResult = await calculateMock();
const secondResult = await calculateMock();
assert.equal(firstResult, firstValue);
assert.equal(secondResult, secondValue);
});
});
  • To assert called arguments for a spy, use mockedFunction.mock.calls[0] value.
mock.method(calculationService, 'calculate');
await calculateSomething(calculationService, firstValue, secondValue);
const call = calculationService.calculate.mock.calls[0];
assert.deepEqual(call.arguments, [firstValue, secondValue]);
  • To assert skipped call for a spy, use mockedFunction.mock.calls.length value.
mock.method(calculationService, 'calculate');
assert.equal(calculationService.calculate.mock.calls.length, 0);
  • To assert how many times mocked function is called, use mockedFunction.mock.calls.length value.
mock.method(calculationService, 'calculate');
calculationService.calculate(3);
calculationService.calculate(2);
assert.equal(calculationService.calculate.mock.calls.length, 2);
  • To assert called arguments for the exact call when a mocked function is called multiple times, an assertion can be done using mockedFunction.mock.calls[index] and call.arguments values.
const calculateMock = context.mock.fn(calculationService.calculate);
calculateMock.mock.mockImplementationOnce((a) => a + 2, 0);
calculateMock.mock.mockImplementationOnce((a) => a + 3, 1);
calculateMock(firstValue);
calculateMock(secondValue);
[firstValue, secondValue].forEach((argument, index) => {
const call = calculateMock.mock.calls[index];
assert.deepEqual(call.arguments, [argument]);
});

Running TypeScript tests

Add a new test script

{
"type": "module",
"scripts": {
"test": "node --test",
"test:ts": "glob -c \"node --loader tsx --no-warnings --test\" \"./src/**/*.{spec,test}.ts\""
},
"devDependencies": {
// ...
"glob": "^10.3.1",
"tsx": "^3.12.7"
}
}

Demo

The demo with the mentioned examples is available here.

Boilerplate

Here is the link to the boilerplate I use for the development.

Linting JavaScript codebase with Eslint

April 5, 2023

Linting represents static code analysis based on specified rules. Please include it in the CI pipeline.

Setup

Run the following commands to generate the linter configuration using the eslint package.

npm init -y
npm init @eslint/config

Below is an example of the configuration. Some rules can be ignored or suppressed as warnings.

// .eslintrc.js
module.exports = {
env: {
commonjs: true,
es2021: true,
node: true,
jest: true,
},
extends: 'airbnb-base',
overrides: [
],
parserOptions: {
ecmaVersion: 'latest',
},
rules: {
'import/no-extraneous-dependencies': 'warn',
'import/prefer-default-export': 'off',
},
};

Ignore the files with the .eslintignore file.

dist

Linting

Configure and run the script with the npm run lint command. Some errors can be fixed automatically with the --fix option.

// package.json
{
"scripts": {
// ...
"lint": "eslint src",
"lint:fix": "npm run lint -- --fix"
}
}

Boilerplate

Here is the link to the boilerplate I use for the development.

2022

Gatsby blog as PWA (Progressive Web App)

November 26, 2022

Starting with some of the benefits, installed PWAs can bring more user engagement and conversions. On the user side, it brings the possibility to read posts offline. Progressive Web App 101 post covers more details about PWAs.

Prerequisites

  • bootstrapped Gatsby blog
  • installed manifest (gatsby-plugin-manifest) and offline (gatsby-plugin-offline) plugins

Setup

Add plugin configurations to the Gatsby configuration file. The manifest plugin should be loaded before the offline plugin.

Prepare the app icon in 512x512 pixels, and the manifest plugin will generate the icons in all the necessary dimensions. PWA usage can be logged with the UTM link in start_url property.

Runtime caching for static resources (JavaScript, CSS, and page data JSON files) is set to network-first caching, so it retrieves the latest changes before showing them to the user. In case of issues with caching in a local environment, an offline plugin can be disabled.

// gatsby-config.js
const plugins = [
// ...
{
resolve: `gatsby-plugin-manifest`,
options: {
name: `app name`,
short_name: `app name`,
start_url: `/?utm_source=pwa&utm_medium=pwa&utm_campaign=pwa`,
background_color: `#FFF`,
theme_color: `#2F3C7E`,
display: `standalone`,
icon: `src/assets/icon.png`
}
}
];
if (process.env.NODE_ENV !== 'development') {
plugins.push({
resolve: `gatsby-plugin-offline`,
options: {
workboxConfig: {
runtimeCaching: [
{
urlPattern: /(\.js$|\.css$|static\/)/,
handler: `NetworkFirst`
},
{
urlPattern: /^https?:.*\/page-data\/.*\.json/,
handler: `NetworkFirst`
},
{
urlPattern: /^https?:.*\.(png|jpg|jpeg|webp|svg|gif|tiff|js|woff|woff2|json|css)$/,
handler: `StaleWhileRevalidate`
},
{
urlPattern: /^https?:\/\/fonts\.googleapis\.com\/css/,
handler: `StaleWhileRevalidate`
}
]
}
}
});
}
module.exports = {
// ...
plugins
};

Service worker updates can also be detected. For a better user experience, a user should approve refreshing the page before updating it to the latest version.

// gatsby-browser.js
exports.onServiceWorkerUpdateReady = () => {
const shouldReload = window.confirm(
'This website has been updated. Reload to display the latest version?'
);
if (shouldReload) {
window.location.href = window.location.href.replace(/#.*$/, '');
}
};
exports.onRouteUpdate = async () => {
if (!navigator) {
console.log('Navigator is not defined, skipping service worker registration...');
return;
}
if (!navigator.serviceWorker) {
console.log('Service worker is not supported, skipping registration...');
return;
}
try {
const registration = await navigator.serviceWorker.register('/sw.js');
await registration.update();
} catch (error) {
console.error('Service worker registration failed', error);
}
};

Documenting JavaScript code with JSDoc

November 24, 2022

JSDoc provides adding types to the JavaScript codebase with appropriate conventions inside comments so different IDEs like Visual Studio Code can recognize defined types, show them and make coding easier with auto-completion. Definitions are put inside /** */ comments.

Examples

Custom types can be defined with @typedef and @property tags. Every property has a type and if the property is optional, its name is put between square brackets, and a description can be included after the property name. Global types should be defined in *.jsdoc.js files so they can be used in multiple files without importing. * represents any type.

/**
* @typedef {object} CollectionItem
* @property {string} [collectionName] - collection name is optional string field
* @property {boolean} isRevealed - reveal status
* @property {number} floorPrice - floor price
* @property {?string} username - username is a nullable field
* @property {Array.<number>} prices - prices array
* @property {Array.<string>} [buyers] - optional buyers array
* @property {Array.<Object<string, *>>} data - some data
*/

Classes are auto recognized so @class and @constructor tags can be omitted.

/**
* Scraper for websites
*/
class Scraper {
/**
* Create scraper
* @param {string} url - website's URL
*/
constructor(url) {
this.url = url;
}
// ...
}

Comments starting with the description can omit @description tag. Function parameters and function return types can be defined with @param and @returns tags. Multiple return types can be handled with | operator. Deprecated parts of the codebase can be annotated with @deprecated tag.

/**
* Gets prices list
* @private
* @param {Array.<number>} prices - prices array
* @returns {string|undefined}
*/
const getPricesList = (prices) => {
if (prices.length > 0) return prices.join(',');
};
/**
* Get data from the API
* @deprecated
* @returns {Promise<CollectionItem>}
*/
const getData = async () => {
// ...
};

Variable types can be documented with @type tags and constants can utilize @const tags.

/**
* Counter for the requests
* @type {number}
*/
let counter;
/**
* HTTP timeout in milliseconds
* @const {number}
*/
const HTTP_TIMEOUT_MS = 3000;

Enums can be documented with @enum and @readonly tags.

/**
* Some states
* @readonly
* @enum {string}
*/
const state = {
STARTED: 'STARTED',
IN_PROGRESS: 'IN_PROGRESS',
FINISHED: 'FINISHED',
};

Docs validation

Linter can validate the docs. Add the following package and update the linter configuration file.

npm i -D eslint-plugin-jsdoc
// .eslintrc.js
module.exports = {
extends: ['plugin:jsdoc/recommended'],
};

Run the linter and it will show warnings if something has to be improved.

Generating the docs overview

Run the following command to recursively generate the HTML files with the docs overview, including the README.md and package.json content. Symbols marked with @private tags will be skipped.

npx jsdoc src -r --destination docs --readme ./README.md --package ./package.json

This command can be included in the CI/CD pipeline, it depends on the needs of the project.

Boilerplate

Here is the link to the boilerplate I use for the development.

Timeout with Fetch API

November 2, 2022

Setting up a timeout for HTTP requests can prevent the connection from hanging forever, waiting for the response. It can be set on the client side to improve user experience, and on the server side to improve inter-service communication. Fetch API is fully available in Node as well from version 18.

AbortController can be utilized to set up timeouts. Instantiated abort controller has a signal property which represents reference to its associated AbortSignal object. Abort signal object is used as a signal parameter in the request with Fetch API, so HTTP request is aborted when abort method is called.

const HTTP_TIMEOUT = 3000;
const URL = 'https://www.google.com:81';
(async () => {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), HTTP_TIMEOUT);
try {
const response = await fetch(URL, {
signal: controller.signal
}).then((res) => res.json());
console.log(response);
} catch (error) {
console.error(error);
} finally {
clearTimeout(timeoutId);
}
})();

Use this snippet also to simulate aborted requests.

Boilerplate

Here is the link to the template I use for the development.

Progressive Web Apps 101

March 5, 2022

Progressive Web Apps bring some advantages over native mobile apps

  • automatic updates can be implemented
  • the installed app takes less memory
  • installable on phones, tablets, desktops

Prerequisites for installation

  • web app is running over an HTTPS connection
  • service worker is registered
  • web app manifest (manifest.json) is included

Service worker

Read more about it on Caching with service worker and Workbox

Manifest

Following fields can be included

  • name is a full name used when the app is installed
  • short_name is a shorter version of the name that is shown when there is insufficient space to display the full name
  • background_color is used on a splash screen
  • description is shown on an installation pop-up
  • display customizes which browser UI is shown when the app is launched (standalone, fullscreen, minimal-ui, browser)
  • icons is a list of icons for the browser used in different places (home screen, app launcher, etc.)
  • scope specifies the navigation scope of the PWA. It should start with the URL from start_url value. If the user navigates outside the scope, PWA won't be open from external URLs
  • screenshots is a list of screenshots shown on the installation pop-up
  • start_url is a relative URL of the app which is loaded when the installed app is launched. PWA usage can be tracked by adding UTM parameters within the URL.
  • theme_color sets the color of the toolbar, it should match the meta theme color specified in the document head

Description and screenshots are shown only on mobile phones.

{
"name": "App name",
"short_name": "App short name",
"background_color": "#ffffff",
"description": "App description",
"display": "standalone",
"icons": [
{
"src": "icons/icon-128x128.png",
"sizes": "128x128",
"type": "image/png"
},
{
"src": "icons/icon-144x144.png",
"sizes": "144x144",
"type": "image/png"
},
{
"src": "icons/icon-152x152.png",
"sizes": "152x152",
"type": "image/png"
},
{
"src": "icons/icon-192x192.png",
"sizes": "192x192",
"type": "image/png"
},
{
"src": "icons/icon-512x512.png",
"sizes": "512x512",
"type": "image/png"
}
],
"scope": "/app",
"screenshots": [{
"src": "screenshots/main.jpg",
"sizes": "1080x2400",
"type": "image/jpg"
}],
"start_url": "/app?utm_source=pwa&utm_medium=pwa&utm_campaign=pwa",
"theme_color": "#3366cc"
}

Manifest file should be included via link tag

<link rel="manifest" href="/manifest.json">

In-app installation experience

It can be implemented on Google Chrome and Edge.

  • listen for the beforeinstallprompt event
  • save beforeinstallprompt event so it can be used to trigger the installation
  • provide a button to start the in-app installation flow
let deferredPrompt;
let installable = false;
window.addEventListener("beforeinstallprompt", (event) => {
event.preventDefault();
deferredPrompt = event;
installable = true;
document.getElementById("installable-btn").innerHTML = "Install";
});
window.addEventListener("appinstalled", () => {
installable = false;
});
document.getElementById("installable-btn").addEventListener("click", () => {
if (installable) {
deferredPrompt.prompt();
deferredPrompt.userChoice.then((choiceResult) => {
if (choiceResult.outcome === "accepted") {
document.getElementById("installable-btn").innerHTML = "click!";
}
});
} else {
alert("clicked!");
}
});

Notes

chrome://webapks page on mobile phones shows the list of installed PWAs with their details. Last Update Check Time is useful for checking when the manifest file was updated. The app is updated once a day if there are some manifest changes.

Demo

The demo with the mentioned examples is available here.

Boilerplate

Here is the link to the boilerplate I use for the development. It contains the examples mentioned above with more details.

2021

Spies and mocking with Jest

August 19, 2021

Besides asserting the output of the function call, unit testing includes the usage of spies and mocking. Spies are functions that let you spy on the behavior of functions called indirectly by some other code. Spy can be created by using jest.fn(). Mocking injects test values into the code during the tests. Some of the use cases will be presented below.

  • Async function and its resolved value can be mocked using mockResolvedValue. Another way to mock it is by using mockImplementation and providing a function as an argument.
const calculationService = {
calculate: jest.fn()
};
jest.spyOn(calculationService, 'calculate').mockResolvedValue(value);
jest
.spyOn(calculationService, 'calculate')
.mockImplementation(async (a) => Promise.resolve(a));
  • Rejected async function can be mocked using mockRejectedValue and mockImplementation.
jest
.spyOn(calculationService, 'calculate')
.mockRejectedValue(new Error(errorMessage));
jest
.spyOn(calculationService, 'calculate')
.mockImplementation(async () => Promise.reject(new Error(errorMessage)));
await expect(calculateSomething(calculationService)).rejects.toThrowError(
Error
);
  • Sync function and its return value can be mocked using mockReturnValue and mockImplementation.
jest.spyOn(calculationService, 'calculate').mockReturnValue(value);
jest.spyOn(calculationService, 'calculate').mockImplementation((a) => a);
  • Chained methods can be mocked using mockReturnThis.
// calculationService.get().calculate();
jest.spyOn(calculationService, 'get').mockReturnThis();
  • Async and sync functions called multiple times can be mocked with different values using mockResolvedValueOnce and mockReturnValueOnce, respectively, and mockImplementationOnce.
jest
.spyOn(calculationService, 'calculate')
.mockResolvedValueOnce(value)
.mockResolvedValueOnce(otherValue);
jest
.spyOn(calculationService, 'calculate')
.mockReturnValueOnce(value)
.mockReturnValueOnce(otherValue);
jest
.spyOn(calculationService, 'calculate')
.mockImplementationOnce((a) => a + 3)
.mockImplementationOnce((a) => a + 5);
  • External modules can be mocked similarly to spies. For the following example, let's suppose axios package is already used in one function. The following example represents a test file where axios is mocked using jest.mock().
import axios from 'axios';
jest.mock('axios');
// within test case
axios.get.mockResolvedValue(data);
  • Manual mocks are resolved by writing corresponding modules in __mocks__ directory, e.g., fs/promises mock will be stored in __mocks__/fs/promises.js file. fs/promises mock will be resolved using jest.mock() in the test file.
jest.mock('fs/promises');
  • To assert called arguments for a mocked function, an assertion can be done using toHaveBeenCalledWith matcher.
const spy = jest.spyOn(calculationService, 'calculate');
expect(spy).toHaveBeenCalledWith(firstArgument, secondArgument);
  • To assert skipped call for a mocked function, an assertion can be done using not.toHaveBeenCalled matcher.
const spy = jest.spyOn(calculationService, 'calculate');
expect(spy).not.toHaveBeenCalled();
  • To assert how many times mocked function is called, an assertion can be done using toHaveBeenCalledTimes matcher.
const spy = jest.spyOn(calculationService, 'calculate');
calculationService.calculate(3);
calculationService.calculate(2);
expect(spy).toHaveBeenCalledTimes(2);
  • To assert called arguments for the exact call when a mocked function is called multiple times, an assertion can be done using toHaveBeenNthCalledWith matcher.
const argumentsList = [0, 1];
argumentsList.forEach((argument, index) => {
expect(calculationService.calculate).toHaveBeenNthCalledWith(
index + 1,
argument
);
});
  • Methods should be restored to their initial implementation before each test case.
// package.json
"jest": {
// ...
"restoreMocks": true
}
// ...

Demo

The demo with the mentioned examples is available here.

Boilerplate

Here is the link to the template I use for the development.

Server-Sent Events 101

August 18, 2021

Server-Sent Events (SSE) is a unidirectional communication between the client and server. The client initiates the connection with the server using EventSource API.

The previously mentioned API can also listen to the events from the server, listen for errors, and close the connection.

const eventSource = new EventSource(url);
eventSource.onmessage = ({ data }) => {
const eventData = JSON.parse(data);
// handling the data from the server
};
eventSource.onerror = () => {
// error handling
};
eventSource.close();

A server can send the events in text/event-stream format to the client once the client establishes the client-server connection. A server can filter clients by query parameter and send them only the appropriate events.

In the following example, the NestJS server sends the events only to a specific client distinguished by its e-mail address.

import { Controller, Query, Sse } from '@nestjs/common';
import { EventEmitter2 } from '@nestjs/event-emitter';
import { Observable, Subject } from 'rxjs';
import { map } from 'rxjs/operators';
import { MessageEvent, MessageEventData } from './message-event.interface';
import { SseQueryDto } from './sse-query.dto';
@Controller()
export class AppController {
constructor(private readonly eventService: EventEmitter2) {}
@Sse('sse')
sse(@Query() sseQuery: SseQueryDto): Observable<MessageEvent> {
const subject$ = new Subject();
this.eventService.on(FILTER_VERIFIED, data => {
if (sseQuery.email !== data.email) return;
subject$.next({ isVerifiedFilter: data.isVerified });
});
return subject$.pipe(
map((data: MessageEventData): MessageEvent => ({ data })),
);
}
// ...
}

Emitting the event mentioned above is done in the following way.

const filterVerifiedEvent = new FilterVerifiedEvent();
filterVerifiedEvent.email = user.email;
filterVerifiedEvent.isVerified = true;
this.eventService.emit(FILTER_VERIFIED, filterVerifiedEvent);

Boilerplate

Here is the link to the template I use for the development.