homeresume
 
   
🔍

Browser automation with Puppeteer

August 26, 2023

Puppeteer is a headless browser for automating browser tasks. Here's the list of some of the features:

  • Turn off headless mode

    const browser = await puppeteer.launch({
    headless: false
    // ...
    });
  • Resize the viewport to the window size

    const browser = await puppeteer.launch({
    // ...
    defaultViewport: null
    });
  • Emulate screen how it's shown to the user via the emulateMediaType method

    await page.emulateMediaType('screen');
  • Save the page as a PDF file with a specified path, format, scale factor, and page range

    await page.pdf({
    path: 'path.pdf',
    format: 'A3',
    scale: 1,
    pageRanges: '1-2',
    printBackground: true
    });
  • Use preexisting user's credentials to skip logging in to some websites. The user data directory is a parent of the Profile Path value from the chrome://version page.

    const browser = await puppeteer.launch({
    userDataDir:
    'C:\\Users\\<USERNAME>\\AppData\\Local\\Google\\Chrome\\User Data',
    args: []
    });
  • Use Chrome instance instead of Chromium by utilizing the Executable Path from the chrome://version URL. Close Chrome browser before running the script

    const browser = await puppeteer.launch({
    executablePath: puppeteer.executablePath('chrome')
    // ...
    });
  • Switch to the selected tab

    await page.bringToFront();
  • Get value based on evaluation in the browser page

    const shouldPaginate = await page.evaluate(
    (param1, param2) => {
    // ...
    },
    param1,
    param2
    );
  • Get HTML content from the specific element

    const html = await page.evaluate(
    () => document.querySelector('.field--text').outerHTML
    );
  • Wait for a specific selector to be loaded. You can also provide a timeout in milliseconds

    await page.waitForSelector('.success', { timeout: 5000 });
  • Manipulate with a specific element and click on some of the elements

    await page.$eval('#header', async (headerElement) => {
    // ...
    headerElement
    .querySelectorAll('svg')
    .item(13)
    .parentNode.click();
    });
  • Extend execution of the $eval method

    const browser = await puppeteer.launch({
    // ...
    protocolTimeout: 0
    });
  • Manipulate with multiple elements

    await page.$$eval('.some-class', async (elements) => {
    // ...
    });
  • Wait for navigation (e.g., form submitting) to be done

    await page.waitForNavigation({ waitUntil: 'networkidle0', timeout: 0 });
  • Trigger hover event on some of the elements

    await page.$eval('#header', async (headerElement) => {
    const hoverEvent = new MouseEvent('mouseover', {
    view: window,
    bubbles: true,
    cancelable: true
    });
    headerElement.dispatchEvent(hoverEvent);
    });
  • Expose a function in the browser and use it in $eval and $$eval callbacks (e.g., simulate typing using the window.type function)

    await page.exposeFunction('type', async (selector, text, options) => {
    await page.type(selector, text, options);
    });
    await page.$$eval('.some-class', async (elements) => {
    // ...
    window.type(selector, text, { delay: 0 });
    });
  • Press the Enter button after typing the input field value

    await page.type(selector, `${text}${String.fromCharCode(13)}`, options);
  • Open a file chooser and select file for upload

    const [fileChooser] = await Promise.all([page.waitForFileChooser(), page.click(selector)]);
    const filePath = `C:/Users/<USERNAME>/Downloads/test.jpeg`; // use "/" instead of "\" in file path
    await fileChooser.accept([filePath]);
  • Remove the value from the input field before typing the new one

    await page.click(selector, { clickCount: 3 });
    await page.type(selector, text, options);
  • Expose a variable in the browser by passing it as the third argument for $eval and $$eval methods and use it in $eval and $$eval callbacks

    await page.$eval(
    '#element',
    async (element, customVariable) => {
    // ...
    },
    customVariable
    );
  • Mock response for the specific request

    await page.setRequestInterception(true);
    page.on('request', async function(request) {
    const url = request.url();
    if (url !== REDIRECTION_URL) {
    return request.continue();
    }
    await request.respond({
    contentType: 'text/html',
    status: 304,
    body: '<body></body>'
    });
    });
  • Intercept page redirections (via interceptor) and open them in new tabs rather than following them in the same tab

    await page.setRequestInterception(true);
    page.on('request', async function(request) {
    const url = request.url();
    if (url !== REDIRECTION_URL) {
    return request.continue();
    }
    await request.respond({
    contentType: 'text/html',
    status: 304,
    body: '<body></body>'
    });
    const newPage = await browser.newPage();
    await newPage.goto(url, { waitUntil: 'domcontentloaded', timeout: 0 });
    // ...
    await newPage.close();
    });
  • Intercept page response

    page.on('response', async (response) => {
    if (response.url() === RESPONSE_URL) {
    if (response.status() === 200) {
    // ...
    }
    // ...
    }
    });

Demo

Runnable tests for each API pattern live in the browser-automation-puppeteer-demo folder. Get access via code demos.

Automated captcha solver

January 22, 2023

Many websites use CAPTCHAs to prevent bots from submitting forms and accessing their services. CAPTCHA is designed to separate humans from…

read more →