Cheerio: jQuery's Cool Cousin Who Lives on the Server

Cheerio is a fast, flexible library for parsing and manipulating HTML and XML on the server. Think of it as jQuery without the browser -- you get the same beloved selector syntax, the same traversal methods, and the same manipulation API, but running entirely in Node.js with none of the overhead of a real DOM. With over 15 million weekly downloads and 30,000 GitHub stars, cheerio has cemented itself as the industry standard for server-side HTML work. Whether you are scraping product prices, transforming email templates, or extracting metadata for SEO analysis, Cheerio makes the job feel effortless.

Why Cheerio Stands Out

Cheerio occupies a sweet spot that few libraries manage to hit. Here is what makes it special:

jQuery-Style API: If you have ever written $('h2').text(), you already know how to use Cheerio. The learning curve is essentially zero for anyone with front-end experience.
Blazing Fast: Because it does not spin up a browser, execute JavaScript, or render anything visually, Cheerio parses and manipulates HTML orders of magnitude faster than tools like Puppeteer or jsdom.
Lightweight: No headless Chrome, no virtual DOM engine. Just a parser and a clean API.
Flexible Parsing: Ships with both htmlparser2 (fast and forgiving) and parse5 (strict HTML5 spec compliance), so you can pick the right tool for the job.
Built-in URL Fetching: As of v1.0, you can load documents directly from URLs, buffers, and streams without any extra dependencies.
Slim Build Available: Import from cheerio/slim for an even lighter bundle when you only need basic functionality.

Setting Up Shop

Getting Cheerio into your project takes a single command:

npm install cheerio
# or
yarn add cheerio

Cheerio ships as an ES module with CommonJS support. It requires Node.js 20.18.1 or higher as of v1.2.0.

The Basics: Loading and Querying

Loading Your First Document

Everything in Cheerio starts with load. You hand it an HTML string, and it hands back a $ function that works just like jQuery:

import * as cheerio from 'cheerio';

const html = `
  <html>
    <head><title>My Page</title></head>
    <body>
      <h1 class="hero">Welcome</h1>
      <ul id="fruits">
        <li class="fruit">Apple</li>
        <li class="fruit">Banana</li>
        <li class="fruit">Cherry</li>
      </ul>
    </body>
  </html>
`;

const $ = cheerio.load(html);

console.log($('title').text());
// => "My Page"

console.log($('.fruit').length);
// => 3

The $ function accepts any CSS selector and returns a Cheerio object you can chain methods on, exactly like jQuery.

Traversing the Tree

Once you have selected elements, the full jQuery traversal toolkit is at your disposal:

import * as cheerio from 'cheerio';

const $ = cheerio.load(`
  <div class="container">
    <article>
      <h2>Post Title</h2>
      <p class="excerpt">A short summary...</p>
      <div class="tags">
        <span>TypeScript</span>
        <span>Node.js</span>
      </div>
    </article>
  </div>
`);

const article = $('article');

const title = article.find('h2').text();
// => "Post Title"

const tags = article.find('.tags span').map((i, el) => $(el).text()).get();
// => ["TypeScript", "Node.js"]

const parentClass = article.parent().attr('class');
// => "container"

Methods like .find(), .parent(), .children(), .siblings(), .next(), .prev(), and .closest() all work exactly as you would expect from jQuery.

Reading and Writing Attributes

Extracting and modifying attributes is straightforward:

import * as cheerio from 'cheerio';

const $ = cheerio.load('<a href="/about" class="nav-link">About Us</a>');

const link = $('a');

console.log(link.attr('href'));
// => "/about"

link.attr('href', '/about-us');
link.addClass('active');
link.attr('data-section', 'navigation');

console.log($.html(link));
// => <a href="/about-us" class="nav-link active" data-section="navigation">About Us</a>

Going Deeper: Real-World Patterns

Scraping Structured Data

One of the most common uses for Cheerio is pulling structured data out of HTML pages. The extract method, introduced in v1.0, makes this particularly elegant:

import * as cheerio from 'cheerio';

const html = `
  <div class="product-list">
    <div class="product">
      <h3 class="name">Mechanical Keyboard</h3>
      <span class="price">$149.99</span>
      <span class="rating" data-score="4.5">4.5 stars</span>
    </div>
    <div class="product">
      <h3 class="name">Ergonomic Mouse</h3>
      <span class="price">$79.99</span>
      <span class="rating" data-score="4.2">4.2 stars</span>
    </div>
  </div>
`;

const $ = cheerio.load(html);

const products = $('.product').map((i, el) => {
  const $el = $(el);
  return {
    name: $el.find('.name').text(),
    price: parseFloat($el.find('.price').text().replace('$', '')),
    rating: parseFloat($el.find('.rating').attr('data-score') ?? '0'),
  };
}).get();

console.log(products);
// => [
//   { name: "Mechanical Keyboard", price: 149.99, rating: 4.5 },
//   { name: "Ergonomic Mouse", price: 79.99, rating: 4.2 }
// ]

Loading HTML Directly from a URL

Since v1.0, Cheerio can fetch and parse documents from URLs without you needing to install a separate HTTP client:

import * as cheerio from 'cheerio';

async function scrapePageTitle(url: string): Promise<string> {
  const $ = await cheerio.fromURL(url);

  const title = $('title').text();
  const metaDescription = $('meta[name="description"]').attr('content') ?? '';
  const h1Text = $('h1').first().text();

  return JSON.stringify({ title, metaDescription, h1Text }, null, 2);
}

const result = await scrapePageTitle('https://example.com');
console.log(result);

The fromURL function handles content-type detection, character encoding sniffing, and base URI resolution automatically.

Transforming and Sanitizing HTML

Cheerio is not just for reading HTML -- it excels at transforming it too. Here is an example that cleans up user-submitted content:

import * as cheerio from 'cheerio';

function sanitizeUserHtml(rawHtml: string): string {
  const $ = cheerio.load(rawHtml);

  $('script').remove();
  $('style').remove();
  $('[onclick]').removeAttr('onclick');
  $('[onerror]').removeAttr('onerror');

  $('a').each((i, el) => {
    const $link = $(el);
    const href = $link.attr('href') ?? '';
    if (href.startsWith('javascript:')) {
      $link.removeAttr('href');
    }
    $link.attr('rel', 'noopener noreferrer');
    $link.attr('target', '_blank');
  });

  $('img').each((i, el) => {
    const $img = $(el);
    if (!$img.attr('alt')) {
      $img.attr('alt', 'User uploaded image');
    }
  });

  return $('body').html() ?? '';
}

const clean = sanitizeUserHtml(`
  <p>Hello!</p>
  <script>alert('xss')</script>
  <a href="javascript:void(0)" onclick="steal()">Click me</a>
  <img src="photo.jpg">
`);

This pattern is invaluable for content management systems, email template processors, and anywhere user-generated HTML needs to be made safe.

Working with XML

Cheerio handles XML just as naturally as HTML. Pass the xml option to enable strict XML mode:

import * as cheerio from 'cheerio';

const rssXml = `
  <rss version="2.0">
    <channel>
      <title>Dev Blog</title>
      <item>
        <title>Understanding Parsers</title>
        <link>https://blog.example.com/parsers</link>
        <pubDate>Mon, 17 Feb 2026 10:00:00 GMT</pubDate>
      </item>
      <item>
        <title>Node.js Performance Tips</title>
        <link>https://blog.example.com/perf</link>
        <pubDate>Sun, 16 Feb 2026 09:00:00 GMT</pubDate>
      </item>
    </channel>
  </rss>
`;

const $ = cheerio.load(rssXml, { xml: true });

const feedTitle = $('channel > title').text();
// => "Dev Blog"

const items = $('item').map((i, el) => ({
  title: $(el).find('title').text(),
  link: $(el).find('link').text(),
  date: $(el).find('pubDate').text(),
})).get();

console.log(feedTitle, items);

This makes Cheerio a natural fit for parsing RSS feeds, SVG files, SOAP responses, and other XML-based formats.

Conclusion

Cheerio has earned its place as the default choice for server-side HTML and XML work in the JavaScript ecosystem. Its jQuery-compatible API means you can be productive within minutes, while its speed and low memory footprint make it practical for everything from one-off scripts to high-throughput data pipelines. The v1.0 release brought modern conveniences like URL fetching and the extract method, and the library continues to evolve with regular updates -- the latest v1.2.0 landing in January 2026.

If your task involves static HTML and you do not need JavaScript execution or browser rendering, cheerio will almost certainly be the fastest and simplest path to a solution. For dynamic, JavaScript-heavy pages, you will want to reach for browser automation tools like Puppeteer or Playwright instead. But for the vast majority of parsing, scraping, and transformation work, Cheerio is the tool that just works.