JSONPath Online logojsonpath.online
Web scraping recipes

XPath for Web Scraping: Reliable Extraction Patterns

Copy-ready XPath selectors and code snippets for Python (lxml, Scrapy) and JavaScript. Extract links, headlines, prices, and attributes while skipping ads and noise.

Common scraping recipes

Extract all links

//a/@href

Attributes return URLs directly; filter with starts-with(@href, 'http').

Try it

Article titles

//article//h2/text()

Use text() to avoid wrapping tags when exporting.

Try it

Image sources

//img/@src

Pair with @alt to keep context.

Try it

List items

//ul[@class='products']/li

Scope to the product list to avoid nav menus.

Try it

Skip ads

//div[not(contains(@class,'ad'))]

Exclude common ad wrappers.

Try it

Price fields

//span[contains(text(),'$')]

Use contains on text and refine with parent context.

Try it

Python + lxml

from lxml import html
import requests

resp = requests.get("https://example.com/articles")
doc = html.fromstring(resp.text)
titles = doc.xpath("//article//h2/text()")
links = doc.xpath("//article//a/@href")
print(titles, links)

Python + Scrapy

def parse(self, response):
    for article in response.xpath("//article"):
        yield {
            "title": article.xpath(".//h2/text()").get(),
            "url": article.xpath(".//a/@href").get(),
            "tags": article.xpath(".//a[@class='tag']/text()").getall(),
        }

JavaScript (browser/Node)

import { JSDOM } from "jsdom";

const dom = await JSDOM.fromURL("https://example.com");
const doc = dom.window.document;
const result = doc.evaluate("//a/@href", doc, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
const links = [];
for (let i = 0; i < result.snapshotLength; i++) {
  links.push(result.snapshotItem(i)?.textContent);
}
console.log(links);

Scraping tips

  • Scope selectors to the main content container to avoid nav/footer noise.
  • Combine predicates to exclude ads or placeholders: //div[not(contains(@class,'ad'))].
  • Export attributes directly (href, src, data-*) when building datasets.
  • Pair XPath with request caching and polite crawl delays; this guide focuses purely on selector quality.

Next, compare XPath with CSS for your stack in the selector comparison page or jump to the examples library for more scraping-specific patterns.