Skip to content
BlackNode App
Sign in Start workspace

Etsy scrape

A multi-stage pipeline that discovers Etsy shops matching a niche query, then harvests contact emails via the domain-guess trick (resolve {shop}.{tld} → fetch homepage → regex emails). Bypasses Datadome 403 by running through stealth v2.1 and IPRoyal CZ residential proxies.

Pipeline shape

discovery → filter → domain-guess → merge → generate-templates
    │            │           │            │            │
 puppeteer    DE/NL/...   {shop}.com    dedupe    Brevo-ready
 + stealth    GmbH ban    fetch HTML   suppressions  JSON

Yield numbers (production, 2026-05-17)

  • 14 domain-guess rounds across 8000+ shops ≈ ~9% email yield.
  • EU shops yield best because the DSA imprint requirement forces them to publish a contact email.
  • Best query so far: "personalized 3d print gift" at 36% yield.

Why we don't scrape the About page

Between May 9 and May 16 2026, Etsy stripped emails out of the About page HTML via Datadome. Same-origin fetch + domain-guess is the only stable extraction path.

Quickstart

claude
# > "scrape 'personalized 3d print gift' on Etsy,
#    target 50 EU shop emails, save to /workspace/etsy-leads.json"

Environment variables

IPROYAL_USER=...           # residential proxy creds
IPROYAL_PASS=...
IPROYAL_HOST=geo.iproyal.com:12321
PROXY_COUNTRY=cz           # CZ recommended for EU shops
STEALTH_VERSION=v2.1.0     # always load the 14-layer init
Datadome banner. If you see a Datadome captcha load, the proxy IP is burned. Rotate. Never solve manually — Etsy fingerprints the captcha and adds your fingerprint to a watchlist.

Banned email patterns

  • johndoe@gmail.com, customer@gmail.com — Etsy demo addresses.
  • francescoodierna@gmail.com — third-party email-finder spam.
  • cults3d@3dp.chat — competitor scraper bait.

DE / EU filter

26 shops were removed from a recent batch by the DE filter — anything matching @*.de, GmbH, Berlin, Wuerzburg in the contact name. We do this because EU + GmbH = stricter ÚOOÚ / DPA enforcement, not worth the marginal yield.

Last updated 2026-05-21 Edit on GitHub soon