Etsy scrape
A multi-stage pipeline that discovers Etsy shops matching a niche query, then harvests contact emails via the domain-guess trick (resolve {shop}.{tld} → fetch homepage → regex emails). Bypasses Datadome 403 by running through stealth v2.1 and IPRoyal CZ residential proxies.
Pipeline shape
discovery → filter → domain-guess → merge → generate-templates
│ │ │ │ │
puppeteer DE/NL/... {shop}.com dedupe Brevo-ready
+ stealth GmbH ban fetch HTML suppressions JSON
Yield numbers (production, 2026-05-17)
- 14 domain-guess rounds across 8000+ shops ≈ ~9% email yield.
- EU shops yield best because the DSA imprint requirement forces them to publish a contact email.
- Best query so far: "personalized 3d print gift" at 36% yield.
Why we don't scrape the About page
Between May 9 and May 16 2026, Etsy stripped emails out of the About page HTML via Datadome. Same-origin fetch + domain-guess is the only stable extraction path.
Quickstart
claude
# > "scrape 'personalized 3d print gift' on Etsy,
# target 50 EU shop emails, save to /workspace/etsy-leads.json"
Environment variables
IPROYAL_USER=... # residential proxy creds
IPROYAL_PASS=...
IPROYAL_HOST=geo.iproyal.com:12321
PROXY_COUNTRY=cz # CZ recommended for EU shops
STEALTH_VERSION=v2.1.0 # always load the 14-layer init
Datadome banner. If you see a Datadome captcha load, the proxy IP is burned. Rotate. Never solve manually — Etsy fingerprints the captcha and adds your fingerprint to a watchlist.
Banned email patterns
johndoe@gmail.com,customer@gmail.com— Etsy demo addresses.francescoodierna@gmail.com— third-party email-finder spam.cults3d@3dp.chat— competitor scraper bait.
DE / EU filter
26 shops were removed from a recent batch by the DE filter — anything matching @*.de, GmbH, Berlin, Wuerzburg in the contact name. We do this because EU + GmbH = stricter ÚOOÚ / DPA enforcement, not worth the marginal yield.
Last updated 2026-05-21
Edit on GitHub
soon