robots.txt and AI Crawlers: What E-Commerce Merchants Need to Know

Your robots.txt file is a small text file at the root of your website that tells web crawlers what they can and cannot access. It has been a part of the web since 1994, and most merchants have never thought about it.

That needs to change. In the age of AI shopping, your robots.txt configuration directly determines whether AI agents can find and recommend your products.

#A Quick robots.txt Refresher

The robots.txt file lives at https://your-domain.com/robots.txt. It uses a simple format: you specify a user agent (the crawler's name) and then allow or disallow specific paths.

TXT

User-agent: Googlebot
Allow: /

User-agent: BadBot
Disallow: /

The rules are processed in order, and crawlers are expected to follow them voluntarily - robots.txt is a standard, not a security mechanism. What matters for AI shopping is that the major AI crawlers respect robots.txt, so if you block them, they will not crawl your site.

#The 6 AI Crawlers Every Merchant Should Know

These are the crawlers used by the major AI platforms to discover and index product data:

#1. GPTBot (OpenAI)

User-agent: GPTBot

Used by OpenAI to crawl pages for ChatGPT's training data and Shopping features. This is arguably the most important AI crawler for product discovery, given ChatGPT's market share in AI-assisted shopping.

#2. ChatGPT-User (OpenAI)

User-agent: ChatGPT-User

A separate crawler used when ChatGPT is actively browsing the web during a user conversation. Unlike GPTBot which crawls proactively, ChatGPT-User visits pages in real-time when a user asks ChatGPT to look something up.

#3. Amazonbot (Amazon)

User-agent: Amazonbot

Used by Amazon for Alexa answers and the Rufus AI shopping assistant within the Amazon app. If you sell products that compete with or complement Amazon listings, Amazonbot access can drive referral traffic.

#4. Bytespider (ByteDance)

User-agent: Bytespider

ByteDance's crawler, which powers TikTok's search and shopping features. Given TikTok Shop's rapid growth, Bytespider access is increasingly valuable for merchants targeting younger demographics.

#5. ClaudeBot (Anthropic)

User-agent: ClaudeBot

Anthropic's crawler for Claude, used to build knowledge that powers Claude's ability to answer questions about products, make recommendations, and assist with shopping tasks.

#6. PerplexityBot (Perplexity)

User-agent: PerplexityBot

Perplexity's crawler for its answer engine. Perplexity is increasingly used for product research and comparison, making it a valuable discovery channel for merchants.

#How to Check If You Are Blocking AI Crawlers

There are three ways to check your current configuration:

Method 1: Read your robots.txt directly. Visit https://your-domain.com/robots.txt in your browser. Look for any Disallow rules under the AI crawler user agents listed above, or under a wildcard User-agent: * rule.

Method 2: Use UCPReady.ai. Our scanner automatically checks your robots.txt against all six AI crawlers and tells you exactly which ones are allowed and which are blocked.

Method 3: Check for broad blocking rules. Look for patterns like these that block all crawlers (including AI agents):

TXT

# This blocks EVERYTHING
User-agent: *
Disallow: /

Or specific AI crawler blocks:

TXT

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

#Example robots.txt Configurations

#Recommended: Allow AI Crawlers

TXT

User-agent: *
Allow: /

# Explicitly allow AI crawlers
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Bytespider
Allow: /

# Block paths that should not be crawled
User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /checkout
Disallow: /account

Sitemap: https://your-store.com/sitemap.xml

#Selective: Allow AI Crawlers Only on Product Pages

If you want AI agents to access your product pages but not the rest of your site:

TXT

User-agent: GPTBot
Allow: /products/
Allow: /collections/
Disallow: /

User-agent: ChatGPT-User
Allow: /products/
Allow: /collections/
Disallow: /

This approach gives you more control but limits AI agents' ability to understand your full site context.

#Not Recommended: Blocking AI Crawlers

TXT

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

This keeps AI agents out entirely. Unless you have a specific legal or competitive reason to block AI crawlers, this hurts your discoverability.

#Platform Defaults You Should Know About

#Shopify

Shopify generates a robots.txt automatically that you cannot directly edit as a file. However, Shopify's defaults are generally AI-friendly - they allow most crawlers. You can customize the robots.txt behavior through your theme's robots.txt.liquid template.

Key Shopify default behaviors:

Most crawlers are allowed by default
Admin, checkout, and cart paths are blocked
Product and collection pages are accessible

To check if your Shopify store is blocking AI crawlers, visit your store's robots.txt URL directly or scan it with UCPReady.ai.

#WooCommerce

WooCommerce (WordPress) gives you full control over robots.txt, either through the Settings > Reading panel or by placing a physical robots.txt file in your WordPress root directory.

Watch out for these common WooCommerce issues:

Security plugins like Wordfence or Sucuri may add rules that block AI crawlers
SEO plugins like Yoast may add robots.txt rules you are not aware of
Hosting-level blocks - some WordPress hosts add server-level bot blocking that overrides your robots.txt

#BigCommerce, Wix, and Other Platforms

Most hosted platforms generate robots.txt automatically with limited customization. Check your platform's documentation for how to modify crawler access. If you cannot edit robots.txt directly, check if there is a setting in your platform's SEO or security configuration.

#The Bottom Line

Your robots.txt file is a gatekeeper. In the AI shopping era, the default should be to allow AI crawlers unless you have a specific reason not to. Every major AI shopping platform respects robots.txt, which means a few lines of misconfiguration can make your entire product catalog invisible to millions of potential customers using AI assistants.