If you sell on Shopify and you've added filters to your collection pages (color, size, price, brand), you've probably created a problem you can't see. Every time a shopper clicks a filter, Shopify builds a brand new URL with a parameter tacked on, something like ?filter.v.option.color=Blue. That feels harmless. It's how filtering is supposed to work. The trouble starts when crawlers, including the AI ones, treat each of those URLs as a separate page worth visiting.
Multiply a handful of colors by a few sizes by three price brackets by two brands, and a single collection can spawn hundreds or thousands of filter combinations. Most of them show almost the same products in a slightly different order. To you they're one page. To a crawler they look like a giant pile of near-duplicate pages, and the crawler will happily spend its visit chewing through them before it ever reaches the product pages that actually make you money.
What "crawl budget" actually means here
Crawl budget is just a rough cap on how much a bot is willing to fetch from your store in a given window. Google's crawler has one. So do the bots behind AI answers: GPTBot and OAI-SearchBot from OpenAI, PerplexityBot, Google-Extended, and the others feeding ChatGPT, Perplexity, Gemini, and Google's AI Overviews. None of them will crawl your store forever. They sample it.
So picture a crawler that's willing to fetch 500 URLs from your store this week. If 400 of those are filter-parameter URLs, that's 400 wasted fetches on pages that are basically the same. Your new arrivals, your restocked bestseller, the product page you just rewrote? They might not get crawled at all, or they get crawled so rarely that the AI's picture of your store is stale. When a shopper asks an assistant "what's a good waterproof hiking boot under 8000 rupees," the model leans on what it has actually seen and indexed. If it never got to your boot, you're not in the answer.
This is the same root issue as a store that's accidentally locked the door on bots entirely. If you haven't checked whether AI crawlers can even reach your content, start with this guide on whether your store is blocking AI crawlers, then come back here for the filter-specific cleanup.
How to confirm this is happening to you
Don't guess. Two quick checks tell you most of what you need.
First, open Google Search Console and go to Settings, then Crawl stats. Look at the "By response" and especially the "By file type" and host breakdowns, and click into the example URLs. If you see a flood of URLs with ?filter., ?sort_by=, or ?page= parameters being crawled, that's your budget leaking. Search Console reflects Googlebot, but Google's crawler behaviour is a good proxy: if Googlebot is drowning in parameter URLs, the AI bots almost certainly are too, since they crawl the same messy URL space.
Second, do a manual sanity check. Go to one of your busy collection pages, apply two or three filters, and watch the address bar fill up with parameters. Now imagine every shopper-possible combination of those filters existing as its own crawlable URL. That mental picture is roughly what the bot sees.
A rule of thumb: if your store has more crawlable filter URLs than it has actual products, the filters are the problem, not your content.
The fix, step by step, no developer needed
You can knock this down a lot with native Shopify settings plus one small file edit. Work through these in order.
1. Turn off the filters that don't earn their keep
In your Shopify admin, the filters on collection pages are controlled by the Search & Discovery app (it's free from Shopify) under Filters, or in some themes under the collection's filter settings. Go through your filter list and remove the low-value ones. Does anyone really filter your candle store by "material"? Probably not. Every filter you remove is a whole tree of parameter URLs that stops getting generated. Keep the two or three filters shoppers genuinely use (often size, color, price) and cut the rest. Fewer filters means fewer URL combinations, full stop.
2. Make sure your canonical tags point to the clean collection URL
A canonical tag tells crawlers "this filtered version is really just the main collection page." Most modern Shopify themes already output a canonical tag on collection pages that points to the clean URL without parameters, which is exactly what you want. You can confirm it: open a filtered collection URL, view the page source, and search for rel="canonical". The link inside it should be the plain collection URL with no ?filter junk. If it's correctly pointing to the clean URL, good, leave it. If your theme is older and the canonical is echoing back the full filtered URL, that's worth flagging to whoever maintains your theme, because it's telling crawlers each filter combo is its own real page.
3. Disallow filter parameters in robots.txt
This is the strongest single move. Shopify lets you customize robots.txt through a theme template. In your admin go to Online Store, Themes, then on your live theme click the three dots and choose Edit code. Add a new template and pick robots.txt.liquid from the list. Shopify gives you the default rules as a starting point, and you add your own.
Inside that template you can tell bots not to crawl filter and sort parameters. The cleanest approach is to add disallow rules for the parameter patterns Shopify uses, for example disallowing URLs that contain *?filter. and *?sort_by=. Because the file is Liquid, you edit it carefully and Shopify regenerates the live robots.txt. Save, then visit yourstore.com/robots.txt in a browser to confirm your new Disallow lines are showing up. One caution: keep the rules narrow so you only block parameter URLs, never the clean collection or product URLs themselves. If you're unsure about the exact pattern for your store, change one rule, check the live file, and verify your normal pages still resolve before adding more.
4. Keep your real collections and products in the sitemap
Shopify generates your sitemap automatically at yourstore.com/sitemap.xml, and it lists clean URLs for products, collections, pages, and blogs. It does not list filter-parameter URLs, which is exactly right. Your job is just to not accidentally hide good pages. Check that important collections aren't set to "hidden" and that key products are published to the Online Store sales channel. The sitemap is the positive signal that tells crawlers "here are the pages I actually want you to spend time on," so keep it clean and let it do that job.
What changes after you do this
Once the parameter URLs stop eating fetches, crawlers spend more of their limited budget on the pages that represent real products and real collections. Over the following weeks you should see, in Search Console crawl stats, the share of parameter URLs drop and crawling of your genuine pages climb. That's the foundation for AI assistants having a current, accurate picture of what you sell, which is the whole point.
Cleaning up crawl waste is invisible work, so it's worth checking the result from the outside. You can run a free AI visibility audit to see whether assistants like ChatGPT and Perplexity actually recommend your store when shoppers ask what to buy, and which competitors show up instead. If your products were getting starved of crawl attention, fixing your filters is often the first domino. Give the bots a clean path and let them go find the stuff you're actually trying to sell.
See where your store stands
Run a free AI Visibility Audit and find out if AI recommends you.
Get my free audit →Questions store owners ask
Will blocking filter URLs in robots.txt hurt my Google rankings?
No, as long as you only block the parameter URLs and not your clean collection and product pages. Those filtered URLs were near-duplicates that you never wanted ranking anyway. Your canonical tags already point search engines to the clean collection URL, so you lose nothing and free up crawl attention for pages that matter.
How do I edit robots.txt on Shopify if I'm not a developer?
Go to Online Store, Themes, click the three dots on your live theme, and choose Edit code. Add a new template and select robots.txt.liquid. Shopify shows the default rules and you add your own Disallow lines for filter parameters. After saving, visit yourstore.com/robots.txt to confirm your changes are live before adding more rules.
How do I know if AI crawlers are even reaching my product pages?
The quickest signal is Google Search Console crawl stats, since AI bots crawl the same URL space as Googlebot. If parameter URLs dominate what's being crawled, your real pages are getting starved. You can also run an AI visibility audit to check from the outside whether assistants actually surface your store when shoppers ask what to buy.
AI VISIBILITY