SEO

XML Sitemap

Also: sitemap

An XML sitemap is a file that lists the URLs on a site you want search engines to know about, so crawlers can discover and prioritise pages for indexing rather than relying only on following internal links to find them.

A sitemap tells a search engine which pages exist and, optionally, when each was last changed. Shopify generates one automatically at /sitemap.xml, which in turn points to child sitemaps for products, collections, pages, and blog posts. Submitting that URL in Google Search Console gives Google a direct, authoritative list to crawl, which typically speeds up discovery of new and updated pages, especially on a large catalogue or a new store with few inbound links. The lastmod date matters more than most operators realise: it is the signal a crawler reads to decide whether a page is worth re-fetching, so an accurate timestamp on a genuinely changed page is one of the few honest levers you have over recrawl timing.

It is worth being precise about what a sitemap does not do. Listing a URL is a request, not a command: it does not guarantee a page will be indexed, it does not raise rankings, and it does not override a noindex tag or a robots.txt block. A page can sit in a sitemap and still be excluded because Google judges it thin, duplicate, or low value. The sitemap is a discovery aid, not a ranking input, and treating it as the latter leads to wasted effort.

Consider a Shopify store that launches forty new seasonal products on a Thursday morning. With no inbound links pointing at those URLs yet, Google has to find them by crawling internal navigation, which can take days. Submitting the product child sitemap, with each new URL carrying a current lastmod value, gives the crawler a flat list it can read in one pass, so the new pages tend to surface in coverage reports far sooner. If the same store later removes a discontinued line, those URLs should drop out of the sitemap rather than linger as 404s that quietly erode trust in the file.

Keep the sitemap honest and it stays useful: it should contain only canonical, indexable URLs that return a 200 status. Sitemaps with redirects, dead pages, or parameter duplicates waste crawl budget and dilute the signal. Submit it once in Search Console, then treat the coverage and indexing reports there as your feedback loop on whether the pages you listed are actually being picked up.

The sitemap also matters for AI search and answer engines. Tools such as ChatGPT, Perplexity, and Google AI Overviews still depend on the underlying web index, or on their own crawlers, to find and read your content before they can cite it. A page that has never been discovered cannot be summarised or quoted in an answer. A clean, current sitemap raises the odds that your product, collection, and guide pages are in the corpus these systems draw from, which is the quiet precondition for ever being recalled in a generated response. It will not write the answer for you, but it makes sure your pages are in the room when the answer is composed.