Lambda sitemap generator
How to make a modern sitemap with Netlify functions.
When migrating to Gatsby v3 we ended up needing to swap our sitemap plugin because it was outdated both for our version of Gatsby, and Google had updated the fields they look for when crawling sitemaps. The gatsby-plugin-advanced-sitemap-modest
Gatsby plugin I built during the v2 to v3 migration was a solid holdover, but if we ever want to move to v4 basically every convention the plugin relies upon will be deprecated, might as well fix it now.
Netlify functions
Netlify offers Netlify functions, which are (simply put) a more streamlined way to integrate AWS Lambda functions into your codebase and develop with them alongside your app (as well as a host of other things). Up to this point, Brightcove had already implemented them for form validation, site search, and most importantly for our blog’s RSS feed, which is essentially a ‘lite’ version of what our sitemap is today.
Sitemap city, population: me
Going into this project we had one sitemap-index
with 4 sitemaps listed: sitemap-posts
, sitemap-press-releases
, sitemap-pages
, and sitemap-legacy-pages
. Previously we had 1 sitemap-pages
covering all the content in these 4, but when we added info about localized pages to our sitemaps this sitemap file became huge. I wanted to break the sitemap into more manageable, organized directories, and felt like a sitemap index for each language, pointing to our various content types was the best path forward.
I feel like I’m pretty familiar with the Google sitemap docs, but anytime I have a question about something I’m actually building it’s in some engineer’s tweeted response to someone looking for insight like when someone asks if nested sitemaps are supported:
No, it's not supported -- sorry :)
— 🧀 John 🧀 (@JohnMu) November 11, 2020
I wanted to essentially submit 6 sitemaps to Google via Search Console, one for each language Brightcove.com is served in.
Parsing content types and building feed
At a super high-level, Brightcove.com’s content is either built from the /pages/
directory, which usually contain a hodge-podge of content type queries to source all the appropriate and relevant info, or is fed 1:1 from a Contentful content type to a corresponding template via a createPage
call in gatsby-node.js
.
Static(ish) pages
We had a small collection of pages in /pages/
that I snagged the slugs for, as well as a set of archive webinar page info stored in a JSON object, so I declared and merged that info
Everything else
For the rest of the pages the actual process that happens when Gatsby builds our site is as follows:
- We query a list of IDs for each published entry of a specific type
- We feed those IDs (and a small subset of information including the page’s path) to a
createPage
call createPage
calls a template made up of a React component and a GraphQL query and builds that page as HTML at the path specified
So I had to basically recreate up until the createPage
call: use the path of the sitemap to determine what content type it was for (sitemap-posts
vs sitemap-pages
), query all published entries of that content, generate the appropriate URL information based on the type, slug, updated time, and lang. I had a pre-determined map of the content types I’d need to account for. Below is how I got the content type from the request path and fed the queried JSON content to a function that builds the XML feed. Note: When doing this project I rendered all posts and press releases updated prior to October 20, 2020 as a static XML file, as they’re largely old webpages that are not trafficked or updated, and there are such a large number that it significantly slowed the load time of these sitemaps
Here’s that buildFeed
function, which takes an array of entries and the content type, and gives you back the appropriate stringified XML:
And finally the ‘under the hood’ bit - the stringify
function which is called on each entry passed to buildFeed
to create the URL item for that entry in the sitemap, including the xhtml alternate
fields to represent the localized version of each page.
The main thing not shown here is an entry parser that adds the alts
field used in the stringify
function. As every page that’s not in English on brightcove.com falls back to English if the specific lang is not translated, it’s safe to assume every page published will have alternate versions, even if their translations have yet to be created.