Pre-Rendering Pages in Spartacus and SAP Commerce Cloud

Pre-Rendering Pages in Spartacus and SAP Commerce Cloud

Pre-rendering, also known as static rendering pages on a JavaScript web application has strong benefits over rendering them real-time. In this article, we will discuss about the problems of the real-time rendering and how pre-rendering can solve them.

Server-Side Rendering

Server-Side Rendering (SSR) is crucial with Single-Page Applications (SPA) for Search Engine Optimization (SEO) purposes. The Spartacus, an SPA Storefront for SAP Commerce Cloud, comes with SSR support leveraging Angular Universal. However, at the time of writing of this article (Release v5), only real-time rendering with optional in-memory caching is supported out of the box. This approach has some pitfalls that have devastating consequences.

Real-Time Rendering

Any sizeable online store would have hundreds — perhaps even hundreds of thousands — of pages that need to be server-side rendered. The server needs to be able to render many pages concurrently. They have to be rendered quickly as well, within a few seconds ideally, before the visitors browse away after waiting for too long for the page to open. During times of high traffic, this could be like having a hundred Chrome tabs load web pages all at the same time from a single PC. It would not be surprising for the computer to slow down terribly in such situations. With limited amount of resources, the server is definitely not free of such problems. Furthermore, the machine may even run out of memory or there may be no more worker to serve new incoming traffic. The machine will freeze up, and the web server will start returning 503 (Service Unavailable) or 504 (Gateway Timeout) a.k.a. site outage. This harmful situation is even more amplified when web crawlers go through every page throughout the site.

One could try to load a web page each on a hundred PCs instead. Likewise, the server could be horizontally-scaled, but you might be hit with a huge financial cost. Orchestrating, coordinating and balancing tasks among hundreds of PCs/servers is not necessarily an easy task either.

Caching

You may consider caching rendered pages but the caching cannot solve the problem on its own.

  • With in-memory caching, the amount of pages that can be stored would be quite limiting. Moreover, the cache will be cleared when the application restarts and it would not be shared between pods.
  • If there is a surge of traffic to unique pages, which often happens during web crawling, it is likely that the cache is not warm enough yet. When the cache-miss rate is too high, the caching will only have an adverse effect. The system has to perform I/O operations on cache on top of rendering pages.
  • Without direct insight into page contents (database), it is difficult for the frontend application to know when a cache entry needs to be updated. Serving an outdated page could cause serious confusions for customers when a crucial information such as product price has changed. Using a Content Delivery Network (CDN) does not address above issues.

Robots Exclusion Standard

Placing a well-written robots.txt could alleviate the problem by scheduling and spreading out the web crawler traffic. It does not however help when

  • The crawler does not respect the rules defined in robots.txt. - Or a web administrator might trigger scanning of the entire site from tools like Google Search Console.
  • There is a surge of human traffic. - This is a common scenario on e-commerce websites during high sales period like Black Friday and when there is a huge marketing campaign through email, Social Networking Services (SNS), etc.

Dynamic Client-Side Rendering

When the server is too low on resources to be able to effectively render pages, it could fallback to Client-Side Rendering (CSR) for the time being. It could help avoid or delay site-outage situation and visitors can continue to browse the site without interruption. If implemented carelessly, it could have negative impacts on the SEO side however.

  • A page without any meaningful content could be indexed by search engine.
  • A search engine might treat it as “not found” and remove the page from its index.
  • A search engine might consider it as cloaking and penalize the site.

Pre-Rendering (Static Rendering)

Imagine having all (or almost all) pages rendered and saved in some form of long-term storage (local or network file storage, CDN, etc) at all times, well before those pages are requested. No page request, whether it comes from a human user or a bot user, would ever need to trigger the server to render a page. The user-facing server is reduced to a file server effectively. It simply reads static files from the storage and then returns them as-is when a page is requested. The burden of resource-intensive scripting is lifted off the server. The logic complexity is also minimized.

A separate application that is not exposed to the end-users has to render the pages and write them to the storage. The site owner can have full control over the scheduling and coordination of page rendering — free from external factors and surprises. How does this application know what pages to render though? There is certainly more than one way, and each would have its own set of pros and cons. Some strategies are:

  • Have direct access to the database that contains the website’s page data.
  • Poll the sitemap.xml.
  • Start with one page such as homepage and crawl through links.
  • Combine multiple strategies.

There are many factors to consider before deciding on one way. The biggest factor is perhaps the technologies the website is running on. Let’s look at how the pre-rendering could be achieved with SAP Commerce Cloud Backend and Spartacus Frontend setup, utilizing their libraries as much as possible. Also, we will focus on Product, Category and CMS pages as they are more SEO-sensitive than other types of pages such as Cart, Checkout and My Account.

Backend — Page Selection

On SAP Commerce Cloud (hereinafter “the Backend”) powered e-commerce websites, the contents are often ultimately managed via its database. The Backend has many functions built-in for interacting with the Product, Category and Content Page data (hereinafter “the Data”).

  • Listing all the Data.
  • Restricting public access to the Data. - For example, the product page for a non-approved product cannot be accessed.
  • Carrying out tasks after creations, modifications and deletions of the Data.
  • Fetching details of the Data.

To extract the list of pages to pre-render, we need to list all the Data, and filter out the ones that are not supposed to be publicly-accessible. We also need to re-render pages or delete rendered pages as soon as possible when the Data are updated or made non-public. Sounds a lot like Solr Indexing, right? The Backend’s Solr Indexing module consists of Cron Jobs that use Indexer Queries to list/index all, updated, or deleted publicly-accessible products. It is also capable of hot-updating indexes immediately after a product has been modified.

The Solr Indexing gives inspirations to the pre-rendering Data selection strategy. The Backend could execute Cron Jobs that list all, updated, or deleted publicly-accessible Data. Then for each record, it could ask the page rendering application to re-render the record or delete it. For the immediate update, aspect-oriented programming (AOP) or Interceptors could be used.

The out-of-the-box Solr Indexing mostly operates with the Products. The pre-rendering Data selection also need to operate with the Categories and Content Pages. For each data type, we need to define what is publicly-accessible and when a record is considered to be updated or deleted. While the exact definitions would depend on various use cases of each SAP Commerce Cloud implementation, here are some examples.

Publicly-Accessible Data

  • Product: The approvalStatus if APPROVED and today’s date is between onlineDate and offlineDate.
  • Category: Cannot be a ClassificationCategory; allowedPrincipals is null or includes anonymous; Has at least one subcategory or product.
  • Content Page: The approvalStatus is APPROVED and pageStatus is ACTIVE.

Updated Data

It is easy to tell whether and when a given instance is updated by looking at its modified timestamp. However, the Backend updates the modified timestamp only when a direct attribute has changed. When a nested complex type, such as Price Row under Product has changed, the modified timestamp is not updated. Therefore, the modified timestamps of nested types have to be inspected as well.

  • Product: Price/Discount/Tax Rows, Stock Level Statuses, Variants
  • Category: Products[1]
  • Content Page: Content Slots, CMS Components[2]

1: Because the Category pages display the data from Solr documents instead of database, it could be more accurate to leverage the Solr Search API to tell whether a product and hence a category has been updated. A category may also consist of more than one page due to pagination. In such case, we may want to pre-render all or at least the first N pages of the category. The Solr Search API could also provide the pagination information.

2: Related Item Services could be used to get all affected Content Pages when a content slot or CMS component has been modified.

Renderer

As mentioned earlier, no page request can trigger a page rendering in our pre-rendering architecture. The Renderer is not exposed to the public. As such, although the Spartacus is colloquially referred to as a “Frontend” stack, the Renderer is not precisely a “Frontend” application. The Spartacus is a mere (major) part of the Renderer. If the Renderer is not publicly-accessible, how could the Backend send rendering requests? Typically, the Backend and the Spartacus are hosted under different containers and have no communication channel between them besides the HTTP. What really makes the Renderer more than just the Spartacus is how the rendering can be triggered. Needless to say, there is more than just one answer. The best answer would be largely based on each Spartacus implementation’s architectural landscape. Here are some ideas worth considering:

  • Expose access-controlled API over HTTP.
  • Go through a central broker where the Backend pushes the requests while the Renderer pulls them. The broker could be something as simple as a shared file storage or something more advanced such as Message Queue (MQ).
  • Have the Backend store the requests as a Media file similar to sitemap.xml and let the Renderer poll it.

Keep in mind that the Spartacus’ SSR out-of-the-box depends on the Express engine _@nguniversal/express-engine_, and hence requires HTTP traffic to be triggered. While the first of the above options is HTTP-driven, the others are not. In these cases, Angular Universal’s Common Engine can be used to trigger the rendering process.

After rendering a page, the Renderer will have to

  1. Map the rendered page to a file path of the shared storage.
  2. Write the page to the shared storage.

Frontend — Serving Pages

The front-facing server (hereinafter “the Frontend”) is the one ultimately responsible for handling the requests from users. Given that the pages are already rendered, the Frontend does not need any rendering logic. It does not depend on the Spartacus either. Free from the Spartacus-dependency, it does not need to be an Angular application even. Its primary tasks are:

  1. Map the requested page URL to a file path of the shared storage.
  2. Read files from the shared storage and serve them. Optionally, go through another branch of logic if the file is too old (stale).
  3. Handle situations where a rendered page is not found. It could return a 404 page, or fallback to CSR with 5XX HTTP status for example.

Determining the file path of the shared storage, whether it is mapped from a page request URL or a rendered page, is a (partially) common task between the Renderer and the Frontend. To reduce chances of miscommunications, this piece of logic could be extracted to a shared library.

Storage

The storage is what brings the Renderer and the front-facing server together. There are many choices for the kind of storage to be used. When deciding on one, some of the most important factors to consider are:

  • Has to be independent of the Backend, the Renderer and the Frontend. It has to be available without corruption even when there is a failure or outage with any of these three components.
  • Has to provide fast read performance, even under heavy pressure. If the time to read a file is slower than rendering the file, the benefits of pre-rendering may be lost.

Architecture

Below are simplified diagrams illustrating the real-time rendering and pre-rendering architectures. The Angular symbol represents the scripting component that is in charge of rendering pages. This is often the bottleneck during page rendering.

Real-Time Rendering Architecture

Real-Time Rendering Architecture

In this design, the bottleneck component serves the end-users. Too much traffic can easily degrade its performance. When it goes offline, so does the end-user-facing server.

Pre-Rendering Architecture

Notice that in this design, the bottleneck component is inaccessible from the end users. Because of this, the component is protected from uncontrolled and potentially excessive traffic. Even when it is offline for any reason, the end users continue to be served by the Frontend, the much less vulnerable component.

Pre-Rendering Architecture

See Also

Check out these articles too. They make great points and explanations about the SPA and SSR.