Sitecore Production Environment on Azure Kubernetes Services – Part 1 (Azure Infrastructure Setup)

This is the first article of the series Sitecore Production Environment on Azure Kubernetes Services – Series in which I intend to explain the Azure Infrastructure Setup related to Sitecore deployment on Azure Kubernetes Services.

Problem Statement

In this article, lets focus on all the Azure infra components that are relevant for a production environment of Sitecore on AKS. Of course your environment may look like a bit different based on your target design, but even in that case, most of these components will still be relevant.

We’ll focus on the components, what do they do in our solution and how to deploy them, in this article.

Solution Design

The production environment looks something like this. This gives some good idea about how those components are connected to each other and their communication.

Azure Components

We’ll focus on these components in this article:

  • Azure Container Registry
  • A Virtual Network with 4 Subnets
  • Azure Keyvault
  • Azure Content Delivery Network (CDN)
  • Azure Firewall
  • Application Gateway
  • AKS Cluster

Azure Container Registry (ACR)

I mentioned in my previous article, that we were setting up multiple Sitecore environments on AKS, which meant a Test environment (SIT), an acceptance environment (UAT) and a production environment (PROD). Why this information is relevant here? Mostly because there were some design decisions based on that.

Azure Container Registry (ACR), we create a single ACR which is to be shared by all the three environments. This is to ensure all the environments have a single source of images and if something works (image) in UAT, that will work exactly the same in PROD as well.

So, we created a separate Resource group in Azure production subscription to hold just the ACR. All other components are created in another Resource group in the same production subscription.

There are various examples of how to create ACR. You can use template from here.

You can then pull the Sitecore Images from Sitecore Container registry to your own ACR using the script

az acr login -n $ACRName --expose-token
az acr import --name $ACRName --source scr.sitecore.com/sxp/sitecore-xp1-cd:10.1.1-ltsc2019  -t sitecorebaseimages/sitecore-xp1-cd:10.1.1-ltsc2019 --force

#Repeat the script for all other images

This ACR will also be used to store custom images developed by your Sitecore developers.

Virtual Network and Subnets

We need to create a Virtual Network (VNET) and 4 different subnets.

  • Sunbet for AKS Cluster
  • Subnet for Azure Firewall
  • Subnet for Azure Application Gateway
  • Subnet for External Data Sources

This requires some planning with you network guys to come up with an address space for the VNET and then for each of the subnet. Normally, you would like to keep a larger address space for AKS.

Also, you should keep separate address spaces for DockerBridgeAddress and AKS Service CIDR.

There are various examples of how to create VNET and Subnets. You can use template from here.

Azure Key Vault

Azure Key vault has been used to store all the required credentials, connection strings etc. These will be used by Azure Devops pipelines in Part 4 of this series to deploy these Azure resources. Even though, Azure Keyvault can be configured to be accessible only from within a given VNET, it’s a bit tricky if you are not using your own hosted agent on a VM inside the VNET.

So, in our case, we secured the Key vault using RBAC and Access Policies. There are various examples of how to create Azure Key vault. You can use template from here.

Azure Content Delivery Network (CDN)

You would want to make use of CDNs, specially in your production environment to get the associated benefits of faster speed due to cached content closer to users’ locations.

There are various examples of how to create Azure CDN. You can use template from here.

Azure Firewall

In our solution, we are using Azure Firewall and Application gateway with Web Application Firewall (WAF) in parallel. This means “Azure WAF protects inbound traffic to the web workloads, and the Azure Firewall inspects inbound traffic for the other applications. The Azure Firewall will cover outbound flows from both workload types.”

As per the official documentation about this topic – “Because of its simplicity and flexibility, running Application Gateway and Azure Firewall in parallel is often the best scenario.”

In our solution, we wanted all outgoing traffic originating from the AKS Cluster to be inspected by the Azure Firewall. You will see later in this article that we need to create our AKS cluster a bit differently in order to achieve this.

There are various examples of how to create Azure Firewall. You can use template from here.

Application Gateway

Azure Application Gateway is a web traffic load balancer that enables you to manage traffic to Sitecore hosted in AKS. This works with Application Gateway Ingress Controller and Web Application Firewall.

The Application Gateway Ingress Controller (AGIC) is a Kubernetes application, which makes it possible for Azure Kubernetes Service (AKS) customers to leverage Azure’s native Application Gateway L7 load-balancer to expose cloud software to the Internet. AGIC monitors the Kubernetes cluster it is hosted on and continuously updates an Application Gateway, so that selected services are exposed to the Internet. 

Web Application Firewall (WAF). WAF secures incoming traffic from common web traffic attacks. The instance has a public frontend IP configuration that receives user requests.

There are various examples of how to create Azure CDN. You can use template from here.

Azure Kubernetes Services Cluster

And finally the AKS Cluster. In the production environment, we decided to keep 1 Linux node and 1 windows node for running CD container and 1 windows node for running cm, id and all other containers. All these nodes are configured with Autoscaling to meet any future demands. We kept CD in a separate node to keep the size of the underlying VM smaller and to ensure it scales up on it’s own if the load increases.

We used Azure CLI “az aks create…” to create the AKS Cluster. The script below creates the first Linux node. The outbound-type flag in the script below ensures that all the outgoing traffic from within the AKS follows a User Defined Route (UDR) which is configured to go through the Route Table associated with Azure Firewall. All other variables are self explanatory.

az aks create --resource-group $ResourceGroupName `
    --name $AKSName `
    --kubernetes-version $KubernetesVersion `
    --location $Location `
    --windows-admin-username $WindowsAdminUsername `
    --windows-admin-password $WindowsAdminPassword `
    --vm-set-type VirtualMachineScaleSets `
    --node-count $AKSSystemNodeCount `
    --generate-ssh-keys `
    --network-plugin azure `
    --network-policy azure `
    --outbound-type userDefinedRouting `
    --enable-managed-identity `
    --assign-identity $AKSID `
    --aad-admin-group-object-ids $AADProfileAdminGroupObjectIDs `
    --service-cidr $ServiceCIDR `
    --dns-service-ip $DNSServiceIP `
    --docker-bridge-address $DockerBridgeAddress `
    --vnet-subnet-id $AKSSubnetID `
    --appgw-id $AppGatewayId `
    --enable-addons monitoring,ingress-appgw,azure-policy `
    --enable-aad `
    --enable-azure-rbac `
    --max-pods $MaxSystemPods `
    --zones $AvailabilityZones `
    --node-vm-size $SystemNodeVMSize `
    --node-osdisk-type Ephemeral `
    --node-osdisk-size $OSDiskSize `
    --nodepool-name $SystemNodePoolName `
    --nodepool-labels $SystemNodePoolNodeLabels `
    --workspace-resource-id $LogWorkspaceId `
    --tags $SystemNodePoolNodeTags   

In the above script, see the tag –outbound-type that is set to userDefinedRouting. This forces all the traffic originated from within AKS to be routed based on our custom rule. Which essentially is to pass through the Azure Firewall in our case.

To create windows node, the below script can be used

    az aks nodepool add --resource-group $ResourceGroupName `
    --cluster-name $AKSName `
    --os-type Windows `
    --name $UserNodePoolName1 `
    --node-vm-size $UserNodeVMSize `
    --node-osdisk-type Ephemeral `
    --node-osdisk-size $OSDiskSize `
    --zones $AvailabilityZones `
    --vnet-subnet-id $AKSSubnetID `
    --mode user `
    --max-pods $MaxUserPods `
    --node-count $AKSUserNodeCount `
    --labels $UserNodePoolNodeLabels `
    --tags $UserNodePoolNodeTags 

Summary

I know this seems like a very high level detail, but the idea is to provide the details of the components involved and not exactly the implementation details. Of course there are some low level details like Managed Accounts, Firewall rules, WAF rules, Routing rules etc. to take care of after the deployment. Ask me in the comments, if you would like to know more about any specific implementation detail.

In next article of this series, I will try to put down the details about External Data Sources setup. I will update the links of those articles in this post as and when those are published. Keep an eye on my LinkedIn posts or Twitter handle for the updates, if interested.

Enjoy,
Anupam

You may also like

3 comments

  1. Hi Anupam, this is awesome set of articles with lots of goodies about WCM on Azure Kubernets thru sitecore. Thank you for creating/sharing this knowledge blogs.

    I wanted to understand from you what was the VM cores you used for windows/linux VMs and what kind of load you were managing (end user load like 500 concurrent users, etc). Would you please share those details. Currently I am looking for a big public facing website (East and West US regions covered) with peak load of atleast 500 to 800 concurrent users any given time. I would really like to understand what was your Core Infra sizing and what kind of load you were supporting (approx numbers). appreciate if you can share that info.

    Thank you again!

  2. Hi Anupam,

    Thnaks for the wonderful article.
    Can you please share low level details like Managed Accounts, Firewall rules, WAF rules, Routing rules etc. to take care of after the deployment.

    I have to do Site Core 10.2 XP1 setup on Azure AKS Prod and PrePod environment.

Leave a Reply

Your email address will not be published. Required fields are marked *