Using Image Pull Secrets with Azure Container Registry

This topic may seem a bit outdated as there are better (and more recommended) ways now to work with Azure Container Registry (ACR). However, there are still many popular products in market, Sitecore in my case, which still use Image Pull Secret as the primary way to pull images from ACR.

With Sitecore now supporting Container based installations, many organizations are choosing to go that way to be future proof. Sitecore has been busy updating their documentation related to installation and configuration on Azure Kubernetes Services (AKS) and has published a great installation guide.

But this article is NOT about Sitecore in general, it’s about how can we make use of Image Pull Secrets to pull images stored in a private container registry, ACR in my case.

Problem Statement

If you google, about this topic, there are numerous articles, really good ones too with step by guide, like this one from Microsoft. So, why this article in the first place? Well, the answer the simple. Just google for “Imagepullsecret not working” or “Imagepullsecret throwing access deined” etc. and you will find numerous discussion thread going on for years.

Apparently, in a production environment, there are multiple parameters, all of which need to work well in order to this solution to work which look so simple in first glans, otherwise, you could end up spending hours, even days, troubleshooting what went wrong.

So, based on my learnings, I decided to put down a few points which might help others avoid some of those troubles.

Let’s summarize what we are trying to achieve here – We have an image stored in our Azure Container Registry, which resides in a Resource Group in a separate subscription than where our AKS is. Essentially, we created one common ACR which will be shared by all AKS environments like SIT, Staging and Production. So, the idea is simple, when we deploy a Pod in say staging AKS environment, it references to the ACR and pulls the image using ImagePullSecret directive in YAML file. And our environment is complete with VNET, Application Gateway, Azure Firewall, AKS etc, so it resembles a proper staging/production environment and not a POC setup.

How to go about it

Let’s first start with the known steps. As the official documentations explain – we first need to create a Service Principal in Azure AD and give it at least acrPull permission in the ACR.

Create Service Principal

We can use the following script to create a Service Principal. You need permission on Azure AD to be able to execute this script successfully.

az login --tenant <tenant-id>
az account set --subscription <Subscription ID in which ACR has been created>

$ACRName = 'MyOwnACR01',
$ServicePrincipalName = 'MyOwnACR-Service-Principal'

# Obtain the full registry ID for subsequent command args
$ACR_REGISTRY_ID=$(az acr show --name $ACRName --query id --output tsv)

# Create the service principal with rights to pull scoped to the registry.
$SP_PASSWD=$(az ad sp create-for-rbac --name $ServicePrincipalName --scopes $ACR_REGISTRY_ID --role acrpull --query password --output tsv)
$SP_APP_ID=$(az ad sp list --display-name $ServicePrincipalName --query [].appId --output tsv)

# Output the service principal's credentials; use these in your services and
# applications to authenticate to the container registry.
Write-Host "Service principal ID: $SP_APP_ID"
Write-Host "Service principal password: $SP_PASSWD"

Note down the Service principal ID and Service principal password shown in the output, we’ll make use of those to create the Image Pull Secret.

Create Image Pull Secret

This one is easy – just connect to your AKS instance and run this command.

#Switch to the Subscription where your AKS is located
az account set --subscription $SubscriptionID

#Change Kubectl context to the cluster
az aks get-credentials --resource-group <Resource Group of AKS> --name <AKS Cluster Name>

kubectl create secret docker-registry <name of the Image pull secret> --namespace <any custom namespace> --docker-server=MyOnwnacr01.azurecr.io --docker-username=<Service principal ID from previous step> --docker-password=<Service principal password from previous step>

If everything goes well, you will receive a one liner output like “<xxx> secret created”

After this step, most of the official documentations asks you to deploy a sample YAML to test if you can fetch the image from ACR. Like this example from official documentation.

apiVersion: v1
kind: Pod
metadata:
  name: my-awesome-app-pod
  namespace: awesomeapps
spec:
  containers:
    - name: main-app-container
      image: myregistry.azurecr.io/my-awesome-app:v1
      imagePullPolicy: IfNotPresent
  imagePullSecrets:
    - name: acr-secret

What can go wrong, right? Well, depends! If you are in luck, you will see that AKS was able to fetch your image from ACR and the pod as running, but it can also be that you receive one of the many possible errors. In my case, it was the later case and I ran into ImagePullErr and ImagePullBackoff and many more. So, let me walk you through with some of those, so that you can avoid some sleepless nights due to such errors.

Troubleshooting

When I found that My AKS was unable to pull images from my ACR

Pull Images from Public Docker/MS Library

My first step of troubleshooting was to isolate the problem, whether there was something wrong with AKS or ACR. So, I changed the image URL to point to a docker and executed the YAML. Since accessing Images from such public registry doesn’t need Image Pull Secret. But this time too, I was greeted with this nice error message.

Failed to pull image "redis:4.0.14-alpine": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/redis:4.0.14-alpine": failed to resolve reference "docker.io/library/redis:4.0.14-alpine": failed to do request: Head "https://registry-1.docker.io/v2/library/redis/manifests/4.0.14-alpine": EOF

So, AKS was unable to reach registry-1.docker.io, must be Azure firewall. So, I added an Application Rule there to allow access to *registry-1.docker.io. And tried to deploy the YAML again. And this time –

Failed to pull image "redis:4.0.14-alpine": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/redis:4.0.14-alpine": failed to copy: httpReaderSeeker: failed open: failed to do request: Get "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/e3/e3dd0e49bca555d559ca2e97f06a1efa108ebd230fddcb17606723994f18ae3b/data?verify=1628845779-bWSgTL%2FQnESq%2BF%2FWFgpKqy5C%2Bv4%3D": EOF

So, get the idea. I kept trying and adding the host names it was trying to access in the Azure Firewall Allow rules. Finally, this is how my application rule contained, before I could see the image getting pulled successfully by AKS.

*auth.docker.io,*cloudflare.docker.io,*cloudflare.docker.com,*registry-1.docker.io

Pull Images from ACR

Fresh with the first success, I cross verified that ACR is added as allowed in Azure Firewall using the Service Tag and imported an image in my ACR and tried to pull the image from ACR this time, using Image Pull Secret.

But well, this time, I was greeted with something new:

Failed to pull image "MyOwnacr01.azurecr.io/sitecorebaseimageses/external/solr:8.4.0": rpc error: code = Unknown desc = failed to pull and unpack image "MyOwnacr01.azurecr.io/sitecorebaseimageses/external/solr:8.4.0": failed to resolve reference "MyOwnacr01.azurecr.io/sitecorebaseimageses/external/solr:8.4.0": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized

So, it looked like that was able to reach ACR but couldn’t authenticate successfully. As you can Imagine, I suspected something to do with the Image Pull Secret created earlier. But to keep any firewall related issues out for the time being, I also added another (*) rule to allow all traffic.

Getting Image Pull Secret Right

I deleted and recreated the Image Pull Secret multiple times, putting the values in no quotes, single quotes, double quotes, nothing worked. I was able to get the images with Docker pull <acr> -u <service principal id> -p <service principal password>, so the service principal surely has the required permissions to pull the image. I won’t bore you with more details of the troubleshooting, but rather the final outcome about this. We need to get these 2 points right

  • Your AKS Kubernetes Server version and Kubectl client version must match: This is so easy to get it wrong. Even though, officially, we can keep Client and Server versions different by one minor versions, I recommend to use the exact same version for both. We tent to keep our client up to date whereas server is running some other version. You can check this quickly with Kubectl version command. This shows both Client and Server versions.
  • If these two won’t match, Image Pull Secret will still get created without any complaints, but won’t be able to fetch images.
  • You should create your ImagePullSecret in the same Namespace in which you are deploying your Pod. This can easily be overlooked, so if you passing the namespace name as “sitecore-sit” in the Image pull secret, ensure that the YAML file also contains the exactly same namespace.

Integrate AKS with ACR

Now this is a bit off topic. As the whole idea of this article was to pull the images using Image Pull Secret. But I am mentioning it here because this was one of the troubleshooting steps. Since I was getting 401 unauthorized error, I thought maybe I should try to get the image from ACR without using Image pull secret first, just to isolate that the problem was indeed with image pull secret itself. So, I executed this command, since my ACR and AKS are in two different subscriptions:


$AKSResourceGroupName = 'AKSRG01',
$AKSName = 'AKS01',
$AKSSubscriptionID = '<Subscription ID of AKS>',
$ACRResourceGroupName = 'ACRRG01',
$ACRName = 'myownacr01',
$ACRSubscriptionID = 'Subscription ID of ACR'

az login
az account set --subscription $AKSSubscriptionID
# link AKS to ACR
Write-Host "--- Linking AKS to ACR ---" -ForegroundColor Cyan
$ACRResourceID = "/subscriptions/$ACRSubscriptionID/resourceGroups/$ACRResourceGroupName/providers/Microsoft.ContainerRegistry/registries/$ACRName"
az aks update -n $AKSName -g $AKSResourceGroupName --attach-acr $ACRResourceID

Deployed the YAML again, and guess what, I received the same 401 unauthorized error. So, it wa time to move on and focus on getting the image fetched using Image pull secret again.

Getting the YAML Formal Right

Now that Image Pull Secret is in the format we want, I was still unable to fetch the image from ACR. And the culprit this time, a little formatting error in the YAML. Image Pull Secret tag MUST be at the same level as of containers and first level inside spec. Also, I was opening it in nodepad++ at times, so to indent, use space button, not tabs!

    spec:
      containers:
      - name: solr
        image: MyOwnacr01.azurecr.io/sitecorebaseimageses/external/solr:8.4.0
        ports:
        - containerPort: 80
      imagePullSecrets:
      - name: sitecore-docker-registry

At this time, I was able to pull the image successfully from my ACR… Yay!

Azure Firewall Rules

Now it was time to remove the (*) rule from Azure Firewall. So, I removed the rule and tried to pull the image again and surprise surprise, I received a similar error.

Failed to pull image "MyOwnacr01.azurecr.io/sitecorebaseimageses/external/solr:8.4.0": rpc error: code = Unknown desc = Error response from daemon: Head https://MyOwnacr01.azurecr.io/sitecorebaseimageses/external/solr:8.4.0: unauthorized: authentication required, visit https://aka.ms/acr/authorization for more information.

Now, this was frustrating! I was sure that this has nothing to do with Authentication or Authorization anymore. As just putting an “allow all” rule in firewall makes it work.

So, I enabled Diagnostic logging for the Firewall and started to monitor the deny traffic. And to my surprise, even though, I was trying to pull an image which I imported in my ACR, when AKS was trying to pull the image after deploying YAML file, it was trying to refer to lot many more URLs which were getting blocked by the firewall. So, like I did with the public docker images, I started to add allow rules for each of those URLs, one by one. I tested the YAML deployment after each additional rule, and surely, it was still blocking others.

Finally, I ended up opening up these URLs in the firewall to be able to fetch images, which were imported from sitecore registry. of-course, your URLs will be different depending upon your images.

*md-l1pmcrh4bxqx.z27.blob.storage.azure.net
*.microsoft.com
*.cloud.sitecore.net
f7e993c0be3a04e1f8075b2.file.core.windows.net
*.sitecore.com

And this is how my Application Rule Collection look like

And after these settings, finally when Ii deployed the YAML, saw that status “Running”. What a relief!

Final Thoughts

It took me quite sometime to figure it out, so I wanted to pen these down to save same efforts and anxious hours spent on troubleshooting. In summary, focus on these:

  • Use same Kubernetes version in Client and Server
  • Create Image Pull Secret in the same Namespace where you are deploying your Pod
  • Check your Azure Firewall Logs to examine which requests are getting blocked when the Pod is getting deployed and add them to allow rules.

Hope this help,

Thanks,
Anupam

You may also like

1 comment

Leave a Reply

Your email address will not be published. Required fields are marked *