Infrastructure
This document describes the infrastructure used to deploy the production and staging environments of artifacthub.io
.
artifacthub.io
runs on AWS, using an account owned by the CNCF and managed by the Artifact Hub maintainers. The following services are being used at the moment:
-
Route 53: the
artifacthub.io
domain and associated DNS entries are managed from Route 53. The most important entry is theA
record forartifacthub.io
, which points to the domain name of a CloudFront distribution. -
Certificate Manager: the SSL/TLS certificates used by other services like CloudFront and Load Balancing are provisioned and managed by the Certificate Manager. Certificates are configured to be renewed automatically.
-
CloudFront: all static assets and API endpoints traffic is delivered from CloudFront, which caches accordingly to the origin cache headers. The main origin for each distribution is a load balancer that points to a pool of
hub
instances. Another S3 based origin hosts the static assets for the maintenance page. There is a set of behaviors to define more explicitly how some special paths and errors should be handled. -
Load Balancing: an application load balancer distributes traffic among the
hub
instances available. This load balancer acts as the main origin for the corresponding CloudFront distribution. It is created and managed automatically by the AWS Load Balancer Controller based on thehub ingress
resource. -
Firewall Manager: both CloudFront and the load balancer have associated a set of web ACLs rules to rate limit and block certain traffic patterns.
-
Container Registry: a Docker image for each of the Artifact Hub components is built and pushed to ECR for each commit to the
master
branch via the CI workflow. These images are the ones used by theartifacthub.io
production and staging deployments. These images are NOT publicly available. In addition to them, we also build images for each release version, which are published to the Docker Hub and made publicly available. -
Elastic Kubernetes Service: the Artifact Hub components are deployed on a Kubernetes cluster managed by EKS. Each environment (production and staging) runs on a separate cluster. The installation and upgrades are done using the official Helm chart provided by the project.
-
Relational Database Service (RDS): the PostgreSQL instance used as the main datastore for Artifact Hub is managed by RDS. Each environment has its own database instance running in a Multi-AZ setup.
-
Simple Email Service: Artifact Hub needs a SMTP server configured to be able to send emails. In the
artifacthub.io
deployments this is set up using SES.
This section describes how to bootstrap the artifacthub.io
deployment.
We’ll create a Kubernetes cluster in EKS using eksctl. The following command will spin up the cluster as well as all associated required resources, like the VPC, etc.
eksctl create cluster \
--name=<CLUSTER_NAME> \
--version=<KUBERNETES_VERSION> \
--region=<AWS_REGION> \
--managed \
--node-type=m5.xlarge \
--nodes=6 \
--nodes-min=6 \
--nodes-max=10 \
--alb-ingress-access
The Load Balancer Controller will take care of creating the application load balancer from the corresponding K8S ingress resource. Please follow to the official installation instructions to install it on the cluster.
We need to apply the readiness gate inject label to the namespace we’ll use to install Artifact Hub. This will allow us to indicate that the pod is registered to the application load balancer and healthy to receive traffic.
kubectl create namespace <NAMESPACE_NAME>
kubectl label namespace <NAMESPACE_NAME> elbv2.k8s.aws/pod-readiness-gate-inject=enabled
Before creating a PostgreSQL instance in RDS, we’ll setup a security and subnet groups for it. The security group will contain an inbound rule allowing traffic to the PostgreSQL service port from the EKS cluster nodes. The subnet group will list only the private subnets attached to the VPC that eksctl
created for our Kubernetes cluster. Once both are ready we can proceed with the RDS database creation.
The artifacthub.io
deployment is installed using the official Helm chart provided by the project. In addition to the default Chart values, we provide a file with some specific values for the staging and production environments. These are not recommended official values for production deployments, just the ones used by artifacthub.io
. On top of those, some extra values containing credentials and other pieces of information are provided using --set
when running the installation command.
helm install \
--values values-<ENVIRONMENT>.yaml \
--namespace <NAMESPACE_NAME> \
--set imageTag=<GIT_SHA> \
--set creds.dockerUsername=<DOCKER_USERNAME> \
--set creds.dockerPassword=<DOCKER_PASSWORD> \
--set db.user=<DB_USER> \
--set db.host=<DB_HOST> \
--set db.password=<DB_PASSWORD> \
--set email.fromName="Artifact Hub" \
--set email.from=hub@artifacthub.io \
--set email.replyTo=no-reply@artifacthub.io \
--set email.smtp.host=<SMTP_HOST> \
--set email.smtp.port=<SMTP_PORT> \
--set email.smtp.username=<SMTP_USERNAME> \
--set email.smtp.password=<SMTP_PASSWORD> \
--set dbMigrator.job.image.repository=<AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com/db-migrator \
--set hub.deploy.image.repository=<AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com/hub \
--set hub.ingress.annotations."alb\.ingress\.kubernetes\.io/certificate-arn"=<CERTIFICATE_ARN> \
--set hub.ingress.annotations."alb\.ingress\.kubernetes\.io/wafv2-acl-arn"=<ACL_ARN> \
--set hub.server.cookie.hashKey=<COOKIE_HASHKEY> \
--set hub.server.cookie.secure=true \
--set hub.server.csrf.authKey=<CSRF_AUTHKEY> \
--set hub.server.csrf.secure=true \
--set hub.server.xffIndex=-2 \
--set hub.server.oauth.github.clientID=<GITHUB_CLIENT_ID> \
--set hub.server.oauth.github.clientSecret=<GITHUB_CLIENT_SECRET> \
--set hub.server.oauth.google.clientID=<GOOGLE_CLIENT_ID> \
--set hub.server.oauth.google.clientSecret=<GOOGLE_CLIENT_SECRET> \
--set hub.analytics.gaTrackingID=<GA_TRACKING_ID> \
--set tracker.cronjob.image.repository=<AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com/tracker \
--set scanner.cronjob.image.repository=<AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com/scanner \
<RELEASE_NAME> .
For more information about any of the values provided, please check the values schema.
Once all the pods are up and running and the application load balancer corresponding to the hub ingress
has been provisioned, we can update the origin in the CloudFront distribution and point it to the new load balancer.