Customizing the AWS EKS Cluster Worker Node Group Template
Recently I tested out Amazon Web Service's new offering of managed Kubernetes service named EKS. Amazon is great about providing detailed tutorials to get started using their services and EKS is no exception. Check out the tutorial here
During the creation of the EKS cluster one of the steps is to use CloudFormation to create the cluster's worker nodes. Tutorial here
After creating a cluster and putting it to use I was happy to finally have an AWS managed K8s cluster and all I would need to worry about is paying for use of the EC2 instances. A few days passed and I realized the cluster was using the maximum number of worker nodes (NodeAutoScalingGroupMaxSize) specified in the CloudFormation template. After contacting support they informed me there is a setting in the template to control the desired capacity (DesiredCapacity). Below are the modifications to the template to add a field for specifying the DesiredCapacity of your EKS cluster.
First I added a field for the ClusterDesiredCapacity
ClusterDesiredCapacity:
Type: Number
Description: Desired size of Node Group ASG.
Default: 2
Next and Lastly I changed the setting for DesiredCapacity: under the NodeGroup section from its default setting of !Ref NodeAutoScalingGroupMaxSize to using the new field we just added !Ref ClusterDesiredCapacity
Here is the entire NodeGroup: section for reference with the change applied.
NodeGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
DesiredCapacity: !Ref ClusterDesiredCapacity
LaunchConfigurationName: !Ref NodeLaunchConfig
MinSize: !Ref NodeAutoScalingGroupMinSize
MaxSize: !Ref NodeAutoScalingGroupMaxSize
VPCZoneIdentifier:
!Ref Subnets
Tags:
- Key: Name
Value: !Sub "${ClusterName}-${NodeGroupName}-Node"
PropagateAtLaunch: 'true'
- Key: !Sub 'kubernetes.io/cluster/${ClusterName}'
Value: 'owned'
PropagateAtLaunch: 'true'
UpdatePolicy:
AutoScalingRollingUpdate:
MinInstancesInService: '1'
MaxBatchSize: '1'
Note: Futher below I've included the entire template as well as a link to the template hosted in S3.
Now that we have a new cloudformation template we can create a new stack.
When creating the new stack you will now have a field under Other Parameters for ClusterDesiredCapacity
After creating a new stack the role arn for the worker node group will have changed so you will need to modify the AWS auth configmap to allow these new worker nodes to join your cluster. Here is an example of that config map:
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- rolearn: [PUT_THE_ROLEARN_GENERATED_BY_CLOUDFORMATION_HERE]
username: system:node:{{EC2PrivateDNSName}}
groups:
- system:bootstrappers
- system:nodes
mapUsers: |
- userarn: arn:aws:iam::123456789123:user/youreanawesomeuser
username: arn:aws:iam::123456789123:user/youreanawesomeuser
groups:
- system:masters
The final thing to watch out for is security groups. Part of the stack is a security group for the nodes being created. If you need to allow your worker nodes to access other AWS resources now would be the time to change up those security groups.
Customized Amazon S3 template URL
---
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Amazon EKS - Node Group - Released 2018-08-30'
Parameters:
KeyName:
Description: The EC2 Key Pair to allow SSH access to the instances
Type: AWS::EC2::KeyPair::KeyName
NodeImageId:
Type: AWS::EC2::Image::Id
Description: AMI id for the node instances.
NodeInstanceType:
Description: EC2 instance type for the node instances
Type: String
Default: t2.medium
AllowedValues:
- t2.small
- t2.medium
- t2.large
- t2.xlarge
- t2.2xlarge
- m3.medium
- m3.large
- m3.xlarge
- m3.2xlarge
- m4.large
- m4.xlarge
- m4.2xlarge
- m4.4xlarge
- m4.10xlarge
- m5.large
- m5.xlarge
- m5.2xlarge
- m5.4xlarge
- m5.12xlarge
- m5.24xlarge
- c4.large
- c4.xlarge
- c4.2xlarge
- c4.4xlarge
- c4.8xlarge
- c5.large
- c5.xlarge
- c5.2xlarge
- c5.4xlarge
- c5.9xlarge
- c5.18xlarge
- i3.large
- i3.xlarge
- i3.2xlarge
- i3.4xlarge
- i3.8xlarge
- i3.16xlarge
- r3.xlarge
- r3.2xlarge
- r3.4xlarge
- r3.8xlarge
- r4.large
- r4.xlarge
- r4.2xlarge
- r4.4xlarge
- r4.8xlarge
- r4.16xlarge
- x1.16xlarge
- x1.32xlarge
- p2.xlarge
- p2.8xlarge
- p2.16xlarge
- p3.2xlarge
- p3.8xlarge
- p3.16xlarge
ConstraintDescription: Must be a valid EC2 instance type
NodeAutoScalingGroupMinSize:
Type: Number
Description: Minimum size of Node Group ASG.
Default: 1
NodeAutoScalingGroupMaxSize:
Type: Number
Description: Maximum size of Node Group ASG.
Default: 3
ClusterDesiredCapacity:
Type: Number
Description: Desired size of Node Group ASG.
Default: 2
NodeVolumeSize:
Type: Number
Description: Node volume size
Default: 20
ClusterName:
Description: The cluster name provided when the cluster was created. If it is incorrect, nodes will not be able to join the cluster.
Type: String
BootstrapArguments:
Description: Arguments to pass to the bootstrap script. See files/bootstrap.sh in https://github.com/awslabs/amazon-eks-ami
Default: ""
Type: String
NodeGroupName:
Description: Unique identifier for the Node Group.
Type: String
ClusterControlPlaneSecurityGroup:
Description: The security group of the cluster control plane.
Type: AWS::EC2::SecurityGroup::Id
VpcId:
Description: The VPC of the worker instances
Type: AWS::EC2::VPC::Id
Subnets:
Description: The subnets where workers can be created.
Type: List<AWS::EC2::Subnet::Id>
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
-
Label:
default: "EKS Cluster"
Parameters:
- ClusterName
- ClusterControlPlaneSecurityGroup
-
Label:
default: "Worker Node Configuration"
Parameters:
- NodeGroupName
- NodeAutoScalingGroupMinSize
- NodeAutoScalingGroupMaxSize
- NodeInstanceType
- NodeImageId
- NodeVolumeSize
- KeyName
- BootstrapArguments
-
Label:
default: "Worker Network Configuration"
Parameters:
- VpcId
- Subnets
Resources:
NodeInstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: "/"
Roles:
- !Ref NodeInstanceRole
NodeInstanceRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- ec2.amazonaws.com
Action:
- sts:AssumeRole
Path: "/"
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
NodeSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group for all nodes in the cluster
VpcId:
!Ref VpcId
Tags:
- Key: !Sub "kubernetes.io/cluster/${ClusterName}"
Value: 'owned'
NodeSecurityGroupIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow node to communicate with each other
GroupId: !Ref NodeSecurityGroup
SourceSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: '-1'
FromPort: 0
ToPort: 65535
NodeSecurityGroupFromControlPlaneIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow worker Kubelets and pods to receive communication from the cluster control plane
GroupId: !Ref NodeSecurityGroup
SourceSecurityGroupId: !Ref ClusterControlPlaneSecurityGroup
IpProtocol: tcp
FromPort: 1025
ToPort: 65535
ControlPlaneEgressToNodeSecurityGroup:
Type: AWS::EC2::SecurityGroupEgress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow the cluster control plane to communicate with worker Kubelet and pods
GroupId: !Ref ClusterControlPlaneSecurityGroup
DestinationSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: tcp
FromPort: 1025
ToPort: 65535
NodeSecurityGroupFromControlPlaneOn443Ingress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow pods running extension API servers on port 443 to receive communication from cluster control plane
GroupId: !Ref NodeSecurityGroup
SourceSecurityGroupId: !Ref ClusterControlPlaneSecurityGroup
IpProtocol: tcp
FromPort: 443
ToPort: 443
ControlPlaneEgressToNodeSecurityGroupOn443:
Type: AWS::EC2::SecurityGroupEgress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow the cluster control plane to communicate with pods running extension API servers on port 443
GroupId: !Ref ClusterControlPlaneSecurityGroup
DestinationSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: tcp
FromPort: 443
ToPort: 443
ClusterControlPlaneSecurityGroupIngress:
Type: AWS::EC2::SecurityGroupIngress
DependsOn: NodeSecurityGroup
Properties:
Description: Allow pods to communicate with the cluster API Server
GroupId: !Ref ClusterControlPlaneSecurityGroup
SourceSecurityGroupId: !Ref NodeSecurityGroup
IpProtocol: tcp
ToPort: 443
FromPort: 443
NodeGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
DesiredCapacity: !Ref ClusterDesiredCapacity
LaunchConfigurationName: !Ref NodeLaunchConfig
MinSize: !Ref NodeAutoScalingGroupMinSize
MaxSize: !Ref NodeAutoScalingGroupMaxSize
VPCZoneIdentifier:
!Ref Subnets
Tags:
- Key: Name
Value: !Sub "${ClusterName}-${NodeGroupName}-Node"
PropagateAtLaunch: 'true'
- Key: !Sub 'kubernetes.io/cluster/${ClusterName}'
Value: 'owned'
PropagateAtLaunch: 'true'
UpdatePolicy:
AutoScalingRollingUpdate:
MinInstancesInService: '1'
MaxBatchSize: '1'
NodeLaunchConfig:
Type: AWS::AutoScaling::LaunchConfiguration
Properties:
AssociatePublicIpAddress: 'true'
IamInstanceProfile: !Ref NodeInstanceProfile
ImageId: !Ref NodeImageId
InstanceType: !Ref NodeInstanceType
KeyName: !Ref KeyName
SecurityGroups:
- !Ref NodeSecurityGroup
BlockDeviceMappings:
- DeviceName: /dev/xvda
Ebs:
VolumeSize: !Ref NodeVolumeSize
VolumeType: gp2
DeleteOnTermination: true
UserData:
Fn::Base64:
!Sub |
#!/bin/bash
set -o xtrace
/etc/eks/bootstrap.sh ${ClusterName} ${BootstrapArguments}
/opt/aws/bin/cfn-signal --exit-code $? \
--stack ${AWS::StackName} \
--resource NodeGroup \
--region ${AWS::Region}
Outputs:
NodeInstanceRole:
Description: The node instance role
Value: !GetAtt NodeInstanceRole.Arn