Getting an Interactive Shell on ECS/Fargate

AppPack is built on Elastic Container Service (ECS), specifically the Fargate variant. Fargate is awesome because you don't have to deal with the underlying servers. It just runs containers for you and that's it.

One big drawback of the initial release of Fargate was that you had no way to login to your system to debug issues. This got fixed last year with the introduction of Amazon ECS Exec. The introductory blog post does a good job of introducing the concept and how to use it. The high-level is:

  • When you start the task, you add the enableExecuteCommand argument. This will start an additional agent process in the container which allows you to connect to it.
  • You install the AWS CLI and Session Manager plugin.
  • You run aws ecs execute-command to connect.

Pretty cool, right? In practice, it gets harder.

Finding the Task

The aws ecs execute-command command takes a few arguments. It needs a task ID/ARN and a cluster name/ARN. If you want to connect to a task that you haven't started yourself (e.g. ECS services, scheduled tasks, etc.), you'll have to lookup the task ID yourself. To lookup the task ID, you'll need to know the cluster your task is running on and the family of your task:

$ aws ecs list-tasks \
--cluster=your-cluster-name \
--family=your-task-family \
--query=taskArns
[
"arn:aws:ecs:us-east-1:123456789012:task/your-cluster-name/0a1ceb4746f842dbba092b9bf5dd49d6",
"arn:aws:ecs:us-east-1:123456789012:task/your-cluster-name/15e8068b6a464ba394bbb83f6ee2c1c3",
"arn:aws:ecs:us-east-1:123456789012:task/your-cluster-name/4474e12e93494cd985aab61f9632eca4"
]

Connecting to the Task

Now that you have a task ID you want you can plug it into your shell command (if you are running multiple containers in a task, you'll need to specify --container too):

$ aws ecs execute-command \
--cluster=your-cluster-name \
--interactive \
--task=0a1ceb4746f842dbba092b9bf5dd49d6 \
--command=/bin/bash

The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.


Starting session with SessionId: ecs-execute-command-0a044175262f7d9cd
root@ip-10-100-2-8:~#

...and you're in!

Using Ephemeral Containers

This is great, you can connect to containers running in a private network without exposing any ports or modifying your container in any way. But is opening a shell on your production service containers really what you want to do? One of the benefits of containers is that they are cheap to spin up/tear down. If you want a shell to do anything other than inspect the live running service, you should be running an isolated container.

Naively, you can do this by starting a task that runs something like tail -f /dev/null or sleep inifity, then connect to it and finally stop the task, but you're almost certain to end up with orphaned tasks because a user forgot to stop the task or they shutdown their computer with an open task or any number of other reasons.

Want to take the easy route?

AppPack makes getting an interactive shell for your app in AWS as easy as `apppack -a myapp shell`

screenshot of terminal running apppack shell

What you really want is for the task to only run as long as the user is connected to it and probably a failsafe that kills any ephemeral task after a specified number of hours to avoid racking up AWS charges.

We needed to implement this in AppPack in a way that would be accessible no matter what buildpack(s) the user was using. So when you run apppack -a myapp shell, the CLI will start a task using this devious command:

[
"/bin/sh",
"-c",
"STOP=$(($(date +%s)+43200)); sleep 60; while true; do EXECCMD=\"$(pgrep -f ssm-session-worker\\ ecs-execute-command | wc -l)\"; test \"$EXECCMD\" -eq 0 && exit; test \"$STOP\" -lt \"$(date +%s)\" && exit 1; sleep 30; done"
]

It looks slightly less terrifying if we split it up and comment it:

STOP=$(($(date +%s)+43200))  # sets variable with current timestamp +12h
sleep 60 # wait 60s for initial Session Manager connection
while true; do # loop forever
EXECCMD="$(pgrep -f ssm-session-worker\\ ecs-execute-command | wc -l)" # sets a variable of how many session worker processes are running
test \"$EXECCMD\" -eq 0 && exit # if no session workers are running stop
test "$STOP" -lt "$(date +%s)" && exit 1 # if it has been running for >12h stop
sleep 30 # sleep for 30 seconds
done # end of loop

Now this task will essentially self-destruct when the user disconnects.

Fin

As with most things on AWS, the building blocks are there for you to do really advanced things, but the last mile is up to you. They are optimized for flexibility, not usability. I've glossed over the command for starting a task, but it's a doozy (includes subnets, security groups, network mode, logging options, IAM roles, and tags among other things), not to mention all the things you need to do to get to the point where you have the resources setup so you can even do that.

We built AppPack to be the missing developer experience on AWS. With AppPack, you don't have to think about any of this, just run apppack -a myapp shell and you've got an ephemeral container to debug your application.