A Taste Of Cloudformation.


Carving with clouds.

So, we’ve re­cently had cause to move one of our in­ternal ap­plic­a­tions to the cloud; which has largely been an ex­cuse for me to get some ex­per­i­ence in some re­l­at­ively modern op­er­a­tions tech­no­lo­gies. Amazon’s Cloud­Form­a­tion is de­signed so that you can de­clar­at­ively spe­cify the in­fra­struc­ture re­sources (eg: vir­tual ma­chines, load bal­an­cers, con­tainer con­fig­ur­a­tion, &c) that your ap­plic­a­tion needs, and apply changes re­l­at­ively atom­ic­ally.

For someone who’s spent a goodly deal of their ca­reer dealing with phys­ical ma­chines that live in a data­center (so-c­alled ‘Pets’), this is some­thing of a rev­el­a­tion. For a start, be­cause everything is spe­cified in terms of tex­tual con­fig­ur­a­tion (usu­ally JSON, but we’ll get to that); it can be checked into ver­sion con­trol. This might not seem like a big deal, but it cer­tainly hugely more ef­fective than having to manage a fleet of hard­ware, with all of the cabling, in­vent­ory­ing, and fid­dling that in­volves.

What this buys you prin­cip­ally is lever­age. For ex­ample, it’s common to have sta­ging en­vir­on­ments for val­id­ating ap­plic­a­tions on be­fore they’re de­ployed into pro­duc­tion, and one issue with that is that it can be dif­fi­cult to main­tain parity between those sta­ging, and your pro­duc­tion en­vir­on­ments. For one, it’s re­l­at­ively common to have a sim­pli­fied net­work to­po­logy in sta­ging (eg: skip­ping fire­walls, &c) to save on cost; but that can bite you say, when you in­tro­duce new com­mu­nic­a­tion path­ways between ser­vices. If you haven’t told net­working that ser­vice A needs to talk to ser­vice B, then you may sud­denly find that what works on stage will mys­ter­i­ously fail in pro­duc­tion. By being able to spin up an exact copy of your run­ning in­fra­struc­ture on de­mand, you min­imise the risk of any un­pleasant sur­prises.

It’s quite common to have dif­ferent teams re­spons­ible for dif­ferent parts of the stack­–eg: at one place I worked, we had a hard­ware team who would fly out to far-­flung data­cen­ters, and de­ploy and manage the phys­ical kit and net­work­ing, wheras the sysadmin team would manage the op­er­ating sys­tems and everything up. So, in a sense, each team presents an ab­strac­tion; the hard­ware team presents run­ning boxes, and the sysad­mins manage and mon­itor, as well as providing means to boot­strap a new de­ploy­ment.

So, in the same way, we end up with a split of ap­plic­a­tions in the cloud. Sys­tems like Kuber­netes, Rancher, &c provide a plat­form, as­suming that all you care about is providing some code ar­ti­facts and having them run; and that the un­der­lying in­fra­struc­ture can be ab­stracted away, and wor­ried about by someone else.

Cloud­Form­a­tion can provide this; but by ne­ces­sity ends up providing a lower level of ab­strac­tion. So, it’s prob­ably fairer to say that you can build a plat­form on top of Cloud­Form­a­tion’s ser­vices; but you’ll likely need to provide your own ab­strac­tion on top for this to be work­able long term.

So, wanting to avoid working with the Cloud­Form­a­tion JSON syntax dir­ectly, Tom found a lib­rary named Tro­po­sphere, which al­lows you to ex­press con­fig­ur­a­tions as py­thon code, and provides some de­gree of con­fig­ur­a­tion lint­ing, too. Ex­pressing the con­fig­ur­a­tion as an em­bedded DSL in py­thon al­lows you to take ad­vantage of the struc­turing fea­tures of the host lan­guage, so in py­thon, you can use classes to rep­resent sub­-­group­ings of re­sources (eg: a cluster con­fig­ur­a­tion for ECS), and those can be con­sumed by say, an ap­plic­a­tion in­stance without it needing to worry about how that con­tainer system was con­figured.

You can rep­licate this in plain Cloud­Form­a­tion, by having a number of sep­arate stacks; and im­porting ref­er­ences from them; but you’ll end up needing to spe­cify the names of each stack some­how. You could po­ten­tially end up Green­spun­ning a module system from scratch, though, using these im­ported ref­er­ences and string con­cat­en­a­tion, so using a lan­guage that provides mod­u­larity feels more nat­ural.

However, be­cause the py­thon eco­system provides a pack­aging mech­an­ism, it’s en­tirely pos­sible to pub­lish a py­thon module providing say, 90% of what you need for a typ­ical ap­plic­a­tion stack, and have the con­suming ap­plic­a­tion in­ject ref­er­ences to source trees or de­ploy­able ar­ti­facts, and provide a de­ploy­able con­fig­ur­a­tion. It’s not quite as con­venient as a fully plumbed solu­tion like Kuber­netes since you’ll have to in­teg­rate that with your build / de­ploy mech­an­isms, but it’s still a pretty good solu­tion when you need the flex­ib­il­ity.

So; when would I want to use Cloud­Form­a­tion, vs. some­thing higher level? Well, as men­tioned above, the prin­ciple use of this is going to be building in­fra­struc­ture for others to con­sume, or for more com­plex pro­jects where you need re­l­at­ively finely grained con­trol over say, data­base usage or place­ment for real-­time ser­vices and batch jobs.

Con­versely though, be­cause Cloud­Form­a­tion provides re­l­at­ively thin ab­strac­tions; this can po­ten­tially make it easier to debug or trace faults than sys­tems like Kuber­netes if you don’t have an ex­pert team man­aging it, simply be­cause the more fine-­grained con­trol can make it easier to trace what pro­cess is run­ning on which server; even if the servers them­selves are eph­em­eral, the roles they serve may not be.