Terraform best practices - what I wish I'd learnt quicker

Terraform has a changed over the years (it's HCL syntax changed a lot from the early days breaking backward compatibility) and there's a few Terraform concepts which, for me, are worth holding front-and-centre to reduce pain when working with the tool.

If you have existing cloud resources, approach terraform differently


Terraform really wants to not only have a complete graph of resources- Terraform expects to have been the tool that originally created all resources in your cloud provider from the onset. The 'ideal' Terraform project one where you start from a empty cloud account to having all resources created by Terraform. This is the essence of infrastructure as code after-all.

You are very much going against the grain if you're trying to model existing resources which were created by hand, and attempt to 'import' them into Terraforms' view of the world.

Terraform's support for importing resources has always felt finicky, and for good reasons Terraform removed the replace command from it's cli. You may also be interested in terraform import command to import existing resources. Even there though, this does not import all existing configuration- again, consider the prospect of rec-recreating the account resources from scratch programmatically.

Re-creating in a blank account may sound drastic but you may find yourself in a situation with hundreds (if not thousands) of resources which have been created over the years (most of which costing money) which are misconfigured, and not under version control anyway.

A pragmatic first step may be to  

  • create a git repo
  • setup your terraform remote state
  • Plan the cloud resources genuinely needed
  • Write & commit your terraform HCL to your git repo, starting with a few key required resources
  • perform your terraform plan and ultimate terraform apply

There may be existing resources you don't have permission to see (`iam`)

A big gotcha to be aware of with with Terraform, and potential danger is visibility based on the authorisation level your cloud credentials (IAM) have. If the cloud account (e.g. GKE, AWS) keys you're using to connect to Terraform does not have all the access you need to administer the account, you may be performing a terraform plan / terraform apply with incomplete access to existing resources which can result in confusion.

Again, the above largely can occur if you're doing a cloud migration with a mix of existing cloud resources not managed by Terraform and others which are.

Too much nesting with foreach and modules

It's tempting to get lost in the flexibility of Terraform modules, looping, nested modules- especially if you're modelling multiple availability zones / regions for failover. The reality is that it's still tricky to achieve those goals, and I suspect they are talked/blogged about more than practised. Instead, at least start with a long-form non-nested, no modules, long HCL script which describes your infrastructure. This goes against our DRY programming principles I know, but especially if you're new to terraform- don't abstract away too early.

Treating Terraform like a programming language

Terraform HCL is not a programming language, it's a configuration language. Don't make my mistake in treating Terraform HCL as a programming language, you'll 'configure' yourself into a corner. The distinction is important- Terraform HCL is not a programming language, it's purpose in life is "to safely and predictably provision and manage infrastructure" .

If you allow arbitrary programming, you loose the ability to predictably re-create infrastructure.  Yet, the pull is so strong to program infrastructure rather than declare it, nascent tools like Pulumi exist where you can, if you so wish, curl the weather next week and provision different resources depending on that.

The KISS principle (keep it simple) saves you a lot pain. Tim Peters was right "Flat is better than nested"- even if that was a poem about a different language.