While we were checking out competitors for an opportunity with one of our clients, we came across this feature at cycloid.io to generate Terraform state files for existing infra on public cloud infrastructure (Azure, AWS, GCP) or VMWare. Was surprised to see that they had open-sourced the core of this functionality with TerraCognita!
Since I have worked with Terraform before, I was keen to see how this product would reverse engineer the state files for existing infra.
Using TerraCognita is straightforward. First, you need to install TerraCognita.
curl -L https://github.com/cycloidio/terracognita/releases/latest/download/terracognita-linux-amd64.tar.gz -o terracognita-linux-amd64.tar.gz
tar -xf terracognita-linux-amd64.tar.gz
chmod u+x terracognita-linux-amd64
sudo mv terracognita-linux-amd64 /usr/local/bin/terracognita
If you’re macOS user and using Homebrew, you can install via brew command:
brew install terracognita
Once you have done the installation, you can use the terracognita command-line interface to generate Terraform configuration files for your infrastructure
terracognita [TERRAFORM_PROVIDER] [--flags]
The documentation isn’t too great online. Since our client was primarily using Azure, I went ahead with exploring for Azure.
Below is a synopsis of the flags that are supported:
–client-id string Client ID (required)
–client-secret string Client Secret (required)
–environment string Environment (default “public”)
–resource-group-name strings Resource Group Names (required)
–subscription-id string Subscription ID (required)
-t, –tags strings List of tags to filter with format ‘NAME:VALUE’
–tenant-id string Tenant ID (required)
Global Flags:
-d, –debug Activate the debug mode which includes TF logs via TF_LOG=TRACE|DEBUG|INFO|WARN|ERROR configuration https://www.terraform.io/docs/internals/debugging.html
-e, –exclude strings List of resources to not import, this names are the ones on TF (ex: aws_instance). If not set then means that none the resources will be excluded
–hcl string HCL output file or directory. If it’s a directory it’ll be emptied before importing
–hcl-provider-block Generate or not the ‘provider {}’ block for the imported provider (default true)
-i, –include strings List of resources to import, this names are the ones on TF (ex: aws_instance). If not set then means that all the resources will be imported
–interpolate Activate the interpolation for the HCL and the dependencies building for the State file (default true)
–log-file string Write the logs with -v to this destination (default “/Users/kenrickvaz/Library/Caches/terracognita/terracognita.log”)
–module string Generates the output in module format into the directory specified. With this flag (–module) the –hcl is ignored and will be generated inside of the module
–module-variables string Path to a file containing the list of attributes to use as variables when building the module. The format is a JSON/YAML, more information on https://github.com/cycloidio/terracognita#modules
–target strings List of resources to import via ID, those IDs are the ones documented on Terraform that are needed to Import. The format is ‘aws_instance.ID’
–tfstate string TFState output file
-v, –verbose Activate the verbose mode
Terracognita has a powerful feature that allows it to directly generate Terraform modules during the import process. To utilize this feature, you can use the –module {module/path/name} flag, where you specify the desired path for the module to be generated. This path can either be an existing directory or a non-existent path that will be created.
It’s important to note that when generating a module, the existing content of the specified path will be deleted (after user confirmation) to ensure a clean import and organization of the generated resources.
Our client had everything on the cloud mapped to app codes and environment. So my query below took that into consideration, you may tweak it based on your need
terracognita azurerm \
--client-id <client_id> \
--client-secret <client_secret> \
--resource-group-name <resource_group> \
--subscription-id <subscription_id> \
--tenant-id <tenant_id> \
--hcl ./azure \
--module ./output/<env>/<appcode> \
--tags appcode:<appcode> \
--include azurerm_key_vault,azurerm_mssql_database,azurerm_mssql_server,azurerm_mssql_virtual_machine,azurerm_public_ip,azurerm_storage_account,azurerm_kubernetes_cluster,azurerm_container_registry,azurerm_resource_group,azurerm_monitor_action_group
Your output will be something like this

Validation
To confirm the accuracy of the infrastructure snapshot, navigate to the “terracognita” directory and execute terraform init followed by terraform plan. If the auto-generated code accurately represent your existing Azure resources, Terraform should not detect any changes. Depending on your specific setup and requirements, some manual code adjustments may still be necessary.
Limitations
- The tool lacks comprehensive documentation, making it challenging for users to understand its functionalities thoroughly
- The exporting process can be time-consuming, especially when dealing with policies
- The auto-generated Terraform code may lack accuracy, requiring additional manual adjustments
Conclusion
Using Infrastructure as Code (IaC) is now a widely recognized best practice. It’s a great idea to translate your infrastructure into Terraform, and there are tools to help with that. But remember, these tools have limitations. No tool is perfect; they all have constraints and challenges. While these tools can be useful, it’s up to the software engineers to make sure the move to Infrastructure as Code is successful and accurate.
