My blog

View on GitHub
20 October 2019

Adventures in the Terraform DSL, Part IX: Data sources

by Alex Harvey

Part IX of my blog series on the Terraform DSL, where I look at data sources.


This post is about Terraform data sources, also known as data resources, a feature that was introduced in Terraform 0.7 in May 2016 written by Martin Atkins. In here, I look at the history, motivation and usage of this important feature.


A data source a.k.a. data resource looks and behaves much like an ordinary resource, but presents a read-only view of dynamic data that comes from outside of Terraform.

The data that is made available in this way is called fetched data because it is “fetched” during the refresh stage of the Terraform lifecycle. It should not be confused with computed data that is generated by resources like the random_id that we saw earlier in this series. Computed data is created by Terraform during the apply stage of the lifecycle.

The first data source: terraform_remote_state

Before I get to data sources, I want to distinguish them from the logical resources that they grew out of. To do that I’ll look at the very first data source that was added in Terraform, the terraform_remote_state resource. Here is the example from the Terraform 0.6 docs:

resource "terraform_remote_state" "vpc" {
  backend = "atlas"
  config {
    path = "hashicorp/vpc-prod"

resource "aws_instance" "foo" {
  // ...
  subnet_id = "${terraform_remote_state.vpc.output.subnet_id}"

This was a logical resource, a resource that “contributes to Terraform state but does not manage an external resource”. Implemented as a logical resource, it was less clear to the reader that, in this case, remote state was a source of data rather than a managed resource.

In Terraform 0.7, this resource was changed to be the first data source. Instead of the resource declaration, it became this:

data "terraform_remote_state" "vpc" {
  backend = "atlas"
  config {
    name = "hashicorp/vpc-prod"

resource "aws_instance" "foo" {
  // ...
  subnet_id = data.terraform_remote_state.vpc.subnet_id // 0.12 syntax here.

The key differences in usage are:

I found it helpful to study the actual commit that changed this first data source from a logical resource. It makes it clearer that under the hood, a data source really is just a special resource that is read-only. The data source still returns under the hood a Resource schema:

func dataSourceRemoteState() *schema.Resource {
  return &schema.Resource{
    Read: dataSourceRemoteStateRead,

    Schema: map[string]*schema.Schema{
      "backend": &schema.Schema{
        Type:     schema.TypeString,
        Required: true,

      "config": &schema.Schema{
        Type:     schema.TypeMap,
        Optional: true,

      "output": &schema.Schema{
        Type:     schema.TypeMap,
        Computed: true,

Data source examples

Let’s look more at how they’re actually used. One common use of the data sources is to fetch an AWS AMI via the aws_ami data source. Here is how you can fetch an Amazon Linux 2 AMI ID:

data "aws_ami" "amazon_linux_2" {
  most_recent = "true"

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-ebs"]

  owners = ["amazon"]

The structure of this declaration feels familiar to users of the AWS CLI. I apply that:

▶ terraform apply -auto-approve
data.aws_ami.amazon_linux_2: Refreshing state...

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.


ami_id = ami-0804dc420cb24c62b

For AWS users, it is useful to convert some of the AWS data source Terraform declarations into AWS CLI:

▶ aws ec2 describe-images --filters "Name=name,Values=amzn2-ami-hvm-*-x86_64-ebs" \
    --owners amazon --query 'reverse(sort_by(Images, &CreationDate))[0].ImageId'

Very similar, which is not surprising considering that Terraform and AWS CLI are calling the same AWS API of course.

Data sources docs

The next thing to know about data sources is how to find the docs. The docs contain a complete list of all data sources for each provider, for each AWS service, etc, and all of their attributes are documented.

The data sources are generally defined in the providers. For AWS, start at the AWS Provider page:

Screenshot 1

Then go down and click on one of the AWS services e.g. ACM:

Screenshot 2

And from there all the data sources for that AWS service can be seen, in this case the aws_acm_certificate data source that can return the ARN of a certificate in AWS Certificate Manager (ACM).

Local-only data sources: template_file

Another kind of data source is the “local-only data source”, a data source that fetches data from the local machine that is running Terraform, rather than the Cloud or network. I make a special mention of the commonly used template_file data source. It is actually deprecated now in favour of the templatefile() function - and I will discuss this more in the next part of my series which will be on Terraform’s template language - but it is still common to see templates declared like this in Terraform:

data "template_file" "user_data" {
  template = file("${path.module}/template/")
  vars = {
    foo =
    bar =

That template can then be referenced as:

resource "aws_instance" "web" {
  ami           = "ami-0804dc420cb24c62b"
  instance_type = "t2.micro"
  user_data     = data.template_file.user_data.rendered

I am slightly disappointed that this is deprecated because, to me, this is cleaner! More on that later.


Well that is the end of this shorter-than-usual post on data sources a.k.a. data resources. So far, data sources is one of my favourite Terraform features and they do provide a clean way of getting dynamic data from the AWS Cloud and other places. We have seen that they are really just a special kind of resource, distinguished mostly for readability by the data declaration, and that these export fetched data only and no computed data.

In Part X I will look at Terraform’s template language and related template functions some more.

See also

tags: terraform