Skip to main content
Version: Next

AWS EMR

Description

Terraform module which creates EMR on AWS

Specification

Properties

NameDescriptionTypeRequiredDefault
additional_infoA JSON string for selecting additional features such as adding proxy information. Note: Currently there is no API to retrieve the value of this argument after EMR cluster creation from provider, therefore Terraform cannot detect drift from the actual EMR cluster if its value is changed outside Terraformstringfalse
additional_master_security_groupThe name of the existing additional security group that will be used for EMR master node. If empty, a new security group will be createdstringfalse
additional_slave_security_groupThe name of the existing additional security group that will be used for EMR core & task nodes. If empty, a new security group will be createdstringfalse
applicationsA list of applications for the cluster. Valid values are: Flink, Ganglia, Hadoop, HBase, HCatalog, Hive, Hue, JupyterHub, Livy, Mahout, MXNet, Oozie, Phoenix, Pig, Presto, Spark, Sqoop, TensorFlow, Tez, Zeppelin, and ZooKeeper (as of EMR 5.25.0). Case insensitivelist(string)true
bootstrap_actionList of bootstrap actions that will be run before Hadoop is started on the cluster nodeslist(object({\n path = string\n name = string\n args = list(string)\n }))false
configurations_jsonA JSON string for supplying list of configurations for the EMR cluster. See https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html for more detailsstringfalse
core_instance_group_autoscaling_policyString containing the EMR Auto Scaling Policy JSON for the Core instance groupstringfalse
core_instance_group_bid_priceBid price for each EC2 instance in the Core instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instancesstringfalse
core_instance_group_ebs_iopsThe number of I/O operations per second (IOPS) that the Core volume supportsnumberfalse
core_instance_group_ebs_sizeCore instances volume size, in gibibytes (GiB)numbertrue
core_instance_group_ebs_typeCore instances volume type. Valid options are gp2, io1, standard and st1stringfalse
core_instance_group_ebs_volumes_per_instanceThe number of EBS volumes with this configuration to attach to each EC2 instance in the Core instance groupnumberfalse
core_instance_group_instance_countTarget number of instances for the Core instance group. Must be at least 1numberfalse
core_instance_group_instance_typeEC2 instance type for all instances in the Core instance groupstringtrue
create_task_instance_groupWhether to create an instance group for Task nodes. For more info: https://www.terraform.io/docs/providers/aws/r/emr_instance_group.html, https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.htmlboolfalse
create_vpc_endpoint_s3Set to false to prevent the module from creating VPC S3 Endpointboolfalse
custom_ami_idA custom Amazon Linux AMI for the cluster (instead of an EMR-owned AMI). Available in Amazon EMR version 5.7.0 and laterstringfalse
ebs_root_volume_sizeSize in GiB of the EBS root device volume of the Linux AMI that is used for each EC2 instance. Available in Amazon EMR version 4.x and laternumberfalse
ec2_autoscaling_role_enabledIf set to false, will use existing_ec2_autoscaling_role_arn for an existing EC2 autoscaling IAM role that was created outside of this moduleboolfalse
ec2_autoscaling_role_permissions_boundaryThe Permissions Boundary ARN to apply to the EC2 Autoscaling Role.stringfalse
ec2_role_enabledIf set to false, will use existing_ec2_instance_profile_arn for an existing EC2 IAM role that was created outside of this moduleboolfalse
ec2_role_permissions_boundaryThe Permissions Boundary ARN to apply to the EC2 Role.stringfalse
emr_role_permissions_boundaryThe Permissions Boundary ARN to apply to the EMR Role.stringfalse
existing_ec2_autoscaling_role_arnARN of an existing EC2 autoscaling role to attach to the clusterstringfalse
existing_ec2_instance_profile_arnARN of an existing EC2 instance profilestringfalse
existing_service_role_arnARN of an existing EMR service role to attach to the clusterstringfalse
keep_job_flow_alive_when_no_stepsSwitch on/off run cluster with no steps or when all steps are completeboolfalse
kerberos_ad_domain_join_passwordThe Active Directory password for ad_domain_join_user. Terraform cannot perform drift detection of this configuration.stringfalse
kerberos_ad_domain_join_userRequired only when establishing a cross-realm trust with an Active Directory domain. A user with sufficient privileges to join resources to the domain. Terraform cannot perform drift detection of this configuration.stringfalse
kerberos_cross_realm_trust_principal_passwordRequired only when establishing a cross-realm trust with a KDC in a different realm. The cross-realm principal password, which must be identical across realms. Terraform cannot perform drift detection of this configuration.stringfalse
kerberos_enabledSet to true if EMR cluster will use kerberos_attributesboolfalse
kerberos_kdc_admin_passwordThe password used within the cluster for the kadmin service on the cluster-dedicated KDC, which maintains Kerberos principals, password policies, and keytabs for the cluster. Terraform cannot perform drift detection of this configuration.stringfalse
kerberos_realmThe name of the Kerberos realm to which all nodes in a cluster belong. For example, EC2.INTERNALstringfalse
key_nameAmazon EC2 key pair that can be used to ssh to the master node as the user called hadoopstringfalse
log_uriThe path to the Amazon S3 location where logs for this cluster are storedstringfalse
managed_master_security_groupThe name of the existing managed security group that will be used for EMR master node. If empty, a new security group will be createdstringfalse
managed_slave_security_groupThe name of the existing managed security group that will be used for EMR core & task nodes. If empty, a new security group will be createdstringfalse
master_allowed_cidr_blocksList of CIDR blocks to be allowed to access the master instanceslist(string)false
master_allowed_security_groupsList of security groups to be allowed to connect to the master instanceslist(string)false
master_dns_nameName of the cluster CNAME record to create in the parent DNS zone specified by zone_id. If left empty, the name will be auto-asigned using the format emr-master-var.namestringfalse
master_instance_group_bid_priceBid price for each EC2 instance in the Master instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instancesstringfalse
master_instance_group_ebs_iopsThe number of I/O operations per second (IOPS) that the Master volume supportsnumberfalse
master_instance_group_ebs_sizeMaster instances volume size, in gibibytes (GiB)numbertrue
master_instance_group_ebs_typeMaster instances volume type. Valid options are gp2, io1, standard and st1stringfalse
master_instance_group_ebs_volumes_per_instanceThe number of EBS volumes with this configuration to attach to each EC2 instance in the Master instance groupnumberfalse
master_instance_group_instance_countTarget number of instances for the Master instance group. Must be at least 1numberfalse
master_instance_group_instance_typeEC2 instance type for all instances in the Master instance groupstringtrue
regionAWS regionstringtrue
release_labelThe release label for the Amazon EMR release. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-5x.htmlstringfalse
route_table_idRoute table ID for the VPC S3 Endpoint when launching the EMR cluster in a private subnet. Required when subnet_type is privatestringfalse
scale_down_behaviorThe way that individual Amazon EC2 instances terminate when an automatic scale-in activity occurs or an instance group is resizedstringfalse
security_configurationThe security configuration name to attach to the EMR cluster. Only valid for EMR clusters with release_label 4.8.0 or greater. See https://www.terraform.io/docs/providers/aws/r/emr_security_configuration.html for more infostringfalse
service_access_security_groupThe name of the existing additional security group that will be used for EMR core & task nodes. If empty, a new security group will be createdstringfalse
service_role_enabledIf set to false, will use existing_service_role_arn for an existing IAM role that was created outside of this moduleboolfalse
slave_allowed_cidr_blocksList of CIDR blocks to be allowed to access the slave instanceslist(string)false
slave_allowed_security_groupsList of security groups to be allowed to connect to the slave instanceslist(string)false
step_concurrency_levelThe number of steps that can be executed concurrently. You can specify a maximum of 256 steps. Only valid for EMR clusters with release_label 5.28.0 or greater.numberfalse
stepsList of steps to run when creating the cluster.list(object({\n name = string\n action_on_failure = string\n hadoop_jar_step = object({\n args = list(string)\n jar = string\n main_class = string\n properties = map(string)\n })\n }))false
subnet_idVPC subnet ID where you want the job flow to launch. Cannot specify the cc1.4xlarge instance type for nodes of a job flow launched in a Amazon VPCstringtrue
subnet_typeType of VPC subnet ID where you want the job flow to launch. Supported values are private or publicstringfalse
task_instance_group_autoscaling_policyString containing the EMR Auto Scaling Policy JSON for the Task instance groupstringfalse
task_instance_group_bid_priceBid price for each EC2 instance in the Task instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instancesstringfalse
task_instance_group_ebs_iopsThe number of I/O operations per second (IOPS) that the Task volume supportsnumberfalse
task_instance_group_ebs_optimizedIndicates whether an Amazon EBS volume in the Task instance group is EBS-optimized. Changing this forces a new resource to be createdboolfalse
task_instance_group_ebs_sizeTask instances volume size, in gibibytes (GiB)numberfalse
task_instance_group_ebs_typeTask instances volume type. Valid options are gp2, io1, standard and st1stringfalse
task_instance_group_ebs_volumes_per_instanceThe number of EBS volumes with this configuration to attach to each EC2 instance in the Task instance groupnumberfalse
task_instance_group_instance_countTarget number of instances for the Task instance group. Must be at least 1numberfalse
task_instance_group_instance_typeEC2 instance type for all instances in the Task instance groupstringfalse
termination_protectionSwitch on/off termination protection (default is false, except when using multiple master nodes). Before attempting to destroy the resource when termination protection is enabled, this configuration must be applied with its value set to falseboolfalse
use_existing_additional_master_security_groupIf set to true, will use variable additional_master_security_group using an existing security group that was created outside of this moduleboolfalse
use_existing_additional_slave_security_groupIf set to true, will use variable additional_slave_security_group using an existing security group that was created outside of this moduleboolfalse
use_existing_managed_master_security_groupIf set to true, will use variable managed_master_security_group using an existing security group that was created outside of this moduleboolfalse
use_existing_managed_slave_security_groupIf set to true, will use variable managed_slave_security_group using an existing security group that was created outside of this moduleboolfalse
use_existing_service_access_security_groupIf set to true, will use variable service_access_security_group using an existing security group that was created outside of this moduleboolfalse
visible_to_all_usersWhether the job flow is visible to all IAM users of the AWS account associated with the job flowboolfalse
vpc_idVPC ID to create the cluster in (e.g. vpc-a22222ee)stringtrue
writeConnectionSecretToRefThe secret which the cloud resource connection will be written towriteConnectionSecretToReffalse
zone_idRoute53 parent zone ID. If provided (not empty), the module will create sub-domain DNS records for the masters and slavesstringfalse

writeConnectionSecretToRef

NameDescriptionTypeRequiredDefault
nameThe secret name which the cloud resource connection will be written tostringtrue
namespaceThe secret namespace which the cloud resource connection will be written tostringfalse