1.What kind of network performance parameters can you expect when you launch instances in cluster placement group?
The network performance depends on the instance type and network performance specification, if launched in a placement group you can expect up to
- 10 Gbps in a single-flow,
- 20 Gbps in multiflow i.e full duplex
- Network traffic outside the placement group will be limited to 5 Gbps(full duplex).
2.To deploy a 4 node cluster of Hadoop in AWS which instance type can be used?
First let’s understand what actually happens in a Hadoop cluster, the Hadoop cluster follows a master slave concept. The master machine processes all the data, slave machines store the data and act as data nodes. Since all the storage happens at the slave, a higher capacity hard disk would be recommended and since master does all the processing, a higher RAM and a much better CPU is required. Therefore, you can select the configuration of your machine depending on your workload. For e.g. – In this case c4.8xlarge will be preferred for master machine whereas for slave machine we can select i2.large instance. If you don’t want to deal with configuring your instance and installing hadoop cluster manually, you can straight away launch an Amazon EMR (Elastic Map Reduce) instance which automatically configures the servers for you. You dump your data to be processed in S3, EMR picks it from there, processes it, and dumps it back into S3.
3.Where do you think an AMI fits, when you are designing an architecture for a solution?
AMIs(Amazon Machine Images) are like templates of virtual machines and an instance is derived from an AMI. AWS offers pre-baked AMIs which you can choose while you are launching an instance, some AMIs are not free, therefore can be bought from the AWS Marketplace. You can also choose to create your own custom AMI which would help you save space on AWS. For example if you don’t need a set of software on your installation, you can customize your AMI to do that. This makes it cost efficient, since you are removing the unwanted things.
4.How do you choose an Availability Zone?
Let’s understand this through an example, consider there’s a company which has user base in India as well as in the US.
Let us see how we will choose the region for this use case :
So, with reference to the above figure the regions to choose between are, Mumbai and North Virginia. Now let us first compare the pricing, you have hourly prices, which can be converted to your per month figure. Here North Virginia emerges as a winner. But, pricing cannot be the only parameter to consider. Performance should also be kept in mind hence, let’s look at latency as well. Latency basically is the time that a server takes to respond to your requests i.e the response time. North Virginia wins again!
So concluding, North Virginia should be chosen for this use case.
5.Is one Elastic IP address enough for every instance that I have running?
Depends! Every instance comes with its own private and public address. The private address is associated exclusively with the instance and is returned to Amazon EC2 only when it is stopped or terminated. Similarly, the public address is associated exclusively with the instance until it is stopped or terminated. However, this can be replaced by the Elastic IP address, which stays with the instance as long as the user doesn’t manually detach it. But what if you are hosting multiple websites on your EC2 server, in that case you may require more than one Elastic IP address.
6.What are the best practices for Security in Amazon EC2?
There are several best practices to secure Amazon EC2. A few of them are given below:
- Use AWS Identity and Access Management (IAM) to control access to your AWS resources.
- Restrict access by only allowing trusted hosts or networks to access ports on your instance.
- Review the rules in your security groups regularly, and ensure that you apply the principle of least
- Privilege – only open up permissions that you require.
- Disable password-based logins for instances launched from your AMI. Passwords can be found or cracked, and are a security risk.
7.You need to configure an Amazon S3 bucket to serve static assets for your public-facing web application. Which method will ensure that all objects uploaded to the bucket are set to public read?
- Set permissions on the object to public read during upload.
- Configure the bucket policy to set all objects to public read.
- Use AWS Identity and Access Management roles to set the bucket to public read.
- Amazon S3 objects default to public read, so no action is needed.
Explanation: Rather than making changes to every object, its better to set the policy for the whole bucket. IAM is used to give more granular permissions, since this is a website, all objects would be public by default.
8.A customer wants to leverage Amazon Simple Storage Service (S3) and Amazon Glacier as part of their backup and archive infrastructure. The customer plans to use third-party software to support this integration. Which approach will limit the access of the third party software to only the Amazon S3 bucket named “company-backup”?
- A custom bucket policy limited to the Amazon S3 API in three Amazon Glacier archive “company-backup”
- A custom bucket policy limited to the Amazon S3 API in “company-backup”
- A custom IAM user policy limited to the Amazon S3 API for the Amazon Glacier archive “company-backup”.
- A custom IAM user policy limited to the Amazon S3 API in “company-backup”.
Explanation: Taking queue from the previous questions, this use case involves more granular permissions, hence IAM would be used here.
9.Can S3 be used with EC2 instances, if yes, how?
Yes, it can be used for instances with root devices backed by local instance storage. By using Amazon S3, developers have access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. In order to execute systems in the Amazon EC2 environment, developers use the tools provided to load their Amazon Machine Images (AMIs) into Amazon S3 and to move them between Amazon S3 and Amazon EC2.
Another use case could be for websites hosted on EC2 to load their static content from S3.
10.A customer implemented AWS Storage Gateway with a gateway-cached volume at their main office. An event takes the link between the main and branch office offline. Which methods will enable the branch office to access their data?
- Restore by implementing a lifecycle policy on the Amazon S3 bucket.
- Make an Amazon Glacier Restore API call to load the files into another Amazon S3 bucket within four to six hours.
- Launch a new AWS Storage Gateway instance AMI in Amazon EC2, and restore from a gateway snapshot.
- Create an Amazon EBS volume from a gateway snapshot, and mount it to an Amazon EC2 instance.
Explanation: The fastest way to do it would be launching a new storage gateway instance. Why? Since time is the key factor which drives every business, troubleshooting this problem will take more time. Rather than we can just restore the previous working state of the storage gateway on a new instance.
11.When you need to move data over long distances using the internet, for instance across countries or continents to your Amazon S3 bucket, which method or service will you use?
- Amazon Glacier
- Amazon CloudFront
- Amazon Transfer Acceleration
- Amazon Snowball
Explanation: You would not use Snowball, because for now, the snowball service does not support cross region data transfer, and since, we are transferring across countries, Snowball cannot be used. Transfer Acceleration shall be the right choice here as it throttles your data transfer with the use of optimized network paths and Amazon’s content delivery network upto 300% compared to normal data transfer speed.
12.How can you speed up data transfer in Snowball?
The data transfer can be increased in the following way:
- By performing multiple copy operations at one time i.e. if the workstation is powerful enough, you can initiate multiple cp commands each from different terminals, on the same Snowball device.
- Copying from multiple workstations to the same snowball.
- Transferring large files or by creating a batch of small file, this will reduce the encryption overhead.
- Eliminating unnecessary hops i.e. make a setup where the source machine(s) and the snowball are the only machines active on the switch being used, this can hugely improve performance.
13.If you want to launch Amazon Elastic Compute Cloud (EC2) instances and assign each instance a predetermined private IP address you should:
- Launch the instance from a private Amazon Machine Image (AMI).
- Assign a group of sequential Elastic IP address to the instances.
- Launch the instances in the Amazon Virtual Private Cloud (VPC).
- Launch the instances in a Placement Group.
Explanation: The best way of connecting to your cloud resources (for ex- ec2 instances) from your own data center (for eg- private cloud) is a VPC. Once you connect your datacenter to the VPC in which your instances are present, each instance is assigned a private IP address which can be accessed from your datacenter. Hence, you can access your public cloud resources, as if they were on your own network.
14.Can I connect my corporate datacenter to the Amazon Cloud?
Yes, you can do this by establishing a VPN(Virtual Private Network) connection between your company’s network and your VPC (Virtual Private Cloud), this will allow you to interact with your EC2 instances as if they were within your existing network.
15.Is it possible to change the private IP addresses of an EC2 while it is running/stopped in a VPC?
Primary private IP address is attached with the instance throughout its lifetime and cannot be changed, however secondary private addresses can be unassigned, assigned or moved between interfaces or instances at any point.
16.Why do you make subnets?
- Because there is a shortage of networks
- To efficiently utilize networks that have a large no. of hosts.
- Because there is a shortage of hosts.
- To efficiently utilize networks that have a small no. of hosts.
Explanation: If there is a network which has a large no. of hosts, managing all these hosts can be a tedious job. Therefore we divide this network into subnets (sub-networks) so that managing these hosts becomes simpler.
17.Which of the following is true?
- You can attach multiple route tables to a subnet
- You can attach multiple subnets to a route table
- Both A and B
- None of these.
Explanation: Route Tables are used to route network packets, therefore in a subnet having multiple route tables will lead to confusion as to where the packet has to go. Therefore, there is only one route table in a subnet, and since a route table can have any no. of records or information, hence attaching multiple subnets to a route table is possible.
18.In CloudFront what happens when content is NOT present at an Edge location and a request is made to it?
- An Error “404 not found” is returned
- CloudFront delivers the content directly from the origin server and stores it in the cache of the edge location
- The request is kept on hold till content is delivered to the edge location
- The request is routed to the next closest edge location
Explanation: CloudFront is a content delivery system, which caches data to the nearest edge location from the user, to reduce latency. If data is not present at an edge location, the first time the data may get transferred from the original server, but from the next time, it will be served from the cached edge.
19.If I’m using Amazon CloudFront, can I use Direct Connect to transfer objects from my own data center?
Yes. Amazon CloudFront supports custom origins including origins from outside of AWS. With AWS Direct Connect, you will be charged with the respective data transfer rates.
20.If my AWS Direct Connect fails, will I lose my connectivity?
If a backup AWS Direct connect has been configured, in the event of a failure it will switch over to the second one. It is recommended to enable Bidirectional Forwarding Detection (BFD) when configuring your connections to ensure faster detection and failover. On the other hand, if you have configured a backup IPsec VPN connection instead, all VPC traffic will failover to the backup VPN connection automatically. Traffic to/from public resources such as Amazon S3 will be routed over the Internet. If you do not have a backup AWS Direct Connect link or a IPsec VPN link, then Amazon VPC traffic will be dropped in the event of a failure
21.If I launch a standby RDS instance, will it be in the same Availability Zone as my primary?
- Only for Oracle RDS types
- Only if it is configured at launch
Explanation: No, since the purpose of having a standby instance is to avoid an infrastructure failure (if it happens), therefore the standby instance is stored in a different availability zone, which is a physically different independent infrastructure.
22.When would I prefer Provisioned IOPS over Standard RDS storage?
- If you have batch-oriented workloads
- If you use production online transaction processing (OLTP) workloads.
- If you have workloads that are not sensitive to consistent performance
- All of the above
Explanation: Provisioned IOPS deliver high IO rates but on the other hand it is expensive as well. Batch processing workloads do not require manual intervention they enable full utilization of systems, therefore a provisioned IOPS will be preferred for batch oriented workload.
23.How is Amazon RDS, DynamoDB and Redshift different?
- Amazon RDS is a database management service for relational databases, it manages patching, upgrading, backing up of data etc. of databases for you without your intervention. RDS is a Db management service for structured data only.
- DynamoDB, on the other hand, is a NoSQL database service, NoSQL deals with unstructured data.
- Redshift, is an entirely different service, it is a data warehouse product and is used in data analysis.
24.If I am running my DB Instance as a Multi-AZ deployment, can I use the standby DB Instance for read or write operations along with primary DB instance?
- Only with MySQL based RDS
- Only for Oracle RDS instances
Explanation: No, Standby DB instance cannot be used with primary DB instance in parallel, as the former is solely used for standby purposes, it cannot be used unless the primary instance goes down.
25.Your company’s branch offices are all over the world, they use a software with a multi-regional deployment on AWS, they use MySQL 5.6 for data persistence.
The task is to run an hourly batch process and read data from every region to compute cross-regional reports which will be distributed to all the branches. This should be done in the shortest time possible. How will you build the DB architecture in order to meet the requirements?
- For each regional deployment, use RDS MySQL with a master in the region and a read replica in the HQ region
- For each regional deployment, use MySQL on EC2 with a master in the region and send hourly EBS snapshots to the HQ region
- For each regional deployment, use RDS MySQL with a master in the region and send hourly RDS snapshots to the HQ region
- For each regional deployment, use MySQL on EC2 with a master in the region and use S3 to copy data files hourly to the HQ region
Explanation: For this we will take an RDS instance as a master, because it will manage our database for us and since we have to read from every region, we’ll put a read replica of this instance in every region where the data has to be read from. Option C is not correct since putting a read replica would be more efficient than putting a snapshot, a read replica can be promoted if needed to an independent DB instance, but with a Db snapshot it becomes mandatory to launch a separate DB Instance.
26.Can I run more than one DB instance for Amazon RDS for free?
Yes. You can run more than one Single-AZ Micro database instance, that too for free! However, any use exceeding 750 instance hours, across all Amazon RDS Single-AZ Micro DB instances, across all eligible database engines and regions, will be billed at standard Amazon RDS prices. For example: if you run two Single-AZ Micro DB instances for 400 hours each in a single month, you will accumulate 800 instance hours of usage, of which 750 hours will be free. You will be billed for the remaining 50 hours at the standard Amazon RDS price.
27.Which AWS services will you use to collect and process e-commerce data for near real-time analysis?
- Amazon ElastiCache
- Amazon DynamoDB
- Amazon Redshift
- Amazon Elastic MapReduce
Explanation: DynamoDB is a fully managed NoSQL database service. DynamoDB, therefore can be fed any type of unstructured data, which can be data from e-commerce websites as well, and later, an analysis can be done on them using Amazon Redshift. We are not using Elastic MapReduce, since a near real time analyses is needed.
28.Can I retrieve only a specific element of the data, if I have a nested JSON data in DynamoDB?
Yes. When using the GetItem, BatchGetItem, Query or Scan APIs, you can define a Projection Expression to determine which attributes should be retrieved from the table. Those attributes can include scalars, sets, or elements of a JSON document.
29.A company is deploying a new two-tier web application in AWS. The company has limited staff and requires high availability, and the application requires complex queries and table joins. Which configuration provides the solution for the company’s requirements?
- MySQL Installed on two Amazon EC2 Instances in a single Availability Zone
- Amazon RDS for MySQL with Multi-AZ
- Amazon ElastiCache
- Amazon DynamoDB
Explanation: DynamoDB has the ability to scale more than RDS or any other relational database service, therefore DynamoDB would be the apt choice.
30.What happens to my backups and DB Snapshots if I delete my DB Instance?
When you delete a DB instance, you have an option of creating a final DB snapshot, if you do that you can restore your database from that snapshot. RDS retains this user-created DB snapshot along with all other manually created DB snapshots after the instance is deleted, also automated backups are deleted and only manually created DB Snapshots are retained.
31.Which of the following use cases are suitable for Amazon DynamoDB? Choose 2 answers
- Managing web sessions.
- Storing JSON documents.
- Storing metadata for Amazon S3 objects.
- Running relational joins and complex updates.
Explanation: If all your JSON data have the same fields eg [id,name,age] then it would be better to store it in a relational database, the metadata on the other hand is unstructured, also running relational joins or complex updates would work on DynamoDB as well.
32.How can I load my data to Amazon Redshift from different data sources like Amazon RDS, Amazon DynamoDB and Amazon EC2?
You can load the data in the following two ways:
- You can use the COPY command to load data in parallel directly to Amazon Redshift from Amazon EMR, Amazon DynamoDB, or any SSH-enabled host.
- AWS Data Pipeline provides a high performance, reliable, fault tolerant solution to load data from a variety of AWS data sources. You can use AWS Data Pipeline to specify the data source, desired data transformations, and then execute a pre-written import script to load your data into Amazon Redshift.
33.Your application has to retrieve data from your user’s mobile every 5 minutes and the data is stored in DynamoDB, later every day at a particular time the data is extracted into S3 on a per user basis and then your application is later used to visualize the data to the user. You are asked to optimize the architecture of the backend system to lower cost, what would you recommend?
- Create a new Amazon DynamoDB (able each day and drop the one for the previous day after its data is on Amazon S3.
- Introduce an Amazon SQS queue to buffer writes to the Amazon DynamoDB table and reduce provisioned write throughput.
- Introduce Amazon Elasticache to cache reads from the Amazon DynamoDB table and reduce provisioned read throughput.
- Write data directly into an Amazon Redshift cluster replacing both Amazon DynamoDB and Amazon S3.
Explanation: Since our work requires the data to be extracted and analyzed, to optimize this process a person would use provisioned IO, but since it is expensive, using a ElastiCache memoryinsread to cache the results in the memory can reduce the provisioned read throughput and hence reduce cost without affecting the performance.
34.You are running a website on EC2 instances deployed across multiple Availability Zones with a Multi-AZ RDS MySQL Extra Large DB Instance. The site performs a high number of small reads and writes per second and relies on an eventual consistency model. After comprehensive tests you discover that there is read contention on RDS MySQL. Which are the best approaches to meet these requirements? (Choose 2 answers)
- Deploy ElastiCache in-memory cache running in each availability zone
- Implement sharding to distribute load to multiple RDS MySQL instances
- Increase the RDS MySQL Instance size and Implement provisioned IOPS
- Add an RDS MySQL read replica in each availability zone
Explanation: Since it does a lot of read writes, provisioned IO may become expensive. But we need high performance as well, therefore the data can be cached using ElastiCache which can be used for frequently reading the data. As for RDS since read contention is happening, the instance size should be increased and provisioned IO should be introduced to increase the performance.
35.A startup is running a pilot deployment of around 100 sensors to measure street noise and air quality in urban areas for 3 months. It was noted that every month around 4GB of sensor data is generated. The company uses a load balanced auto scaled layer of EC2 instances and a RDS database with 500 GB standard storage. The pilot was a success and now they want to deploy at least 100K sensors which need to be supported by the backend. You need to store the data for at least 2 years to analyze it. Which setup of the following would you prefer?
- Add an SQS queue to the ingestion layer to buffer writes to the RDS instance
- Ingest data into a DynamoDB table and move old data to a Redshift cluster
- Replace the RDS instance with a 6 node Redshift cluster with 96TB of storage
- Keep the current architecture but upgrade RDS storage to 3TB and 10K provisioned IOPS
Explanation: A Redshift cluster would be preferred because it easy to scale, also the work would be done in parallel through the nodes, therefore is perfect for a bigger workload like our use case. Since each month 4 GB of data is generated, therefore in 2 year, it should be around 96 GB. And since the servers will be increased to 100K in number, 96 GB will approximately become 96TB. Hence option C is the right answer.
36.Suppose you have an application where you have to render images and also do some general computing. From the following services which service will best fit your need?
- Classic Load Balancer
- Application Load Balancer
- Both of them
- None of these
Explanation: You will choose an application load balancer, since it supports path based routing, which means it can take decisions based on the URL, therefore if your task needs image rendering it will route it to a different instance, and for general computing it will route it to a different instance.
37.What is the difference between Scalability and Elasticity?
Scalability is the ability of a system to increase its hardware resources to handle the increase in demand. It can be done by increasing the hardware specifications or increasing the processing nodes.
Elasticity is the ability of a system to handle increase in the workload by adding additional hardware resources when the demand increases(same as scaling) but also rolling back the scaled resources, when the resources are no longer needed. This is particularly helpful in Cloud environments, where a pay per use model is followed.
38.How will you change the instance type for instances which are running in your application tier and are using Auto Scaling. Where will you change it from the following areas?
- Auto Scaling policy configuration
- Auto Scaling group
- Auto Scaling tags configuration
- Auto Scaling launch configuration
Explanation: Auto scaling tags configuration, is used to attach metadata to your instances, to change the instance type you have to use auto scaling launch configuration.