Amazon DynamoDB

Amazon DynamoDB is a fully managed NoSQL database service that provides fast and lowlatency performance that scales with ease. Amazon DynamoDB lets you offload the
administrative burdens of operating a distributed NoSQL database and focus on the
application. Amazon DynamoDB significantly simplifies the hardware provisioning, setup and
configuration, replication, software patching, and cluster scaling of NoSQL databases.

Amazon DynamoDB is designed to simplify database and cluster management, provide
consistently high levels of performance, simplify scalability tasks, and improve reliability
with automatic replication. Developers can create a table in Amazon DynamoDB and write an
unlimited number of items with consistent latency.

Amazon DynamoDB can provide consistent performance levels by automatically distributing
the data and traffic for a table over multiple partitions. After you configure a certain read or
write capacity, Amazon DynamoDB will automatically add enough infrastructure capacity to
support the requested throughput levels. As your demand changes over time, you can adjust
the read or write capacity after a table has been created, and Amazon DynamoDB will add or
remove infrastructure and adjust the internal partitioning accordingly.

To help maintain consistent, fast performance levels, all table data is stored on highperformance
SSD disk drives. Performance metrics, including transactions rates, can be monitored using Amazon CloudWatch. In addition to providing high-performance levels, Amazon DynamoDB also provides automatic high-availability and durability protections by replicating data across multiple Availability Zones within an AWS Region.

Data Model

The basic components of the Amazon DynamoDB data model include tables, items, and
attributes. As depicted in Figure 7.3, a table is a collection of items and each item is a
collection of one or more attributes. Each item also has a primary key that uniquely identifies
the item.

FIGURE 7.3 Table, items, attributes relationship
In a relational database, a table has a predefined schema such as the table name, primary key,
list of its column names, and their data types. All records stored in the table must have the
same set of columns. In contrast, Amazon DynamoDB only requires that a table have a
primary key, but it does not require you to define all of the attribute names and data types in
advance. Individual items in an Amazon DynamoDB table can have any number of attributes,
although there is a limit of 400KB on the item size.

Each attribute in an item is a name/value pair. An attribute can be a single-valued or multivalued
set. For example, a book item can have title and authors attributes. Each book has one
title but can have many authors. The multi-valued attribute is a set; duplicate values are not
allowed. Data is stored in Amazon DynamoDB in key/value pairs such as the following:

{
Id = 101
ProductName = “Book 101 Title”
ISBN = “123–1234567890”
Authors = [ “Author 1”, “Author 2” ]
Price = 2.88
Dimensions = “8.5 x 11.0 x 0.5”
PageCount = 500
InPublication = 1
ProductCategory = “Book”
}

Applications can connect to the Amazon DynamoDB service endpoint and submit requests
over HTTP/S to read and write items to a table or even to create and delete tables. DynamoDB
provides a web service API that accepts requests in JSON format. While you could program
directly against the web service API endpoints, most developers choose to use the AWS
Software Development Kit (SDK) to interact with their items and tables. The AWS SDK is
available in many different languages and provides a simplified, high-level programming
interface.

Data Types

Amazon DynamoDB gives you a lot of flexibility with your database schema. Unlike a
traditional relational database that requires you to define your column types ahead of time,
DynamoDB only requires a primary key attribute. Each item that is added to the table can
then add additional attributes. This gives you flexibility over time to expand your schema
without having to rebuild the entire table and deal with record version differences with
application logic.

When you create a table or a secondary index, you must specify the names and data types of
each primary key attribute (partition key and sort key). Amazon DynamoDB supports a wide
range of data types for attributes. Data types fall into three major categories: Scalar, Set, or
Document.

Scalar Data Types A scalar type represents exactly one value. Amazon DynamoDB supports
the following five scalar types:

String Text and variable length characters up to 400KB. Supports Unicode with UTF8
encoding

Number Positive or negative number with up to 38 digits of precision

Binary Binary data, images, compressed objects up to 400KB in size

Boolean Binary flag representing a true or false value

Null Represents a blank, empty, or unknown state. String, Number, Binary, Boolean
cannot be empty.

Set Data Types Sets are useful to represent a unique list of one or more scalar values. Each
value in a set needs to be unique and must be the same data type. Sets do not guarantee
order. Amazon DynamoDB supports three set types: String Set, Number Set, and Binary Set.

String Set Unique list of String attributes

Number Set Unique list of Number attributes

Binary Set Unique list of Binary attributes

Document Data Types Document type is useful to represent multiple nested attributes,
similar to the structure of a JSON file. Amazon DynamoDB supports two document types:
List and Map. Multiple Lists and Maps can be combined and nested to create complex
structures.

List Each List can be used to store an ordered list of attributes of different data types.

Map Each Map can be used to store an unordered list of key/value pairs. Maps can be

used to represent the structure of any JSON object.

Primary Key

When you create a table, you must specify the primary key of the table in addition to the table
name. Like a relational database, the primary key uniquely identifies each item in the table. A
primary key will point to exactly one item. Amazon DynamoDB supports two types of primary
keys, and this configuration cannot be changed after a table has been created:

Partition Key The primary key is made of one attribute, a partition (or hash) key. Amazon
DynamoDB builds an unordered hash index on this primary key attribute.

Partition and Sort Key The primary key is made of two attributes. The first attribute is the
partition key and the second one is the sort (or range) key. Each item in the table is uniquely
identified by the combination of its partition and sort key values. It is possible for two items
to have the same partition key value, but those two items must have different sort key values.

Furthermore, each primary key attribute must be defined as type string, number, or binary.
Amazon DynamoDB uses the partition key to distribute the request to the right partition.

If you are performing many reads or writes per second on the same primary key,
you will not be able to fully use the compute capacity of the Amazon DynamoDB cluster.
A best practice is to maximize your throughput by distributing requests across the full
range of partition keys.

Provisioned Capacity

When you create an Amazon DynamoDB table, you are required to provision a certain
amount of read and write capacity to handle your expected workloads. Based on your
configuration settings, DynamoDB will then provision the right amount of infrastructure
capacity to meet your requirements with sustained, low-latency response times. Overall
capacity is measured in read and write capacity units. These values can later be scaled up or
down by using an UpdateTable action.

Each operation against an Amazon DynamoDB table will consume some of the provisioned
capacity units. The specific amount of capacity units consumed depends largely on the size of
the item, but also on other factors. For read operations, the amount of capacity consumed
also depends on the read consistency selected in the request. Read more about eventual and
strong consistency later in this chapter.

For example, given a table without a local secondary index, you will consume 1 capacity unit
if you read an item that is 4KB or smaller. Similarly, for write operations you will consume 1
capacity unit if you write an item that is 1KB or smaller. This means that if you read an item
that is 110KB, you will consume 28 capacity units, or 110 / 4 = 27.5 rounded up to 28. For
read operations that are strongly consistent, they will use twice the number of capacity units,
or 56 in this example.

You can use Amazon CloudWatch to monitor your Amazon DynamoDB capacity and make
scaling decisions. There is a rich set of metrics, including ConsumedReadCapacityUnits and
ConsumedWriteCapacityUnits. If you do exceed your provisioned capacity for a period of time,
requests will be throttled and can be retried later. You can monitor and alert on the
ThrottledRequests metric using Amazon CloudWatch to notify you of changing usage
patterns.

Secondary Indexes

When you create a table with a partition and sort key (formerly known as a hash and range
key), you can optionally define one or more secondary indexes on that table. A secondary
index lets you query the data in the table using an alternate key, in addition to queries against
the primary key. Amazon DynamoDB supports two different kinds of indexes:

Global Secondary Index The global secondary index is an index with a partition and sort
key that can be different from those on the table. You can create or delete a global secondary
index on a table at any time.

Local Secondary Index The local secondary index is an index that has the same partition
key attribute as the primary key of the table, but a different sort key. You can only create a
local secondary index when you create a table.

Secondary indexes allow you to search a large table efficiently and avoid an expensive scan
operation to find items with specific attributes. These indexes allow you to support different
query access patterns and use cases beyond what is possible with only a primary key. While a
table can only have one local secondary index, you can have multiple global secondary
indexes.

Amazon DynamoDB updates each secondary index when an item is modified. These updates
consume write capacity units. For a local secondary index, item updates will consume write
capacity units from the main table, while global secondary indexes maintain their own
provisioned throughput settings separate from the table.

Writing and Reading Data

After you create a table with a primary key and indexes, you can begin writing and reading
items to the table. Amazon DynamoDB provides multiple operations that let you create,
update, and delete individual items. Amazon DynamoDB also provides multiple querying
options that let you search a table or an index or retrieve back a specific item or a batch of
items.

Writing Items

Amazon DynamoDB provides three primary API actions to create, update, and delete items:
PutItem, UpdateItem, and DeleteItem. Using the PutItem action, you can create a new item
with one or more attributes. Calls to PutItem will update an existing item if the primary key
already exists. PutItem only requires a table name and a primary key; any additional
attributes are optional.

The UpdateItem action will find existing items based on the primary key and replace the
attributes. This operation can be useful to only update a single attribute and leave the other
attributes unchanged. UpdateItem can also be used to create items if they don’t already exist.
Finally, you can remove an item from a table by using DeleteItem and specifying a specific
primary key.

The UpdateItem action also provides support for atomic counters. Atomic counters allow you
to increment and decrement a value and are guaranteed to be consistent across multiple
concurrent requests. For example, a counter attribute used to track the overall score of a
mobile game can be updated by many clients at the same time.

These three actions also support conditional expressions that allow you to perform validation
before an action is applied. For example, you can apply a conditional expression on PutItem
that checks that certain conditions are met before the item is created. This can be useful to
prevent accidental overwrites or to enforce some type of business logic checks.

Reading Items

After an item has been created, it can be retrieved through a direct lookup by calling the
GetItem action or through a search using the Query or Scan action. GetItem allows you to
retrieve an item based on its primary key. All of the item’s attributes are returned by default,
and you have the option to select individual attributes to filter down the results.

If a primary key is composed of a partition key, the entire partition key needs to be specified
to retrieve the item. If the primary key is a composite of a partition key and a sort key,
GetItem will require both the partition and sort key as well. Each call to GetItem consumes
read capacity units based on the size of the item and the consistency option selected.

By default, a GetItem operation performs an eventually consistent read. You can optionally
request a strongly consistent read instead; this will consume additional read capacity units,
but it will return the most up-to-date version of the item.

Eventual Consistency

When reading items from Amazon DynamoDB, the operation can be either eventually
consistent or strongly consistent. Amazon DynamoDB is a distributed system that stores
multiple copies of an item across an AWS Region to provide high availability and increased
durability. When an item is updated in Amazon DynamoDB, it starts replicating across
multiple servers. Because Amazon DynamoDB is a distributed system, the replication can
take some time to complete. Because of this we refer to the data as being eventually
consistent, meaning that a read request immediately after a write operation might not show
the latest change. In some cases, the application needs to guarantee that the data is the latest
and Amazon DynamoDB offers an option for strongly consistent reads.

Eventually Consistent Reads When you read data, the response might not reflect the
results of a recently completed write operation. The response might include some stale data.
Consistency across all copies of the data is usually reached within a second; if you repeat your
read request after a short time, the response returns the latest data.

Strongly Consistent Reads When you issue a strongly consistent read request, Amazon
DynamoDB returns a response with the most up-to-date data that reflects updates by all prior
related write operations to which Amazon DynamoDB returned a successful response. A
strongly consistent read might be less available in the case of a network delay or outage. You
can request a strongly consistent read result by specifying optional parameters in your
request.

Batch Operations

Amazon DynamoDB also provides several operations designed for working with large batches
of items, including BatchGetItem and BatchWriteItem. Using the BatchWriteItem action, you
can perform up to 25 item creates or updates with a single operation. This allows you to
minimize the overhead of each individual call when processing large numbers of items.

Searching Items

Amazon DynamoDB also gives you two operations, Query and Scan, that can be used to
search a table or an index. A Query operation is the primary search operation you can use to
find items in a table or a secondary index using only primary key attribute values. Each Query
requires a partition key attribute name and a distinct value to search. You can optionally
provide a sort key value and use a comparison operator to refine the search results. Results
are automatically sorted by the primary key and are limited to 1MB.

In contrast to a Query, a Scan operation will read every item in a table or a secondary index.
By default, a Scan operation returns all of the data attributes for every item in the table or
index. Each request can return up to 1MB of data. Items can be filtered out using expressions,
but this can be a resource-intensive operation. If the result set for a Query or a Scan exceeds
1MB, you can page through the results in 1MB increments.

For most operations, performing a Query operation instead of a Scan operation
will be the most efficient option. Performing a Scan operation will result in a full scan of
the entire table or secondary index, then it filters out values to provide the desired result.
Use a Query operation when possible and avoid a Scan on a large table or index for only a
small number of items.

Scaling and Partitioning

Amazon DynamoDB is a fully managed service that abstracts away most of the complexity
involved in building and scaling a NoSQL cluster. You can create tables that can scale up to
hold a virtually unlimited number of items with consistent low-latency performance. An
Amazon DynamoDB table can scale horizontally through the use of partitions to meet the
storage and performance requirements of your application. Each individual partition
represents a unit of compute and storage capacity. A well-designed application will take the
partition structure of a table into account to distribute read and write transactions evenly and
achieve high transaction rates at low latencies.

Amazon DynamoDB stores items for a single table across multiple partitions, as represented
in Figure 7.4. Amazon DynamoDB decides which partition to store the item in based on the
partition key. The partition key is used to distribute the new item among all of the available
partitions, and items with the same partition key will be stored on the same partition.

FIGURE 7.4 Table partitioning

As the number of items in a table grows, additional partitions can be added by splitting an
existing partition. The provisioned throughput configured for a table is also divided evenly
among the partitions. Provisioned throughput allocated to a partition is entirely dedicated to
that partition, and there is no sharing of provisioned throughput across partitions.

When a table is created, Amazon DynamoDB configures the table’s partitions based on the
desired read and write capacity. One single partition can hold about 10GB of data and
supports a maximum of 3,000 read capacity units or 1,000 write capacity units. For partitions
that are not fully using their provisioned capacity, Amazon DynamoDB provides some burst
capacity to handle spikes in traffic. A portion of your unused capacity will be reserved to
handle bursts for short periods.

As storage or capacity requirements change, Amazon DynamoDB can split a partition to
accommodate more data or higher provisioned request rates. After a partition is split,
however, it cannot be merged back together. Keep this in mind when planning to increase
provisioned capacity temporarily and then lower it again. With each additional partition
added, its share of the provisioned capacity is reduced.

To achieve the full amount of request throughput provisioned for a table, keep your workload
spread evenly across the partition key values. Distributing requests across partition key
values distributes the requests across partitions. For example, if a table has 10,000 read
capacity units configured but all of the traffic is hitting one partition key, you will not be able
to get more than the 3,000 maximum read capacity units that one partition can support.

To maximize Amazon DynamoDB throughput, create tables with a partition key
that has a large number of distinct values and ensure that the values are requested fairly
uniformly. Adding a random element that can be calculated or hashed is one common
technique to improve partition distribution.

Security

Amazon DynamoDB gives you granular control over the access rights and permissions for
users and administrators. Amazon DynamoDB integrates with the IAM service to provide
strong control over permissions using policies. You can create one or more policies that allow
or deny specific operations on specific tables. You can also use conditions to restrict access to
individual items or attributes.

All operations must first be authenticated as a valid user or user session. Applications that
need to read and write from Amazon DynamoDB need to obtain a set of temporary or
permanent access control keys. While these keys could be stored in a configuration file, a best
practice is for applications running on AWS to use IAM Amazon EC2 instance profiles to
manage credentials. IAM Amazon EC2 instance profiles or roles allow you to avoid storing
sensitive keys in configuration files that must then be secured.

For mobile applications, a best practice is to use a combination of web identity
federation with the AWS Security Token Service (AWS STS) to issue temporary keys that
expire after a short period.

Amazon DynamoDB also provides support for fine-grained access control that can restrict
access to specific items within a table or even specific attributes within an item. For example,
you may want to limit a user to only access his or her items within a table and prevent access
to items associated with a different user. Using conditions in an IAM policy allows you to
restrict which actions a user can perform, on which tables, and to which attributes a user can
read or write.

Amazon DynamoDB Streams

A common requirement for many applications is to keep track of recent changes and then
perform some kind of processing on the changed records. Amazon DynamoDB Streams
makes it easy to get a list of item modifications for the last 24-hour period. For example, you
might need to calculate metrics on a rolling basis and update a dashboard, or maybe
synchronize two tables or log activity and changes to an audit trail. With Amazon DynamoDB
Streams, these types of applications become easier to build.

Amazon DynamoDB Streams allows you to extend application functionality without
modifying the original application. By reading the log of activity changes from the stream,
you can build new integrations or support new reporting requirements that weren’t part of
the original design.

Each item change is buffered in a time-ordered sequence or stream that can be read by other
applications. Changes are logged to the stream in near real-time and allow you to respond
quickly or chain together a sequence of events based on a modification.

Streams can be enabled or disabled for an Amazon DynamoDB table using the AWS
Management Console, Command Line Interface (CLI), or SDK. A stream consists of stream
records. Each stream record represents a single data modification in the Amazon DynamoDB
table to which the stream belongs. Each stream record is assigned a sequence number,
reflecting the order in which the record was published to the stream.

Stream records are organized into groups, also referred to as shards. Each shard acts as a
container for multiple stream records and contains information on accessing and iterating
through the records. Shards live for a maximum of 24 hours and, with fluctuating load levels,
could be split one or more times before they are eventually closed.

To build an application that reads from a shard, it is recommended to use the
Amazon DynamoDB Streams Kinesis Adapter. The Kinesis Client Library (KCL)
simplifies the application logic required to process reading records from streams and
shards