Use Athena To Process Data From DynamoDB

Have you ever thought that it would be so good to have SQL-like queries in DynamoDB?

I worked in companies that have always used Microsoft's SQL Server as database, and it's so easy for me to do things like COUNT, AVG, among others to calculate metrics and other indicators. However, now that I've been using Dynamo for some projects, I miss that.

This week, I found Athena (not the Greek Goddess), a tool provided by Amazon to query big data information stored in S3 using standard SQL queries. To use it, you don't need to configure any infrastructure, it runs serverless and executes the queries directly into the datasource in S3. It supports a variety of data formats like CSV, JSON, among others (today, we will be doing some examples using JSON).

But then, how do I use it to query from Dynamo? well, Athena still does not support queries directly into Dynamo, but we can mimic this functionality by using AWS Glue.

AWS Glue, is another tool that allows developers to create ETL jobs that can perform many tasks, and it's completely integrated with Athena. AWS Glue uses something called Crawlers that create schemas from the datasources that are analyzed, so for example, creating a crawler from a dynamo table, will enumerate all the columns that the table can have and a possible type of the data it contains.

So combining everything, we can do the following:

  1. Create a crawler that reads the Dynamo table
  2. Create a Glue job that reads the information and stores it in an S3 bucket
  3. Create a new crawler that reads the S3 bucket
  4. Use Athena to query information using the crawler created in the previous step.
  5. Expose information using API Gateway and Lambda functions

Here you have a visual representation of those steps:

And here is an example of the queries that you can create:

SELECT a.openCount, b.openWACount, c.solvedYTD
   SELECT COUNT(*) AS openCount
   FROM caseview_metrics
   WHERE status='open'
) a
   SELECT COUNT(*) AS openWACount
   FROM caseview_metrics
   WHERE status='open' AND CARDINALITY(abductorsids) >= 1
) b
   SELECT COUNT(*) as solvedYTD
   FROM caseview_metrics
   WHERE status='close' AND closeddate BETWEEN 1514764800000 AND 1546300799000
) c

In this example, you can see functions as COUNT or CARDINALITY, expressions like BETWEEN or comparisons, and finally you can also apply JOIN clauses like the CROSS JOIN.


AWS Athena is a tool that you can add to your toolkit, and if you are using Dynamo, it can enhance greatly the experience for the users, as well as facilitate development. 

If you have any comment, don't hesitate in contacting me or leaving a comment below. And remember to follow me on twitter to get updated on every new post.