Use Athena To Process Data From DynamoDB

Have you ever thought that it would be so good to have SQL-like queries in DynamoDB?

I worked in companies that have always used Microsoft's SQL Server as database, and it's so easy for me to do things like COUNT, AVG, among others to calculate metrics and other indicators. However, now that I've been using Dynamo for some projects, I miss that.

This week, I found Athena (not the Greek Goddess), a tool provided by Amazon to query big data information stored in S3 using standard SQL queries. To use it, you don't need to configure any infrastructure, it runs serverless and executes the queries directly into the datasource in S3. It supports a variety of data formats like CSV, JSON, among others (today, we will be doing some examples using JSON).

But then, how do I use it to query from Dynamo? well, Athena still does not support queries directly into Dynamo, but we can mimic this functionality by using AWS Glue.

AWS Glue, is another tool that allows developers to create ETL jobs that can perform many tasks, and it's completely integrated with Athena. AWS Glue uses something called Crawlers that create schemas from the datasources that are analyzed, so for example, creating a crawler from a dynamo table, will enumerate all the columns that the table can have and a possible type of the data it contains.

So combining everything, we can do the following:

  1. Create a crawler that reads the Dynamo table
  2. Create a Glue job that reads the information and stores it in an S3 bucket
  3. Create a new crawler that reads the S3 bucket
  4. Use Athena to query information using the crawler created in the previous step.
  5. Expose information using API Gateway and Lambda functions

Here you have a visual representation of those steps:

And here is an example of the queries that you can create:

SELECT a.openCount, b.openWACount, c.solvedYTD
FROM (
   SELECT COUNT(*) AS openCount
   FROM caseview_metrics
   WHERE status='open'
) a
CROSS JOIN (
   SELECT COUNT(*) AS openWACount
   FROM caseview_metrics
   WHERE status='open' AND CARDINALITY(abductorsids) >= 1
) b
CROSS JOIN (
   SELECT COUNT(*) as solvedYTD
   FROM caseview_metrics
   WHERE status='close' AND closeddate BETWEEN 1514764800000 AND 1546300799000
) c

In this example, you can see functions as COUNT or CARDINALITY, expressions like BETWEEN or comparisons, and finally you can also apply JOIN clauses like the CROSS JOIN.

Summary

AWS Athena is a tool that you can add to your toolkit, and if you are using Dynamo, it can enhance greatly the experience for the users, as well as facilitate development. 

If you have any comment, don't hesitate in contacting me or leaving a comment below. And remember to follow me on twitter to get updated on every new post.

How to provide temporary access to S3 Buckets?

There are times when we store assets like images or videos in S3 buckets to be displayed in our websites. But what happens when we want to secure those assets so that only authenticated users can see them?

Well, there are many ways to provide security, one of the most common is used the "Referer" header but this can be spoofed, so we lose the security we wanted before. Another one is using Cloudfront and create signed URLs but that requires a lot of development work, and the last option was to use API Gateway to return binary data. After analyzing all this options, I determined that none of them provided the security we needed nor satisfied all of our use cases. Finally, I came up with another solution using a little bit of all the approaches mentioned before.

In order to provide security to the S3 folder, we are going to use signed URLs to provide temporary access to the bucket where the assets are hosted. To create the signed URLs we are using 2 lambda functions, the first one will be running under a IAM role that will create the signed URLs and the second one will be an authorizer function for the first one that will verify if the user making the request have the proper credentials. Here is a diagram of how the security flow for the S3 bucket works:

S3 Security Architecture

S3 Security Architecture

The first step to accomplish this is to remove the public policy that the bucket has, we want the bucket to be as closed as possible.

The second step will be to create a lambda function that will generate the signed URLs. For that, we need to create a lambda function called resolver and type the code provided below:

const AWS = require('aws-sdk');
 
exports.handler = (event, context, callback) => {
    AWS.config.update({
        region: "us-east-2"
    });
 
    const s3 = new AWS.S3({signatureVersion: 'v4', signatureCache: false});
    var key = event["queryStringParameters"]["key"];
    s3.getSignedUrl('getObject', {
        Bucket: "owi-trainer-assets",
        Key: key,
        Expires: 7200
    }, function(error, data){
        if(error) {
            context.done(error);
        }else{
            var response = {
                statusCode: 301,
                headers: {
                    "Location" : data
                },
                body: null
            };
            callback(null, response);
        }
    })
};

The getSignedUrl function from the SDK receives 3 parameters, the name of the operation that will be allowed from the URL created, an object containing the configuration (bucket, key of the object in the bucket and the expiration time in seconds), and lastly, the callback that will be executed once the URL is generated. As you can see, we are returning a code 301 in the response to force the client to redirect the request to the generated URL.

The third step is create an API Gateway endpoint that works as a proxy to the lambda function. The only important aspect here is to grab the ID of the API endpoint because we will need it for the next step. The ID can be obtained from the UI when the endpoint is created, in the next image, the text highlighted in yellow is the ID we need.

Gateway ID

Gateway ID

The fourth step is to create the validator lambda function that will verify that the client requesting an asset is a valid client. For that, we will follow the following steps.

  1. The validator function requires 2 NPM packages that not provided by default in the lambda ecosystem. So we will need to upload a zip file that contains all the necessary libraries.
  2. To accomplish that, create a folder named validator and navigate to it in a command window. In there, type "npm init" to create a package.json file and install these two components:
    1. aws-auth-policy: contains the AuthPolicy class that is required for a Gateway authorizer to perform actions.
    2. jsonwebtoken: this library is going to be used to validate the JWT tokens sent in the query string from the client.
  3. Inside of the validator folder created before, add an index.js file that will contain the logic to validate the tokens. The code will be provided below.
  4. Finally, create a lambda function named validator and upload the folder in a zip file.
var jwt = require('jsonwebtoken');
var AuthPolicy = require("aws-auth-policy");
 
exports.handler = (event, context) => {
    jwt.verify(event.queryStringParameters.token, "<SECRET TOKEN TO AUTHENTICATE JWT>",
    function(err, decoded){
        if(err) {
            console.log(err);
            context.fail("Unable to load encryption key");
        }
        else{
            console.log("Decoded: " + JSON.stringify(decoded));
 
            var policy = new AuthPolicy(decoded.sub, "<AWS-ACCOUNT-ID>", {
                region: "<REGION>",
                restApiId: "<API GATEWAY ID>",
                stage: "<STAGE>"
            });
            policy.allowMethod(AuthPolicy.HttpVerb.GET, "*");
 
            context.succeed(policy.build());
        }
    });
};

Finally, the fifth and last step is to add the authorizer in the API Gateway, for that, go to the Authorizers section in the Gateway you created and click on  "Create New Authorizer". Follow the details as follows:

Authorizer Configuration

Authorizer Configuration

As you can see, the token will be sent as part of the query string, other options are to send the token as a header or a stage variable.

If you have any comment, don't hesitate in contacting me or leaving a comment below. And remember to follow me on @cannyengineer to get updated on every new post.

Uploading files to AWS S3 Buckets

Well... I said that I was going to step away a little bit from AWS but seems like it wasn't that easy. Today, I'm writing about uploading files to S3 using a node.js backend service. 

This will be a short tutorial divided in two steps, first the front end development and then the backend work. so let's start.

Web site

I'm using an Angular application created with the Angular-cli and Bootstrap as the front-end framework to design the website, however, in this tutorial I'm not going to focus on how to setup all of this. For UI notifications, we are using ngx-toastr (if you don't know about it, look at my review here).

To create the file upload component and give some styles, I used the following code:

<div class="custom-file" style="width: auto;">
  <input type="file" class="custom-file-input" 
         accept="application/pdf"
         (change)="upload($event.target.files, fileInput)" 
         id="customPreReadFile" #fileInput/>
  <label class="custom-file-label" for="customPreReadFile">{{getFileName()}}</label>
</div>

As you can see, we are allowing only PDF files, but this restriction can be disabled or modified to meet your needs.

On the component code, we created two methods, the "upload" method called on the change event and "getFileName" to display an instruction text or the name of the file if one was already selected. The code for both methods is as follows:

upload(files: FileList, fileInput: any) {
  if(files[0].type.indexOf("pdf") === -1){
    this.toastr.error("The file selected is not a PDF.", "Error");
    fileInput.value = "";
    return;
  }
  this.toastr.info("Uploading file...");
  this.uploadService.uploadFile(files[0], this.identifier).subscribe(data => {
    this.toastr.success("File has been uploaded.", "Success");    
  });
}

getFileName(): string {
  var fileName = this.file ?
      this.file.name :
  'Upload File';
  return fileName;
}

The service method is the one that prepares the file to be sent to the node.js service as follows:

uploadFile(file: File, id: string): Observable<any> {
  const formData: FormData = new FormData();
  formData.append("file", file, '${id}/${file.name}');
  return this.httpClient.post(environment.apiEndPoint 
    + '/admin/upload/'
    + id, formData);
}

Node JS Service

Having configured all the required parts in the front end code, we need to adapt our Node JS service to receive the file. The service uses Express to configure the REST API, but we also use a package called formidable to process the form data sent from the Angular application easily. Similar to the Web Site section, I'm not focusing on how to setup the node service, but rather the exact code to process the file upload. 

Before digging into the code, I'll explain a little bit about what formidable does. In short, formidable parses the content of the form sent in the request and saves it to a local temporary location; from there, we can grab the file and do any logic we want with it.

The express endpoint code looks like this: 

 

var IncomingForm = require('formidable').IncomingForm;
var fs = require('fs');
router.post('/admin/upload/:id', function (req, res) {
    var id = req.params.id;
    var s3Uploader = new S3Uploader(req);
    var form = new IncomingForm();
    var fileName = "";
    var buffer = null;
    form.on('file', (field, file) => {
        fileName = file.name;
        buffer = fs.readFileSync(file.path);
    });
    form.on('end', () => {
        s3Uploader.uploadFile(fileName, buffer).then(fileData => {
          res.json({
            successful: true,
            fileData
          });
        }).catch(err => {
            console.log(err);
            res.sendStatus(500);
        });
    });
    form.parse(req);
});

Before moving to the next part to upload the file to S3, let's explain what we are doing here. After importing the necessary dependencies, inside of the request handler we are doing multiple things:

  1. Creating an instance of an "S3Uploader" helper to send the files to S3.
  2. Configuring the "IncomingForm" instance from formidable.
    1. Define an event handler when a file is processed by formidable that retrieves the file name and creates a buffer that we will send to the S3 service.
    2. Define an event handler when the form has been processed to call the upload file method in the S3 helper.
  3. Calling the parse method from Formidable to start the whole process.

The "S3Uploader" object has the following code:

var AWS = require('aws-sdk');
function S3Uploader(request) {
  var jwtToken = request ? request.headers.cognitoauthorization : null;
  let credentials = {
    IdentityPoolId: "<IDENTITY POOL ID>",
    Logins: {}
  };
  credentials.Logins['cognito-idp.<COGNITO REGION>.amazonaws.com/<USER POOL ID>'] = jwtToken;

  AWS.config.update({
    credentials: new AWS.CognitoIdentityCredentials(credentials, {
      region: "<COGNITO REGION>"
    }),
    region: "<S3 BUCKET REGION>"
  });

  let s3 = new AWS.S3();
  function uploadFile(key, file) {
    var s3Config = {
      Bucket: "<BUCKET NAME>",
      Key: key,
      Body: file
    };
    return new Promise((resolve, reject) => {
      s3.putObject(s3Config, (err, resp) => {
        if (err) {
          console.log(err);
          reject({success: false, data: err});
        }
        resolve({sucess: true, data: resp});
      })
    });
  }
}

If the first part about configuring the AWS SDK to use proper credentials, I invite you to read my post on how to manage credentials properly using Cognito or even an older post where I explain how to use Cognito and the Federated Identities to create users with roles that can access AWS resources. 

In short, what we are doing is, retrieving the authentication token generated by cognito when the user logs in so that we can configure the AWS SDK use the permissions from the user. 

After all that, we just need to instantiate an object to use the S3 APIs and send the data to the bucket.

If you have any comment, don't hesitate in contacting me or leaving a comment below. And remember to follow me on @cannyengineer to get updated on every new post.

Proper credentials management in AWS Cognito

A couple of days ago, I wrote how to combine Cognito, Angular and Node.js, and it was primarily focused on how to authenticate users using those technologies. However, a real application has way more use cases and features than just a simple authentication process. 

AWS provides many services such as databases, file storages, lambda expressions, among others. To connect to those services, Amazon provides at our disposal an SDK compressed in an NPM package called AWS-SDK (sounds logic, right?)

Today, I wanted to write how to use Cognito services to authorize users to use these services, instead of creating an IAM user and using the access keys and secret directly in the code; which is a bad practice, since you are actually uploading to your code the credentials to connect to your AWS account, so let's say that if another person finds them, it can do some pretty bad things to your account.

In order to apply security to an application using Cognito, we will need to configure the following resources:

  1. Cognito User pool
  2. Cognito Federated Identities
  3. Code in your Node.js service

The first step, is actually something I wrote two weeks ago, configuring the user pool was something we had to do in order to authenticate users, so for this post, I will skip that step but you can read all about it here. So, let's start from step 2 now.

Cognito Ferederated Identity

Objective: create a federated identity that will provide credentials to users to connect to the Amazon Web Services

On the Amazon console, go to the Security section and click on Cognito. On the screen that appears, click on "Manage Federated Identities" and then "Create new identity pool". Right after, the first step is to type the name of the identity pool. Also, mark the checkbox to allow unauthenticated identities.

Federated Identity - Step 1

Next, open the Authentication providers section and in the Cognito tab, type the user pool id and app client id created in the previous section.

Federated Identity - Step 2

After that, click on create pool and it will prompt you to create some roles to be used for the authenticated and unauthenticated users. Type the role name that suits best your needs and click on allow. We will update this roles later to match the policies that we need.

Click on Allow, and it will create the roles and redirect you to the dashboard screen for the identity pool.

Next, we will go IAM and modify the roles created to have the proper credentials to access our resources. So, on the Amazon console, go to the Security section and click on IAM. On the screen that appears, click Roles in the left pane and search for the roles that you want to update.

For the unatuhenticated users, the policy could reflect the following configuration:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "mobileanalytics:PutEvents",
                "cognito-sync:*"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:GetItem",
                "dynamodb:Scan",
                "dynamodb:Query"
            ],
            "Resource": "arn:aws:dynamodb:us-east-1:ACCOUNTID:table/TABLENAME"
        },
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:PutItem",
                "dynamodb:UpdateItem"
            ],
            "Resource": [
                "arn:aws:dynamodb:us-east-1:ACCOUNTID:table/TABLENAME",
            ]
        }
    ]
}

The policy above, is an example of how it should look like, remember that it will differ depending on how you need each role to behave, so it can have more or less privileges than this example.

For the authenticated users, the policy could reflect the following configuration:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "cognito-idp:ListUsersInGroup",
                "cognito-idp:DescribeUserPool",
                "cognito-idp:AdminEnableUser",
                "cognito-idp:SignUp",
                "cognito-idp:AdminDisableUser",
                "cognito-idp:ChangePassword",
                "cognito-idp:AdminAddUserToGroup",
                "cognito-idp:AdminUpdateUserAttributes",
                "cognito-idp:AdminConfirmSignUp"
            ],
            "Resource": "arn:aws:cognito-idp:us-east-2:ACCOUNTID:userpool/USERPOOLID"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "cognito-identity:*",
            "Resource": "arn:aws:cognito-identity:us-east-2:ACCOUNTID:identitypool/us-east-2:IDENTITYPOOLID"
        },
        {
            "Sid": "VisualEditor2",
            "Effect": "Allow",
            "Action": "cognito-sync:*",
            "Resource": [
                "arn:aws:cognito-sync:us-east-2:ACCOUNTID:identitypool/IDENTITYPOOLID",
                "arn:aws:cognito-sync:us-east-2:ACCOUNTID:identitypool/IDENTITYPOOLID/identity/*",
                "arn:aws:cognito-sync:*:*:identitypool/*/identity/*/dataset/*"
            ]
        },
        {
            "Sid": "VisualEditor3",
            "Effect": "Allow",
            "Action": [
                "dynamodb:PutItem",
                "dynamodb:DeleteItem",
                "dynamodb:GetItem",
                "dynamodb:Scan",
                "dynamodb:Query",
                "dynamodb:UpdateItem"
            ],
            "Resource": "arn:aws:dynamodb:us-east-1:ACCOUNTID:table/TABLENAME"
        },
        {
            "Sid": "VisualEditor4",
            "Effect": "Allow",
            "Action": [
                "dynamodb:GetItem",
                "dynamodb:Scan",
                "dynamodb:Query"
            ],
            "Resource": "arn:aws:dynamodb:us-east-1:ACCOUNTID:table/TABLENAME"
        }
    ]
}

The authenticated users will have the ability to manage cognito users and perform CRUD operations over the dynamo tables. I have created some custom pages to manage the cognito users, that's why I'm allowing authenticated users to have this policy. I encountered many problems letting only administrators to manage this users, so that's why I ended up creating those screens. For example, one problem is that when an admin creates a user, it stays in a status named FORCE_CHANGE_PASSWORD, and it means that the user should, somehow, change the password by using an AWS-SDK command, so I had to create a screen to do that. Also, resetting a password needed an AWS-SDK and other use cases that at the end, it makes so much sense just to create the pages, and stop fighting Amazon, :)

Code in Node.JS service

Objective: modify node.js service to use authentication based on Cognito tokens.

Lastly, this will the fastest step, the trainer application will already have this applied, but just to ensure, validate that the AWS SDK configuration is being done in the following way:

function awsDynamo(configuration) {
  var docClient;
  var credentials = {
    IdentityPoolId: configuration.identityPoolId,
    Logins: {}
  };
  if (configuration.cognitoJwt) {
    credentials.Logins[`cognito-idp.${configuration.cognitoRegion}.amazonaws.com/${configuration.userPoolId}`] = configuration.cognitoJwt;
  }
  AWS.config.update({
    credentials: new AWS.CognitoIdentityCredentials(credentials, {
      region: configuration.cognitoRegion
    }),
    region: configuration.dynamoRegion
  });
  docClient = new AWS.DynamoDB.DocumentClient();
  /* More code */
}

In this block of code, we are sending a configuration object that have the identity pool id, user pool id, the jwt token generated from cognito (when the user has been authenticated) and the region for the cognito identity pool and the dynamo tables.

After all this steps, the service should connect correctly, however, in 1 of 2 ocassions I had to do one more configuration. Apparently, in IAM we need to allow trusted connectivity between the user pool and the identity pool, this is something that I never found in any documentation provided by Amazon, so I had to dig in other places.

And to be honest, this is something that I need to give credits to a guy called Alex Hague. I found the solution here. In case the link is not working, this was his solution, and when applied, it worked perfectly.

To check that the role you have assigned in Cognito Identity Pools (Federated Identities), has a trust relationship with the identity pool. Get the identity pool ID + the name of the role that isn't working. To do this, follow the next steps:

  1. Go to Cognito
  2. Select Manage Federated Identities
  3. Select the identity pool
  4. Click Edit identity pool (top right)
  5. Make a note of the identity pool ID
  6. Make a note of the name of the role that isn't working (e.g. Cognito_blahUnauth_Role

In IAM, check the trust relationship for the role. Ensure that the StringEquals condition value matches the identity pool ID. To do this:

  1. Go to IAM
  2. Click Roles
  3. Click the name of the role that you noted previously
  4. Click Trust relationships
  5. On the right under Conditions, check the StringEquals condition contains the identity pool Id that you noted previously .

Finally, when you fixed the relationship of trust, the service should authenticate and authorize any user that works with the application.

Thanks for reading up to here, hope it works for you and don't hesitate on leaving any comment!

Using Cognito, Angular and Node.js together (3/3)

Application configuration

Objective: Configure a Node.js service using Express.js to authenticate Cognito tokens.

As prerequisites, we are one npm package, so please install the one listed below:

  • cognito-express [1]: Cognito-express authenticates API requests by verifying the Json Web Tokens signatures generated by Amazon Cognito.

Configure application back-end

To secure the back-end application, we need to setup a middle-ware method that will be applied to all routings where it's needed and a configuration file that returns variables depending on the environment.

The configuration file will return 3 variables, the user pool identifier, the application client identifier and the region where those 2 resources are located in AWS. The code for it looks like this:

module.exports = (function (env) {
    switch (env) {
        case 'prod':
            return {
                userPoolId: 'us-east-2_XYZXYZXYS',
                clientId: 'ABCDEFGHIJKLMNOPQ123',
                region: 'us-east-2'
            };
        case 'uat':
            return {
                userPoolId: 'us-east-2_XYZXYZXYS',
                clientId: 'ABCDEFGHIJKLMNOPQ123',
                region: 'us-east-2'
            };
        default:
            return {
                userPoolId: 'us-east-2_XYZXYZXYS',
                clientId: 'ABCDEFGHIJKLMNOPQ123',
                region: 'us-east-2'
            };
    }
})(process.env.NODE_ENV);

The middle-ware method uses the configuration file and the cognito-express package installed before. This method will be executed in all routings to validate that the token provided is valid according to Cognito. In case the Cognito User Pool can't validate the token, this middle-ware method will return a 401 status code for the request. The code for this method looks like this:

const CognitoExpress = require("cognito-express");
const awsConfig = require('../helpers/awsConfig');

const cognitoExpress = new CognitoExpress({
    region: awsConfig.region,
    cognitoUserPoolId: awsConfig.userPoolId,
    tokenUse: "id",
    tokenExpiration: 3600000
});

function validateAdmin(req, res, next) {
    let accessTokenFromClient = req.headers.authorization;
 
    if (!accessTokenFromClient) return res.status(401).send("Access Token missing from header");

    cognitoExpress.validate(accessTokenFromClient, function (err, response) {
        if (err) return res.status(401).send(err);
        res.locals.user = response;
        next();
    });
}

module.exports = validateAdmin;

Finally, we will apply to the routings the middle-ware method using the "use" method from the ExpressJS router.

var express = require('express');
var router = express.Router();
var cognitoValidator = require('../helpers/cognitoValidator');

var dynamo = require('../helpers/dataService');
router.use(cognitoValidator);

router.get('/'
    , function (req, res) {
        dynamo.getData().then(data => {
            res.json(data);
        }).catch(err => {
            console.log(err);
            res.sendStatus(500);
        });
    });

module.exports = router;

Summary

After finishing all 3 parts of this tutorial, you should have completed the configuration to use Cognito in an Angular application with a Node.js backend service.