Creating cron jobs in node.js: a real-life example using BambooHR

Do you have a requirement that you need to run some kind of process every X number of hours? Wondering on creating scheduled jobs in Node? If that’s why you are here, then this post will work for you.

This time, I’m going to write about node-cron. An NPM package used to schedule tasks that will execute in certain periods of time defined by cron expressions. Let’s start with some basics:

What’s a cron expression?

cron expression is a string containing some subexpressions that describe the details of the schedule that you want to create. Every subexpression is separated by a white space and have a limited amount of options to be set. The cron expression is defined from left to right, and it can contain from 5 to 7 subexpressions (fields from now on).

The library we selected, works with cron expressions from 5 to 6 fields and it works like this:

  • The first option is to set a scheduler with seconds. This field is optional and is only used when the cron expression has 6 fields. It accepts values from 0 to 59, or the wildcard ( * ).

  • The second option is used for minutes. It accepts values from 0 to 59, or the wildcard ( * ).

  • Third for hours. It accepts values from 0 to 23, or the wildcard ( * ).

  • Fourth for day of month. It accepts values from 1 to 31, or the wildcard ( * ).

  • Fifth for month. It accepts values from 1 to 12, the names of the months, or the wildcard ( * ).

  • Sixth and last for Day of week. It accepts values from 0 to 7, the name of each day, or the wildcard ( * ).

Besides the values accepted, each subexpression can have special operators that allows for more complex scenarios, for example:

  • Run every minute 10th and 20th minute: 10,20 * * * *

  • Run every 2 hours: * */2 * * *

  • Run every Sunday: * * * * Sunday

About node-cron

As mentioned before, we are using node-cron. An NPM package with more than 50,000 downloads weekly, and currently, as the time of this post, on version 2.0.3. In Github, it has 713 stars, 10 contributors and 20 releases since February 2016, which was the first release.

Since we are going to work in Typescript, I suggest also to install the types package for node-cron. You can install it by running:

npm install @types/node-cron —save-dev

Setting up node-cron

Creating a scheduled task with node-cron is a really easy task, and actually the basic examples from the documentation of the package explain it really well. Here is one of the examples from the page:

var cron = require('node-cron');
 
 cron.schedule('0 1 * * *', () => {
   console.log('Runing a job at 01:00 at America/Sao_Paulo timezone');
 }, {
   scheduled: true,
   timezone: "America/Sao_Paulo"
 });

However, this example falls short if you work with a more real-life scenario; like retrieving information from a datasource, manipulate it and then insert it into another database. This is a typical case of a process that needs to be done when you are doing some kind of synchronization between two systems. And actually, today we are doing that same example.

Our use case will be to retrieve information from a system called BambooHR (used to manage employees of a company, salaries, vacations, etc), compare it with data from another system and then insert, update or delete the differences. So let’s start first with the cron job.

The cron job

We are going to create first a class that will contain all the logic of the tasks that will be run, for our case it will be called BambooCron. Here is the code for it:

import { schedule, ScheduleOptions, ScheduledTask } from 'node-cron';
import { parseExpression } from 'cron-parser';
import _ from 'lodash';
import moment from 'moment';
import { BambooService } from '../data-access/bamboo/bamboo.service';
import { UserService } from '../api/services/user.service';
import { TimeOffService } from '../api/services/timeOff.service';
import { IHumanResourceManagerService } from '../data-access/IHumanResourceManagerService';

export default class BambooCron {
    private options: ScheduleOptions = {
        scheduled: false
    };
    private task: ScheduledTask;
    private bambooService: IHumanResourceManagerService;
    private usersService: UserService;
    private timeOffsService: TimeOffService;

    constructor() {
        this.task = schedule(process.env.CRON_EXPRESSION
            , this.executeCronJob
            , this.options);
    }

    public startJob() {
        this.task.start();
    }

    private executeCronJob = async () => {
        const format = 'YYYY-MM-DD hh:mm:ss';
        console.info(`Starting cron job at: ${moment().format(format)}`);

        this.usersService = new UserService();
        this.bambooService = new BambooService();
        this.timeOffsService = new TimeOffService();
        await this.processEmployees();
        await this.processTimeOff();

        const cronDate = parseExpression(process.env.CRON_EXPRESSION).next();
        console.info(`Finished cron job. Next iteration at: ${moment(cronDate.toDate()).format(format)}`);
    }

    private async processEmployees() {
        const employees = await this.bambooService.getEmployees();
        const users = await this.usersService.getAllUser();
        const usersToAdd = _.differenceWith(employees, users, (employee, user) => {
            return employee.id === user.bambooId;
        });
        const usersToDelete = _.differenceWith(users, employees, (user, employee) => {
            return employee.id === user.bambooId;
        });
        usersToAdd.forEach(async (employee) => {
            await this.usersService.saveUser(employee);
        });
        usersToDelete.forEach(async (user) => {
            await this.usersService.removeUser(user);
        });
    }

    private async processTimeOff() {
        const bambooTimeOffs = await this.bambooService.getTimeOffs();
        const dbTimeOffs = await this.timeOffsService.getAllFromProvider('bamboo');
        const users = await this.usersService.getAllUser();
        const timeOffsToAdd = _.differenceWith(bambooTimeOffs, dbTimeOffs, (bambooTimeOff, dbTimeOff) => {
            return bambooTimeOff.id === dbTimeOff.bambooId;
        });
        const timeOffsToDelete = _.differenceWith(dbTimeOffs, bambooTimeOffs, (dbTimeOff, bambooTimeOff) => {
            return bambooTimeOff.id === dbTimeOff.bambooId;
        });
        timeOffsToAdd.forEach(async (timeOff) => {
            const user = users.find(x => x.bambooId === timeOff.employeeId);
            if (user)
                await this.timeOffsService.saveTimeOff(timeOff, user.userNm);
        });
        timeOffsToDelete.forEach(async (user) => {
            await this.timeOffsService.removeTimeOff(user);
        });
    }
}

Let’s explain this class by sections. First, the constructor is where the task is going to be scheduled. The method schedule, imported from node-cron, receives 3 parameters: the cron expression that is being retrieved from the environment file, then the callback to the job code and lastly, some options of the scheduler (in out case, the only option we set is that it won’t start immediately).

The method startJob is a simple one, since we specify that the job is not going to start as soon as we schedule it, we need to have a way to start it programmatically.

The following method is executeCronJob, here is where everything happens, at least from a high level. From here, we initialize all the services that we are using to retrieve or insert information and also we print some information messages to the console like the time the task is running and when will be the next time the job runs.

The next two methods are similar but works for different entities, so let’s explain the flow for each one. The first step is retrieve all the information needed by calling methods from the services instantiated in the executeCronJob method. Then, we compare the data using lodash’s differenceWith method (another famous package). And finally, from the arrays created, we either delete or add information to the database by calling the services again (no updates are being managed in this example).

A big design improvement

As I’m writing this post, I’m noticing that the methods processEmployees and processTimeOff are, in essence, the same thing. So they can be abstracted to another method that encompasses the implementations. Feel free to design it differently.

The bamboo service

Now, we are going to work with the service that retrieves information from bamboo.

import fetch from 'node-fetch';
import moment from 'moment';
import { IHumanResourceManagerService } from '../IHumanResourceManagerService';
import { Employee } from './employee';
import { VacationTimeOff } from './vacationTimeOff';

export class BambooService implements IHumanResourceManagerService {
    private bambooHeaders = {
        method: 'GET',
        headers: { 'Accept': 'application/json' }
    };

    private getBaseUrl(endpoint) {
        const key = process.env.bambooKey;
        const baseEndpoint = ':x@api.bamboohr.com/api/gateway.php';
        const subdomain = process.env.bambooSubDomain;
        return `https://${key}${baseEndpoint}/${subdomain}/v1/${endpoint}`;
    }

    public async getEmployees(): Promise<Employee[]> {
        const url: string = this.getBaseUrl('employees/directory');
        try {
            const response = await fetch(url, this.bambooHeaders);
            const directory = await response.json();
            return directory.employees
                .filter(employee => employee.workEmail)
                .map((employee) => {
                return {
                    ...employee,
                    id: parseInt(employee.id),
                }
            });
        } catch (error) {
            throw error;
        }
    }

    public async getTimeOffs(): Promise<VacationTimeOff[]> {
        const today = moment();
        const startDate = today.format('YYYY-MM-DD');
        const endDate = today.add(3, 'M').startOf('month').format('YYYY-MM-DD');
        const url: string = this.getBaseUrl('time_off/requests/?status=approved&start=${startDate}&end=${endDate}');
        try {
            const response = await fetch(url, this.bambooHeaders);
            const timesOff = await response.json();
            return timesOff.map((timeOff) => {
                return {
                    ...timeOff,
                    id: parseInt(timeOff.id),
                    employeeId: parseInt(timeOff.employeeId),
                };
            });
        } catch (error) {
            throw error;
        }
    }
}

Again, let’s review this by sections. First, we create some reusable headers and a getBaseUrl method. This method will create the URL that will be used to connect to Bamboo; this URL is created by reading some configurations from an environment file.

Then, we have two methods that get the information, one for the employees and another one for the time offs from Bamboo. Some logic is applied in here to limit the information retrieved, for example, for the time offs we just want to retrieve the requests created or updated for the upcoming 3 months, anything prior to that is not needed for our target system.

The database services

From the BambooCron class, we also use services that connects to our database. In our system, we are using typeorm (which I talked previously here), an ORM with mysql integration and supports typescript out-of-the-box. For this post, I’m just going to show the service to manage users, however all of them follow a similar approach, so you can extrapolate for the rest of the entities.

import { BaseService } from "./base.service";
import { Employee } from "../../data-access/bamboo/employee";
import { User } from "../../data-access/entity/user";


export class UserService extends BaseService{
  public getAllUser = async () =>{
    return this.dbContext.users.find({
      where: { statusTxt: 'active' }
    });
  }

  public async saveUser(employee: Employee): Promise<User> {
    let newUser = this.createUser(employee);
    try {
      await this.dbContext.users.insert(newUser);
      return newUser;
    } catch (error) {
      throw error;
    }
  }

  public async removeUser(user: User) {
    try {
      user.statusTxt = <any>{ statusTxt: 'inactive' };
      await this.dbContext.users.save(user);
    } catch (error) {
      throw error;
    }
  }

  private createUser(employee: Employee): User {
    const userNm = employee.workEmail.substring(0, employee.workEmail.indexOf('@'));
    const user: User = this.dbContext.users.create({
      bambooId: employee.id,
      email: employee.workEmail,
      fullNm: employee.displayName,
      userNm: userNm,
      statusTxt: <any>{ statusTxt: 'active' }
    });
    return user;
  }
}

The User service is pretty straight-forward. It has some CRUD operations like getting active users, saving new users and finally removing them (soft delete by changing the status). It extends a BaseService class which looks like this:

import { DbContext } from "../../data-access/dbcontext";

export class BaseService{
  protected dbContext:DbContext = new DbContext();
}

This one is even easier, since it only exposes a property that is called DbContext. This property is exposed to every service that inherits from it, and basically it grants the ability to use connections from typeorm to execute queries or transactions with the database. Finally, this is how the DbContext class looks like:

import { Connection, createConnection, EntityManager, Repository } from "typeorm";
import { User } from "./entity/user";

export class DbContext {
    private connection: Connection;
    constructor (){
        this.init();
    }

    private async init(){
        try {
            this.connection = await createConnection({
                "name": `connection-${new Date().getTime()}`,
                "type": "mysql",
                "host": ANY_HOST_HERE,
                "port": 3306,
                "username": ANY_USERNAME_HERE,
                "password": ANY_PASSWORD_HERE,
                "database": ANY_DATABASE_HERE,
                "synchronize": false,
                "logging": true,
                "entities": [
                    User
                ]
            });
        } catch (error) {
            throw error;
        }
    }
    
    public get manager() : EntityManager {
        return this.connection.manager;
    }

    public get users(): Repository<User>{
        return this.manager.getRepository(User);
    }
}

The DbContext class is a reduced version of the one I use, it has more entities but the rest of the design is the same. First, we have an init method that creates a connection every time the DbContext is instantiated, and this connection receives all the entities and database information needed to create it.

And then, for every entity, we expose a getter property that expose the repository for each one of the entities that the typeorm will map to.

Finally, where do we execute all of this code. Since it needs to be executed or started as soon as the Node service starts, we add the code to the index.ts file of express.js, like this:

...IMPORTS AND OTHER STUFF HERE

const cron = new BambooCron();
...SOME LOGIC HERE TO PREPARE THE SERVICE OR OTHER THINGS
cron.startJob();

const port = parseInt(process.env.PORT);
export default new Server()
  .router(routes)
  .listen(port);

Summary

Finally, we have arrived to the end, and if you are here also, it means that you have created all the necessary code to run a scheduled task using node-cron and typeorm. Now, this is just one of the many use cases that can be covered with this design, so please adapt it as best as you see fit to whatever case you have to solve.

If you have any comment, don't hesitate in contacting me or leaving a comment below. And remember to follow me on twitter to get updated on every new post.