How to build serverless cursor-based pagination service on the AWS Cloud

בתאריך 22 נובמבר, 2019

We provide a guide on how to build an API with AWS Lambda and API Gateway to paginate data using the Serverless framework.

How to build serverless cursor-based pagination service on the AWS Cloud

In this port, I’ll be guiding you on how to build an API with AWS Lambda and API Gateway to paginate data using the Serverless framework.

Technologies stack

  • AWS Lambda for running code without provisioning or managing servers. You can utilize free-tier to run your service with zero cost.
  • AWS API Gateway to create, publish, and secure Rest APIs.
  • Serverless framework to build and deploy your backend services on the cloud, you can build and deploy a serverless function in minutes.

Prerequisites


Cursor-based Pagination

Have you ever heard this concept, it just is simple a pagination technique but it does not work the same with the classic pagination. Yes, the classic pagination that I think you will be familiar with, it works based on the offset-limit concept. An offset is a number that specifies the number of rows to skip before starting to return rows from the query. For simplicity, you want to see the data on page 3 with page size is 10, just skip 20 rows and return the next 10 rows as the result.

Offset-limit is one of the most commonly used techniques for data paging. However, there are several drawbacks of using this approach:

  • With a database that has large datasets, the offset clause makes slower querying as the number of records increases because the database still read up all records before skipping with offset records and return it.
  • In datasets that change frequently, the paged data becomes unreliable and possibility returns duplicate results.

Cursor-based pagination (aka keyset pagination) works based on a cursor that is a unique identifier for a specific record in datasets. We will deal with the cursor as the pointer of the next record that we will receive on the next page. And on subsequent requests, the server returns the cursor until there is no record on the next page.

There are a couple of approaches to implement cursor-based pagination, below are some interesting articles that I think you should take a look to get more details:

Advantages:

  • Prevent many of the drawbacks of classic pagination that I have mentioned above.
  • Scale well with the large datasets.

Disadvantages:

  • The end-user cannot see the total number of items in the list.
  • The end-user cannot jump to a specific page or back to the previous page.

With the above benefits, I chose cursor-based pagination for this implementation.

In this demo, I will show you how to implement cursor-based pagination using a unique and sequential column of the table (i.e. auto-increment column). Yeah, it is pretty simple and easy to understand.


Serverless

The Serverless Framework helps you develop and deploy your AWS Lambda functions, along with the AWS infrastructure resources they require. It’s a CLI that offers structure, automation and best practices out-of-the-box, allowing you to focus on building sophisticated, event-driven, serverless architectures, comprised of Functions and Events.

There are ways to create and deploy an AWS Lambda function by using AWS Cloudformation or AWS Console. But I chose serverless because of its convenience, automation and the ability to increase development time.

One most of my favorite Serverless feature is the plugin, some of the plugins I recommend to use in this article. In this guide, I will use serverless-plugin-typescript and serverless-offline for building Serverless applications.

With the serverless application, we must define a template file for the definition of your AWS resources, configurations, list of plugins, etc.

service: pagination-serverless-api
frameworkVersion: '>=1.1.0 <2.0.0'

provider:
  name: aws
  runtime: nodejs10.x
  memorySize: 512 # optional, in MB, default is 1024
  timeout: 10 # optional, in seconds, default is 6
  versionFunctions: false # optional, default is true
  region: us-east-2

custom:
  cors:
    origin: '*'
    headers:
      - Cache-Control
      - Content-Type
      - Authorization
      - Accept
      - Accept-Encoding
      - X-Amz-Date
      - X-Api-Key
      - X-Amz-Security-Token
      - X-Amz-User-Agent
    allowCredentials: false

functions:
  person:
    handler: functions/person/index.handler
    events:
      - http:
          path: persons
          method: get
          cors: ${self:custom.cors}

plugins:
  - serverless-plugin-typescript
  - serverless-offline

In the functions section, we will link to the file that contains code for reading data with pagination. Before implementing code for that function, we have to create a database and datasets for testing purposes (should have more 50 items in the datasets). I will use the MySQL database and you can choose others such as MSSQL or PostgreSQL, etc for creating the database and preparing datasets.

Continue, to connect to the database we can install this package, it is an awesome Node.js module for communicating with the MySQL database. However, the AWS Lambda function scales by creating separate instances for each concurrent request (you can see here for more details). And this is a serious problem for the RDBS database like MySQL when we can get the maximum connection error anytime with high concurrent requests.

Luckily, Jeremy Daly has built a package that is wrapped for MySQL to solve the above problem. We implement code for database helper as below:

import { URL } from 'url';
import * as serverlessMysql from 'serverless-mysql';

// Replace below with your database connection string
const dbConnectionString = process.env.DB_CON_STR || 'jdbc://root:root@localhost:3306/example_db';

const dbConfig = new URL(dbConnectionString);

// Require and initialize outside of your main handler
export const mysql: serverlessMysql.ServerlessMysql = serverlessMysql({
  config: {
    host: dbConfig.hostname,
    database: dbConfig.pathname.replace('/', ''),
    user: decodeURIComponent(dbConfig.username),
    password: decodeURIComponent(dbConfig.password)
  }
});

We create a file to build a query for reading data support cursor-based pagination:

import { select } from 'squel';

import { decode } from './base64';

/**
 * Build query string to reading data from database support cursor-based pagination
 */
export const buildCursorPaginationQuery = (options: { table: string; limit: number; cursor: string }): string => {
  let cursorId;
  let selectQuery = select().from(options.table);

  if (options.cursor) {
    try {
      let decodedCursor = decode(options.cursor);

      cursorId = decodedCursor.split(':')[1];
    } catch {
      throw new Error('Invalid cursor');
    }

    selectQuery = selectQuery.where(`id >= ${cursorId}`);
  }

  selectQuery = selectQuery.limit(+(options.limit || 50) + 1);
  selectQuery = selectQuery.order('id', true);

  return selectQuery.toString();
};

As you see above code, I use the squel to build the query string, you also manual to build the raw query. But I recommend you to use the squel by its benefits like:

  • Supports parameterized queries for safe value escaping.
  • Supports Non-standard SQL.
  • Uses method chaining for ease of use.

Finally, implement code for the function as below:

import { Handler, Context } from 'aws-lambda';

import { mysql, getQueryParams, encode, buildCursorPaginationQuery } from '../../helpers';

export const handler: Handler = async (event: any, context: Context) => {
  const query: any = getQueryParams(event);
  const pagingQuery: string = buildCursorPaginationQuery({
    table: 'person',
    limit: query.limit,
    cursor: query.cursor
  });

  console.log(pagingQuery);

  const results: any[] = (await mysql.query(pagingQuery)) || [];
  // Calculate next cursor
  let nextCursor = '';

  if (results.length > query.limit) {
    const lastEle = results.pop();

    nextCursor = encode(`person:${lastEle.id}`);
  }

  const response = {
    success: true,
    data: results,
    response_metadata: {
      next_cursor: nextCursor
    }
  };

  // Run clean up function
  await mysql.end();

  return {
    statusCode: 200,
    body: JSON.stringify(response)
  };
};

The API we develop accepts two params:

  • limit: a number value to limit records that return from the server.
  • cursor: a base64 string that is returned from the previous request. It indicates the starting point to retrieve data for the next page.

For testing, I will run the following command to start API on local:

sls offline

Let’s explore our work. Using the Postman, we can easier invoke API

Test API with Postman

Deploy

Deployment is much easier with the Serverless framework and can be done with one single command. So let’s deploy and see results by running below command:

sls deploy --stage [your-stage] --region [your-region]

Feel free to check out your deployment and try to invoke API Gateway on the AWS and compare results. I hope it works for you. If not, please let me know.

The source code is available on our Github


Conclusion

I hope this guide will be helpful to you and bring to you some ideas to help you build your services with cost-optimized, secured, flexible, high availability.

In this guide, I only write a simple code to show how to build and deploy. You are free to extend this project with some additional features as below:

  • Supports more param options such as fieldssort by …
  • Implementing classic pagination (offset-limit pagination).
  • Implementing the error handler for handling such as InvalidCursorError or UnexpectedError.
  • Uses other databases such as Amazon DynamoDB is a great option.
מאמרים נוספים...