In part two of this series I discussed creating a serverless data collection and processing fabric for an IoT deployment. To recap, we’ve now reviewed the local devices and controller/gateway pattern for the security cameras deployed. We’ve also discussed the Amazon Web Services infrastructure deployed to collect, process and catalog the data generated by the security cameras.
In this post we will cover the creation of a serverless REST API.
In part one of this series I briefly discussed the purpose of the application to be built and reviewed the IoT local controller & gateway pattern I’ve deployed. To recap, I have a series of IP cameras deployed and configured to send (via FTP) images and videos to a central controller (RaspberryPI 3 Model B). The controller processes those files as they arrive and pushes them to Amazon S3. The code for the controller process can be found on GitHub.
In this post we will move on to the serverless processing of the videos when they arrive in S3.
… but vendors will fight you every step of the way.
Remember – every vendor wants to get as much of your data in their database as possible. And every data approach (Relational, Document, Key/Value Store and Graph) can be forced to do just about anything; and given the opportunity the vendor will tell you their solution is the right choice.
The challenge for the modern Enterprise Data Architect is maintaining a cogent point of view about assembling a polyglot solution to make each use case (or micro service) less complex, more scalable and easier improve/enrich over time.
However, practical illustrations of patterns and implementations are exceptionally hard to find. This series of posts will attempt to close that gap by providing both the purpose, design and implementation of a complete serverless application on Amazon Web Services.
Part 1 – The setup…
Every application needs a reason to exist – so before we dive into the patterns and implementation we should first discuss the purpose of the application.
Nest wants how much for cloud storage and playback?
I have 14 security cameras deployed, each captures video and still images when motion is detected. These videos and images are stored on premises – but getting them to “the cloud” is a must have – after all if someone breaks in and takes the drive they are stored on all the evidence is gone.
If I were to swap all of the cameras out for Nest cameras cloud storage and playback would cost $2250/year – clearly this can be done cheaper… so…
Since the late-2000’s there has been an explosion of non-relational (NoSQL if you must) data persistence technology. The industry buzz has focused around the derivatives of the seminal work done at Google – i.e., BigTable – and Amazon – i.e., Dynamo.
We’ve seen massive adoption of simple document stores and key-value based stores – which focus on availability and partition tolerance and thereby enable storage and processing of schema-less (or semi-structured) data at velocities and volumes previously considered entirely impractical.
There were – however – compromises. These systems are abysmal at dealing with connections between the data – or more precisely connecting the entities in the data sets with one another in a variety of contexts.
Many of our platforms, systems, applications and services are intended to deal with these types of connections – and unfortunately most engineering teams fall back to relational databases to solve these problems. The problem with this approach, however, is that relational databases are inherently inefficient when performing complex set operations:
The true value of the graph approach becomes evident when one performs searches that are more than one level deep. For instance, consider a search for users who have “subscribers” (a table linking users to other users) in the “311” area code. In this case a relational database has to first look for all the users with an area code in “311”, then look in the subscribers table for any of those users, and then finally look in the users table to retrieve the matching users. In comparison, a graph database would look for all the users in “311”, then follow the back-links through the subscriber relationship to find the subscriber users. This avoids several searches, lookups and the memory involved in holding all of the temporary data from multiple records needed to construct the output. Technically, this sort of lookup is completed in O(log(n)) + O(1) time, that is, roughly relative to the logarithm of the size of the data. In comparison, the relational version would be multiple O(log(n)) lookups plus additional time to join all the data.
The opportunity to build highly efficient systems by leveraging natural graph models (those that exist in the real world) is massive and dramatically under utilized.
Imagine a CRM system which wasn’t tightly coupled to a rigid account, contact, entitlement hierarchy model. Imagine an student roster system which directly modeled the complexity of sections for teachers, schools, districts and states without a myriad of join tables.
The only barrier is investing in education required to understand the efficiencies of graphs and the graph databases available today. I highly recommend you up your game by downloading Neo4j today and beginning to learn the advantages graph databases can provide in your polyglot persistence architectures, and if you need any help let me know – I’m happy to help you and your team model your data and leverage a graph to simply your system and increase your feature velocity.