Cloud computing is playing an increasingly key role in organizational IT strategies. Amazon, Microsoft, IBM, Google and others continue to improve their offerings, making a compelling case for using cloud-based resources. As providers continue to offer more and more services, new opportunities are presenting themselves. In this article, we’ll look at some of the products offered by Amazon Web Services (AWS) and how they can be used to introduce big data analytics capabilities for NonStop applications. In a follow-up article, we’ll take a detailed look at how this solution can be integrated with NonStop applications.
To begin from a common starting point, let’s define cloud computing and its advantages.
What is Cloud Computing?
Amazon defines Cloud Computing as “the on-demand delivery of computer power, database storage, applications, and other IT resources through a cloud services platform via the internet with pay-as-you-go pricing”. A quick Google search will find all kinds of definitions for Cloud Computing, but they are essentially the same with the key concepts being:
- On demand delivery of servers, storage, databases, networking, software, analytics, and more—over the Internet.
- Pay as you go pricing.
- Accessing computer services over the Internet instead of from your computer or company network.
- Accessing services that are managed for you by someone else.
There are three types of Cloud Computing deployments, with each one representing a different level of control. They are: Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS).
Infrastructure as a Service involves renting individual components of computing resources hosted by someone else, such as: storage, network endpoints, virtual machines with a selection of operating systems. You are responsible for assembling and maintaining these components as well as deploying and maintaining your software. Examples of this are: Microsoft Azure, Amazon Web Services, Google Cloud Platform and IBM Cloud.
Platform as a Service typically involves renting access to pre-configured virtual machines hosted by someone else; you provide the software that runs in the cloud and they provide the operative system and virtual hardware to host that software. Examples of this are: Heroku, OpenShift and any Docker based applications.
Software as a Service involves renting access to a piece of software that someone else hosts on the Internet; you pay for access to the service and they worry about setting up, providing and maintaining that service. Examples of this are SalesForce and DropBox.
Advantages of Cloud Computing
Advantages of Cloud Computing include:
- Scale services up or down to meet needs
- Stop guessing capacity
- Reduced Cost
- Only pay for what you use (i.e. “pay as you go”)
- No major up-front capital expenditure
- Best security practices followed
- Deploy the infrastructure you need almost instantly
- Get applications to market quickly without worrying about underlying infrastructure costs or maintenance
- Strategic Value
- Access to latest, most innovative technology
- Replication can reduce the impact of unexpected outages of your software
- Most cloud offerings allow geographically dispersed deployments so that your software can be located close to your markets.
Cloud Computing and Big Data Analytics
Big data analytics is the examination of large volumes of data to detect patterns, trends, customer preferences and other information that can help organizations find ways to: increase revenue, provide better customer service, improve operational efficiency and more. Typically, this type of analysis is not done against transactional databases which are designed for high performance, transaction-based processing. Instead, data is moved to storage areas designed specifically for querying and analysis. Data lakes, data warehouses and data marts are examples of such storage areas.
A data lake is a storage area that holds large volumes of data in its original format. No processing or formatting of the data is done before it is loaded. The thinking behind storing data this way is that you never know how this data will be used in the future. Storing it in its raw format keeps all possibilities open. While a data lake stores data in an unstructured format, a data warehouse stores data in a structured format. Data warehouses are modeled for high performance querying and reporting. Data marts are subsets of a data warehouse geared to a specific functional area. Data lakes and data warehouses / data marts are sometimes considered mutually exclusive approaches to big data analytics; however, this doesn’t have to be the case. A data lake is an excellent way to source data for use by multiple data warehouses and data marts to meet both immediate and future analytic requirements.
Costs for storage, processing power, software, etc. can make implementing a data analytics solution an expensive undertaking. This, plus the fact that being scalable is a key requirement as data volumes grow, makes cloud computing a great option for big data analytics.
AWS and NonStop
Transactional data captured by NonStop applications can be a key source for business analytics; however, the NonStop platform may not be ideal for storing and analyzing the information over a long period of time. Developing and hosting an in-house data analytics solution can be a challenging – and expensive – option. An attractive alternative is AWS which provides all the cloud-based services needed to easily extend a NonStop application with a scalable, flexible and cost-effective infrastructure. But how do you get the information from the NonStop to AWS and which AWS services do you use and how do you use them?
The diagram below shows just one possible approach for integrating data from the NonStop platform with AWS.
The AWS services in the above diagram can be split into 3 categories – Collection, Storage and Analyze.
AWS’ Direct Connect service can be used to connect NonStop application data to AWS.
AWS provides several storage services. The example above uses AWS’ Simple Storage Service (S3) to hold data in its raw form – thus providing an excellent data lake implementation. A concern with data lake implementations is that they can often turn into “data swamps.” This is a term used to describe a situation where the data stored cannot be easily queried or used and can happen when data is simply stored into a data lake without any information about it’s context (date, source, identifiers, etc.). AWS’ data lake solution addresses this by storing data in packages and tagging each package with metadata. You can define the metadata you need for your packages to keep them organized. AWS’ Elasticsearch and DynamoDB are used for storing and retrieving these packages. Redshift is a data warehouse service where data can be stored for sophisticated querying and analysis. Data can be loaded from the data lake into one or more data warehouses. Lambda is AWS’ serverless function environment. It can be used to develop event-driven code for receiving data from the NonStop and loading it to S3 and storing metadata in DynamoDB.
AWS provides many analysis services. In the above example, AWS Quicksite is used.
Integrating AWS with NonStop can provide a scalable, flexible and cost-effective platform for big data analytics. In our next article we’ll discuss the steps involved in more detail.
Do you find this tutorial blog helpful? Let us know what you think, and how we can make it even better. Don’t forget, you can subscribe to our blogs (top right-hand corner of the home page) to get automatic email notification when a new blog is available.
John Russell joined Canam Software Labs in 1998 as a member of their consulting services team. He has worked on software projects as developer, support analyst, architect and project manager. In 2007, he took on the role of Product Manager and is responsible for Canam Software’s suite of products including XML Thunder for NonStop.