Amazon Web Services (AWS) – BI


 

IT organization is organized in some form of verticals / Strategic Business Units (SBU), or in some other form. These may be grouped by geography / technology / industry groups etc. Almost inevitably every such organization has a cloud computing capability, and most of cloud based projects / architectures are designed and developed by this capability. This may work till you are working in the capacity of an architect for your own set of projects that just deal with your technology.

I believe that when one intends to grow as an enterprise architect, one needs to collaborate with SMEs from cross environments / technologies / platforms, and for the same one needs to have a good understanding of a variety of each of it.

Why Amazon Web Services (AWS) – AWS is probably the largest cloud player in providing IaaS. Azure and other such platforms have started providing IaaS recently, but their major strength is PaaS where they provide technology to build solutions and the infra is managed by them. If one intends to develop solutions that have a very broad mix variety of technologies, then one would have to opt a very strong IaaS cloud environment, than a PaaS environment.

World of Amazon Web Services, that one might want to keep in consideration while architecting BI solutions on AWS.

1) AWS has two types of clouds : Public / Virtual private cloud (VPC)

In public cloud servers are under AWS control, which can be configured by user. In VPC, servers are hosted within AWS but part of corporate network. IPs are under the control of the corporate network and security between the corporate network and servers hosted on AWS is the obligation of the corporate.
2) Amazon Simple Storage Service (S3) :
Its an object store, where one can store any type of data in huge amounts, and the same can be accessed using the API provided by amazon for S3.
  • It’s a highly available service, as it stores copies of data in multiple locations. It can be used as a staging location for migrating data across availability zones when using Elastic Block Store Disk.
  • When data is stored into S3, the datatype is stored in a metadata tag. When a client accesses the data, it can check this tag to ensure that the data is read accordingly.
  • S3 can store an object with max 5 GB in size. S3 objects can be accessed via REST/SOAP/HTTP. Third party tools are available to handle storage management inside S3.
3) Amazon Elastic Compute Cloud  

Provides scalable and flexible compute capacity EC2 instance provides interface to manage Amazon Machine Image (AMI, also known as bundle). Amazon, and other third party providers like RightScale, IBM and others provide ready images for use.

  • Any software installation would be lost from EC2 instance, once the instance is “terminated”. Persistent images are also available which can persist software changes, once the instance is stopped (but not terminated). These images are based on EBS or S3 instance store.
  • If you use a SQL Server 2008 R2 AMI, then the license cost of SQL Server is included in the cost of running the instance. One cannot use their own purchased licenses to offset the cost of SQL Server license in a AWS provided SQL Server AMI.
  • One can allocate static IP address to an instance using AWS “Elastic IP”, and after that once can RDP to the same using the same IP / DNS every time. Without an Elastic IP, the IP address for the instance would change every time the instance is started and stopped. Elastic IPs are chargeable.
Billing types for EC2 instance
  • Reserved Instance – This instance type requires reserving the instance for a fixed term. It includes an up-front cost, along with usage charges. This instance is cheaper than Unreserved instance.
  • Unreserved Instance – This instance is billed on pay-per-use basis, but is comparatively expensive than Reserved Instance.
  • Spot Instance– These are unique type of EC2 instances, which are basically amazon’s way to handle spare capacity. You need to set a price and number of instances you need. When the average spot price falls below the price set by you, the instances would be allocated to your account. But downside is that once the average spot price rise above the price set by you, those instance would stop.
  • In AWS, you are not billed for any data transfer between AWS components (for example data transfer between S3 and EC2). But for any data traffic that goes in and out of the instance using Internet, is billable. 
  • Various categories of EC2 instances available like Micro, Standard, Cluster Compute, High-Memory Cluster, Cluster GPU, High Memory, High CPU, High Storage, High I/O etc. Also each of them have small, medium, large scaling for each category. A comparison can be seen from here, easy way to decide just click on those links.  ,
4) Amazon Elastic Block Storage (EBS)
 Its the storage system / disk where EC2 instance would store and persist data. EBS is created, configured and managed out of EC2 instance and not within it. Even if an EC2 instance has been terminated, data stored on EBS would persist.
  • EBS volumes can be 1 GB to 1 TB in size.
  • EBS volume availability is restricted to the region and availability zone in which they are created. It’s possible to make it available within a different zone by creating a snapshot of EBS and storing it into S3, and again creating a new EBS from the snapshot stored in S3. But EBS cannot be made available across regions by any means.
  • One EC2 instance can have many EBS volumes, but one EBS volume cannot be shared by multiple EC2 instances.

5) Amazon Security Groups

 It provides a way to restrict access on EC2 instances, by configuring ports, ip and servers that can connect to an EC2 instance. It acts as a firewall for an EC2 instance.

  • All the EC2 instance on which a security group is applied, does not become part of a common group / subnet.

6) Amazon CloudWatch

 Cloudwatch are of two types in AWS – Basic CloudWatch and Detailed CloudWatch.

  • Basic CloudWatch is available with EC2 instance. It collects different performance metrics related to the EC2 instance.
  • Detailed CloudWatch enables a detailed monitoring of EC2 instances, with alerts and notifications.

7) Amazon Elastic Load Balancing (ELB) Elastic Load Balancing can be used for two major purposes – Load balancing and Fault tolerance.

  • As a load balancer it can distribute incoming traffic to different servers in a load balanced fashion.
  • As a fail over balancer, it can detect a failed / unresponsive / unhealthy EC2 instance and route traffic to other instances as required.

8) Amazon Relational Database Service (RDS) Amazon RDS provides full featured database services using MySQL, Oracle as well as SQL Server database engine.

  • RDS provides fault-tolerance / high availability by creating Multi-AZ Deployments. With this option, one instance of RDS is created in the availability zone selected by user, and second instance is created in an alternative availability zone. Both instances are kept upto date in parallel. The second instance is not visible / available, until the first instance becomes unavailable, and when it does, the second instance takes over immediately.
  • RDS instance can be configured to create Read Replica which are copies of the RDS instance, that can be used for reporting purposes.
  • RDS instances are backed up by default in AWS and this backup remains available for a limited time. Backups are totally configurable and can be persisted indefinitely too.

 9) Amazon Simple Notification Service (SNS) Amazon SNS is a publish and subscribe model using which systems or user can generate and/or receive alerts and/or notifications.

  • There are three methods in which alerts / notifications are delivered: Email / Http based web service call / A message via Simple Queue Service (SQS).

 10) Amazon CloudFrontIts the Content Delivery Network of AWS that distributes and caches content at the nearest servers based on user request patterns. 11) Amazon Elastic MapReduce (EMR) Amazon EMR provides features to process large amounts of data using Hadoop based processing combined with other AWS products.

  • EMR also provides option to run HBase (column oriented, distributed, NoSQL database) on Hadoop clusters which enables real-time data access to Hadoop in cloud.

 12) Amazon Identity and Access Management (IAM) and Amazon CloudFormation provides means to control permissions to AWS resources as well as manage AWS resources as a system respectively. Amazon Route 53 is a highly available and scalabe Domain Name System (DNS) management service that can be used with AWS IAM to manage domains with faster performance. Have a look youtube

 

Annual Amazon Web Services (AWS) Summit in Stockholm on May 27, 2014

 

This free, one-day event is a great opportunity to hear about the latest AWS services, learn best practices from AWS engineers, gain new skills and get your questions answered by our AWS experts.
This year’s summit will be bigger, with more breakout sessions (separated into business and technical tracks), presentation content you can vote to unlock on the day and a greater number of hands-on labs.

 

Event Details
Date: 27 May 2014
Location: Stockholm Waterfront Congress Centre.
Register Today! HERE

 

Get Hands-on Experience! Visit the AWS Hands-on Labs (HOL) area and get practical hands-on experience with our self-paced lab sessions. The HOLs are free of charge, our AWS experts will be on hand to help and dedicated computers will be provided.

 

history-cloud

Amazon EC2, Rackspace, Salesforce, GoGrid are some of the famous cloud providers, and Microsoft Azure is probably the newest kid on the block. Cloud is gaining popularity day by day, and businesses as well as solution providers would want to move their existing applications or base their future applications on the cloud. But there are a few factors which should be considered for evaluating different cloud providers that suit your needs. Below is a brief list of such requirements in alignment with BI / MS BI needs.
1) Platform migration without architectural changes : Applications that are already developed when needs to be migrated to cloud, change in architecture of the application due to limitations or constraints of the cloud vendor is out of question. For this requirement the cloud vendor should be providing IaaS and not just PaaS services. The reason I mentioned PaaS and not SaaS is that if one is considering to use SaaS , this would make sense for your future requirements but not for your existing applications. Amazon EC2 provides a major set of services to cater this requirement. Unfortunately, to the best of my knowledge, Microsoft Azure is not up to the mark till date for this requirement.
2) Support for Private Cloud : There might be some very sensitive business logic in WCF (Windows Communication Foundation) services or similar other interfaces that one might not want to expose on the cloud. So the cloud vendor should also be supporting private cloud like Amazon VPC or Windows Azure Appliance Solution.
3) Software Licensing : Many cloud providers do not facilitate use of corporate licenses that enterprises would have procured, and many software vendors do not provide license for the use of product on cloud. Licensing needs to be levelled from software vendor as well as cloud vendor, so that software licenses can be easily used / reused on cloud environment.
4) PaaS Support for operations related to data : In a typical BI project, involving ETL, one can expect more than one database and different forms of staging ( in different file formats ). Cloud vendor should be supporting multiple relational DBs and storage formats required to support data storage. For example, SQL Azure and Project Houston provide a nice platform for data storage, design and operations. Windows storage provides three different kinds of storage formats. But in case if both are required to operate in the same environment, where I need to pull of data from SQL Azure and store in a file format provided by Windows Azure ( say BLOB storage ), both are not on the same platform. If your project has such dependencies, this should be taken care before considering your cloud vendor.
5) Ease of backup and restore operations : Most cloud vendor provides features that flush out any data as soon as you stop paying for the instance i.e using the instance. For permanent data, a separate dedicated storage needs to be purchased. Backing up and restoring this instance can be one big concern. Also enterprises might want portability for such instances, where one might want to create images on the cloud and use it for multiple projects or solutions. Amazon Machine Images is one such example in alignment to this requirement.
Mehboob
Microsoft Certified Solutions Associate (MCSA)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s