Module 5 Storage and Databases

Instance Stores and Amazon Elastic Block Store (Amazon EBS)

  • Block Level Storage

    • to store files (bytes stored on disk)

    • When data change only change that section

    • Hard drive

    • File systems, databases use it

  • Instance STore Volumes

    • storage that can be provided by EC2 instance

    • Physical attached to host that EC2 is running on

    • If EC2 is terminated, the data on the ISV is deleted

      • As the EC2 might start up on another host

    • Useful for temp data, scratch files, data that can be easily recreated without consequence

    • Not good for persisting data outside of lifecycle of EC2

  • Amazon Elastic Block Store (EBS)

    • Create virtual Hard drives (volumes) that are attached to the EC2

    • Not tied to the host

    • persist data outside of lifecycle of EC2

    • Can configure the type you want (size, type) and attach it the EC2

    • Can take incremental back ups of data (snapshots)

      • configurable

      • only the blocks of data that have chagned are saved

    • Up to 16TB

    • SSD by default, but HDD options

Amazon Simple Storage Service (Amazon S3)

  • Data to be stored somewhere

  • Store and retrieve unlimited amount of data

  • Data stored as objects

    • each object contains data, metadata and key

      • metadate = info about data, how its used, object size

      • key = unique id of object

    • When object is updated, the whole object is modified

    • Stored in buckets

    • Max size of object =5Tb

    • Used for write once read many

    • Each object has a url

    • Can version objects (keep history of object, can rollback if deleted)

    • Create permissions (visibility, write access) for multiple buckets

    • Different tiers/classes

  • Storage classes

    • Standard

      • 11 9s of durability (remain intact for one year)

      • for frequent access

      • Multiple copies are stored in 3 availability zones

      • Content distributino

      • data analytics

      • Static web hosting

        • Load all static files (html etc) to S3 and check box to host it as site

    • Standard Infrequent Access (STandard-IA)

      • For data that is accessed less frequently, but requires rapid access when needed

      • backups, disasted recovery files, or long term storage

      • Audit data, stored for seveal years can be moved to other classes

      • Lower storage price

      • higher retrieval price

      • Multiple copies are stored in 3 availability zones

    • One Zone Infrequent Access

      • 1 copy are stored in 1 availability zones

      • Lower storage price than STandard IA

      • Saving costs on storage

      • Can easily reproduce data incase of failure of zone or loss of data

    • Intelligent Tiering

      • Data with unknown or changing access patterns

      • Monthly monitroing and automation fee per object

    • Glacier Instant Retrieval

      • For archived data that needs immediate access

      • Access time of milliseconds (same perfromance as standard)

    • Glacier Flexible Retrieval

      • Low cost storage

      • Takes 1 minutes to 12 hours to access data

      • Audit data, stored for several years

      • Use vaults

    • Glacier Deep Archive

      • Lowest cost

      • Retrieve within 12 to 48 hours

      • Long retention

      • aim for 1/2 times a years access

      • Multiple copies are stored in 3 availability zones

    • Outposts

      • Creates buckets on Outposts

      • Easier to retrieve

      • Puts it on your on premise site

  • Lifecycle polices

    • Setup rules to move data between tiers

    • ie after x days move to another class

    • Default polics

      • haven’t accessed an object for 30 consecutive days, Amazon S3 automatically moves it to the infrequent access tier, S3 Standard-IA. If you access an object in the infrequent access tier, Amazon S3 automatically moves it to the frequent access tier, S3 Standard.

  • EBS vs S3

Amazon Elastic File System (EFS)

  • A type of file storage

    • a storage server uses block storage with a local file system to organize files. Clients access data through file paths.

  • Ensures that

    • access the same data at the same time

    • Storage can handle the amount of data

    • scale with increase demand

    • that backups are taken

    • data is stored is redundantly

    • management of servers holding data

  • EFS handles all this

  • EFS

    • Multiple instances can access (read/write) the data in EFS at the same time

    • Linux file system

    • regional resource

    • automatically scales

    • In different availability zones

      • allows for concurrent access

  • difference with EBS

    • does not scale, once you attach it to EC2 thats it

    • EBS must be in same availability zone

  • Can be AWS cloud service or on prem

    • on prem access with AWS Direct Connect

Amazon Relational Database Service (RDS)

  • RDBS useful for data stored that has relationships with other stored data

  • Data is stored in tables

  • Tables are defined by schemas

  • relationships between data in tables is done via a key

  • Querying is done via standard language such as SQL

    • so is defining the schema of the tables (definition)

    • And commands (updates/delets/writes)

  • Supported DB

    • postgres

    • mysql

    • oracle

    • sql server

  • Security

    • at encryption at rest

    • encryption in tranist

  • Migrate database from on prem to aws

    • Lift and shift migration

    • Have more control over OS, memory, CPU, storage capacity, etc

  • Can use Amazon RDBS, a managed db service

    • supportst the major DB engines

    • Hardware provisionin

    • Automated patching

    • backups

    • redundancy

    • failover

    • disaster recovery

  • Amazon Aurora

    • enterprise class rdbs

    • a more managed db system

    • mysql or postgres flavours

      • 5 times faster than mysql and 3 time sfaster than postgres

    • Reduces unecessary IO

    • cheaper than other db engines

    • data is replicated across facilities (6 copies at any one time)

    • up to 15 read replicas

    • Continuous back up to S3

  • Can be slow, due to the overhead of the queries/commands over several tables

  • For business analytics, over many tables

Amazon DynamoDB

  • Serverless DB

  • Data stored in tables, as items with attributes

  • Handles the storage, automatic scaling, stored redundently accross multiple AZ,

  • millisecond respone time

    • Dont have to provision, patch or manage servers, or install, maintain or operate software

  • Does not use sql, does not need to define schema

  • Useful for data that is not rigid (ie cannot be defined by a scehma) and need high performance

  • Non relational db

  • Have simple flexible schemas

  • Can add/remove attributes to a table at any time

  • NOt every item must have the same attributes

  • Store data as key-value pairs

    • key = items

    • value = attirbutes

  • Queries are much simpler

    • focus on collection of items from one table

    • Not on queries from multiple tables

    • leads to quick response time and high scalability

  • It is purpose built and fits a specific usecase

  • Most data is used for lookup lists

    • this can be done via non relational DB rather than sqlDB

Amazon Redshift

  • Used for data analysing what happened

  • Using traditional RDBMS for querying data which is constantly updated

    • causes performance issues

    • used for high speed real time ingestion, rather than complex queries over ltos of data

    • VAriety of data that is spread out has issues with this analytics

  • USe of data wharehousing

    • engineered for big data and historical analystics instead of operational analysis

    • For questions about looking backwards,rather than looking at the current information for current processing (which is what RDBMS is built for)

  • Redshift

    • DW that is tuned, resiliant and highly scalable

    • Nodes can handle mutliple PBytes

AWS Database Migration Service (AWS DMS)

  • Help migrate DB onto AWS securly adn easily

  • Source DB remains fully operational during the migration

    • reduces the downtime

  • Dont have to migrate to the same type of DB

    • Same type migrations = homogenous

      • straigthforward

    • source and target are different = hetrogenous

      • Two step process

        • Need to convert schema structure/data types and db code using AWS Schema Conversion Tool to match the target db

        • Then use DMS to migrate the data

  • Can migrate from from on prem to EC2 or RDS

  • Other migrations include

    • dev/test db migrations

      • copy prod data to test env (one off or continuously)

    • db consolidations

      • have multipe db but move to one db

    • continous db replication

      • for disaster recovery or geographic separation

Additional Database SErvices

  • Amazon DocumentDB

    • Document db

    • supports MongoDB

  • Amazon Neptune

    • graph DB

    • works with highly connected datasets

    • ie recommendation engines, fraud detection, and knowledge graphs.

  • Amazon Quantum Ledger Database (Amazon QLDB)

    • review a complete history of all the changes that have been made to your application data.

    • Data never deletd

  • Amazon Managed Blockchain

    • create and manage blockchain networks with open-source frameworks.

    • Blockchain is a distributed ledger system that lets multiple parties run transactions and share data without a central authority.

  • Amazon ElastiCache

    • adds a caching layer on top of db

    • improve read times for common requests

    • two types: redis and memcached

  • Amazon DynamoDB Accelerator

    • in memory cahce for dynamo db

    • millis to micro

  • https://aws.amazon.com/products/storage

  • https://aws.amazon.com/blogs/storage/

  • https://aws.amazon.com/getting-started/hands-on/?awsf.getting-started-category=category%23storage&awsf.getting-started-content-type=content-type%23hands-on

  • https://aws.amazon.com/solutions/case-studies/?customer-references-cards.sort-by=item.additionalFields.publishedDate&customer-references-cards.sort-order=desc&awsf.customer-references-location=*all&awsf.customer-references-segment=*all&awsf.customer-references-product=product%23vpc%7Cproduct%23api-gateway%7Cproduct%23cloudfront%7Cproduct%23route53%7Cproduct%23directconnect%7Cproduct%23elb&awsf.customer-references-category=category%23storage

  • https://aws.amazon.com/dms/

  • https://aws.amazon.com/products/databases

  • https://aws.amazon.com/getting-started/deep-dive-databases/

  • https://aws.amazon.com/blogs/database/

  • https://aws.amazon.com/solutions/case-studies/?customer-references-cards.sort-by=item.additionalFields.publishedDate&customer-references-cards.sort-order=desc&awsf.customer-references-location=*all&awsf.customer-references-segment=*all&awsf.customer-references-product=product%23vpc%7Cproduct%23api-gateway%7Cproduct%23cloudfront%7Cproduct%23route53%7Cproduct%23directconnect%7Cproduct%23elb&awsf.customer-references-category=category%23databases

Last updated