Making SAP HANA Cloud Native: Increase agility, performance and uptime + reduce cost

By 
Frank Stienhans
Ocean9 Solution

Taking High Performance to SAP Mission Critical Systems

By 
Frank Stienhans
, posted on
, posted 
August 2, 2016

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

asdsadsadn sldjflsdjf

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.

Before the cloud: Increasing agility, performance and uptime meant to increase cost.

Making a solution cloud-native is rewarding. It allows to improve all mayor dimensions of a solution architecture at the same time.

The process is not always simple and requires a very precise understanding of Cloud Behavior and Software Solution.

In this blog we will describe how a redesigned approach to SAP HANA Backups can have significant consequences for Non-Production and Production deployments.

Backups have a different role in the cloud

SAP backups were always a critical element of any High Availability Strategy. The cloud adds a different spin to the topic:

  • HANA Backups are stored in Amazon S3 at $ 0.0125 / GB-month with 99.999999999 % durability
  • S3 is not some Tape like storage. In fact you can easily get more than 400 MB / second read and write throughput between S3 and your HANA Host.

Ocean9 adds one click simplicity, leading to a user experience where large SAP HANA systems can be created and restored from S3 in 2 hours or less.  

So with nothing but a SAP HANA Backup you can re-create your DEV or Production system including data in 2 hours.

This fact lead already many customers to run their SAP HANA Production Systems with a Backup & Recovery Strategy instead of a Multi-Site SAP HANA Cluster with Database replication.

On Premise or with Hosting you would need to have a Redundant Deployment for Production. With AWS and Ocean9 you often do not need that.

That said, if you need it then AWS and Ocean9 enable One-Click Triple-Site deployment with SAP HANA System Replication. Such architectures become usually viable in a large scale cloud provider only.

Also for Non-Production the ability to deploy a massive HANA System including Data in 2 hours leads often to very different behavior of administrators and developers.

Want to test out a new HANA Revision with your production data set? That is a One Click experience with 120 minutes lead time.

That is until now.

Re-architected for the cloud

The above described the status quo of Ocean9 with AWS.

Lets look into the common practice of SAP HANA Backups and an advanced approach (highly simplified).

Common and Advanced Approach to Backups

It is common today to route all Backup traffic through the Master Host of a SAP HANA System. This is fine for a Single Host HANA System. The moment SAP HANA Scale-out is important it starts to have a severe performance impact not just on the Backup and Recovery process, but also on the primary function of HANA to serve applications with real time data.

It is therefore one of the fundamental principles of the cloud that each unit of work (e.g. a HANA Host) should have no side effect on other units. Each unit should have a precisely defined degree of freedom to fulfill its purpose without others interfering with it.

That is relatively simple to understand, but sometimes hard to achieve. That is why the common practice is the common practice for SAP HANA.

Now lets have a look on how this impacts Backup and Recovery times for systems that are fully loaded with Data.

In this section we use R3.8xlarge with 244 GB RAM for each HANA Host. Each Host holds 2 billion rows of a simple table.

Recovery Time

With the Common practice the time to recover will increase with every GB of data. The Advanced Practice changes that. It no longer matters how much data is in the HANA System but only how much data is in a particular Host. A 244 GB RAM system will restore in nearly the same time like a 4 TB RAM Scale out system (17 x 244 GB).

Backup Time

Backups look similar. If you think about the SAP HANA System impact during backup, you might actually care more about the backup time than the restore time, because this function will be executed every day on your production system. The advanced backup approach leads not just to a significantly shortened time of system impact, but also the severity during that shortened window is reduced by 66% less consumed I/O bandwidth.

Now, why does even a single host HANA System benefit from this approach? This is because, the SAP HANA Master does not need to be prepared for the case of needing to serve as a backup proxy for future workers. It allows to embrace a different approach also for the Master Node. In addition we invest a lot of time into getting the maximum out of the AWS Platform, which yields some great results.

As a bonus it allows us to lower AWS Infrastructure cost by 5-10%.

What about X1 with 2 TB RAM?

I ran SAP HANA on X1 through the Advanced Approach only.

Here the observations.

X1 Data Insert Phase

We could insert 16 billion rows into a single HANA X1 Host in 2 hours ( 2.2 million rows per second ) by using 25 concurrent data streams.

Different than I assumed the insert phase was not I/O, but CPU bound (We pushed HANA only to 70% CPU utilization of X1's 128 vCPUs).

During (explicitly triggered) Delta Merge the Index Server fully utilized 125 vCPUs.

X1 based HANA System after data load

X1 Backup Phase

During Backup, we observed 800 MB READ + 800 MB WRITE at the same time = 1.6 GB / second I/O throughput for the HANA Index Server.

Under production conditions system impact is important. We therefore only stream backup data to Amazon S3 with 200 MB / second (= 1.6 Gbps).

The total backup time was 41 minutes, of which 10 minutes impacted HANA I/O (under production conditions overlapping with Application driven HANA I/O).

X1 Recovery Phase

A new empty HANA System takes 16 minutes to be provisioned.

End to End Recovery time for an existing HANA System and the above dataset is 60 minutes of which 44 minutes went into the S3 Download.

During recovery the very first action is to stop the existing SAP HANA System. As such this time the path from S3 to HANA is mission critical.

The standard S3 Download speed is 150 MB / second which is good but no longer good enough for HANA on X1 data sets and Amazon S3 has more throughput potential.

So we wrote a new S3 Download Path (following AWS S3 High Performance Best Practices). When we applied this to the R3.8xlarge instances, S3 Downloads went up to 550 MB / second measured for R3.8xlarge.

However when we used it for X1 it delivers no more than 200 MB / second. Knowing that the Storage Write Performance is (observed) way better than for an R3.8xlarge HANA Instance, this can only mean that the Network maxes out.

I interpret these measurements in a way that the new Elastic Network Adapter (ENA) introduced with X1 is not only nice to have, but mandatory for SAP HANA production workloads. The ENA support needs to be active in the Amazon Machine Image, which it is not yet the case in the SuSE Linux 12 SP1 Base.

Usually the OS Vendors take care of incorporating those essential driver features into a new batch of OS AMIs ...

I will retest X1 Recovery once we have a new batch of OS AMIs.

Advanced Backup Strategy - Now available in Ocean9

Just navigate to your HANA Environment and select the new Advanced Backup Strategy option.

One Click to activate

Summary

While we used to promise an RTO time for Backup Based Recovery of 2 hours including System Provisioning, it appears that with this release Ocean9 can deliver << 1 hour RTO for R3 based systems.

This might lead even more customers to switch from a System Replication based approach to a Backup & Recovery based architecture using Ocean9 and AWS.

Or to say it differently: Reduce Infrastructure cost by more than 50%. 

If you need more uptime, you are just 1 click away from a 2 or 3 AZ HANA Deployment with up to 99.99% uptime.

Value of the Advanced Backup Strategy

  • Recovery time: 2 - 3 x faster (predicted 13 x faster for a 17 Host configuration)
  • Backup time: 2 - 5 x faster (predicted 19 x faster for a 17 Host configuration)
  • HANA System Impact for Backups: Single Host: reduced by 66%, 3 Hosts: reduced by 80%, 17 Hosts reduced by 97%.
  • Backup & Recovery time is independent from the Number of HANA Hosts.
  • AWS Cost Reduction by 5-10%

As always, I hope you liked our update.

Want to continue reading on SAP HANA on AWS best practices?

Questions? Contact us: questions@ocean9.io

Back to top