Enterprise Data Management: ETL

Almost a year back based on an ASUG event I have written in my blog Data Management in Modern Enterprise focusing on harnessing the power of in-memory platforms in data migration issues. I have tested this in my lab to prove to myself the viability of this usecase. Test data is always the test data.

Soon after this I got onto an engagement on one of the biggest retailers in Canada on SAP HANA. As an EA / Program Manager my focus has been in supporting the HANA appliance (very large scale-out solution support 18TB of data in memory, considered as one of the largest scale-out solutions in the world). In this process I had been part of architectural review sessions focusing on performance aspects of loading historical data into SAP HANA.

Without going into too many details, to summarize the findings from performance perspective

      a) Data loading via SLT is good (after doing some network fine tuning)

      b) Historical data loading via SAP BO (covering lot of transformations ) was very slow. This
         data is being migrated from Teradata onto SAP HANA. It was so slow that to migrate full data
         it takes months

Upon further analysis it was found that most of part of the data load time was being spent in SAP BO. At this point of time One of my recommendations was to do ELT ( figure 2, in the previous blog ) instead of ETL. So as initial attempt we moved all the integration driven transformations (cross lookups, parent child relationships) over to SAP HANA as internal procedures.

Since then historical data loads were never been a bottle neck and what was supposed to take months before, now takes weeks ( with right # of sessions from Teradata). So.. to conclude ELT is very much a practical and needful approach in processing big data to harness the power of the modern day in-memory platforms like SAP HANA.

In the recent ASUG event here at NJ, I had the opportunity to talk to some of the industry leading customers who are actively working on defining the roadmap for Enterprise Data warehouse strategic initiative. I also happened to learn about their frustrations and discomfort to realize the roadmap. One of the pain points that is concerning the users is the Data Migration along with all the associated business rules.

Modern day enterprise is characterized by heterogeneous landscape often face tough challenges to analyze business operations over voluminous data. SAP HANA not only supports these modern enterprise needs but also enables various analytics in real-time with minimal disruption to existing elements of landscape like applications, infrastructure and databases.

One of the possible ways for the modern enterprise to arrive at this future state (real-time in-memory analytics across the enterprise) is to make the in-memory platform like SAP HANA as Enterprise data lake (Yes, a data lake to support both structured and unstructured data streams). Disparate data sources that fall in the scope of the enterprise and the extended aspects of enterprise, structured and unstructured flow into the lake to make it's content readily available for consumption in real time and to throttle decision making process.

From here onwards I would like to highlight the migration aspects with SAP HANA in view.

SAP HANA is provisioned with two main channels (SLT and Data Services) to funnel data in for now. In future there could be multiple data channels like Hadoop, third party apps, social streams..etc that could be funneling into this new enterprise in-memory lake. So what are the characteristics of these channels and how well we could use the advancements in in-memory approach for data migration ?

SLT

SAP Landscape Transformation (SLT) is aimed at replicating data in real time from ECC to HANA platform. Since SLT doesn't support any transformations during the movement we are left with two options either doing the transformation at the source or at the sink. We can also safely exclude source as we do not want to overload those systems hindering the operations running on them. So it's only sink where Transformations can go wild during SLT.

Data Services

On the other hand using data services we could Transform and then load the data. But the obvious implications of this are latency, dependencies and constraints because of interdependence between the channel sources. Also data services support batch and near real-time data migrations.

Often enterprises need a mix of batch and real-time migration capabilities to support the growing demand for high volume real-time analytics. In future many other channels like Hadoop, social streams etc will made available to route data to SAP HANA directly.

Following figure highlights SLT and ETL based data migrations in modern enterprise and the related constraints

Figure 1

HOW SAP HANA could help us

SAP HANA Appliance comes with dedicated resources to support HANA needs. Can we use these resources by pushing the transformations close to the data ?.

Having lead operational reports migration effort over to SAP HANA platform and the ETL data migration efforts on BO Data services and Informatica I don't see any limitations in SAP HANA that prevents us from migrating transformations close to data. SAP HANA has all the elements to facilitate transformation process as in other relational platforms. Apart from these,SQLScript is a very powerful tool that could be used effectively to arrive at the required modularity to facilitate reusability.

SAP HANA could easily carry out all the required transformations close to the data. However to best utilize the abilities of the channels and the platform that hosts the lake, Transformations could be distributed between the channel and in-memory lake. Channels could more focus on technical aspects pertaining to their streams like Data Cleansing, Data Standardization while business transformations like Derivations, Aggregations, Integration, Summarizations which could be cross channel dependent could be pushed to the lake where they can be handled effectively.

Following figure highlights the Data migration in the modern enterprise using ELT / ETLT based approach

Figure 2

Summary

ETL vs ELT discussion has been happening for some time. This blog is aimed at re-looking at this discussion in the context of latest advances in in-memory platforms in particular SAP HANA and its powerful features from Data migration perspective. This doesn't undermine the capabilities of the Data services at all. Rather Data Services could be best utilized where they are good at like Data Cleansing, Data Standardization and other technical aspects of migration. This aspect of Data services allows for data sanity which is very crucial to enable trusted decisions as data flows into the lake.

Some of the benefits having business transformations close to the data are

      a) Ability to drill down to the line item level and thus to the source when needed

      b) Zero network latency as Transformations happen where the Data is residing

      c) Business rules associated with transformations can be modularized using SQLSCript and other
          DB features leading to ease of maintenance and emplacements

      d) Easy to absorb source system modifications

Enterprise Data Management

Thursday, May 8, 2014

Data migration to SAP HANA

Thursday, July 5, 2012

Data Migration in Modern Enterprise

About Me