Thursday, May 8, 2014

Data migration to SAP HANA

Almost a year back based on an ASUG event I have written in my blog Data Management in Modern Enterprise focusing on harnessing the power of in-memory platforms in data migration issues. I have tested this in my lab to prove to myself the viability of this usecase. Test data is always the test data.

Soon after this I got onto an engagement on one of the biggest retailers in Canada on SAP HANA. As an EA / Program Manager my focus has been in supporting the HANA appliance (very large scale-out solution support 18TB of data in memory, considered as one of the largest scale-out solutions in the world). In this process I had been part of architectural review sessions focusing on performance aspects of loading historical data into SAP HANA.

Without going into too many details, to summarize the findings from performance perspective

      a) Data loading via SLT is good (after doing some network fine tuning)


      b) Historical data loading via SAP BO (covering lot of transformations ) was very slow. This
         data is being migrated from Teradata onto SAP HANA. It was so slow that to migrate full data
         it takes months

Upon further analysis it was found that most of part of the data load time was being spent in SAP BO. At this point of time One of my recommendations was to do ELT ( figure 2, in the previous blog )  instead of ETL. So as initial attempt we moved all the integration driven transformations (cross lookups,  parent child relationships) over to SAP HANA as internal procedures.

Since then historical data loads were never been a bottle neck and what was supposed to take months before, now takes weeks ( with right # of sessions from Teradata). So.. to conclude ELT is very much a practical and needful approach in processing big data to harness the power of the modern day in-memory platforms like SAP HANA.









Tuesday, February 5, 2013

Standardize Locally and Integrate Globally - A Strategic approach for Data Consolidation

I just completed a data consolidation engagement with one of my clients. This engagement was focusing on customer data like  Accounts, Activity, History..etc from diverse set of customer source systems. In the process as we went live successfully with this client there are a few things that we did differently and that really made difference to the client. The uniqueness about this engagement is that I am able to quickly engage the end users and reduce the time of participation from business domain experts on this exercise so that they can better focus on their high priority tasks.

A typical consolidation approach would be primarily to focus on the systems in the current scope and start extracting the data using ETL tools. Data thus extracted will be cleansed, standardized and then integrated ( to avoid redundancy) before flushing the content into target system. Many times this approach results in lot of iterations between staging and source systems. Each iteration would take considerable amount of time from the involved parties to address various issues (ex: which record has to take precedence, what should be the standard format in the consolidated world, which fields takes precedence..etc) Meanwhile end user has to wait his fingers crossed to see any outcome of this effort.


Contrary to this approach,  Standardizing and cleansing the data locally with the help of domain experts helped for early adoption and less number of iterations. A decently cleansed and standardized local data will farewell as the data starts going through alignment with the rest of the data in the enterprise. Standardized data "need not be" in the final standard format as consolidated data would have. For example Lease Type attribute could have a value of "Annual" , "Monthly" ..etc with in your local system. While on the consolidated system it could be a code.(1, 2, etc). The key here "Standard to Standard translation is much efficient and less time consuming".  This means iterations are happening internally before going to the final integration process ultimately leading to early participation from the end user.


This approach seems to be lean and strategic, enables early involvement of the end users thus leading to a satisfied client. Standardizing locally doesn't mean to revamp the existing apps and redesign the information content. Standardize the data to the extent permissible however possible in source system directly or in the extracts of the source systems before merging / integrating with staging data with the other similar systems.












Thursday, July 5, 2012

Data Migration in Modern Enterprise

In the recent ASUG event here at NJ, I had the opportunity to talk to some of the industry leading customers who are actively working on defining the roadmap for Enterprise Data warehouse strategic initiative. I also happened to learn about their frustrations and discomfort to realize the roadmap.  One of the pain points that is concerning the users is the Data Migration along with all the associated business rules.

Modern day enterprise is characterized by heterogeneous landscape often face tough challenges to analyze business operations over voluminous data. SAP HANA not only supports these modern enterprise needs but also enables various analytics in real-time with minimal disruption to existing elements of landscape like applications, infrastructure and databases.

One of the possible ways for the modern enterprise to arrive at this future state (real-time in-memory analytics across the enterprise) is to make the in-memory platform like SAP HANA as Enterprise data lake (Yes, a data lake to support both structured and unstructured data streams). Disparate data sources that fall in the scope of the enterprise and the extended aspects of enterprise, structured and unstructured flow into the lake to make it's content readily available for consumption in real time and to throttle decision making process.

From here onwards I would like to highlight the migration aspects with SAP HANA in view.


SAP HANA is provisioned with two main channels (SLT and Data Services) to funnel data in for now. In future there could be multiple data channels like Hadoop, third party apps, social streams..etc that could be funneling into this new enterprise in-memory lake. So what are the characteristics of these channels and how well we could use the advancements in in-memory approach for data migration ?

SLT

SAP Landscape Transformation (SLT) is aimed at replicating data in real time from ECC to HANA platform. Since SLT doesn't support any transformations during the movement we are left with two options either doing the transformation at the source or at the sink. We can also safely exclude source as we do not want to overload those systems hindering the operations running on them. So it's only sink where Transformations can go wild during SLT.

Data Services


On the other hand using data services we could Transform and then load the data. But the obvious implications of this are latency, dependencies and constraints because of interdependence between the channel sources. Also data services support batch and near real-time data migrations.

Often enterprises need a mix of batch and real-time migration capabilities to support the growing demand for high volume real-time analytics. In future many other channels like Hadoop, social streams etc will made available to route data to SAP HANA directly.

Following figure highlights SLT and ETL based data migrations in modern enterprise and the related constraints


Figure 1

HOW SAP HANA could help us

SAP HANA Appliance comes with dedicated resources to support HANA needs. Can we use these resources by pushing the transformations close to the data ?.

Having lead operational reports migration effort over to SAP HANA platform and the ETL data migration efforts on BO Data services and Informatica I don't see any limitations in SAP HANA that prevents us from migrating transformations close to data. SAP HANA has all the elements to facilitate transformation process as in other relational platforms. Apart from these,SQLScript is a very powerful tool that could be used effectively to arrive at the required modularity to facilitate reusability.

SAP HANA could easily carry out all the required transformations close to the data. However to best utilize the abilities of the channels and the platform that hosts the lake, Transformations could be distributed between the channel and in-memory lake. Channels could more focus on technical aspects pertaining to their streams like Data Cleansing, Data Standardization while business transformations like Derivations, Aggregations, Integration, Summarizations which could be cross channel dependent could be pushed to the lake where they can be handled effectively.

Following figure highlights the Data migration in the modern enterprise using ELT / ETLT based approach


Figure 2



Summary

ETL vs ELT discussion has been happening for some time. This blog is aimed at re-looking at this discussion in the context of latest advances in in-memory platforms in particular SAP HANA and its powerful features from Data migration perspective. This doesn't undermine the capabilities of the Data services at all. Rather Data Services could be best utilized where they are good at like Data Cleansing, Data Standardization and other technical aspects of migration. This aspect of Data services allows for data sanity which is very crucial to enable trusted decisions as data flows into the lake.

Some of the benefits having business transformations close to the data are

      a) Ability to drill down to the line item level and thus to the source when needed

      b) Zero network latency as Transformations happen where the Data is residing

      c) Business rules associated with transformations can be modularized using SQLSCript and other
          DB features leading to ease of maintenance and emplacements

      d) Easy to absorb source system modifications