Modern day enterprise is characterized by heterogeneous landscape often face tough challenges to analyze business operations over voluminous data. SAP HANA not only supports these modern enterprise needs but also enables various analytics in real-time with minimal disruption to existing elements of landscape like applications, infrastructure and databases.
One of the possible ways for the modern enterprise to arrive at this future state (real-time in-memory analytics across the enterprise) is to make the in-memory platform like SAP HANA as Enterprise data lake (Yes, a data lake to support both structured and unstructured data streams). Disparate data sources that fall in the scope of the enterprise and the extended aspects of enterprise, structured and unstructured flow into the lake to make it's content readily available for consumption in real time and to throttle decision making process.
From here onwards I would like to highlight the migration aspects with SAP HANA in view.
SAP HANA is provisioned with two main channels (SLT and Data Services) to funnel data in for now. In future there could be multiple data channels like Hadoop, third party apps, social streams..etc that could be funneling into this new enterprise in-memory lake. So what are the characteristics of these channels and how well we could use the advancements in in-memory approach for data migration ?
SLT
SAP Landscape Transformation (SLT) is aimed at replicating data in real time from ECC to HANA platform. Since SLT doesn't support any transformations during the movement we are left with two options either doing the transformation at the source or at the sink. We can also safely exclude source as we do not want to overload those systems hindering the operations running on them. So it's only sink where Transformations can go wild during SLT.
Data Services
On the other hand using data services we could Transform and then load the data. But the obvious implications of this are latency, dependencies and constraints because of interdependence between the channel sources. Also data services support batch and near real-time data migrations.
Often enterprises need a mix of batch and real-time migration capabilities to support the growing demand for high volume real-time analytics. In future many other channels like Hadoop, social streams etc will made available to route data to SAP HANA directly.
Following figure highlights SLT and ETL based data migrations in modern enterprise and the related constraints
Figure 1
HOW SAP HANA could help us
SAP HANA Appliance comes with dedicated resources to support HANA needs. Can we use these resources by pushing the transformations close to the data ?.
Having lead operational reports migration effort over to SAP HANA platform and the ETL data migration efforts on BO Data services and Informatica I don't see any limitations in SAP HANA that prevents us from migrating transformations close to data. SAP HANA has all the elements to facilitate transformation process as in other relational platforms. Apart from these,SQLScript is a very powerful tool that could be used effectively to arrive at the required modularity to facilitate reusability.
SAP HANA could easily carry out all the required transformations close to the data. However to best utilize the abilities of the channels and the platform that hosts the lake, Transformations could be distributed between the channel and in-memory lake. Channels could more focus on technical aspects pertaining to their streams like Data Cleansing, Data Standardization while business transformations like Derivations, Aggregations, Integration, Summarizations which could be cross channel dependent could be pushed to the lake where they can be handled effectively.
Following figure highlights the Data migration in the modern enterprise using ELT / ETLT based approach
Figure 2
ETL vs ELT discussion has been happening for some time. This blog is aimed at re-looking at this discussion in the context of latest advances in in-memory platforms in particular SAP HANA and its powerful features from Data migration perspective. This doesn't undermine the capabilities of the Data services at all. Rather Data Services could be best utilized where they are good at like Data Cleansing, Data Standardization and other technical aspects of migration. This aspect of Data services allows for data sanity which is very crucial to enable trusted decisions as data flows into the lake.
Some of the benefits having business transformations close to the data are
a) Ability to drill down to the line item level and thus to the source when needed
b) Zero network latency as Transformations happen where the Data is residing
c) Business rules associated with transformations can be modularized using SQLSCript and other
DB features leading to ease of maintenance and emplacements
d) Easy to absorb source system modifications