With the evolution and rapid adoption of the cloud technologies many enterprises and developers are moving towards cloud based solution offerings. This is also true for many existing On-Premise Data Warehouses and moving them to cloud based Data Warehouses.
Now this paradigm shift brings new challenges as well. There are challenges related to tools’ choice and some are related to approach to achieve the objective. Considering technology advancements there are many tools available in the market offering similar Data Warehousing capabilities as those available On-Premises.
Depending on the size of the data, transformation requirements and information accessibility, architects and developers can fairly choose the right tool satisfying respective needs. Challenges with the tools are relatively easy to overcome but it is more difficult and important to decide on the approach to be considered for this Shift; and one of the trivial factor to consider is whether to stick to traditional, proven ETL (Extract, Transform, and Load) approach or to shift to ELT (Extract, Load, Transform)?
These challenges are also applicable to non-cloud solutions.
Before deep dive into ETL vs ELT here is brief on what is Extract, Transform and Load –
This is a process where data is collected from various homogeneous or heterogeneous data sources. This process integrates data available in multiple formats from various applications, databases, files etc. developed and maintained by various applications developers / vendors and available on different locations and hardware.
These are more in terms of rules and functions which needs to be applied to massage the data and make it ready for the targeted Database / Data Warehouse.
Once the data is transformed as per operational needs, it is loaded into the end target from where it is consumed by the end users either using operational systems or business intelligence tools.
ETL vs ELT
Having above understanding in place, it is fairly easy now to understand the difference in ETL and ELT.
During the ETL approach the data is extracted, then transformed and ultimately loaded into the target Database / Data Warehouse.
Where as in ELT approach, the data is extracted as is and loaded into the target Database / Data Warehouse and then necessary transformations takes place preparing the information for downstream uses.
Features to consider while choosing your approach – ETL or ELT
There are many factors which needs to be considered and well thought while choosing the approach to fit the transformation needs. On a very high level below are some primary factors –
1. Data Volume – What is the amount of the data which is required to be moved from On-Premise Data Ware house to Cloud based Data Warehouse?
2. Frequency of Data Load – Along with data volume, is it important to identify the frequency of data load from source to destination. Is it near real time, stale data or onetime load? Most of the transactional systems are putting new data or modifying the existing data leading towards the need for change data capture. This adds complexity in identifying the delta along with slowly moving dimensions
3. Sources and Destinations– Which are various data source to be connected to fetch he data and which are the targeted databases / data stores which will accommodate the data as new store. There are some databases which are extremely good at ELT approach where as some are relative better during ETL. Some systems have native support to the source data where as some needs additional connectors adding cost to the solution.
4. Availability of Tools – What are the various tools available / to be considered to transform the data from On-Premise Data Warehouse to Cloud based Data Warehouse
5. Transformation responsibility – There is a need to give significant consideration for when the transformations need to be accomplished and who is responsible to perform these transformations. Whether these needs to be taken care by the tool used during the ETL process or the transformations needs to be performed by the targeted Database / Data Warehouse once the data is loaded there.
6. Flexibility to adopt new business rules – Traditional ETL processes do not consider the business processes. They operate more on data movement from source to destination. As businesses grow, their operating processes undergo transformations and that leads to change in the ETL as well. Currently, ETL architectures are not that flexible to accommodate such changes.
7. Scalability – Considering the dramatic growth in the data volume, it is increasingly difficult for the IT systems to scale up quickly to accommodate data storage and processing needs. While selecting the approach, this is important to consider the future data growth and available technology landscape
8. Data Security – This is one of the very important aspect to consider during the approach selection. Some of the data you literally do not want to get exposed to outside world and will have reservations on where and how it should be stored
9. License fees – With the modern RDBMS systems available in the market, which are capable of taking transformation responsibility, a need to have a separate ETL tool is no longer needed.
10. Available knowledge base and training needs – With the advancement in the technology there is a great need to have through understanding and implementation expertise of the modern RDBMS capable of taking care of ETL / ELT needs. Availability of such talent also participate in the decision making.
With the advancements in the modern tools and technologies, availability of data and necessity to make the data available for end users at near real time, many architects are experimenting hybrid approaches like TELT, ETLT or even TETLT. These hybrid approaches help bridging gaps in ETL or ELT but they also add additional burden on the RDBMS with added complexity.