How to incorporate SAP archived data into data lakes

Jan Meszaros
| 4 min read

Data stored in traditional SAP archive solutions do not contribute to better business decisions

SAP systems have been around for decades, unlike most on-premise (Hadoop) or cloud-based (Google, Azure, AWS) data lakes. That's why often large chunks of SAP historical data is archived. This poses a challenge - historical SAP archiving solutions store data in file-based storages in a compressed format and it is difficult to integrate this data into corporate data lakes, let alone run real-time analytics, machine learning algorithms, or create business value from it.

What is the business value of a data lake without SAP data, and what is the value of SAP data without historical SAP archives? Often business data in SAP S/4HANA gets archived as quickly as after 2 years due to rising costs of SAP HANA. This makes providing the historical SAP archive for further self-service business intelligence key.

OutBoard ERP Archiving can migrate or archive SAP aged data into corporate data lakes. Already, more than 40 of the Fortune 500 companies rely on this solution to bridge SAP with big data lakes, which supports having historical and recent SAP data stored within a single corporate data lake. 


archived data accessible in powerbiArchived data accessible in data lake for further consumption via PowerBi, Tableau, etc.


Archived data accessible via SAP transactions via ArchiveLink. Faster access depending on data lake technology compared to traditional archive solutions.


The power of the data lake and why SAP data is key

Data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. This guides better business decisions as you can store data as-is, without having to first structure the data. You can run different types of analytics—from dashboards and visualizations, to big data processing, real-time analytics, and machine learning.

Based on Microsoft Azure, data lakes are also a cost-effective solution to running big data workloads. You can choose between on-demand clusters or a pay-per-job model when data is processed. Data lakes scale up or down based on business needs, and independently scale storage and compute, enabling more economic flexibility. Google says, that the data lake is not just storage, and it is not the same as a data warehouse. Data lakes provide a scalable and secure platform that allow enterprises to ingest any data from any system at any speed.

The vast majority of SAP customers on S/4HANA or planning the migration to S/4HANA need to significantly reduce their HANA footprint and closed business documents are archived as quickly as after two years. Considering this, it is difficult to imagine data lakes and big data analytics without data from historical SAP archives (3-10 year old business data).

Typical architecture with SAP historical data integrated into the data lake

Here comes the solution: data lakes enabled with a full set of SAP data, recent hot data but also with historical SAP data. Structured data from SAP is combined with structured and un-structured data coming from other data sources (IoT, Social media, non-SAP enterprise software, 3rd party or custom applications) and is enabled for big data processing and self-service business intelligence, to create additional business value and provide information for the right business decisions.

eneterprise landscape

SAP connected with data lake (SAP HANA and SAP historical archive)

More and more companies are looking to enable all enterprise data in any data lake technology. OutBoard ERP Archiving is a holistic archiving solution that moves data between the SAP database and external storage, regardless of the storage vendor (e.g. cloud-based or on-premises data lakes) according to its usage or age of data.

OutBoard ERP Archiving is the only available solution that makes archived data available for further data analytics in the cloud data lake, because historical data can be provided in transparent format in several data lake formats, such as Hadoop HIVE, Impala, AWS Redshift, Azure Data Lake Service, Azure Databricks, Google Big Query, Snowflake, etc. Active data remains in the database during daily operations, cold or old data is archived. Archive data can still be used for reports. In the data lake all SAP data, including historical data is enabled and extended with non-SAP data (e.g. customer attributes) and helps to guide better business decisions.