Lessons from cloud data projects

Lessons from cloud data projects.

In brief:

  • Cloud services are enabling organisations to become more agile and productive.

  • In starting their cloud journey, organisations tend to encounter similar issues managing cost, implementing data privacy controls and enabling cloud users to be productive.

  • We provide 5 recommendations to implement cloud data platforms effectively and efficiently, leveraging lessons learned at various organisations.


Our team have extensive experience in cloud data platform implementations at various organisations. One of the immediate benefits they have observed is reducing the time to procure and install new IT infrastructure - usually from months to minutes – enabling greater agility and productivity. They also broadly agree that most organisations have learned lessons in three areas that we believe would benefit others who are embarking on their cloud journey.

We will outline common issues in managing cost, complying with data privacy policy, and enabling cloud users to be productive. We propose 5 strategic actions to enable organisations to address them. In summary:

R1 Enabling consumption of cloud services through “templates” designed for specific purposes and with predefined costs.
R2 Implementing cost monitoring and cross-charging mechanisms to incentivise responsible consumption of cloud services.
R3 Designing data-centric security policies and enforcing them through “data zones” in the cloud platform.
R4 Ingesting data on core business processes into the cloud platform and providing a “data catalogue” to enable projects to discover and reuse them.
R5 Considering whether to establish a dedicated data integration team to accelerate the ingestion of new data sets into the cloud platform.

Managing cloud costs.

A project that one of our team members reviewed involved building complex algorithms to personalise daily offers for retail customers.

“They stored a huge data set on low-cost cloud storage, but every day had to copy the data into faster storage to feed the algorithms, which took too long. The team decided to leave 12 months of data in the fast storage and append new data from the previous day. The result: a huge jump in cost and a panic to find a better solution.”

This is a common scenario, where entry-level cloud services are cost-effective but limited. Typically, advanced compute and storage options are priced at a premium, while being harder to predict. This uncertainty arises from the wide range of service configurations that are possible, where a small change can significantly increase fees.

Some organisations tackle this issue by limiting options. In the above scenario, this was implemented through “templates” that bundled services for specific purposes - such as data cleaning, insights generation or rapid prototyping – with predictable costs. When a team member required cloud services for one of these purposes, the cloud administration team activated them using the relevant template, subject to approval of the cost.

Another best practice is to incentivise responsible use through reporting. Several organisations have put in place cost-monitoring and cross-charging mechanisms. These allow cloud administrators to apportion costs to each team and provide transparency on usage.

Complying with data privacy policy.

On another personalisation project, the team could measure changes in customer spend but also wanted qualitative feedback regarding the perceived quality and relevance of content. Several people working on the project volunteered as test subjects, giving permission to use their loyalty and transaction data, while believing that their information would be protected.

“Imagine my surprise when I found my personal data had been replicated across cloud storage areas to support the development of new algorithms! Our data scientists knew a lot about me at that point.”

Although the cloud services used by the project complied with the organisation’s information security policy, this did not adequately control the use of data. Team members could copy data across storage areas inside the cloud platform. They also combined data from multiple sources, making it difficult to understand provenance and control usage.

To deal with this issue, some organisations are adding data-centric security controls to their information security policies, implemented through cloud “data zones”. A data zone provides granular control over data stores, even at the level of individual data fields, automatically enforcing rules on access and replication. For example, storing Australian personally identifiable information (PII) in a data zone that only allows employees of the organisation’s Australian entity to see PII fields and triggers an approval workflow when the team would like to use personal information for personalisation applications.

Improving productivity.

In reviewing cloud platform implementations, we have observed some organisations excluding the ingestion of data from their scope. As a result, they have delivered “empty shells” of technology for future initiatives to work with. We believe that integration with source data systems would improve the adoption rate of cloud platforms once implemented.

The most common argument in support of separating the build of a cloud platform from the ingestion of data is that the data required by each project is different. Our rebuttal is that, although each initiative may require data in specific formats, this data is likely to originate from the same sources. Typically, these are enterprise systems that support core business processes, such as the Customer Relationship Management (CRM) platform.

We recommend that cloud data platform projects build interfaces to these systems as well as setting up cloud infrastructure. This would save future initiatives the effort and time of profiling data sources, obtaining approvals to connect to them and re-implementing interfaces that could be built once and used across projects. As part of this, we suggest maintaining a “data catalogue” to provide common definitions for, and visibility of, data already ingested into the cloud.

In addition, organisations undergoing constant change may wish to establish a dedicated data integration team to support ingestion of new data into the cloud. In our experience, identifying and profiling data sources can represent 40 to 60 percent of the expense in a typical data project. A data integration team that existed outside of individual projects could be tasked with maintaining a knowledge base of data sources and integrations, significantly reducing this expense.

In summary.

We believe that organisations undertaking cloud data projects should consider automating cost monitoring, data privacy controls and integrations with key systems. With these capabilities in place, they will be able to make effective and efficient use of cloud services – typically taking months out of the development time for digital assets.

At Cognis, we are passionate about protecting the future of community-oriented organisations by enabling them to effectively engage with stakeholders in the digital economy.

What we do
Lessons from cloud data projects

Lessons from cloud data projects.


In brief:

  • Cloud services are enabling organisations to become more agile and productive.

  • In starting their cloud journey, organisations tend to encounter similar issues managing cost, implementing data privacy controls and enabling cloud users to be productive.

  • We provide 5 recommendations to implement cloud data platforms effectively and efficiently, leveraging lessons learned at various organisations.



Our team have extensive experience in cloud data platform implementations at various organisations. One of the immediate benefits they have observed is reducing the time to procure and install new IT infrastructure - usually from months to minutes – enabling greater agility and productivity. They also broadly agree that most organisations have learned lessons in three areas that we believe would benefit others who are embarking on their cloud journey.

We will outline common issues in managing cost, complying with data privacy policy, and enabling cloud users to be productive. We propose 5 strategic actions to enable organisations to address them. In summary:

R1 Enabling consumption of cloud services through “templates” designed for specific purposes and with predefined costs.
R2 Implementing cost monitoring and cross-charging mechanisms to incentivise responsible consumption of cloud services.
R3 Designing data-centric security policies and enforcing them through “data zones” in the cloud platform.
R4 Ingesting data on core business processes into the cloud platform and providing a “data catalogue” to enable projects to discover and reuse them.
R5 Considering whether to establish a dedicated data integration team to accelerate the ingestion of new data sets into the cloud platform.

Managing cloud costs.

A project that one of our team members reviewed involved building complex algorithms to personalise daily offers for retail customers.

“They stored a huge data set on low-cost cloud storage, but every day had to copy the data into faster storage to feed the algorithms, which took too long. The team decided to leave 12 months of data in the fast storage and append new data from the previous day. The result: a huge jump in cost and a panic to find a better solution.”

This is a common scenario, where entry-level cloud services are cost-effective but limited. Typically, advanced compute and storage options are priced at a premium, while being harder to predict. This uncertainty arises from the wide range of service configurations that are possible, where a small change can significantly increase fees.

Some organisations tackle this issue by limiting options. In the above scenario, this was implemented through “templates” that bundled services for specific purposes - such as data cleaning, insights generation or rapid prototyping – with predictable costs. When a team member required cloud services for one of these purposes, the cloud administration team activated them using the relevant template, subject to approval of the cost.

Another best practice is to incentivise responsible use through reporting. Several organisations have put in place cost-monitoring and cross-charging mechanisms. These allow cloud administrators to apportion costs to each team and provide transparency on usage.

Complying with data privacy policy.

On another personalisation project, the team could measure changes in customer spend but also wanted qualitative feedback regarding the perceived quality and relevance of content. Several people working on the project volunteered as test subjects, giving permission to use their loyalty and transaction data, while believing that their information would be protected.

“Imagine my surprise when I found my personal data had been replicated across cloud storage areas to support the development of new algorithms! Our data scientists knew a lot about me at that point.”

Although the cloud services used by the project complied with the organisation’s information security policy, this did not adequately control the use of data. Team members could copy data across storage areas inside the cloud platform. They also combined data from multiple sources, making it difficult to understand provenance and control usage.

To deal with this issue, some organisations are adding data-centric security controls to their information security policies, implemented through cloud “data zones”. A data zone provides granular control over data stores, even at the level of individual data fields, automatically enforcing rules on access and replication. For example, storing Australian personally identifiable information (PII) in a data zone that only allows employees of the organisation’s Australian entity to see PII fields and triggers an approval workflow when the team would like to use personal information for personalisation applications.

Improving productivity.

In reviewing cloud platform implementations, we have observed some organisations excluding the ingestion of data from their scope. As a result, they have delivered “empty shells” of technology for future initiatives to work with. We believe that integration with source data systems would improve the adoption rate of cloud platforms once implemented.

The most common argument in support of separating the build of a cloud platform from the ingestion of data is that the data required by each project is different. Our rebuttal is that, although each initiative may require data in specific formats, this data is likely to originate from the same sources. Typically, these are enterprise systems that support core business processes, such as the Customer Relationship Management (CRM) platform.

We recommend that cloud data platform projects build interfaces to these systems as well as setting up cloud infrastructure. This would save future initiatives the effort and time of profiling data sources, obtaining approvals to connect to them and re-implementing interfaces that could be built once and used across projects. As part of this, we suggest maintaining a “data catalogue” to provide common definitions for, and visibility of, data already ingested into the cloud.

In addition, organisations undergoing constant change may wish to establish a dedicated data integration team to support ingestion of new data into the cloud. In our experience, identifying and profiling data sources can represent 40 to 60 percent of the expense in a typical data project. A data integration team that existed outside of individual projects could be tasked with maintaining a knowledge base of data sources and integrations, significantly reducing this expense.

In summary:

We believe that organisations undertaking cloud data projects should consider automating cost monitoring, data privacy controls and integrations with key systems. With these capabilities in place, they will be able to make effective and efficient use of cloud services – typically taking months out of the development time for digital assets.

At Cognis, we are passionate about protecting the future of community-oriented organisations by enabling them to effectively engage with stakeholders in the digital economy.

What we do