Tuesday 26 May 2020

VOO transforms its BI services and migrates to the Cloud

VOO completed Memento, its Business Intelligence and Big Data transformation program with a key migration to the Cloud. Find out more about how Micropole accompanied the operator during the different phases of the project.

The challenges of VOO


In the context of a global transformation, Micropole helped VOO implement a complete migration of their Business Intelligence, Big Data, and AI landscape to the Cloud. This migration was critical to meet strategic and urgent business needs such as:

  • Dramatically increase customer insights to speed up acquisition and improve loyalty and retention
  • Support the digital transformation by offering a unified vision of the customer and his behavior
  • Meet the new compliancy challenges (GDPR)
  • Radically reduce overall data environments TCO (4 different BI environments + 3 Hadoop clusters before the transformation)
  • Introduce company-wide data governance and address shadow BI (25+ FTE’s on the business side crunching and processing data)


The solution and result generated by Micropole


Micropole conducted a rapid study, scanning all aspects of the transformation and addressing both the organizational challenge (Roles and responsibilities, teams and skills, processes, governance) and the technical challenge (holistic architectural scenarios, ranging from hybrid cloud to full cloud solutions in PaaS mode).

Based on the outcome of the study, Micropole deployed a Cloud-based, Enterprise-wide data platform that would combine traditional BI processes with advanced analytical capabilities. Micropole helped redefine the data organization and related processes and introduced data governance at the corporate level.

The TCO dropped to less than 30% of what it used to be while agility and capabilities have dramatically improved.


Architecture based upon key data services of AWS


Data lake

Amazon S3 is used for central inbound layer and achieve long term persistence.

Some data files are pre-processed on Amazon EMR. EMR clusters are created on the fly a couple of times per day. The clusters only process new data that arrived in S3. Once the data is processed and persisted in an analytical optimized Apache Parquet format, the cluster is destroyed. Encryption and lifecycle management are enabled on most S3 buckets to meet security and cost-efficiency requirements. 600+ TB of data is currently stored in the Data Lake. Amazon Athena is used to creating and maintaining a data catalog and explore raw data in the Data Lake. 


Real-time ingestion

Amazon Kinesis Data Streams captures real-time data, which is filtered and enriched (with data from the Data Warehouse) by a Lambda function before it is stored into an Amazon DynamoDB database. Real-time data is also stored in dedicated S3 buckets for persistency.


Data Warehouse

The Data Warehouse is running on Amazon Redshift, using the new RA3 nodes and follows the Data Vault 2.0 methodology. Data Vault objects are very standardized and have strict modeling rules, which allows a high level of standardization and automation. The data model is generated based on metadata stored in an Amazon RDS Aurora database.

The automation engine itself is built on Apache Airflow, deployed on EC2 instances.


The project implementation started in June 2017; the production Redshift cluster initially sized on 6 DC2 nodes seamlessly evolved over time and answered the projects growing data needs and answering all business needs.



Amazon DynamoDB is being used for specific use cases where web-applications need sub-second response times. Using the DynamoDB’s variable read/write capacity allows to provision the more expensive high performance read capacity during business hours only, where low latency and fast response time are required. Such mechanisms, which rely on the AWS services’ elasticity, are used to optimize the AWS monthly bill.


Machine Learning

A series of predictive models have been implemented, ranging from a classical churn prediction model to more advanced use cases. For example, a model has been built to spot customers who are likely to have been impacted by a network failure. Amazon SageMaker was used to build, train, and deploy the models at scale, leveraging the data available in the Data Lake (Amazon S3) and the Data Warehouse (Amazon Redshift).


API’s for external access

External parties need to access specific data sets in a secured and reliable way and Amazon API Gateway is used to deployed secured RESTful APIs on top of serverless data microservices implemented with Lambda functions. 


And much more!

The Data Platform Micropole has built for VOO offers dozens of other capabilities. The huge set of services available on the AWS environment allows addressing new use cases every day, in a fast and efficient way.


Mentions in the press

Learn more on this disruptive project successfully conducted by Micropole!

An extentisive article on the matter was written in Solutions Magazine (in French).

The interview was conducted by Alain de Fooz.