In the context of a global transformation, Micropole helped VOO implement a complete migration of their Business Intelligence, Big Data, and AI landscape to the Cloud. This migration was critical to meet strategic and urgent business needs such as:
Micropole conducted a rapid study, scanning all aspects of the transformation and addressing both the organizational challenge (Roles and responsibilities, teams and skills, processes, governance) and the technical challenge (holistic architectural scenarios, ranging from hybrid cloud to full cloud solutions in PaaS mode).
Based on the outcome of the study, Micropole deployed a Cloud-based, Enterprise-wide data platform that would combine traditional BI processes with advanced analytical capabilities. Micropole helped redefine the data organization and related processes and introduced data governance at the corporate level.
The TCO dropped to less than 30% of what it used to be while agility and capabilities have dramatically improved.
Amazon S3 is used for central inbound layer and achieve long term persistence.
Some data files are pre-processed on Amazon EMR. EMR clusters are created on the fly a couple of times per day. The clusters only process new data that arrived in S3. Once the data is processed and persisted in an analytical optimized Apache Parquet format, the cluster is destroyed. Encryption and lifecycle management are enabled on most S3 buckets to meet security and cost-efficiency requirements. 600+ TB of data is currently stored in the Data Lake. Amazon Athena is used to creating and maintaining a data catalog and explore raw data in the Data Lake.
Amazon Kinesis Data Streams captures real-time data, which is filtered and enriched (with data from the Data Warehouse) by a Lambda function before it is stored into an Amazon DynamoDB database. Real-time data is also stored in dedicated S3 buckets for persistency.
The Data Warehouse is running on Amazon Redshift, using the new RA3 nodes and follows the Data Vault 2.0 methodology. Data Vault objects are very standardized and have strict modeling rules, which allows a high level of standardization and automation. The data model is generated based on metadata stored in an Amazon RDS Aurora database.
The automation engine itself is built on Apache Airflow, deployed on EC2 instances.
The project implementation started in June 2017; the production Redshift cluster initially sized on 6 DC2 nodes seamlessly evolved over time and answered the projects growing data needs and answering all business needs.
Amazon DynamoDB is being used for specific use cases where web-applications need sub-second response times. Using the DynamoDB’s variable read/write capacity allows to provision the more expensive high performance read capacity during business hours only, where low latency and fast response time are required. Such mechanisms, which rely on the AWS services’ elasticity, are used to optimize the AWS monthly bill.
A series of predictive models have been implemented, ranging from a classical churn prediction model to more advanced use cases. For example, a model has been built to spot customers who are likely to have been impacted by a network failure. Amazon SageMaker was used to build, train, and deploy the models at scale, leveraging the data available in the Data Lake (Amazon S3) and the Data Warehouse (Amazon Redshift).
External parties need to access specific data sets in a secured and reliable way and Amazon API Gateway is used to deployed secured RESTful APIs on top of serverless data microservices implemented with Lambda functions.
The Data Platform Micropole has built for VOO offers dozens of other capabilities. The huge set of services available on the AWS environment allows addressing new use cases every day, in a fast and efficient way.
Learn more on this disruptive project successfully conducted by Micropole!
An extentisive article on the matter was written in Solutions Magazine (in French).
The interview was conducted by Alain de Fooz.