Skip to content

How to connect

The platform made available is composed of several elements:

  • a data processing environment
  • a S3 datalake for data storage
  • an identical authentication between the data processing environment and the S3 datalake web console

The entry points

The data processing environment

The data processing environment is a product of the company Saagie, member of the consortium, which allows to perform pipeline processing with DataOps tools on datasets.

The data processing platform runs in a Kubernetes cluster consisting of:

  • 4 virtual machines with for each :
    • 32 vCPUs
    • 128 GB of RAM
  • 1 distributed disk space of about 4 TB on HDD disks, expandable to 8 TB
  • 1 high performance distributed disk space of about 1.3 TB on SSD disks, expandable to 2.5 TB

The S3 datalake

The platform includes a S3 service for storing user and project data spaces.

The datalake is based on MinIO technology, which allows you to generate S3 tokens.

The S3 datalake is composed of 4 virtual machines, each addressing 4 physical disks, for a total volume of about 119 TB.

Last update: April 26, 2022 16:18:52