Datalake S3¶
The Datalake service is based on the S3 protocol and the MinIO project.
An S3 service exposes buckets, which could be likened to the share points of a file sharing service (SMB, AFP, ...).
Each user of the service is assigned a user bucket and access to one bucket per user group of which it is a member.
Group buckets are ideal for having a common space to deposit the datasets of a project.
Quota on buckets
In order to limit the occupation that a single user or group of users could have on the available space, a quota system per bucket is applied.
If you see that your quota is insufficient, please contact dln-support@criann.fr to evaluate a quota change for your group.
The service is accessed through two channels:
- for a use from a computation program or which would mount a bucket as a network drive on your computer, you have to interact with the S3 API whose URL is
https://s3.atelier.datalab-normandie.fr
- for a human management use, a web console is also available, which allows for example to simplify the import and export of datasets. The URL of the web console is https://s3-console.atelier.datalab-normandie.fr
Authentication
Beware, the S3 protocol uses secret key and access key authentication, alphanumeric tokens while the web console authentication uses the same authentication as the data processing tool.
You can generate authentication tokens for the S3 service from the datalake web console by creating a Service Account.
Login via the web¶
List your buckets¶
Manage your service accounts¶
Service accounts are S3 authentication tokens that allow your programs to authenticate to the S3 service.
Create a Service Account¶
Once validated, a set of tokens will be made available to you:
Use a custom access policy¶
You can create the Service Account with a more restrictive access policy than the user account. For example, a user account user0001
is part of the group group0001
so the policy of the account would be as follows:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListAllMyBuckets",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::group0001",
"arn:aws:s3:::group0001/*",
"arn:aws:s3:::user0001",
"arn:aws:s3:::user0001/*"
]
},
{
"Effect": "Deny",
"Action": [
"s3:CreateBucket",
"s3:DeleteBucket"
],
"Resource": [
"arn:aws:s3:::*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::group0001",
"arn:aws:s3:::group0001/*",
"arn:aws:s3:::user0001",
"arn:aws:s3:::user0001/*"
],
"Condition": {
"StringLike": {
"s3:prefix": ""
}
}
}
]
}
but you may want the Service Account not to give rights to the user account bucket because it is with this Service Account that the project jobs run, the data being in the group bucket.
In this case, you would want to deny access to the account's bucket via an access policy such as :
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::user0001",
"arn:aws:s3:::user0001/*"
]
},
{
"Effect": "Deny",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::user0001",
"arn:aws:s3:::user0001/*"
],
"Condition": {
"StringLike": {
"s3:prefix": ""
}
}
}
]
}
The user's access policy is combined with the service account's access policy, which will give more restricted access to the service account.
Limitation on policy size
Declared access policies for each user cannot exceed the limit of 20 KB
The access policies defined are written in the JSON syntax supported by the S3 protocol as defined in the Minio documentation and according to the Amazon S3 API reference documentation.