Note:Click to download the fluid-0.6.0.tgz installation package.
resource.yaml
, which contains the following:test-bucket
in this exampleapiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
name: hadoop
spec:
mounts:
- mountPoint: cosn://test-bucket/
options:
fs.cosn.userinfo.secretId: <COS_SECRET_ID>
fs.cosn.userinfo.secretKey: <COS_SECRET_KEY>
fs.cosn.bucket.region: <COS_REGION>
fs.cosn.impl: org.apache.hadoop.fs.CosFileSystem
fs.AbstractFileSystem.cosn.impl: org.apache.hadoop.fs.CosN
fs.cosn.userinfo.appid: <COS_APP_ID>
name: hadoop
---
apiVersion: data.fluid.io/v1alpha1
kind: GooseFSRuntime
metadata:
name: hadoop
spec:
replicas: 2
tieredstore:
levels:
- mediumtype: HDD
path: /mnt/disk1
quota: 100G
high: "0.9"
low: "0.2"
To ensure the security of key information such as AK, you are advised to use secret
to save related key information. For details about how to use secret
, see Using Parameters for Encryption.
apiVersion: v1
kind: Secret
metadata:
name: mysecret
stringData:
fs.cosn.userinfo.secretId: <COS_SECRET_ID>
fs.cosn.userinfo.secretKey: <COS_SECRET_KEY>
---
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
name: hadoop
spec:
mounts:
- mountPoint: cosn://yourbucket/
options:
fs.cosn.bucket.region: <COS_REGION>
fs.cosn.impl: org.apache.hadoop.fs.CosFileSystem
fs.AbstractFileSystem.cosn.impl: org.apache.hadoop.fs.CosN
fs.cosn.userinfo.appid: <COS_APP_ID>
name: hadoop
encryptOptions:
- name: fs.cosn.userinfo.secretId
valueFrom:
secretKeyRef:
name: mysecret
key: fs.cosn.userinfo.secretId
- name: fs.cosn.userinfo.secretKey
valueFrom:
secretKeyRef:
name: mysecret
key: fs.cosn.userinfo.secretKey
---
apiVersion: data.fluid.io/v1alpha1
kind: GooseFSRuntime
metadata:
name: hadoop
spec:
replicas: 2
tieredstore:
levels:
- mediumtype: SSD
path: /mnt/disk1
quota: 100G
high: "0.9"
low: "0.2"
$ kubectl create -f resource.yaml
Check the status of the GooseFSRuntime deployed. If all states are Ready
, GooseFSRuntime is deployed successfully.
$ kubectl get goosefsruntime hadoop
NAME MASTER PHASE WORKER PHASE FUSE PHASE AGE
hadoop Ready Ready Ready 62m
Check the dataset status. If the state is Bound
, the dataset is bound successfully.
$ kubectl get dataset hadoop
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE
hadoop 210.00MiB 0.00B 180.00GiB 0.0% Bound 1h
Check the PV and PVC creation results. PV and PVC are automatically created during GooseFSRuntime deployment.
$ kubectl get pv,pvc
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/hadoop 100Gi RWX Retain Bound default/hadoop 58m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/hadoop Bound hadoop 100Gi RWX 58m
Log in to the master and worker pods and observe whether files can be listed properly.
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
hadoop-fuse-svz4s 1/1 Running 0 23h
hadoop-master-0 1/1 Running 0 23h
hadoop-worker-2fpbk 1/1 Running 0 23h
$ kubectl exec -ti hadoop-goosefs-master-0 bash
goosefs fs ls /hadoop
Log in to the Fuse pod and observe whether files can be listed properly.
$ kubectl exec -ti hadoop-goosefs-fuse-svz4s bash
cd /runtime-mnt/goosefs/<namespace>/<DatasetName>/goosefs-fuse/<DatasetName>
You can create an application container to use the GooseFS acceleration service or submit a machine learning job to try out related features. In the following example, we create an application container app.yaml
to use the dataset. We will access the same data multiple times and compare access times to demonstrate the acceleration effect of GooseFS.
apiVersion: v1
kind: Pod
metadata:
name: demo-app
spec:
containers:
- name: demo
image: nginx
volumeMounts:
- mountPath: /data
name: hadoop
volumes:
- name: hadoop
persistentVolumeClaim:
claimName: hadoop
Use kubectl to create an application.
$ kubectl create -f app.yaml
Check the file size.
$ kubectl exec -it demo-app -- bash
$ du -sh /data/hadoop/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2
210M/data/hadoop/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2
Check the file copy time (18 seconds in this example).
$ time cp /data/hadoop/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2 /dev/null
real0m18.386s
user0m0.002s
sys 0m0.105s
Now check the dataset cache status. You can find that 210 MB data is all cached locally.
$ kubectl get dataset hadoop
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE
hadoop 210.00MiB 210.00MiB 180.00GiB 100.0% Bound 1h
To prevent other factors (such as page cache) affecting the results, we delete the previous container, create the same application, and try to access the same file. Since the file is already cached by GooseFS at this point, you can see that the second access takes much less time than the first.
$ kubectl delete -f app.yaml && kubectl create -f app.yaml
Check the file copy time (48 ms in this example). The entire copy duration is reduced by 300 times.
$ time cp /data/hadoop/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2 /dev/null
real0m0.048s
user0m0.001s
sys 0m0.046s
$ kubectl delete -f resource.yaml
Was this page helpful?