Log in to the TKE console.
On the left sidebar, click Ops Feature Management to enter the feature management page.
Find the target cluster and click Settings.
In the pop-up window, click Edit in the Log Collection column.
Select Enable Log Collection and click OK.
Click Close.
Log in to the TKE console.
On the left sidebar, click Ops Feature Management to enter the feature management page.
Find the target cluster and click Settings.
In the pop-up window, click Edit in the Log Collection column.
Click Upgrade Component.
cls-provisioner, the component for communication between the TKE log collection component and CLS, uses a TencentCloud API domain, which must be kept accessible. If the component deployment fails, or you find the following error in logs, the domain name is inaccessible.
As shown above, a cls-provisioner start exception occurred, and you can find in logs that the cls.internal.tencentcloudapi.com
domain name is inaccessible.
The private and public domain names are accessible by default from servers in Tencent Cloud. A common cause of this problem is that the DNS configuration on the TKE node is modified. You can fix it in the following two ways:
Note:We recommend you check the DNS configuration on the relevant TKE nodes first when the domain name is inaccessible in the TKE cluster.
Log upload domain names are different from TencentCloud API domain names and are in the format of <region>.cls.tencentcs.com
(public network domain) or <region>.cls.tencentyun.com
(private network domain). For more information, see Available Regions.
Solution:
Make the domain name accessible on the corresponding cluster node server.
Sometimes an error similar to the following may be reported during communication between cls-provisioner and CLS:
Solution:
Associate the QcloudAccessForTKERoleInOpsManagement
policy in the TKE_QCSRole role under the account that created the TKE cluster.
In some cases, the output type of user logs is standard output, but logs collected to CLS are truncated. This happens as json-tool, the default log collection tool of Docker, limits the size of single-line logs. Therefore, logs exceeding 16 KB in size will be truncated.
Solution:
Modify the log output configuration to make the size of printed single-line logs below 16 KB.
If you find that some logs are collected repeatedly in the CLS console, you can check the log output path first to see whether logs are output to the persistent storage created by PV/PVC.
If logs are output to the persistent storage, when a business Pod is recreated, logs will be collected again. You can run the following command to view the YAML definition of the Pod:
kubectl get pods <pod_name> -n <namespace> -o yaml | less
If information similar to the following is returned, logs are output to the persistent storage.
Solution:
LogListener currently doesn't support logs stored in NFS. It subscribes to Linux kernel events to get the file update information instead of actively scanning target files.
As NSF file update information is generated on the NFS server, and file update events cannot be generated in the local kernel, such information cannot be perceived by LogListener. Therefore, NFS file logs cannot be collected in real time.
You can troubleshoot as follows:
kubectl get pods <pod_name> -n <namespace> --show-labels
kubectl get pods -n <namespace> |grep testa
kubectl get pods <pod_name> -n <namespace> -o yaml
When collecting container files or host files, check whether the collection directory path is correct and contains logs complying with the collection rule.
In container file collection scenarios, matched log files cannot have soft links.
In Kubernetes scenarios, CLS collects logs by parsing the location of the container file on the host. As a container soft link points to a path within the container, if a matched file to be collected has a soft link, it cannot be reached correctly.
Solution:
Modify the collection rule path and matched files to use the actual log file path and log files to be collected, so as to avoid matching soft links.
As resources are restricted when LogListener is used to collect container logs, the numbers of directories and files listened on are also limited as follows:
You may encounter such problems when collecting container or host files. Generally, the following information will be displayed in LogListener logs if expired log files are not cleared:
Files and directories exceeding the limits will not be listened on by LogListener; therefore, some target log files may not be collected.
For more information, see LogListener Limits.
Solution:
In the log directory, run the tree
command to check whether the current numbers of directories and files in the entire directory structure reach the LogListener limits.
tree -L 5
under /var/log/tke-log-agent
on the host of the business container to check whether the host limits are reacheddocker history $image
command to view the image rebuild information.crictl inspecti $image
command to view the image rebuild information.The following information is returned. You can see that the /logs/live-srv
volume is customized in the Dockerfile, which also happens to be the log directory. Such an operation prevents the log collection component from finding correct log files.
Solution:
In some Docker scenarios, bugs on earlier versions may be triggered, causing a log collection component start failure and generating panic logs.
This happens mainly because the Docker configuration of TKE cluster nodes is customized, leading to the error as shown below:
Solution:
"storage-driver": "overlay2"
to the /etc/docker/daemon.json
configuration file as shown below:filePattern
?As shown below, a subdirectory is set in the filePattern
parameter, making it unable to collect logs.
Solution:
Set the log file directory in the logPath
parameter, and set only the file type parameter in filePattern
.
Was this page helpful?