Concurrency refers to the number of requests that can be processed by a function concurrently at a moment. If it can be sustained by other services of your business, you can increase the function concurrency from several to tens of thousands with simple configuration.
When a function is invoked, SCF will assign a concurrent instance to process the request or event. After the function code is executed and its response is returned, the instance will process other requests. If all instances are running when a request arrives, SCF will assign a new concurrent instance.
SCF follows the execution logic that one concurrent instance processes only one event at any time so as to ensure the processing efficiency and stability of each event.
Async events will first enter a queue on the SCF platform, where they will be processed in a FIFO manner. The system will select an appropriate concurrency processing method based on the conditions such as queue length and current number of concurrent instances of the function to pull sufficient concurrent instances and process the events in sequence.
If an async invocation fails, SCF will retry according to certain rules. For more information, please see Error Types and Retry Policies.
After sync events arrive at the SCF platform, the platform will check whether there are any idle concurrent instances, and if so, the events will be immediately sent to instances for processing; otherwise, the platform will start new concurrent instances to process them.
If a sync invocation fails, you need to retry by yourself.
SCF concurrency refers to the number of requests or invocations processed by the function code at a time, which can be estimated according to the following formula:
Concurrency = request rate * function execution duration = QPS * average time per request
You can view the average time per request in execution duration in the monitoring data.
For example, if the QPS of a business is 2,000, and the average time per request is 0.02 seconds, then the concurrency during function execution will be 2000 * 0.02 = 40.
After a concurrent instance processes a request event, it will not be repossessed immediately; instead, it will be retained for a certain period of time for reuse. During the retention duration, if there are new request events that need to be processed, the retained concurrent instance will be used first, so the events can be processed quickly with no need to start new concurrent instances.
After the retention duration elapses, if there are no requests that need to be processed by the instance, the SCF platform will repossess it.
The concurrent instance retention duration is dynamically adjusted by the SCF platform as needed; therefore, you cannot assume a certain retention duration when writing the function business code.
If a request arrives, but no concurrent instances for that version can process it, the SCF platform will start a new concurrent instance for processing. After initialization, the new instance can process events, which is called expansion by elastic concurrency.
The maximum expansion speed of new concurrent instances in a region under one account is 500 instances/minute by default, that is, up to 500 new concurrent instances can be started in one minute. If the limit is reached in one minute, no more new instances will be started until the minute elapses, during which a limited expansion error (429 ResourceLimit) will occur if new instance expansion requests are initiated. For more information, please see Function Status Code.
For example, the concurrency quota of an account in the Guangzhou region is 1,000 concurrent instances by default for a 128 MB function, and if many requests arrive, 500 concurrent instances can be started from 0 in the first minute. If there are still other requests to be processed, 500 more concurrent instances can be started to reach 1,000 instances in total in the second minute, until the number of concurrent instances is sufficient for the requests or reaches the upper limit.
The elastic concurrency expansion speed of 500 instances per minute can meet the requirements in most business scenarios. If your business is limited by this speed, please select provisioned concurrency for prefetch or submit a ticket to increase the limit.
Concurrent instances in elastic concurrency expansion in the SCF platform need to be initialized, which includes initialization of the runtime environment and the business code.
You can use the provisioned concurrency feature to configure concurrent instances in advance. The SCF platform will start the concurrent instances after you configure them and will not actively repossess the provisioned instances, so as to guarantee a certain number of concurrent instances as much as possible. If errors such as code memory leak occur on a concurrent instance, the SCF platform will replace it with a new instance. For more information, please see Provisioned Concurrency.
The SCF platform provides concurrency management capabilities at the function granularity for you to flexibly control the concurrency of different functions. For more information, please see Reserved Concurrency.
Each account has a total concurrency quota limit at the region level. The default value is 128,000 MB or 64,000 MB. For more information, please see Quota Limits. The concurrency quotas between regions are independent of each other.
To help you manage concurrency more precisely, the SCF concurrency quota is calculated by memory; for example, a 256 MB concurrency quota represents one concurrent instance with 256 MB memory or two instances with 128 MB memory each.
When you set a reserved quota for a function, it will have the following two effects:
When a concurrent instance of a function is processing actual requests, it will be marked as running concurrent instance. In SCF monitoring information, you can query the running concurrent instances of a function, a specific function version, or an alias. As there are certain intervals in running instance information collection, if a function's execution time is very short and its number of concurrent instances is high, the current monitoring data may not be completely accurate.
By using reserved quota and provisioned concurrency together, you can flexibly allocate resources among multiple functions and warm up functions as needed.
If nothing is configured, all functions share the account quota by default. If a function generates a surge of business invocations, it can make full use of the unused quota to ensure that the surge will not cause overrun errors.
If the business features of a specific function are sensitive or critical, and you need to do your best to ensure a high request success rate, then you can use the reserved quota feature to this end. Reserved quota can give the function exclusive quota to guarantee the concurrency reliability and avoid overruns caused by concurrency preemption by multiple functions.
If a function is sensitive to cold start, the code initialization process takes a long time, or many libraries need to be loaded, then you can set the provisioned concurrency for a specific function version to start function instances in advance and ensure smooth execution.