Operators can be referenced by operator objects in pipelines. It is not possible to create new operators yourself, the logic of an operator is provided by us. Some operators can be used for more complex logic in your pipeline, other operators are useful to make small data manipulations without creating a deployment for it.
Are you looking for tutorials for specific operators and how to work with them? Have a look at the how-to's.
The Function operator provides the ability to manipulate one of the request data fields:
This operator may have multiple input fields, and has only a single output field called
output of which you can select the data type (integer/double/boolean/etc.). The output value is the result of an expression that you can define. The expression should be a one-line Python expression. Your expression may depend on some variables, like, the expression
"high" if my_variable > 5 else "low" depends on the variable
my_variable. Those variables can be configured as the input fields of the Function operator:
Some examples of when this operator is useful:
- If the output of your source object is in the wrong unit, like in centimeters instead of meters, you could use the Function operator with, for example, the expression
centimeters / 100and input field
centimetersof data type
- If you have two models (A and B) that predict something and that both output a prediction and a probability, you could use the Function operator to pick the prediction with the highest probability, for example, using the expression
prediction_a if probability_a > probability_b else prediction_band input fields
probability_acoming from source A and
probability_bcoming from source B.
Note that in pipelines, it is possible to combine the outputs of multiple source objects to a single destination object by selecting multiple sources when creating a pipeline connection. In that way, you can combine the result of the Function operator with other pipeline object results as input for the next object. You could even use multiple Function operators to manipulate multiple data fields and combine them as source fields for the next pipeline object.
The Conditional Logic operator provides the ability to conditionally choose the next object in the pipeline:
As for the Function operator, you can specify an expression for the Conditional Logic operator as well. The outcome of the expression will be evaluated as boolean. If you use this Conditional Logic operator as one of the sources for the next object in your pipeline, the next object will not be triggered if the result of the Conditional Logic operator is
False, but only when its result is
Some examples of when this operator is useful:
- If you only want to trigger the next object in your pipeline if a certain condition is met. For example,
my_variable is not Nonewith Conditional Logic operator input field
- If you want to route the data in your pipeline to a different object (e.g. my 'large' or 'small' model) if a certain condition is met. In that case, you can use two Conditional Logic operators. E.g.
my_variable > 10connected to my 'large' model and
my_variable <= 10connected to my 'small' model, with both operators having one input field
Note that the Conditional Logic operator can only be used as dependent source for the next object, deciding whether it will run or not, without using it in the field mapping in the connection, as the operator will not have any output fields. If you want to use it as input to a boolean input field of your object, use the Function operator with boolean output field type instead.
The Raise Error operator provides the ability to raise an error somewhere in the pipeline.
This operator is particularly useful in combination with the Conditional Logic operator to conditionally raise an error. When the error is raised, the pipeline request will fail with the defined error message, an example of this is shown in the corresponding howto.
The Create Subrequests operator (
one-to-many) provides the ability to parallelize subrequests over multiple instances of a deployment object.
To understand the behavior of this operator, you need to understand first what subrequests are. In the standard situation, a deployment returns a single item in the
request() method in the deployment package: a dictionary for structured output type or a string for plain output type. In addition to that, it is also possible to return a list of items, so a list of dictionaries for structured output type or a list of strings for plain output type. This means that a single request can result in multiple output items. In pipelines, each output item of a deployment object creates a subrequest to the next object. In other words; pipeline subrequests are created by returning multiple results in your deployment.
An example use case could be video processing in which the first deployment object of your pipeline splits the video in frames and the next object in the pipeline processes the frames one by one. Each frame is in that case a subrequest.
As there can be many subrequests in your pipeline, it would be beneficial to parallelize these subrequests over multiple instances of your deployment, i.e., to process the video frames in parallel. This is exactly where this operator comes in handy. You can configure the batch size to choose the chunk size to split the subrequests in. For example, if your video is split in 500 frames and you configure the batch size to be 100, 5 parallel requests will be created for your deployment, each containing 100 subrequests. This allows you to parallelize over 5 instances of your deployment, each instance picking up one request of 100 subrequests.
Maximum number of instances
Note that the maximum number of instances to which your deployment can scale depends on the maximum instances setting configured for your deployment version.
As we were mentioning subrequests being sent as a single request to a deployment instance, another concept comes into play. The standard
request() method of your deployment only handles one input item (dictionary for structured or string for plain input type) at a time. Which means that we will loop over all subrequests and send them to the
request() method one by one, resulting in a list of output items. If you don't want to do it like this, but prefer to receive the complete list of subrequests as input, so you can optimize the parallel processing (useful for GPUs!), you can easily do so by adding a
requests() (with an 's') method to your deployment. This method will receive a list as input. Note that it is up to you what you return in this method; both a single item (dictionary for structured or string for plain output type) or a list of items is valid. This allows you to both continue with subrequests or merge back to a single request. To stop subrequest parallelization and going back to a single deployment request containing all subrequests, you can use the Collect Subrequests operator, which is described below. In the video processing example, the Collect Subrequests operator in combination with a
requests() method is useful to collect all results of all frames (subrequests) and combine them back to a single video.
The Collect Subrequests operator (
many-to-one) provides the ability to wait for all parallelized requests to be finished and sent all subrequests as a single list to the next object. Which means that it stops parallelization created by the Create Subrequests operator.
As mentioned above for the Create Subrequests operator, your deployment can have a
request() and/or a
requests() method. The
requests() method allows you to merge all subrequests back to a single output item, e.g., combining frames back to a single video.
The Count Subrequests operator provides the ability to count the number of subrequests:
It has a single output field called
output of which the value is an integer:
In the video processing example, if you put the operator directly after the deployment object that splits the video in 500 frames, the output of this operator would be 500.