This tutorial will walk through the steps required to create a simple copernicus module
Tutorial files can be found here
The components of a module¶
The components you need to create a copernicus module is
- An xml definition describing the inputs and outputs of the module.
- A run method that contains the logic of the module.
- A plugin for the executable that runs on the worker.
Defining the xml and the run method¶
We will create a module that takes integer and doubles their values
We first create a directory under cpc/lib that will hold these two files.
Create the directory cpc/lib/double
Create a file named _import.xml in this directory. The import xml is a shell that describes the input and output values and their types.
<?xml version="1.0"?> <cpc> <!-- A simple copernicus function that takes in an array of integers and doubles their values. --> <!--Inputs and outputs of our function will be Integer arrays Here we define the a type called int_array, and we specify that the member-type(contents) of the array should be ints --> <type id="int_array" base="array" member-type="int"/> <function id="double_value" type="python-extended"> <desc></desc> <inputs> <!--each field has a unique id and a type--> <field type="int_array" id="integer_inputs"> <desc>Integer inputs</desc> </field> </inputs> <outputs> <field type="int_array" id="integer_outputs"> <desc>Integer outpus</desc> </field> </outputs> <!-- when this function is called it will call the python function defined below. The path is the same as when one imports a module in python we also specify that we want to create a persistent directory for this module we will use this to keep track of states. --> <controller function="cpc.lib.math.double_script.run" import="cpc.lib.math.double_script" persistent_dir="true"/> </function> </cpc>
The xml contains comments that describes the different tags; type and function. There are 5 basic types; int, float, file, array, string. For our module we only want to have arrays of integers as inputs and outputs so we have created a custom type called int_array. You see that it’s base type is defined as array and it’s member-type is defined as int.
The function tag describes the input and output values of the module as well as a controller which links it to the actual method that will be executed. The controller contains two attributes
- function, which points to the method that will be executed
- import, which is the python file that contains the method
The function attribute is set to cpc.lib.double.run. notice the the method in run.py is also called run the import attribute is set to cpc.lib.double as it the python package holding the method. functions and imports are defined the same way as you import packages and files in python.
Now create a file name run.py under the same directory. It will contain the logic that will be executed once input values are added or updated. In this method we can:
- Create commands to send to workers
- Define logic for what happens when inputs are set or updated
- Create new module instances
import cpc from cpc.dataflow import IntValue, Resources __author__ = 'iman' import logging log=logging.getLogger(__name__) #our run functions #the incoming value is of type cpc.dataflow.run.FunctionRunInput def run(inp): if inp.testing(): ''' When an instance of a function is first created a test call is performed here you can test to see if certain prerequisites are met. for example if this function is ran on the server only it might need to access some binaries''' return fo = inp.getFunctionOutput() #get hold of the inputs # the name that is provided must match the id of the input in the xml file #find what changed or was added in the array val = inp.getInputValue('integer_inputs') updatedIndices = [ i for i,val in enumerate(val.value) if val.isUpdated()] log.debug(updatedIndices) # only one command is finished per call if inp.cmd: runResultLogic(inp,updatedIndices) return #run some login on the changes for i in updatedIndices: #THIS IS WHERE WE SHOULD PUT OUR LOGIC runLogic(inp,i) return fo def runLogic(inp,i): #Sending a job for a worker to compute val = inp.getInputValue('integer_inputs') arr = inp.getInput('integer_inputs') #1 create command storageDir = "%s/%s"%(inp.getPersistentDir(),i) #the command name should match the executable name of the plugin commandName = "demo/double" args = [arr[i].get()] cmd =cpc.command.Command(storageDir ,commandName ,args) #2 define how many cores we want for this job resources = Resources() resources.min.set('cores',1) resources.max.set('cores',1) resources.updateCmd(cmd) #2 add the command to the function output --> will be added to the queue fo = inp.getFunctionOutput() fo.addCommand(cmd) def runResultLogic(inp,index): #in this case we are getting the result directly from stdout # stdout = "%s/%s/stdout"%(inp.getBaseDir(),inp.cmd.dir) stdout = "%s/stdout"%(inp.cmd.getDir()) with open(stdout,"r") as f: result = int(f.readline().strip()) log.debug("result is %s"%result) fo = inp.getFunctionOutput() fo.setOut("integer_outputs[%s]"%index,IntValue(result)) return fo
the run method does three things. First it grabs the input value, it uses this input value to create a command that will be put on the copernicus queue and sent to a worker. Lastly it handles the returned data from the worker and sets it to the output.
Defining the worker plugin¶
We will also need to create a plugin which is what the worker runs The plugin is just an xml which defines a list of executables that this plugin can run. An executable can be anything that can be run on the command line.
There are two ways two create the xml. You can either define a static one named executable.xml or a dynamic using a python script both have the same output and should be located in cpc/plugins/executables.
create a folder called double under cpc/plugins/executables. And add the executables.xml file under it.
<?xml version="1.0"?> <executable-list> <!--the executable has a name which it matches to the command name on the queue, an executable might have support for --> <!--different types of platforms, for example smp or mpi. A version of the executable can be defined. This can be used --> <!--when creating a command to specify the minimum version required--> <executable name="math/double" platform="smp" arch="" version="1.0"> <!--the command that the executable will call is defined here. you can define a command, script --> <!--or a program in the same way that you call it on the command line--> <run in_path="yes" cmdline="double.py"/> </executable> </executable-list>
this xml calls the command double which is a python script that we have created. to make this runnable by a worker, either change the cmdline attribute to specifiy the absolute path to where the script is located, or add the path to your PATH environment variable.
We know have everything ready for our first module! Start up the copernicus server. To see if the module has loaded properly call cpcc list-modules. You should see a module name double.
Now you can create a project and start using the module. this premade script will create an instance of the module and start adding values.