| eachElem {nws} | R Documentation |
Apply a Function in Parallel over a Set of Lists and Vectors
Description
eachElem executes function fun multiple times in
parallel with a varying set of arguments, and returns the results in a
list. It is functionally similar to the standard R
lapply function, but is more flexible in the way that
the function arguments can be specified.
Usage
## S4 method for signature 'sleigh'
eachElem(.Object, fun, elementArgs=list(), fixedArgs=list(),
eo=NULL, DEBUG=FALSE)
Arguments
.Object |
sleigh class object. |
fun |
the function to be evaluated by the sleigh.
In the case of functions like |
elementArgs |
list of vectors, lists, matrices, and data frames that
specify (some of) the arguments to be passed to |
fixedArgs |
list of additional arguments to be passed to |
eo |
list specifying environment options. See the section Environment Options below. |
DEBUG |
logical; should |
Details
The eachElem function forms argument sets from objects passed in via
elementArgs and fixedArgs.
The elements of elementsArgs are used to specify the arguments that are
changing, or varying, from task to task, while the elements of
fixedArgs are used to specify the arguments that do not vary
from task to task. The number of tasks that are executed by a call to
eachElem is basically equal to the length of the longest vector
(or list, etc) in elementArgs. If any elements of
elementArgs are shorter, then their values are recycled, using
the standard R rules.
The elements of elementArgs may be vectors, lists, matrices, or
data frames. The vectors and lists are always iterated over by
element, or "cell", but matrices and data frames can also be iterated
over by row or column. This is controlled by the by option,
specified via the eo argument. See below for more information.
For example:
eachElem(s, '+', elementArgs=list(1:4), fixedArgs=list(100))
This will submit four tasks, since the length of 1:4 is four. The four tasks will be to add the arguments 1 and 100, 2 and 100, 3 and 100, and 4 and 100. The result is a list containing the four values 101, 102, 103, and 104.
Another way to do the same thing is with:
eachElem(s, '+', elementArgs=list(1:4, 100))
Since the second element of elementArgs is length one, it's
value is recycled four times, thus specifying the same set of tasks as
in the previous example. This method also has the advantage of making it
easy to put fixed values before varying values, without the need for
the eo$argPermute option, discussed later. For example:
eachElem(s, '-', elementArgs=list(100, 1:4))
is similar to the R statement:
100 - 1:4
Note that in simple examples like these, where the results are numeric
values, the standard R unlist function can be very
useful for converting the resulting list into a vector.
Environment Options
The eo argument is a list that can be used to specify various
options. The following options are recognized:
- elementFunc
The
eo$elementFuncoption can be used to specify a callback function that provides the varying arguments forfunin place ofelementArgs(that is, you can't specify botheo$elementFuncandelementArgs).eachElemcalls theeo$elementFuncfunction to get a list of arguments for one invocation offun, and will keep calling it untileo$elementFuncsignals that there are no more tasks to execute by calling thestopfunction with no arguments.eachElemappends any values specified byfixedArgsto the list returned byeo$elementFuncjust as ifelementArgshad been specified.eachElempasses the number of the desired task (starting from 1) as the first argument toeo$elementFunc, and the value of theeo$byoption as the second argument. Note that the use of theeo$elementFuncfunction is an advanced feature, but is very useful when executing a large number of tasks, or when the arguments are coming from a database query, for example. For that reason, theeo$loadFactoroption should usually be used in conjunction witheo$elementFunc(see description below).- accumulator
The
eo$accumulatoroption can be used to specify a callback function that will receive the results of the task execution as soon as they are complete, rather than returning all of the task results as a list wheneachElemcompletes. In other words,eachElemwill call theeo$accumulatorfunction with task results as soon as it receives them from the sleigh workers, rather than saving them in memory until all the tasks are complete. Note that if the tasks are chunked (using theeo$chunkSizeoption described below), then theeo$accumulatorfunction will receive multiple task results, which is why the task results are always passed to theeo$accumulatorfunction in a list.The first argument to the
eo$accumulatorfunction is a list of results, where the length of the list is equal toeo$chunkSize. The second argument is a vector of task numbers, starting from 1, where the length of the vector is also equal toeo$chunkSize. The task numbers are very important, because the results are not guaranteed to be returned in order.eo$accumulatoris another advanced feature, and likeeo$elementFunc, is very useful when executing a large number of tasks. It allows you to process each result as they finish, rather than forcing you to wait until all of the tasks are complete. In conjunction witheo$elementFuncandeo$loadFactor, you can set up a pipeline, allowing you to process an unlimited number of tasks efficiently. Note that wheneo$accumulatoris specified,eachElemreturns NULL, not the list of results, sinceeachElemdoesn't save any of the results after passing them to theeo$accumulatorfunction.- by
The
eo$byoption specifies the iteration scheme to use for matrix and data frame elements inelementArgs. The default value is"row", but it can also be set to"column"or"cell". Vectors and lists inelementArgsare not affected by this option.- chunkSize
The
eo$chunkSizeoption is a tuning parameter that specifies the number of tasks that sleigh workers should allocate at a time. The default value is 1, but if the tasks are small, performance can be improved by specifying a larger value, which decreases the overhead per task.If the
funfunction executes very quickly, you may not be able to keep your workers busy, giving you poor performance. In that case, consider setting theeo$chunkSizeoption to a large enough number to increase the effective task execution time.- loadFactor
The
eo$loadFactoroption is a tuning parameter that specifies the maximum number of tasks per worker that are submitted to the sleigh at the same time. If set, no more than(loadFactor * workerCount)tasks will be submitted at the same time. This helps to control the resource demands that are made on the NetWorkSpaces server, which is especially important if there are a large number of tasks. Note that this option is ignored ifblockingis set toTRUE, since the two options are incompatible with each other.If in doubt, set the
eo$loadFactoroption to 10. That will almost certainly avoid putting a strain on the NetWorkSpaces server, and if that isn't enough to keep your workers busy, then you should really be using theeo$chunkSizeoption to give the workers more to do.- blocking
The
eo$blockingoption is used to indicate whether to wait for the results, or to return as soon as the tasks have been submitted. If set toFALSE,eachElemwill return asleighPendingobject that is used to monitor the status of the tasks, and to eventually retrieve the results. You must wait for the results to be complete before executing any further tasks on the sleigh, or an exception will be raised. The default value isTRUE.- argPermute
The
eo$argPermuteoption is used to reorder the arguments passed tofun. It is generally only useful if thefixedArgsargument has been specified, and some of those arguments need to precede the arguments specified viaelementArgs. Note that by using recycling of elements inelementArgs, the use offixedArgsandargPermutecan often be avoided entirely.
Note
If elementArgs or fixedArgs isn't a list,
eachElem will automatically wrap it in a list. This is a
convenience that only works for passing in a single vector and matrix,
however.
If elementArgs or fixedArgs are named lists, then the
names are used to map the values to the appropriate argument of
fun. This can be used as another technique to avoid the use of
eo$argPermute.
The elementArgs argument can be specified as a data frame.
This works just like a named list, and therefore, the column names of
the data frame must all correspond to arguments of fun. Note
that if the data frame has many rows, the performance may not be good
due to the overhead of subsetting data frames in R.
If you have a huge number of tasks, consider using the
eo$elementFunc, eo$accumulator, and eo$loadFactor
options.
If eo$elementFunc returns a value that isn't a list,
eachElem will automatically wrap that value in a list.
The eo$elementFunc function doesn't have to define a second
formal argument (the by argument) if it's not needed.
The eo$accumulator function doesn't have to define a second
formal argument (the taskVector argument) if it's not needed.
Just remember that the results are not guaranteed to come back in
order.
See Also
Examples
## Not run:
# create a sleigh
s <- sleigh()
# compute the list mean for each list element
x <- list(a=1:10, beta=exp(-3:3), logic=c(TRUE,FALSE,FALSE,TRUE))
eachElem(s, mean, list(x))
# median and quartiles for each list element
eachElem(s, quantile, elementArgs=list(x), fixedArgs=list(probs=1:3/4))
# use eo$elementFunc to supply 100 random values and eo$accumulator to
# receive the results
elementFunc <- function(i, by) {
if (i <= 100) list(i=i, x=runif(1)) else stop()
}
accumulator <- function(resultList, taskVector) {
if (resultList[[1]][[1]] != taskVector[1]) stop('assertion failure')
cat(paste(resultList[[1]], collapse=' '), '\n')
}
eo <- list(elementFunc=elementFunc, accumulator=accumulator)
eachElem(s, function(i, x) list(i=i, x=x, xsq=x*x), eo=eo)
## End(Not run)