Developing new Turbinia Tasks and Evidence types
It’s easy!
Creating new Tasks for Turbinia is fairly easy, and if your Task is simple (like just executing an external command) it should only take a few lines of real code along with a bit of boiler-plate code, and a few extra lines to connect things together.
Before you start
Check out the How it Works page to see how the different components work within Turbinia.
Make sure to follow the Turbinia developer contribution guide.
Task code
Task Setup
The Worker which runs the tasks handles the following things before you even get
to the run()
method where most of our code will go:
Running any pre- or post-processors that need to run to prepare the Evidence.
Updating the
evidence.local_path
to be the local path of the evidence on the worker machine the Task is currently running on.Setting up temporary directories (available as
self.output_dir
andself.tmp_dir
).Preparing a TurbiniaResult object to save results into.
Task execution
To see a relatively simple example of the code required for a new Task, see this pull request. This simply executes the strings binary on Disk-based Evidence types.
Here is the bulk of the actual Task code for the Ascii Strings Task:
# Create the new Evidence object that will be generated by this Task.
output_evidence = TextFile()
# Create a path that we can write the new file to.
base_name = os.path.basename(evidence.local_path)
output_file_path = os.path.join(
self.output_dir, '{0:s}.ascii'.format(base_name))
# Add the output path to the evidence so we can automatically save it
# later.
output_evidence.source_path = output_file_path
# Generate the command we want to run.
cmd = 'strings -a -t d {0:s} > {1:s}'.format(
evidence.local_path, output_file_path)
# Add a log line to the result that will be returned.
result.log('Running strings as [{0:s}]'.format(cmd))
# Actually execute the binary
self.execute(
cmd, result, new_evidence=[output_evidence], close=True, shell=True)
This is mostly self explanatory from the comments, but the line that needs a little more explaining is this one:
self.execute(
cmd, result, new_evidence=[output_evidence], close=True, shell=True)
This will:
Run the command as specified
Set the output evidence to be saved
Save the stdout and stderr in the results object specified
Close the Result in preparation for Task completion
Task Finalization and Saving Results
Before a Task completes and returns, the Result object must be “closed” which
finalizes the results in preparation for them to be returned to the server.
Closing a Result does a few things like set Task stats, save all of the output
files, and run the post-processor to free up the Evidence (e.g. unmount disks,
etc). In the above example of self.execute()
, close=True
is set, which
will tells the method to handle closing the results. If you have other external
commands that you want to run and save the output from, you should not close
the results until after these are all complete (i.e. don’t set close=True
in
self.execute()
in this case). If you are not calling the execute()
method
and implicitly closing the results that way, you’ll need to close them similar
to this:
result.close(self, success=True, status='My message about the Task status')
One important parameter that was not set in this example call of self.close()
is save_files
. It takes a list of file paths that you want to save (no need
to add the files you linked to the Evidence earlier, it will save those
automatically). This is used for non-Evidence files that you want to save (for
example log files).
If you want to write files from your Task, you should do this relative to the
self.output_dir
. If you have temporary files you want to write, you can write
these to self.tmp_dir
. These directories are unique for the given Task
execution.
The run()
method should return the result
object which will be serialized
and returned to the server along with the associated Evidence that may have
been created. The new Evidence created and included in the results will be
checked by the Task Manager to see if there are other Jobs/Tasks that should be
scheduled to process it.
Pre/Post-Processing
Each Task can set the Evidence state that is required (e.g. mounted, attached,
etc) prior to execution by setting the state in Task.REQUIRED_STATES
. Each
Evidence object specifies which states it supports in the
Evidence.POSSIBLE_STATES
list attribute for that Task (e.g. see the
GoogleCloudDisk
possible states
here - Line 717).
These states are set up by the pre-processors and then after the Task is
completed, the post-processor will tear down this state (e.g. unmount or
detach, etc). For more details on the different states and how the
pre/post-processors will set these up at runtime, see the
Evidence.preprocess()
docstrings - Line 405.
Evidence Paths
As mentioned above, the pre-processor that runs before the Task is executed
will set the path evidence.local_path
to point to the local Evidence. If the
Task generates any new output Evidence objects, you must set the .source_path
attribute for that object before you add it to the results. The .source_path
is the original path the Evidence is created with and the .local_path
is the
path to access the Evidence after any pre-processors are run (e.g. the path it
was mounted on if it was mounted, etc). In most cases, Tasks should use
.local_path
when processing or operating on the input Evidence and
.source_path
for the output Evidence that is created by the Task (and will
be processed by other Tasks).
Since not all Tasks can process all types/states of Evidence (e.g. device files
and mounted directories), they can also reference other more specific paths for
the input evidence if needed (e.g. device_path
or mount_path
), but
generally this should not be needed as long as you set the
TurbiniaTask.REQUIRED_STATES
for the Task to match your actual requirements
since the local_path
should always be created by the pre-processors.
See the docstrings for these attributes in the Evidence
object - Line 203
for more details.
Recipe configuration
Tasks can also specify a set of variables that can be configured and set
through recipes. This allows users to pre-define set
configurations for the runtime behavior of Tasks along with which Jobs/Tasks
should be run. Each Task has a TASK_CONFIG
dictionary set at the object
level that defines each of the variables that can be used along with the
default values that the Task will use when the recipe does not specify that
variable, or there is no recipe used. See the Plaso
Task - Line 29
TASK_CONFIG
as an example. Tasks can access these variables by referencing
the dictionary at self.task_config
.
Boilerplate and Glue
The only two interesting bits for the Job definition in
turbinia/jobs/strings.py
are this one that sets the allowable input and
output Evidence types for the Task (so the Task Manager knows what kinds of
Tasks to schedule):
evidence_input = [RawDisk, GoogleCloudDisk, GoogleCloudDiskRawEmbedded]
evidence_output = [TextFile]
And this one, which just sets up the Tasks for both Task types (Ascii and Unicode):
tasks = [StringsAsciiTask() for _ in evidence]
tasks.extend([StringsUniTask() for _ in evidence])
return tasks
In this case we have two separate Tasks that we are executing for the Job, but
it’s possible that there could be more or less depending on how much you want to
split it up. Then you just need to add a reference to the new job in
turbinia/jobs/__init__.py
.
Logging
Using result.log()
is recommended for logging within a Task instead of the
normal python logger. result.log()
will log to the standard logger and
will also log the data in the Task result and creates a Task specific log
file with the name of worker.txt
in the Task output directory. This makes it
easier to debug and find Task specific logs. result.log()
also has a level
parameter that takes a log level (e.g. logging.INFO
) for control.
Reporting
Tasks can return report data in Markdown format by adding it as a string to
result.report_data
. If high priority findings are found, you can change
result.report_priority
. Priorities are 0 - 100, and the highest priority is 0.
This will affect the ordering in the report, and if the priority is a value less
than what is set with --priority_filter
(i.e. a higher priority), then the
full report data will be printed out when --full_report
is specified.
Tasks can use the helper methods in turbinia.lib.text_formatter to help format the text with formatting like bold and code. Note that when setting headings in a task report, do not use heading1 through heading3 because these are used in other sections of the report, but you can use heading4 and above.
Tips
If possible, set a meaningful
status
message that summarizes the Task execution or output. This can be done by either by setting thestatus
parameter when callingresult.close()
, or by explicitly setting theresult.status
attribute. This should be a single line and is the output that shows up for each task when runningturbiniactl status
. If a Task has a lowreport_priority
, then the full report data will not show up in theturbiniactl status
, and so the status may be the only place that Task info will bubble up in the output by default, so setting it to something useful can be important.If your Task executes an external command that can generate a log file, it’s helpful to specify the appropriate flags to generate this and then automatically save it by setting the
log_files
parameter when callingself.execute()
. Additionally, if there are flags that control the verbosity of this log file, it’s helpful to check theconfig.DEBUG_TASKS
config parameter and log accordingly, and this way all tasks can generate debug output when this is configured.
Testing
There is a TestTurbiniaTaskBase
object that task tests can sub-class for
relatively easy testing of the basic run method. See the photorec
test
for a simple example. For a task test with reporting output see the sshd
test as an example.
Notes
The reason we separate out the strings processing into two separate Tasks is so we can do them in parallel and save on wall-time.
One caveat about Task development is that it is possible to create a cycle in the Task Manager by generating Evidence types that your Task (or any of its parent’s tasks) also listens to. Check out the Job and Evidence graph generator if you want to verify that there aren’t any cycles in the graph.