Serialization of stored/cached data¶
By default, both cache and data files (created using the APIs described in
Persistent data) are cached using pickle
. This provides
a great compromise in terms of speed and the ability to store arbitrary objects.
When changing or specifying a serializer, use the name under which the serializer is registered with the workflow.manager object.
Warning
When it comes to cache data, it is strongly recommended to stick with the
default. pickle
is very fast and fully supports standard Python
data structures (dict
, list
, tuple
, set
etc.).
If you really must customise the cache data format, you can change the
default cache serialization format to pickle
thus:
1wf = Workflow()
2wf.cache_serializer = 'pickle'
Unlike the stored data API, the cached data API can’t determine the format of the cached data. If you change the serializer without clearing the cache, errors will probably result as the serializer tries to load data in a foreign format.
In the case of stored data, you are free to specify either a global default serializer or one for each individual datastore:
1wf = Workflow()
2# Use `pickle` as the global default serializer
3wf.data_serializer = 'pickle'
4
5# Use the JSON serializer only for these data
6wf.store_data('name', data, serializer='json')
This is primarily so you can create files that are human-readable or useable by other software. The generated JSON is formatted to make it readable.
The stored_data()
method can
automatically determine the serialization of the stored data (based on the file
extension, which is the same as the name the serializer is registered under),
provided the corresponding serializer is registered. If it isn’t, a
ValueError
will be raised.
Built-in serializers¶
There are 2 built-in, pre-configured serializers:
pickle
— the default serializer for both cached and stored data, with very good support for native Python data types; andjson
— a very common data format, but with limited support for native Python data types.
See the built-in pickle
and json
libraries for
more information on the serialization formats.
Managing serializers¶
You can add your own serializer, or replace the built-in ones, using the
configured instance of SerializerManager
at
workflow.manager
, e.g. from workflow import manager
.
A serializer
object must have load()
and dump()
methods that work
the same way as in the built-in json
and pickle
libraries, i.e.:
1# Reading
2obj = serializer.load(open('filename', 'rb'))
3# Writing
4serializer.dump(obj, open('filename', 'wb'))
To register a new serializer, call the
register()
method of the
workflow.manager
object with the name of the serializer and the object
that performs serialization:
1 from workflow import Workflow, manager
2
3
4 class MySerializer(object):
5
6 @classmethod
7 def load(cls, file_obj):
8 # load data from file_obj
9
10 @classmethod
11 def dump(cls, obj, file_obj):
12 # serialize obj to file_obj
13
14 manager.register('myformat', MySerializer())
Note
The name you specify for your serializer will be the file extension of the stored files.
Serializer interface¶
A serializer must conform to this interface (like json
and
pickle
):
1serializer.load(file_obj)
2serializer.dump(obj, file_obj)
See the Serializers section of the API documentation for more information.