What is pickling?
Pickle is used for serializing and de-serializing Python object structures, also called marshalling or flattening. Serialization refers to the process of converting an object in memory to a byte stream that can be stored on disk or sent over a network. Later on, this character stream can then be retrieved and de-serialized back to a Python object. Pickling is not to be confused with compression! The former is the conversion of an object from one representation (data in Random Access Memory (RAM)) to another (text on disk), while the latter is the process of encoding data with fewer bits, in order to save disk space.
What Can You Do With pickle?
Pickling is useful for applications where you need some degree of persistency in your data. Your program's state data can be saved to disk, so you can continue working on it later on. It can also be used to send data over a Transmission Control Protocol (TCP) or socket connection, or to store python objects in a database. Pickle is very useful for when you're working with machine learning algorithms, where you want to save them to be able to make new predictions at a later time, without having to rewrite everything or train the model all over again.
When Not To Use pickle
If you want to use data across different programming languages, pickle is not recommended. Its protocol is specific to Python, thus, cross-language compatibility is not guaranteed. The same holds for different versions of Python itself. Unpickling a file that was pickled in a different version of Python may not always work properly, so you have to make sure that you're using the same version and perform an update if necessary. You should also try not to unpickle data from an untrusted source. Malicious code inside the file might be executed upon unpickling.
Storing data with pickle
What can be pickled?
You can pickle objects with the following data types:
- Booleans,
- Integers,
- Floats,
- Complex numbers,
- (normal and Unicode) Strings,
- Tuples,
- Lists,
- Sets, and
- Dictionaries that ontain picklable objects.
All the above can be pickled, but you can also do the same for classes and functions, for example, if they are defined at the top level of a module.
Not everything can be pickled (easily), though: examples of this are generators, inner classes, lambda functions and defaultdicts. In the case of lambda functions, you need to use an additional package named dill
. With defaultdicts, you need to create them with a module-level function.
Pickling files
To use pickle, start by importing it in Python.
import pickle
dogs_dict = { 'Ozzy': 3, 'Filou': 8, 'Luna': 5, 'Skippy': 10, 'Barco': 12, 'Balou': 9, 'Laika': 16 }
To pickle this dictionary, you first need to specify the name of the file you will write it to, which is dogs
in this case.
Note that the file does not have an extension.
To open the file for writing, simply use the open()
function. The first argument should be the name of your file. The second argument is 'wb'
. The w
means that you'll be writing to the file, and b
refers to binary mode. This means that the data will be written in the form of byte objects. If you forget the b
, a TypeError: must be str, not bytes
will be returned. You may sometimes come across a slightly different notation; w+b
, but don't worry, it provides the same functionality.
filename = 'dogs'
outfile = open(filename,'wb')
Once the file is opened for writing, you can use pickle.dump()
, which takes two arguments: the object you want to pickle and the file to which the object has to be saved. In this case, the former will be dogs_dict
, while the latter will be outfile
.
Don't forget to close the file with close()
!
pickle.dump(dogs_dict,outfile)
outfile.close()
Now, a new file named dogs
should have appeared in the same directory as your Python script (unless you specified a file path as file name).
Unpickling files
The process of loading a pickled file back into a Python program is similar to the one you saw previously: use the open()
function again, but this time with 'rb'
as second argument (instead of wb
). The r
stands for read mode and the b
stands for binary mode. You'll be reading a binary file. Assign this to infile
. Next, use pickle.load()
, with infile
as argument, and assign it to new_dict
. The contents of the file are now assigned to this new variable. Again, you'll need to close the file at the end.
infile = open(filename,'rb')
new_dict = pickle.load(infile)
infile.close()
To make sure that you successfully unpickled it, you can print the dictionary, compare it to the previous dictionary and check its type with type()
.
print(new_dict)
print(new_dict==dogs_dict)
print(type(new_dict))
Output:
{'Ozzy': 3, 'Filou': 8, 'Luna': 5, 'Skippy': 10, 'Barco': 12, 'Balou': 9, 'Laika': 16}
True
<class 'dict'>
Comments
0 comments
Article is closed for comments.