klepto package documentation
klepto: persistent caching to memory, disk, or database
About Klepto
klepto
extends Python’s lru_cache
to utilize different keymaps and
alternate caching algorithms, such as lfu_cache
and mru_cache
.
While caching is meant for fast access to saved results, klepto
also
has archiving capabilities, for longer-term storage. klepto
uses a
simple dictionary-sytle interface for all caches and archives, and all
caches can be applied to any Python function as a decorator. Keymaps
are algorithms for converting a function’s input signature to a unique
dictionary, where the function’s results are the dictionary value.
Thus for y = f(x)
, y
will be stored in cache[x]
(e.g. {x:y}
).
klepto
provides both standard and “safe” caching, where “safe” caches
are slower but can recover from hashing errors. klepto
is intended
to be used for distributed and parallel computing, where several of
the keymaps serialize the stored objects. Caches and archives are
intended to be read/write accessible from different threads and
processes. klepto
enables a user to decorate a function, save the
results to a file or database archive, close the interpreter,
start a new session, and reload the function and it’s cache.
klepto
is part of pathos
, a Python framework for heterogeneous computing.
klepto
is in active development, so any user feedback, bug reports, comments,
or suggestions are highly appreciated. A list of issues is located at https://github.com/uqfoundation/klepto/issues, with a legacy list maintained at https://uqfoundation.github.io/project/pathos/query.
Major Features
klepto
has standard and “safe” variants of the following:
lfu_cache
- the least-frequently-used caching algorithm
lru_cache
- the least-recently-used caching algorithm
mru_cache
- the most-recently-used caching algorithm
rr_cache
- the random-replacement caching algorithm
no_cache
- a dummy caching interface to archiving
inf_cache
- an infinitely-growing cache
klepto
has the following archive types:
file_archive
- a dictionary-style interface to a file
dir_archive
- a dictionary-style interface to a folder of files
sqltable_archive
- a dictionary-style interface to a sql database table
sql_archive
- a dictionary-style interface to a sql database
hdfdir_archive
- a dictionary-style interface to a folder of hdf5 files
hdf_archive
- a dictionary-style interface to a hdf5 file
dict_archive
- a dictionary with an archive interface
null_archive
- a dictionary-style interface to a dummy archive
klepto
provides the following keymaps:
keymap
- keys are raw Python objects
hashmap
- keys are a hash for the Python object
stringmap
- keys are the Python object cast as a string
picklemap
- keys are the serialized Python object
klepto
also includes a few useful decorators providing:
simple, shallow, or deep rounding of function arguments
cryptographic key generation, with masking of selected arguments
Current Release
The latest released version of klepto
is available from:
klepto
is distributed under a 3-clause BSD license.
Development Version
You can get the latest development version with all the shiny new features at:
If you have a new contribution, please submit a pull request.
Installation
klepto
can be installed with pip
:
$ pip install klepto
To include optional archive backends, such as HDF5 and SQL, in the install:
$ pip install klepto[archives]
To include optional serializers, such as jsonpickle
, in the install:
$ pip install klepto[crypto]
Requirements
klepto
requires:
python
(orpypy
), >=3.8
setuptools
, >=42
dill
, >=0.3.7
pox
, >=0.3.3
Optional requirements:
h5py
, >=2.8.0
pandas
, >=0.17.0
sqlalchemy
, >=1.4.0
jsonpickle
, >=0.9.6
cloudpickle
, >=0.5.2
More Information
Probably the best way to get started is to look at the documentation at
http://klepto.rtfd.io. Also see klepto.tests
for a set of scripts that
test the caching and archiving functionalities in klepto
.
You can run the test suite with python -m klepto.tests
. The
source code is also generally well documented, so further questions may
be resolved by inspecting the code itself. Please feel free to submit
a ticket on github, or ask a question on stackoverflow (@Mike McKerns).
If you would like to share how you use klepto
in your work, please send
an email (to mmckerns at uqfoundation dot org).
Citation
If you use klepto
to do research that leads to publication, we ask that you
acknowledge use of klepto
by citing the following in your publication:
Michael McKerns and Michael Aivazis,
"pathos: a framework for heterogeneous computing", 2010- ;
https://uqfoundation.github.io/project/pathos
Please see https://uqfoundation.github.io/project/pathos or http://arxiv.org/pdf/1202.1056 for further information.
- _keygen(func, ignored, *args, **kwds)
generate a ‘key’ from the (*args,**kwds) suitable for use in caching
func is the function being called ignored is the list of names and/or indicies to ignore args and kwds are func’s input arguments and keywords
returns the archive ‘key’ – does not call the function
ignored can include names (e.g. ‘x’,’y’), indicies (e.g. 0,1), or ‘*’,’**’. if ‘*’ in ignored, all varargs are ignored. Similarly for ‘**’ and varkwds.` Note that for class methods, it may be useful to ignore ‘self’.
- citation()
print citation
- class inf_cache(maxsize=None, cache=None, keymap=None, ignore=None, tol=None, deep=False, purge=False)
Bases:
object
infinitely-growing (INF) cache decorator.
This decorator memoizes a function’s return value each time it is called. If called later with the same arguments, the cached value is returned, and not re-evaluated. This cache will grow without bound. To avoid memory issues, it is suggested to frequently dump and clear the cache. This decorator takes an integer tolerance ‘tol’, equal to the number of decimal places to which it will round off floats, and a bool ‘deep’ for whether the rounding on inputs will be ‘shallow’ or ‘deep’. Note that rounding is not applied to the calculation of new results, but rather as a simple form of cache interpolation. For example, with tol=0 and a cached value for f(3.0), f(3.1) will lookup f(3.0) in the cache while f(3.6) will store a new value; however if tol=1, both f(3.1) and f(3.6) will store new values.
maxsize = maximum cache size [fixed at maxsize=None] cache = storage hashmap (default is {}) keymap = cache key encoder (default is keymaps.hashmap(flat=True)) ignore = function argument names and indicies to ‘ignore’ (default is None) tol = integer tolerance for rounding (default is None) deep = boolean for rounding depth (default is False, i.e. ‘shallow’) purge = boolean for purge cache to archive at maxsize (fixed at False)
If keymap is given, it will replace the hashing algorithm for generating cache keys. Several hashing algorithms are available in ‘keymaps’. The default keymap requires arguments to the cached function to be hashable.
If the keymap retains type information, then arguments of different types will be cached separately. For example, f(3.0) and f(3) will be treated as distinct calls with distinct results. Cache typing has a memory penalty, and may also be ignored by some ‘keymaps’.
If ignore is given, the keymap will ignore the arguments with the names and/or positional indicies provided. For example, if ignore=(0,), then the key generated for f(1,2) will be identical to that of f(3,2) or f(4,2). If ignore=(‘y’,), then the key generated for f(x=3,y=4) will be identical to that of f(x=3,y=0) or f(x=3,y=10). If ignore=(‘*’,’**’), all varargs and varkwds will be ‘ignored’. Ignored arguments never trigger a recalculation (they only trigger cache lookups), and thus are ‘ignored’. When caching class methods, it may be useful to ignore=(‘self’,).
View cache statistics (hit, miss, load, maxsize, size) with f.info(). Clear the cache and statistics with f.clear(). Replace the cache archive with f.archive(obj). Load from the archive with f.load(), and dump from the cache to the archive with f.dump().
- __call__(user_function)
Call self as a function.
- __get__(obj, objtype)
support instance methods
- __reduce__()
Helper for pickle.
- isvalid(func, *args, **kwds)
check if func(*args,**kwds) is a valid call for function ‘func’
returns True if valid, returns False if an error is thrown
- keygen(*ignored, **kwds)
decorator for generating a cache key for a given function
ignored: names and/or indicies of the function’s arguments to ‘ignore’ tol: integer tolerance for rounding (default is None) deep: boolean for rounding depth (default is False, i.e. ‘shallow’)
The decorator accepts integers (for the index of positional args to ignore, or strings (the names of the kwds to ignore). A cache key is returned, built with the registered keymap. Ignored arguments are stored in the keymap with a value of klepto.NULL. Note that for class methods, it may be useful to ignore ‘self’.
- The decorated function will gain new methods for working with cache keys
call: __call__ the function with the most recently provided arguments
valid: True if the most recently provided arguments are valid
key: get the cache key for the most recently provided arguments
keymap: get the registered keymap [default: klepto.keymaps.keymap]
register: register a new keymap
The function is not evaluated until the ‘call’ method is called. Both generating the key and checking for validity avoid calling the function by inspecting the function’s input signature.
The default keymap is keymaps.keymap(flat=True). Alternate keymaps can be set with the ‘register’ method on the decorated function.
- class lfu_cache(*args, **kwds)
Bases:
object
least-frequenty-used (LFU) cache decorator.
This decorator memoizes a function’s return value each time it is called. If called later with the same arguments, the cached value is returned, and not re-evaluated. To avoid memory issues, a maximum cache size is imposed. For caches with an archive, the full cache dumps to archive upon reaching maxsize. For caches without an archive, the LFU algorithm manages the cache. Caches with an archive will use the latter behavior when ‘purge’ is False. This decorator takes an integer tolerance ‘tol’, equal to the number of decimal places to which it will round off floats, and a bool ‘deep’ for whether the rounding on inputs will be ‘shallow’ or ‘deep’. Note that rounding is not applied to the calculation of new results, but rather as a simple form of cache interpolation. For example, with tol=0 and a cached value for f(3.0), f(3.1) will lookup f(3.0) in the cache while f(3.6) will store a new value; however if tol=1, both f(3.1) and f(3.6) will store new values.
maxsize = maximum cache size cache = storage hashmap (default is {}) keymap = cache key encoder (default is keymaps.hashmap(flat=True)) ignore = function argument names and indicies to ‘ignore’ (default is None) tol = integer tolerance for rounding (default is None) deep = boolean for rounding depth (default is False, i.e. ‘shallow’) purge = boolean for purge cache to archive at maxsize (default is False)
If maxsize is None, this cache will grow without bound.
If keymap is given, it will replace the hashing algorithm for generating cache keys. Several hashing algorithms are available in ‘keymaps’. The default keymap requires arguments to the cached function to be hashable.
If the keymap retains type information, then arguments of different types will be cached separately. For example, f(3.0) and f(3) will be treated as distinct calls with distinct results. Cache typing has a memory penalty, and may also be ignored by some ‘keymaps’.
If ignore is given, the keymap will ignore the arguments with the names and/or positional indicies provided. For example, if ignore=(0,), then the key generated for f(1,2) will be identical to that of f(3,2) or f(4,2). If ignore=(‘y’,), then the key generated for f(x=3,y=4) will be identical to that of f(x=3,y=0) or f(x=3,y=10). If ignore=(‘*’,’**’), all varargs and varkwds will be ‘ignored’. Ignored arguments never trigger a recalculation (they only trigger cache lookups), and thus are ‘ignored’. When caching class methods, it may be useful to ignore=(‘self’,).
View cache statistics (hit, miss, load, maxsize, size) with f.info(). Clear the cache and statistics with f.clear(). Replace the cache archive with f.archive(obj). Load from the archive with f.load(), and dump from the cache to the archive with f.dump().
See: http://en.wikipedia.org/wiki/Cache_algorithms#Least_Frequently_Used
- __call__(user_function)
Call self as a function.
- __get__(obj, objtype)
support instance methods
- static __new__(cls, *args, **kwds)
- __reduce__()
Helper for pickle.
- license()
print license
- class lru_cache(*args, **kwds)
Bases:
object
least-recently-used (LRU) cache decorator.
This decorator memoizes a function’s return value each time it is called. If called later with the same arguments, the cached value is returned, and not re-evaluated. To avoid memory issues, a maximum cache size is imposed. For caches with an archive, the full cache dumps to archive upon reaching maxsize. For caches without an archive, the LRU algorithm manages the cache. Caches with an archive will use the latter behavior when ‘purge’ is False. This decorator takes an integer tolerance ‘tol’, equal to the number of decimal places to which it will round off floats, and a bool ‘deep’ for whether the rounding on inputs will be ‘shallow’ or ‘deep’. Note that rounding is not applied to the calculation of new results, but rather as a simple form of cache interpolation. For example, with tol=0 and a cached value for f(3.0), f(3.1) will lookup f(3.0) in the cache while f(3.6) will store a new value; however if tol=1, both f(3.1) and f(3.6) will store new values.
maxsize = maximum cache size cache = storage hashmap (default is {}) keymap = cache key encoder (default is keymaps.hashmap(flat=True)) ignore = function argument names and indicies to ‘ignore’ (default is None) tol = integer tolerance for rounding (default is None) deep = boolean for rounding depth (default is False, i.e. ‘shallow’) purge = boolean for purge cache to archive at maxsize (default is False)
If maxsize is None, this cache will grow without bound.
If keymap is given, it will replace the hashing algorithm for generating cache keys. Several hashing algorithms are available in ‘keymaps’. The default keymap requires arguments to the cached function to be hashable.
If the keymap retains type information, then arguments of different types will be cached separately. For example, f(3.0) and f(3) will be treated as distinct calls with distinct results. Cache typing has a memory penalty, and may also be ignored by some ‘keymaps’.
If ignore is given, the keymap will ignore the arguments with the names and/or positional indicies provided. For example, if ignore=(0,), then the key generated for f(1,2) will be identical to that of f(3,2) or f(4,2). If ignore=(‘y’,), then the key generated for f(x=3,y=4) will be identical to that of f(x=3,y=0) or f(x=3,y=10). If ignore=(‘*’,’**’), all varargs and varkwds will be ‘ignored’. Ignored arguments never trigger a recalculation (they only trigger cache lookups), and thus are ‘ignored’. When caching class methods, it may be useful to ignore=(‘self’,).
View cache statistics (hit, miss, load, maxsize, size) with f.info(). Clear the cache and statistics with f.clear(). Replace the cache archive with f.archive(obj). Load from the archive with f.load(), and dump from the cache to the archive with f.dump().
See: http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used
- __call__(user_function)
Call self as a function.
- __get__(obj, objtype)
support instance methods
- static __new__(cls, *args, **kwds)
- __reduce__()
Helper for pickle.
- class mru_cache(*args, **kwds)
Bases:
object
most-recently-used (MRU) cache decorator.
This decorator memoizes a function’s return value each time it is called. If called later with the same arguments, the cached value is returned, and not re-evaluated. To avoid memory issues, a maximum cache size is imposed. For caches with an archive, the full cache dumps to archive upon reaching maxsize. For caches without an archive, the MRU algorithm manages the cache. Caches with an archive will use the latter behavior when ‘purge’ is False. This decorator takes an integer tolerance ‘tol’, equal to the number of decimal places to which it will round off floats, and a bool ‘deep’ for whether the rounding on inputs will be ‘shallow’ or ‘deep’. Note that rounding is not applied to the calculation of new results, but rather as a simple form of cache interpolation. For example, with tol=0 and a cached value for f(3.0), f(3.1) will lookup f(3.0) in the cache while f(3.6) will store a new value; however if tol=1, both f(3.1) and f(3.6) will store new values.
maxsize = maximum cache size cache = storage hashmap (default is {}) keymap = cache key encoder (default is keymaps.hashmap(flat=True)) ignore = function argument names and indicies to ‘ignore’ (default is None) tol = integer tolerance for rounding (default is None) deep = boolean for rounding depth (default is False, i.e. ‘shallow’) purge = boolean for purge cache to archive at maxsize (default is False)
If maxsize is None, this cache will grow without bound.
If keymap is given, it will replace the hashing algorithm for generating cache keys. Several hashing algorithms are available in ‘keymaps’. The default keymap requires arguments to the cached function to be hashable.
If the keymap retains type information, then arguments of different types will be cached separately. For example, f(3.0) and f(3) will be treated as distinct calls with distinct results. Cache typing has a memory penalty, and may also be ignored by some ‘keymaps’.
If ignore is given, the keymap will ignore the arguments with the names and/or positional indicies provided. For example, if ignore=(0,), then the key generated for f(1,2) will be identical to that of f(3,2) or f(4,2). If ignore=(‘y’,), then the key generated for f(x=3,y=4) will be identical to that of f(x=3,y=0) or f(x=3,y=10). If ignore=(‘*’,’**’), all varargs and varkwds will be ‘ignored’. Ignored arguments never trigger a recalculation (they only trigger cache lookups), and thus are ‘ignored’. When caching class methods, it may be useful to ignore=(‘self’,).
View cache statistics (hit, miss, load, maxsize, size) with f.info(). Clear the cache and statistics with f.clear(). Replace the cache archive with f.archive(obj). Load from the archive with f.load(), and dump from the cache to the archive with f.dump().
See: http://en.wikipedia.org/wiki/Cache_algorithms#Most_Recently_Used
- __call__(user_function)
Call self as a function.
- __get__(obj, objtype)
support instance methods
- static __new__(cls, *args, **kwds)
- __reduce__()
Helper for pickle.
- class no_cache(maxsize=0, cache=None, keymap=None, ignore=None, tol=None, deep=False, purge=True)
Bases:
object
empty (NO) cache decorator.
Unlike other cache decorators, this decorator does not cache. It is a dummy that collects statistics and conforms to the caching interface. This decorator takes an integer tolerance ‘tol’, equal to the number of decimal places to which it will round off floats, and a bool ‘deep’ for whether the rounding on inputs will be ‘shallow’ or ‘deep’. Note that rounding is not applied to the calculation of new results, but rather as a simple form of cache interpolation. For example, with tol=0 and a cached value for f(3.0), f(3.1) will lookup f(3.0) in the cache while f(3.6) will store a new value; however if tol=1, both f(3.1) and f(3.6) will store new values.
maxsize = maximum cache size [fixed at maxsize=0] cache = storage hashmap (default is {}) keymap = cache key encoder (default is keymaps.hashmap(flat=True)) ignore = function argument names and indicies to ‘ignore’ (default is None) tol = integer tolerance for rounding (default is None) deep = boolean for rounding depth (default is False, i.e. ‘shallow’) purge = boolean for purge cache to archive at maxsize (fixed at True)
If keymap is given, it will replace the hashing algorithm for generating cache keys. Several hashing algorithms are available in ‘keymaps’. The default keymap requires arguments to the cached function to be hashable.
If the keymap retains type information, then arguments of different types will be cached separately. For example, f(3.0) and f(3) will be treated as distinct calls with distinct results. Cache typing has a memory penalty, and may also be ignored by some ‘keymaps’. Here, the keymap is only used to look up keys in an associated archive.
If ignore is given, the keymap will ignore the arguments with the names and/or positional indicies provided. For example, if ignore=(0,), then the key generated for f(1,2) will be identical to that of f(3,2) or f(4,2). If ignore=(‘y’,), then the key generated for f(x=3,y=4) will be identical to that of f(x=3,y=0) or f(x=3,y=10). If ignore=(‘*’,’**’), all varargs and varkwds will be ‘ignored’. Ignored arguments never trigger a recalculation (they only trigger cache lookups), and thus are ‘ignored’. When caching class methods, it may be useful to ignore=(‘self’,).
View cache statistics (hit, miss, load, maxsize, size) with f.info(). Clear the cache and statistics with f.clear(). Replace the cache archive with f.archive(obj). Load from the archive with f.load(), and dump from the cache to the archive with f.dump().
- __call__(user_function)
Call self as a function.
- __get__(obj, objtype)
support instance methods
- __reduce__()
Helper for pickle.
- class rr_cache(*args, **kwds)
Bases:
object
random-replacement (RR) cache decorator.
This decorator memoizes a function’s return value each time it is called. If called later with the same arguments, the cached value is returned, and not re-evaluated. To avoid memory issues, a maximum cache size is imposed. For caches with an archive, the full cache dumps to archive upon reaching maxsize. For caches without an archive, the RR algorithm manages the cache. Caches with an archive will use the latter behavior when ‘purge’ is False. This decorator takes an integer tolerance ‘tol’, equal to the number of decimal places to which it will round off floats, and a bool ‘deep’ for whether the rounding on inputs will be ‘shallow’ or ‘deep’. Note that rounding is not applied to the calculation of new results, but rather as a simple form of cache interpolation. For example, with tol=0 and a cached value for f(3.0), f(3.1) will lookup f(3.0) in the cache while f(3.6) will store a new value; however if tol=1, both f(3.1) and f(3.6) will store new values.
maxsize = maximum cache size cache = storage hashmap (default is {}) keymap = cache key encoder (default is keymaps.hashmap(flat=True)) ignore = function argument names and indicies to ‘ignore’ (default is None) tol = integer tolerance for rounding (default is None) deep = boolean for rounding depth (default is False, i.e. ‘shallow’) purge = boolean for purge cache to archive at maxsize (default is False)
If maxsize is None, this cache will grow without bound.
If keymap is given, it will replace the hashing algorithm for generating cache keys. Several hashing algorithms are available in ‘keymaps’. The default keymap requires arguments to the cached function to be hashable.
If the keymap retains type information, then arguments of different types will be cached separately. For example, f(3.0) and f(3) will be treated as distinct calls with distinct results. Cache typing has a memory penalty, and may also be ignored by some ‘keymaps’.
If ignore is given, the keymap will ignore the arguments with the names and/or positional indicies provided. For example, if ignore=(0,), then the key generated for f(1,2) will be identical to that of f(3,2) or f(4,2). If ignore=(‘y’,), then the key generated for f(x=3,y=4) will be identical to that of f(x=3,y=0) or f(x=3,y=10). If ignore=(‘*’,’**’), all varargs and varkwds will be ‘ignored’. Ignored arguments never trigger a recalculation (they only trigger cache lookups), and thus are ‘ignored’. When caching class methods, it may be useful to ignore=(‘self’,).
View cache statistics (hit, miss, load, maxsize, size) with f.info(). Clear the cache and statistics with f.clear(). Replace the cache archive with f.archive(obj). Load from the archive with f.load(), and dump from the cache to the archive with f.dump().
http://en.wikipedia.org/wiki/Cache_algorithms#Random_Replacement
- __call__(user_function)
Call self as a function.
- __get__(obj, objtype)
support instance methods
- static __new__(cls, *args, **kwds)
- __reduce__()
Helper for pickle.
- signature(func, variadic=True, markup=True, safe=False)
get the input signature of a function
func: the function to inspect variadic: if True, also return names of (*args, **kwds) used in func markup: if True, show a “!” before any ‘unsettable’ parameters safe: if True, return (None,None,None,None) instead of throwing an error
Returns a tuple of variable names and a dict of keywords with defaults. If variadic=True, additionally return names of func’s (*args, **kwds).
Python functions, methods, lambdas, and partials can be inspected. If safe=False, non-python functions (e.g. builtins) will raise an error.
For partials, ‘fixed’ args correspond to positional arguments given in when the partial was defined. Partials have ‘unsettalble’ parameters, where, these parameters may be given as input but will throw errors. If markup=True, ‘unsettable’ parameters are denoted by a prepended ‘!’.
For example:
>>> def bar(x,y,z,a=1,b=2,*args): ... return x+y+z+a+b ... >>> signature(bar) (('x', 'y', 'z', 'a', 'b'), {'a': 1, 'b': 2}, 'args', '') >>> >>> # a partial with a 'fixed' x, thus x is 'unsettable' as a keyword >>> p = partial(bar, 0) >>> signature(p) (('y', 'z', 'a', 'b'), {'a': 1, '!x': 0, 'b': 2}, 'args', '') >>> p(0,1) 4 >>> p(0,1,2,3,4,5) 6 >>> >>> # a partial where y is 'unsettable' as a positional argument >>> p = partial(bar, y=10) >>> signature(p) (('x', '!y', 'z', 'a', 'b'), {'a': 1, 'y': 10, 'b': 2}, 'args', '') >>> p(0,1,2) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: bar() got multiple values for keyword argument 'y' >>> p(0,z=2) 15 >>> p(0,y=1,z=2) 6 >>> >>> # a partial with a 'fixed' x, and positionally 'unsettable' b >>> p = partial(bar, 0,b=10) >>> signature(p) (('y', 'z', 'a', '!b'), {'a': 1, '!x': 0, 'b': 10}, 'args', '') >>> >>> # apply some options that reduce information content >>> signature(p, markup=False) (('y', 'z', 'a', 'b'), {'a': 1, 'b': 10}, 'args', '') >>> signature(p, markup=False, variadic=False) (('y', 'z', 'a', 'b'), {'a': 1, 'b': 10})
- strip_markup(names, defaults)
strip markup (‘!’) from function argument names and defaults
- validate(func, *args, **kwds)
validate a function’s arguments and keywords against the call signature
Raises an exception when args and kwds do not match the call signature. If valid args and kwds are provided, “None” is returned.
NOTE: ‘validate’ does not call f(*args,**kwds), instead checks *args,**kwds against the call signature of func. Thus, ‘validate’ will fail when called to inspect builtins and other non-python functions.