klepto package documentation

klepto: persistent caching to memory, disk, or database

About Klepto

klepto extends Python’s lru_cache to utilize different keymaps and alternate caching algorithms, such as lfu_cache and mru_cache. While caching is meant for fast access to saved results, klepto also has archiving capabilities, for longer-term storage. klepto uses a simple dictionary-sytle interface for all caches and archives, and all caches can be applied to any Python function as a decorator. Keymaps are algorithms for converting a function’s input signature to a unique dictionary, where the function’s results are the dictionary value. Thus for y = f(x), y will be stored in cache[x] (e.g. {x:y}).

klepto provides both standard and “safe” caching, where “safe” caches are slower but can recover from hashing errors. klepto is intended to be used for distributed and parallel computing, where several of the keymaps serialize the stored objects. Caches and archives are intended to be read/write accessible from different threads and processes. klepto enables a user to decorate a function, save the results to a file or database archive, close the interpreter, start a new session, and reload the function and it’s cache.

klepto is part of pathos, a Python framework for heterogeneous computing. klepto is in active development, so any user feedback, bug reports, comments, or suggestions are highly appreciated. A list of issues is located at https://github.com/uqfoundation/klepto/issues, with a legacy list maintained at https://uqfoundation.github.io/project/pathos/query.

Major Features

klepto has standard and “safe” variants of the following:

  • lfu_cache - the least-frequently-used caching algorithm

  • lru_cache - the least-recently-used caching algorithm

  • mru_cache - the most-recently-used caching algorithm

  • rr_cache - the random-replacement caching algorithm

  • no_cache - a dummy caching interface to archiving

  • inf_cache - an infinitely-growing cache

klepto has the following archive types:

  • file_archive - a dictionary-style interface to a file

  • dir_archive - a dictionary-style interface to a folder of files

  • sqltable_archive - a dictionary-style interface to a sql database table

  • sql_archive - a dictionary-style interface to a sql database

  • hdfdir_archive - a dictionary-style interface to a folder of hdf5 files

  • hdf_archive - a dictionary-style interface to a hdf5 file

  • dict_archive - a dictionary with an archive interface

  • null_archive - a dictionary-style interface to a dummy archive

klepto provides the following keymaps:

  • keymap - keys are raw Python objects

  • hashmap - keys are a hash for the Python object

  • stringmap - keys are the Python object cast as a string

  • picklemap - keys are the serialized Python object

klepto also includes a few useful decorators providing:

  • simple, shallow, or deep rounding of function arguments

  • cryptographic key generation, with masking of selected arguments

Current Release

The latest released version of klepto is available from:

klepto is distributed under a 3-clause BSD license.

Development Version

You can get the latest development version with all the shiny new features at:

If you have a new contribution, please submit a pull request.

Installation

klepto can be installed with pip:

$ pip install klepto

To include optional archive backends, such as HDF5 and SQL, in the install:

$ pip install klepto[archives]

To include optional serializers, such as jsonpickle, in the install:

$ pip install klepto[crypto]

Requirements

klepto requires:

  • python (or pypy), >=3.8

  • setuptools, >=42

  • dill, >=0.3.8

  • pox, >=0.3.4

Optional requirements:

  • h5py, >=2.8.0

  • pandas, >=0.17.0

  • sqlalchemy, >=1.4.0

  • jsonpickle, >=0.9.6

  • cloudpickle, >=0.5.2

More Information

Probably the best way to get started is to look at the documentation at http://klepto.rtfd.io. Also see klepto.tests for a set of scripts that test the caching and archiving functionalities in klepto. You can run the test suite with python -m klepto.tests. The source code is also generally well documented, so further questions may be resolved by inspecting the code itself. Please feel free to submit a ticket on github, or ask a question on stackoverflow (@Mike McKerns). If you would like to share how you use klepto in your work, please send an email (to mmckerns at uqfoundation dot org).

Citation

If you use klepto to do research that leads to publication, we ask that you acknowledge use of klepto by citing the following in your publication:

Michael McKerns and Michael Aivazis,
"pathos: a framework for heterogeneous computing", 2010- ;
https://uqfoundation.github.io/project/pathos

Please see https://uqfoundation.github.io/project/pathos or http://arxiv.org/pdf/1202.1056 for further information.

_keygen(func, ignored, *args, **kwds)

generate a ‘key’ from the (*args,**kwds) suitable for use in caching

func is the function being called ignored is the list of names and/or indicies to ignore args and kwds are func’s input arguments and keywords

returns the archive ‘key’ – does not call the function

ignored can include names (e.g. ‘x’,’y’), indicies (e.g. 0,1), or ‘*’,’**’. if ‘*’ in ignored, all varargs are ignored. Similarly for ‘**’ and varkwds.` Note that for class methods, it may be useful to ignore ‘self’.

citation()

print citation

class inf_cache(maxsize=None, cache=None, keymap=None, ignore=None, tol=None, deep=False, purge=False)

Bases: object

infinitely-growing (INF) cache decorator.

This decorator memoizes a function’s return value each time it is called. If called later with the same arguments, the cached value is returned, and not re-evaluated. This cache will grow without bound. To avoid memory issues, it is suggested to frequently dump and clear the cache. This decorator takes an integer tolerance ‘tol’, equal to the number of decimal places to which it will round off floats, and a bool ‘deep’ for whether the rounding on inputs will be ‘shallow’ or ‘deep’. Note that rounding is not applied to the calculation of new results, but rather as a simple form of cache interpolation. For example, with tol=0 and a cached value for f(3.0), f(3.1) will lookup f(3.0) in the cache while f(3.6) will store a new value; however if tol=1, both f(3.1) and f(3.6) will store new values.

maxsize = maximum cache size [fixed at maxsize=None] cache = storage hashmap (default is {}) keymap = cache key encoder (default is keymaps.hashmap(flat=True)) ignore = function argument names and indicies to ‘ignore’ (default is None) tol = integer tolerance for rounding (default is None) deep = boolean for rounding depth (default is False, i.e. ‘shallow’) purge = boolean for purge cache to archive at maxsize (fixed at False)

If keymap is given, it will replace the hashing algorithm for generating cache keys. Several hashing algorithms are available in ‘keymaps’. The default keymap requires arguments to the cached function to be hashable.

If the keymap retains type information, then arguments of different types will be cached separately. For example, f(3.0) and f(3) will be treated as distinct calls with distinct results. Cache typing has a memory penalty, and may also be ignored by some ‘keymaps’.

If ignore is given, the keymap will ignore the arguments with the names and/or positional indicies provided. For example, if ignore=(0,), then the key generated for f(1,2) will be identical to that of f(3,2) or f(4,2). If ignore=(‘y’,), then the key generated for f(x=3,y=4) will be identical to that of f(x=3,y=0) or f(x=3,y=10). If ignore=(‘*’,’**’), all varargs and varkwds will be ‘ignored’. Ignored arguments never trigger a recalculation (they only trigger cache lookups), and thus are ‘ignored’. When caching class methods, it may be useful to ignore=(‘self’,).

View cache statistics (hit, miss, load, maxsize, size) with f.info(). Clear the cache and statistics with f.clear(). Replace the cache archive with f.archive(obj). Load from the archive with f.load(), and dump from the cache to the archive with f.dump().

__call__(user_function)

Call self as a function.

__get__(obj, objtype)

support instance methods

__reduce__()

Helper for pickle.

isvalid(func, *args, **kwds)

check if func(*args,**kwds) is a valid call for function ‘func’

returns True if valid, returns False if an error is thrown

keygen(*ignored, **kwds)

decorator for generating a cache key for a given function

ignored: names and/or indicies of the function’s arguments to ‘ignore’ tol: integer tolerance for rounding (default is None) deep: boolean for rounding depth (default is False, i.e. ‘shallow’)

The decorator accepts integers (for the index of positional args to ignore, or strings (the names of the kwds to ignore). A cache key is returned, built with the registered keymap. Ignored arguments are stored in the keymap with a value of klepto.NULL. Note that for class methods, it may be useful to ignore ‘self’.

The decorated function will gain new methods for working with cache keys
  • call: __call__ the function with the most recently provided arguments

  • valid: True if the most recently provided arguments are valid

  • key: get the cache key for the most recently provided arguments

  • keymap: get the registered keymap [default: klepto.keymaps.keymap]

  • register: register a new keymap

The function is not evaluated until the ‘call’ method is called. Both generating the key and checking for validity avoid calling the function by inspecting the function’s input signature.

The default keymap is keymaps.keymap(flat=True). Alternate keymaps can be set with the ‘register’ method on the decorated function.

class lfu_cache(*args, **kwds)

Bases: object

least-frequenty-used (LFU) cache decorator.

This decorator memoizes a function’s return value each time it is called. If called later with the same arguments, the cached value is returned, and not re-evaluated. To avoid memory issues, a maximum cache size is imposed. For caches with an archive, the full cache dumps to archive upon reaching maxsize. For caches without an archive, the LFU algorithm manages the cache. Caches with an archive will use the latter behavior when ‘purge’ is False. This decorator takes an integer tolerance ‘tol’, equal to the number of decimal places to which it will round off floats, and a bool ‘deep’ for whether the rounding on inputs will be ‘shallow’ or ‘deep’. Note that rounding is not applied to the calculation of new results, but rather as a simple form of cache interpolation. For example, with tol=0 and a cached value for f(3.0), f(3.1) will lookup f(3.0) in the cache while f(3.6) will store a new value; however if tol=1, both f(3.1) and f(3.6) will store new values.

maxsize = maximum cache size cache = storage hashmap (default is {}) keymap = cache key encoder (default is keymaps.hashmap(flat=True)) ignore = function argument names and indicies to ‘ignore’ (default is None) tol = integer tolerance for rounding (default is None) deep = boolean for rounding depth (default is False, i.e. ‘shallow’) purge = boolean for purge cache to archive at maxsize (default is False)

If maxsize is None, this cache will grow without bound.

If keymap is given, it will replace the hashing algorithm for generating cache keys. Several hashing algorithms are available in ‘keymaps’. The default keymap requires arguments to the cached function to be hashable.

If the keymap retains type information, then arguments of different types will be cached separately. For example, f(3.0) and f(3) will be treated as distinct calls with distinct results. Cache typing has a memory penalty, and may also be ignored by some ‘keymaps’.

If ignore is given, the keymap will ignore the arguments with the names and/or positional indicies provided. For example, if ignore=(0,), then the key generated for f(1,2) will be identical to that of f(3,2) or f(4,2). If ignore=(‘y’,), then the key generated for f(x=3,y=4) will be identical to that of f(x=3,y=0) or f(x=3,y=10). If ignore=(‘*’,’**’), all varargs and varkwds will be ‘ignored’. Ignored arguments never trigger a recalculation (they only trigger cache lookups), and thus are ‘ignored’. When caching class methods, it may be useful to ignore=(‘self’,).

View cache statistics (hit, miss, load, maxsize, size) with f.info(). Clear the cache and statistics with f.clear(). Replace the cache archive with f.archive(obj). Load from the archive with f.load(), and dump from the cache to the archive with f.dump().

See: http://en.wikipedia.org/wiki/Cache_algorithms#Least_Frequently_Used

__call__(user_function)

Call self as a function.

__get__(obj, objtype)

support instance methods

static __new__(cls, *args, **kwds)
__reduce__()

Helper for pickle.

license()

print license

class lru_cache(*args, **kwds)

Bases: object

least-recently-used (LRU) cache decorator.

This decorator memoizes a function’s return value each time it is called. If called later with the same arguments, the cached value is returned, and not re-evaluated. To avoid memory issues, a maximum cache size is imposed. For caches with an archive, the full cache dumps to archive upon reaching maxsize. For caches without an archive, the LRU algorithm manages the cache. Caches with an archive will use the latter behavior when ‘purge’ is False. This decorator takes an integer tolerance ‘tol’, equal to the number of decimal places to which it will round off floats, and a bool ‘deep’ for whether the rounding on inputs will be ‘shallow’ or ‘deep’. Note that rounding is not applied to the calculation of new results, but rather as a simple form of cache interpolation. For example, with tol=0 and a cached value for f(3.0), f(3.1) will lookup f(3.0) in the cache while f(3.6) will store a new value; however if tol=1, both f(3.1) and f(3.6) will store new values.

maxsize = maximum cache size cache = storage hashmap (default is {}) keymap = cache key encoder (default is keymaps.hashmap(flat=True)) ignore = function argument names and indicies to ‘ignore’ (default is None) tol = integer tolerance for rounding (default is None) deep = boolean for rounding depth (default is False, i.e. ‘shallow’) purge = boolean for purge cache to archive at maxsize (default is False)

If maxsize is None, this cache will grow without bound.

If keymap is given, it will replace the hashing algorithm for generating cache keys. Several hashing algorithms are available in ‘keymaps’. The default keymap requires arguments to the cached function to be hashable.

If the keymap retains type information, then arguments of different types will be cached separately. For example, f(3.0) and f(3) will be treated as distinct calls with distinct results. Cache typing has a memory penalty, and may also be ignored by some ‘keymaps’.

If ignore is given, the keymap will ignore the arguments with the names and/or positional indicies provided. For example, if ignore=(0,), then the key generated for f(1,2) will be identical to that of f(3,2) or f(4,2). If ignore=(‘y’,), then the key generated for f(x=3,y=4) will be identical to that of f(x=3,y=0) or f(x=3,y=10). If ignore=(‘*’,’**’), all varargs and varkwds will be ‘ignored’. Ignored arguments never trigger a recalculation (they only trigger cache lookups), and thus are ‘ignored’. When caching class methods, it may be useful to ignore=(‘self’,).

View cache statistics (hit, miss, load, maxsize, size) with f.info(). Clear the cache and statistics with f.clear(). Replace the cache archive with f.archive(obj). Load from the archive with f.load(), and dump from the cache to the archive with f.dump().

See: http://en.wikipedia.org/wiki/Cache_algorithms#Least_Recently_Used

__call__(user_function)

Call self as a function.

__get__(obj, objtype)

support instance methods

static __new__(cls, *args, **kwds)
__reduce__()

Helper for pickle.

class mru_cache(*args, **kwds)

Bases: object

most-recently-used (MRU) cache decorator.

This decorator memoizes a function’s return value each time it is called. If called later with the same arguments, the cached value is returned, and not re-evaluated. To avoid memory issues, a maximum cache size is imposed. For caches with an archive, the full cache dumps to archive upon reaching maxsize. For caches without an archive, the MRU algorithm manages the cache. Caches with an archive will use the latter behavior when ‘purge’ is False. This decorator takes an integer tolerance ‘tol’, equal to the number of decimal places to which it will round off floats, and a bool ‘deep’ for whether the rounding on inputs will be ‘shallow’ or ‘deep’. Note that rounding is not applied to the calculation of new results, but rather as a simple form of cache interpolation. For example, with tol=0 and a cached value for f(3.0), f(3.1) will lookup f(3.0) in the cache while f(3.6) will store a new value; however if tol=1, both f(3.1) and f(3.6) will store new values.

maxsize = maximum cache size cache = storage hashmap (default is {}) keymap = cache key encoder (default is keymaps.hashmap(flat=True)) ignore = function argument names and indicies to ‘ignore’ (default is None) tol = integer tolerance for rounding (default is None) deep = boolean for rounding depth (default is False, i.e. ‘shallow’) purge = boolean for purge cache to archive at maxsize (default is False)

If maxsize is None, this cache will grow without bound.

If keymap is given, it will replace the hashing algorithm for generating cache keys. Several hashing algorithms are available in ‘keymaps’. The default keymap requires arguments to the cached function to be hashable.

If the keymap retains type information, then arguments of different types will be cached separately. For example, f(3.0) and f(3) will be treated as distinct calls with distinct results. Cache typing has a memory penalty, and may also be ignored by some ‘keymaps’.

If ignore is given, the keymap will ignore the arguments with the names and/or positional indicies provided. For example, if ignore=(0,), then the key generated for f(1,2) will be identical to that of f(3,2) or f(4,2). If ignore=(‘y’,), then the key generated for f(x=3,y=4) will be identical to that of f(x=3,y=0) or f(x=3,y=10). If ignore=(‘*’,’**’), all varargs and varkwds will be ‘ignored’. Ignored arguments never trigger a recalculation (they only trigger cache lookups), and thus are ‘ignored’. When caching class methods, it may be useful to ignore=(‘self’,).

View cache statistics (hit, miss, load, maxsize, size) with f.info(). Clear the cache and statistics with f.clear(). Replace the cache archive with f.archive(obj). Load from the archive with f.load(), and dump from the cache to the archive with f.dump().

See: http://en.wikipedia.org/wiki/Cache_algorithms#Most_Recently_Used

__call__(user_function)

Call self as a function.

__get__(obj, objtype)

support instance methods

static __new__(cls, *args, **kwds)
__reduce__()

Helper for pickle.

class no_cache(maxsize=0, cache=None, keymap=None, ignore=None, tol=None, deep=False, purge=True)

Bases: object

empty (NO) cache decorator.

Unlike other cache decorators, this decorator does not cache. It is a dummy that collects statistics and conforms to the caching interface. This decorator takes an integer tolerance ‘tol’, equal to the number of decimal places to which it will round off floats, and a bool ‘deep’ for whether the rounding on inputs will be ‘shallow’ or ‘deep’. Note that rounding is not applied to the calculation of new results, but rather as a simple form of cache interpolation. For example, with tol=0 and a cached value for f(3.0), f(3.1) will lookup f(3.0) in the cache while f(3.6) will store a new value; however if tol=1, both f(3.1) and f(3.6) will store new values.

maxsize = maximum cache size [fixed at maxsize=0] cache = storage hashmap (default is {}) keymap = cache key encoder (default is keymaps.hashmap(flat=True)) ignore = function argument names and indicies to ‘ignore’ (default is None) tol = integer tolerance for rounding (default is None) deep = boolean for rounding depth (default is False, i.e. ‘shallow’) purge = boolean for purge cache to archive at maxsize (fixed at True)

If keymap is given, it will replace the hashing algorithm for generating cache keys. Several hashing algorithms are available in ‘keymaps’. The default keymap requires arguments to the cached function to be hashable.

If the keymap retains type information, then arguments of different types will be cached separately. For example, f(3.0) and f(3) will be treated as distinct calls with distinct results. Cache typing has a memory penalty, and may also be ignored by some ‘keymaps’. Here, the keymap is only used to look up keys in an associated archive.

If ignore is given, the keymap will ignore the arguments with the names and/or positional indicies provided. For example, if ignore=(0,), then the key generated for f(1,2) will be identical to that of f(3,2) or f(4,2). If ignore=(‘y’,), then the key generated for f(x=3,y=4) will be identical to that of f(x=3,y=0) or f(x=3,y=10). If ignore=(‘*’,’**’), all varargs and varkwds will be ‘ignored’. Ignored arguments never trigger a recalculation (they only trigger cache lookups), and thus are ‘ignored’. When caching class methods, it may be useful to ignore=(‘self’,).

View cache statistics (hit, miss, load, maxsize, size) with f.info(). Clear the cache and statistics with f.clear(). Replace the cache archive with f.archive(obj). Load from the archive with f.load(), and dump from the cache to the archive with f.dump().

__call__(user_function)

Call self as a function.

__get__(obj, objtype)

support instance methods

__reduce__()

Helper for pickle.

class rr_cache(*args, **kwds)

Bases: object

random-replacement (RR) cache decorator.

This decorator memoizes a function’s return value each time it is called. If called later with the same arguments, the cached value is returned, and not re-evaluated. To avoid memory issues, a maximum cache size is imposed. For caches with an archive, the full cache dumps to archive upon reaching maxsize. For caches without an archive, the RR algorithm manages the cache. Caches with an archive will use the latter behavior when ‘purge’ is False. This decorator takes an integer tolerance ‘tol’, equal to the number of decimal places to which it will round off floats, and a bool ‘deep’ for whether the rounding on inputs will be ‘shallow’ or ‘deep’. Note that rounding is not applied to the calculation of new results, but rather as a simple form of cache interpolation. For example, with tol=0 and a cached value for f(3.0), f(3.1) will lookup f(3.0) in the cache while f(3.6) will store a new value; however if tol=1, both f(3.1) and f(3.6) will store new values.

maxsize = maximum cache size cache = storage hashmap (default is {}) keymap = cache key encoder (default is keymaps.hashmap(flat=True)) ignore = function argument names and indicies to ‘ignore’ (default is None) tol = integer tolerance for rounding (default is None) deep = boolean for rounding depth (default is False, i.e. ‘shallow’) purge = boolean for purge cache to archive at maxsize (default is False)

If maxsize is None, this cache will grow without bound.

If keymap is given, it will replace the hashing algorithm for generating cache keys. Several hashing algorithms are available in ‘keymaps’. The default keymap requires arguments to the cached function to be hashable.

If the keymap retains type information, then arguments of different types will be cached separately. For example, f(3.0) and f(3) will be treated as distinct calls with distinct results. Cache typing has a memory penalty, and may also be ignored by some ‘keymaps’.

If ignore is given, the keymap will ignore the arguments with the names and/or positional indicies provided. For example, if ignore=(0,), then the key generated for f(1,2) will be identical to that of f(3,2) or f(4,2). If ignore=(‘y’,), then the key generated for f(x=3,y=4) will be identical to that of f(x=3,y=0) or f(x=3,y=10). If ignore=(‘*’,’**’), all varargs and varkwds will be ‘ignored’. Ignored arguments never trigger a recalculation (they only trigger cache lookups), and thus are ‘ignored’. When caching class methods, it may be useful to ignore=(‘self’,).

View cache statistics (hit, miss, load, maxsize, size) with f.info(). Clear the cache and statistics with f.clear(). Replace the cache archive with f.archive(obj). Load from the archive with f.load(), and dump from the cache to the archive with f.dump().

http://en.wikipedia.org/wiki/Cache_algorithms#Random_Replacement

__call__(user_function)

Call self as a function.

__get__(obj, objtype)

support instance methods

static __new__(cls, *args, **kwds)
__reduce__()

Helper for pickle.

signature(func, variadic=True, markup=True, safe=False)

get the input signature of a function

func: the function to inspect variadic: if True, also return names of (*args, **kwds) used in func markup: if True, show a “!” before any ‘unsettable’ parameters safe: if True, return (None,None,None,None) instead of throwing an error

Returns a tuple of variable names and a dict of keywords with defaults. If variadic=True, additionally return names of func’s (*args, **kwds).

Python functions, methods, lambdas, and partials can be inspected. If safe=False, non-python functions (e.g. builtins) will raise an error.

For partials, ‘fixed’ args correspond to positional arguments given in when the partial was defined. Partials have ‘unsettalble’ parameters, where, these parameters may be given as input but will throw errors. If markup=True, ‘unsettable’ parameters are denoted by a prepended ‘!’.

For example:

>>> def bar(x,y,z,a=1,b=2,*args):
...   return x+y+z+a+b
...
>>> signature(bar)
(('x', 'y', 'z', 'a', 'b'), {'a': 1, 'b': 2}, 'args', '')
>>>
>>> # a partial with a 'fixed' x, thus x is 'unsettable' as a keyword
>>> p = partial(bar, 0)
>>> signature(p)
(('y', 'z', 'a', 'b'), {'a': 1, '!x': 0, 'b': 2}, 'args', '')
>>> p(0,1)
4
>>> p(0,1,2,3,4,5)
6
>>>
>>> # a partial where y is 'unsettable' as a positional argument
>>> p = partial(bar, y=10)
>>> signature(p)
(('x', '!y', 'z', 'a', 'b'), {'a': 1, 'y': 10, 'b': 2}, 'args', '')
>>> p(0,1,2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: bar() got multiple values for keyword argument 'y'
>>> p(0,z=2)
15
>>> p(0,y=1,z=2)
6
>>>
>>> # a partial with a 'fixed' x, and positionally 'unsettable' b
>>> p = partial(bar, 0,b=10)
>>> signature(p)
(('y', 'z', 'a', '!b'), {'a': 1, '!x': 0, 'b': 10}, 'args', '')
>>>
>>> # apply some options that reduce information content
>>> signature(p, markup=False)
(('y', 'z', 'a', 'b'), {'a': 1, 'b': 10}, 'args', '')
>>> signature(p, markup=False, variadic=False)
(('y', 'z', 'a', 'b'), {'a': 1, 'b': 10})
strip_markup(names, defaults)

strip markup (‘!’) from function argument names and defaults

validate(func, *args, **kwds)

validate a function’s arguments and keywords against the call signature

Raises an exception when args and kwds do not match the call signature. If valid args and kwds are provided, “None” is returned.

NOTE: ‘validate’ does not call f(*args,**kwds), instead checks *args,**kwds against the call signature of func. Thus, ‘validate’ will fail when called to inspect builtins and other non-python functions.

Indices and tables