Efficient caching in python: Building @decorator to cache function call
Fact: the image was generated by Dall-E 3 (:
Introduction
In the realm of software development, efficiency is key. One crucial aspect of enhancing application performance is caching — a technique that stores data so future requests for that data can be served faster. This article delves into creating a custom Python decorator to cache the results of function calls, a method particularly beneficial in Django-based applications.
Understanding @decorator in python
Python decorators are a powerful feature that allows for the modification of functions. They act as wrappers, enabling the addition of functionality to an existing code. For instance, a simple decorator might log information each time a function is called.
For more information find here.
Why cache function calls?
Caching function calls can significantly reduce the time and resources needed for repetitive and costly operations, such as database queries or complex calculations. However, it’s important to balance these benefits against potential drawbacks like increased memory usage.
Implementing the cache @decorator
Full code finds here.
Our focus is the cache_function decorator. This tool uses the functools.wraps module to preserve the original function’s metadata and utilizes function arguments to generate a unique cache key. The decorator allows for customizable cache timeouts and has built-in handling for cache misses.
import base64
import functools
import hashlib
import os
from typing import Callable, Dict
from django.core.cache import cache
SAFE_HASH_LENGTH = 16 # Default result length of hashing algorithm
TIMEOUT_DEFAULT = 600 # Default cache timeout in seconds
def cache_function(timeout: int = None, prefix: str = os.getenv("ENV_NAME")):
"""
Decorator to cache the result of a function call, using the function's parameters
as part of the cache key. This approach helps in maintaining a clean and efficient
caching strategy, especially for frequently called functions with varying arguments.
Note: If the environment variable 'IS_TEST' is set, caching is bypassed.
:param timeout: The duration (in seconds) to keep the result in cache. If None,
the result is cached indefinitely. Default is None.
:param prefix: An optional prefix for the cache key, typically used for namespacing.
Defaults to the value of the 'ENV_NAME' environment variable.
:return: The cached result of the function if available; otherwise, the actual function
result is computed, cached, and returned.
"""
def decorator(f: Callable):
@functools.wraps(f)
def decorated_function(*args, **kwargs):
if os.getenv("IS_TEST"):
return f(*args, **kwargs)
cache_key = make_function_cache_key(f, *args, **kwargs)
if prefix:
cache_key = f"{prefix}_{cache_key}"
# On cache hit: return cached result
cached_result = cache.get(cache_key)
if cached_result is not None:
return cached_result
# On cache miss: actually calculate result, cache it then return
result = f(*args, **kwargs)
if result is not None:
cache.set(cache_key, result, timeout=timeout)
else:
# https://docs.djangoproject.com/en/3.0/topics/cache/#django.core.caches.cache.get
# > We advise against storing the literal value None in the cache,
# > because you won’t be able to distinguish between your stored None
# > value and a cache miss signified by a return value of None.
print("WARNING: functions returning None can't be cached!")
return result
return decorated_function
return decorator
Key functions like make_function_cache_key and hash_data play a vital role in the caching mechanism. These functions ensure that each cache key is unique and that the data is stored efficiently.
def make_function_cache_key(f: Callable, *args, **kwargs) -> str:
"""
Generate a unique cache key for a function call, incorporating the function's
namespace, name, and hashed arguments. This key is used to store and retrieve
function results in the cache.
:param f: The function for which the cache key is being generated.
:param args: The arguments passed to the function.
:param kwargs: The keyword arguments passed to the function.
:return: A string representing the unique cache key.
"""
arg_hash = hash_data(f"{args}{kwargs}") if args or kwargs else None
cache_key = function_namespace(f)
if arg_hash:
cache_key = f"{cache_key}.{arg_hash}"
return cache_key
def function_namespace(f: Callable) -> str:
"""
Generate a namespace string for a given function. This namespace is a combination
of the function's module and its qualified name, ensuring uniqueness across different
modules and classes.
:param f: The function for which the namespace is to be generated.
:return: A string representing the namespace of the function.
"""
"""Attempts to returns unique namespace for function"""
module = f.__module__
if hasattr(f, "__qualname__"):
name = f.__qualname__
else:
name = f.__name__
namespace = ".".join((module, name))
return namespace
def hash_data(s: bytes or str, hash_length: int = SAFE_HASH_LENGTH) -> str:
"""
Compute a fixed-length hash of the given data (either in bytes or string format).
This is particularly useful for generating a compact and unique representation of
larger data structures.
:param s: The data to be hashed. Can be a byte string or a regular string.
:param hash_length: The length of the hash to be returned. Default is defined by SAFE_HASH_LENGTH.
:return: A string representing the hashed data.
"""
if isinstance(s, str):
s = s.encode()
return convert_base64_string(hashlib.md5(s).digest(), hash_length)
def convert_base64_string(s: bytes, length: int) -> str:
"""
Convert the given byte string to a Base64-encoded string, truncated to a specified length.
:param s: The byte string to be encoded.
:param length: The desired length of the resulting Base64 string.
:return: A Base64-encoded string, truncated to the specified length.
"""
return base64.b64encode(s, b"-_")[:length].decode()
Using the example of a BookRepository class in a Django application, we demonstrate how to apply the caching decorator to real-world scenarios. This example shows how caching can be used to optimize database queries.
@cache_function(timeout=TIMEOUT_DEFAULT)
def list_raw(author_id: int) -> list:
"""
List all books of an author, in a dictionary format.
:param author_id: The ID of the author.
:return: a list of tittle of books of the author.
Example: ["Harry Potter", "The Lord of the Rings"].
"""
return list(Book.objects.filter(author_id=author_id).values_list("title", flat=True))
class BookRepository:
@cache_function(timeout=TIMEOUT_DEFAULT)
def list_raw(author_id: int) -> Dict[str, str]:
"""
List all books of an author, in a dictionary format.
:param author_id: The ID of the author.
:return: A dictionary of books, with the book title as key and category as value.
Example: {"Harry Potter": "Fantasy", "The Lord of the Rings": "Fantasy"}.
"""
books = Book.objects.filter(author_id=author_id).values("author_id", "title", "category")
books_dict = {}
for book in books:
books_dict[book["title"]] = book["category"]
return books_dict
print(list_raw(1))
print(BookRepository.list_raw(1))
Conclusion
Caching is a powerful tool in a developer’s arsenal, capable of significantly improving application performance. The cache_function decorator provides a flexible and efficient way to implement caching in Python, particularly within Django applications.
While caching can greatly improve performance, it also introduces complexity regarding memory management and cache invalidation. It’s important to understand these trade-offs to make informed decisions about when and how to use caching.