Warm Cache, Cool Dash: Streamline Your Superset Experience
Just like it is with any physical exercise, if you are going to exert yourself with minimal damage, it is prudent to have a warmup, the same goes for most web applications.
Case in point, I have been working on Superset for a while now and as a result, had made some pretty hefty dashboards with up to 30 charts on a single tab. As a result of this, I have recently hit load time limits , which Superset communicates as a timeout issues, with some of these larger dashboards . If you, like me are running into timeout issues with your charts, performing a cache warmup might be something you want to consider as this allows resources to be stored in browser resulting in a faster experience for users as it cuts out the cost of the request to the backend (API/ database).
There are some other measures you can implement as provided by the good people at Superset including:
Prerequisites:
- Apache Superset: A deployed running instance of Superset. For this article, I am using an image deployed on an AWS Containers using Docker.
- You will also need some working knowledge of how APIs work .
- Python knowledge is assumed
Warming up
To start off , make sure your deployed Superset instance has a user with a role which has the permission can warmup cache
and SecurityRestApi
access assigned.
Apache Superset provides 2 options for warming up the cache depending on the level of warmup desired:
1) Warmup the cache for a given chart as well as any additional dashboard or filter context via the api/v1/chart/warm_up_cache
endpoint
2) Warmup all charts powered by a given dataset as well as any additional filter context using the /api/v1/dataset/warm_up_cache
endpoint
For this example, I will be using the latter option as this is more efficient for the use case where many charts use the same dataset
Performing a warmup will involve three steps:
1. Identify the datasets that have changed and the dashboards affected
Since I am using Airflow to create and update datasets , I navigate the file system for each pipeline for files ending with .sql
as this translates to the table names stored in my database. With these dataset names at hand, I can then fetch the dashboards that have a dependency on the dataset using the following query :
SELECT
DISTINCT c.id
FROM
public.dashboard_slices a
LEFT JOIN slices b ON a.slice_id = b.id
LEFT JOIN dashboards c ON c.id = a.dashboard_id
WHERE
b.datasource_name = '{schema_name}.{table_name}'
"""
Now that I have both the dataset name and the dashboard_id ,I store them in a data structure namedwarmup_items.
2. Set up a Superset Session
Remember the user who has specific permissions mentioned earlier? . With this user details, I can proceed to create a login session and then setup a CSRF(cross site request forgery) token in the session object that will be used for the rest of the tasks that will interact with Superset. Please note that the next step or any other interaction with the Superset API depends on this token!
def setup_superset_session() -> requests.Session:
"""
Returns:
requests.Session: A session object with a bearer token header
"""
login_url = superset_base_url + login_endpoint_fragment
client = requests.session()
login_request = {
"password": superset_user_password,
"provider": "db",
"refresh": "true",
"username": superset_user_name,
}
login = client.post(login_url, json=login_request)
if login.status_code != 200:
raise Exception(login.status_code, login.status_code)
access_token = login.json()["access_token"]
client.headers["Authorization"] = f"Bearer {access_token}"
return client
def get_superset_csrf(client:requests.Session)->requests.Session:
"""
Args:
client (requests.Session): a logged in session object
Returns:
requests.Session: session object with a CSRF token header
"""
csrf_url = superset_base_url + csrf_endpoint_fragment
payload = {}
guest_token = client.get(
csrf_url,
json=payload
)
if guest_token.status_code != 200:
raise Exception(guest_token.status_code, guest_token.json())
client.headers["Referer"] = superset_base_url
csrf_cookie = guest_token.json()["result"]
client.headers["X-Csrftoken"] = csrf_cookie
return client
3. Warm up the cache
Now that I have the datasets and dashboards that need warming up, as well as an authenticated session on the Superset API, I am ready to now make the request to warmup the cache!
warmup_url = superset_base_url + cache_warmup_endpoint_fragment
superset_session = setup_superset_session()
superset_session_with_crsf = get_superset_csrf(superset_session)
if len(warmup_items) > 0:
for item in warmup_items:
dashboard_ids = item["dashboard_ids"]
dataset_name = item["dataset_name"]
for id in dashboard_ids:
data = {
"dashboard_id": id,
"db_name": superset_dataset_database_name,
"extra_filters": "",
"table_name": dataset_name,
}
warmup_request = superset_session_with_crsf.put(
warmup_url,
json=data
)
if warmup_request.status_code != 200:
raise Exception(warmup_request.status_code, warmup_request.json())
Conclusion:
Admittedly, this is a first and dirty version of this warmup task and there are additional options to explore such as the extra filters
parameter for more efficiency on the warmup.
I am working on bench testing this against my previous performance and look forward to updating this to include the improvement seen/ not seen as a result of this!
Let me know if you have any questions about this particular Apache Superset optimization.