Quickstart

Basic usage

Define a SQLAlchemy model whose columns mirror the params your simulation takes, then a runner that writes its output to params["result_file"]. Hand both to a Store:

from pathlib import Path

import numpy as np

from entropic import Store, Base, Mapped


class SimResult(Base):
    __tablename__ = "results"

    n: Mapped[int]
    steps: Mapped[int]
    dt: Mapped[float]


def my_sim(params: dict) -> None:
    data = np.random.randn(params["n"], params["steps"])
    np.save(params["result_file"], data)


store = Store(
    runner=my_sim,
    result_cls=SimResult,
    results_dir="./results",
    db_url="sqlite:///./runs.sqlite3",
    file_suffix=".npy",
)

record = store.run_or_retrieve({"n": 100, "steps": 5000, "dt": 0.01})
data = np.load(record.result_file)

The first call runs the simulation. Every subsequent call with the same parameters returns the cached row without re-running.

The result record

Every Store method that returns a record returns an instance of your result_cls. The four reserved columns from Base are always present; the rest come from your model.

record.id            # "a3f8c1d2e4b6f7a8" — 16-char hash, primary key
record.result_file   # "./results/a3f8c1d2e4b6f7a8.npy"
record.created_at    # datetime — UTC, set on insert
record.custom_data   # {"elapsed_seconds": 0.042}
record.n             # 100
record.dt            # 0.01

Retrieving without running

record = store.retrieve({"n": 100, "steps": 5000, "dt": 0.01})

Returns the model instance on a hit, None on a miss.

Forcing a re-run

run always invokes the runner. Same params hash to the same row, so a forced re-run overwrites the existing record (and result file) for that hash:

record = store.run({"n": 100, "steps": 5000, "dt": 0.01})

Deleting runs

store.delete({"n": 100, "steps": 5000, "dt": 0.01})

Pass remove_file=True to also delete the result file from disk:

store.delete({"n": 100, "steps": 5000, "dt": 0.01}, remove_file=True)

Returns True if a row was removed, False otherwise.

Registering external files

If a result file was produced outside entropic, index it via register:

store.register(
    {"n": 100, "steps": 5000, "dt": 0.01},
    result_file="./results/my_existing_run.npy",
)

The file must already exist. After registration the row is reachable via retrieve like any other run.

Parameter sweeps

sweep is the batch counterpart to run_or_retrieve: it takes an iterable of param dicts, reuses cached entries, and only invokes the runner for new parameter sets. It makes no assumption about how the sets relate, so any sweep shape is just an iterable.

For the common full-product case, build the iterable with expand_grid — each key maps to a list of candidate values, and it returns one dict per combination:

from entropic import expand_grid

records = store.sweep(
    expand_grid({"n": [100], "steps": [5000], "dt": [0.01, 0.005, 0.001]})
)
# runs 3 combinations: (n=100, steps=5000, dt=0.01), (…, dt=0.005), (…, dt=0.001)

For a multi-axis product:

records = store.sweep(expand_grid({"n": [50, 100], "dt": [0.01, 0.005]}))
# runs 4 combinations: (50, 0.01), (50, 0.005), (100, 0.01), (100, 0.005)

Because sweep takes a plain iterable, non-product sweeps need no special support — build the dicts however you like:

# zipped / diagonal sweep
records = store.sweep([{"n": n, "dt": dt} for n, dt in zip([50, 100], [0.01, 0.005])])

# filtered product (drop unstable regions)
records = store.sweep(p for p in expand_grid(grid) if p["dt"] * p["n"] < 1.0)

To parallelise with Dask:

from dask.distributed import Client
with Client() as dask_client:
    records = store.sweep(
        expand_grid({"n": [50, 100], "dt": [0.01, 0.005]}), client=dask_client
    )

Custom metadata

Any keyword argument to run, run_or_retrieve, or register lands on the row’s custom_data JSON column:

record = store.run_or_retrieve(
    {"n": 100, "steps": 5000, "dt": 0.01},
    git_sha="abc123",
    note="initial sweep",
)
record.custom_data
# {"elapsed_seconds": 0.042, "git_sha": "abc123", "note": "initial sweep"}

elapsed_seconds is added automatically on actual runs.

Logging

entropic uses a NullHandler by default (silent). To enable logging:

import logging
logging.getLogger("entropic").addHandler(logging.StreamHandler())
logging.getLogger("entropic").setLevel(logging.INFO)

This logs cache hits, run completions, ingestion, and file operations.