Quickstart

Basic usage

Define a SQLAlchemy model whose columns mirror the params your simulation takes, then a runner that writes its output to params["result_file"]. Hand both to a Store:

from pathlib import Path

import numpy as np

from entropic import Store, Base, Mapped


class SimResult(Base):
    __tablename__ = "results"

    n: Mapped[int]
    steps: Mapped[int]
    dt: Mapped[float]


def my_sim(params: dict) -> None:
    data = np.random.randn(params["n"], params["steps"])
    np.save(params["result_file"], data)


store = Store(
    runner=my_sim,
    result_cls=SimResult,
    results_dir="./results",
    db_url="sqlite:///./runs.sqlite3",
    file_suffix=".npy",
)

record = store.run_or_retrieve({"n": 100, "steps": 5000, "dt": 0.01})
data = np.load(record.result_file)

The first call runs the simulation. Every subsequent call with the same parameters returns the cached row without re-running.

The result record

Every Store method that returns a record returns an instance of your result_cls. The four reserved columns from Base are always present; the rest come from your model.

record.id            # "a3f8c1d2e4b6f7a8" — 16-char hash, primary key
record.result_file   # "./results/a3f8c1d2e4b6f7a8.npy"
record.created_at    # datetime — UTC, set on insert
record.custom_data   # {"elapsed_seconds": 0.042}
record.n             # 100
record.dt            # 0.01

Retrieving without running

record = store.retrieve({"n": 100, "steps": 5000, "dt": 0.01})

Returns the model instance on a hit, None on a miss.

Forcing a re-run

run always invokes the runner. Same params hash to the same row, so a forced re-run overwrites the existing record (and result file) for that hash:

record = store.run({"n": 100, "steps": 5000, "dt": 0.01})

Deleting runs

store.delete({"n": 100, "steps": 5000, "dt": 0.01})

Pass remove_file=True to also delete the result file from disk:

store.delete({"n": 100, "steps": 5000, "dt": 0.01}, remove_file=True)

Returns True if a row was removed, False otherwise.

Registering external files

If a result file was produced outside entropic, index it via register:

store.register(
    {"n": 100, "steps": 5000, "dt": 0.01},
    result_file="./results/my_existing_run.npy",
)

The file must already exist. After registration the row is reachable via retrieve like any other run.

Parameter sweeps

Pass a grid dict — each key maps to a list of candidate values. sweep expands all combinations via itertools.product, reuses cached entries, and only invokes the runner for new parameter sets:

records = store.sweep({"n": [100], "steps": [5000], "dt": [0.01, 0.005, 0.001]})
# runs 3 combinations: (n=100, steps=5000, dt=0.01), (…, dt=0.005), (…, dt=0.001)

For a multi-axis sweep:

records = store.sweep({"n": [50, 100], "dt": [0.01, 0.005]})
# runs 4 combinations: (50, 0.01), (50, 0.005), (100, 0.01), (100, 0.005)

To parallelise with Dask:

from dask.distributed import Client
with Client() as dask_client:
    records = store.sweep({"n": [50, 100], "dt": [0.01, 0.005]}, client=dask_client)

Custom metadata

Any keyword argument to run, run_or_retrieve, or register lands on the row’s custom_data JSON column:

record = store.run_or_retrieve(
    {"n": 100, "steps": 5000, "dt": 0.01},
    git_sha="abc123",
    note="initial sweep",
)
record.custom_data
# {"elapsed_seconds": 0.042, "git_sha": "abc123", "note": "initial sweep"}

elapsed_seconds is added automatically on actual runs.

Logging

entropic uses a NullHandler by default (silent). To enable logging:

import logging
logging.getLogger("entropic").addHandler(logging.StreamHandler())
logging.getLogger("entropic").setLevel(logging.INFO)

This logs cache hits, run completions, ingestion, and file operations.