# API Reference ## `Store` ```python class Store(Generic[ModelT]): def __init__( self, runner: Callable[[dict[str, Any]], None], result_cls: type[ModelT], results_dir: str | Path = "./results", file_suffix: str = ".h5", db_url: str = "sqlite:///db.sqlite3", ) -> None ``` The main entry point. Creates `results_dir` if it does not exist and runs `metadata.create_all` on the engine derived from `db_url`. | Parameter | Description | | ------------- | ---------------------------------------------------------------------------------------------- | | `runner` | Callable invoked as `runner(params)`. The Store injects `params["result_file"]` before calling. | | `result_cls` | User-defined SQLAlchemy model subclassing `entropic.Base`. Columns must mirror `params` keys. | | `results_dir` | Directory where result files (and ingest sidecars) live. Created if missing. | | `file_suffix` | Extension appended to auto-generated result filenames (e.g. `".h5"`, `".npy"`, `".csv"`). | | `db_url` | SQLAlchemy URL for the backing database. SQLite by default; any dialect SQLAlchemy supports. | `Store` is generic in `ModelT`; methods that return a record are typed as `ModelT` so your editor sees the user-defined columns. ### Methods #### `run_or_retrieve` ```python def run_or_retrieve( self, params: dict[str, Any], **custom_data: Any, ) -> ModelT ``` Returns the cached row if `params` hashes to an existing primary key. Otherwise calls `run` and persists the new row. `custom_data` is forwarded to the runner and stored on the row's `custom_data` column when a run actually happens. #### `run` ```python def run( self, params: dict[str, Any], **custom_data: Any, ) -> ModelT ``` Always executes the runner and persists. Same params hash to the same primary key, so a re-run overwrites the existing row (and the file at the same path). `elapsed_seconds` is automatically added to `custom_data`. #### `retrieve` ```python def retrieve(self, params: dict[str, Any]) -> ModelT | None ``` Look up a row by exact parameter match. Returns `None` on a miss. If `params` contains an explicit `id` it is used verbatim and hashing is skipped; otherwise the reserved keys (`result_file`, `created_at`, `custom_data`) are stripped from a copy and the rest is hashed. #### `register` ```python def register( self, params: dict[str, Any], result_file: str | Path, **custom_data: Any, ) -> ModelT ``` Index an externally-produced result file. Raises `FileNotFoundError` if `result_file` does not exist. #### `sweep` ```python def sweep( self, params: Iterable[dict[str, Any]], client: Client | None = None, ) -> list[ModelT] ``` Batch counterpart to `run_or_retrieve`: run or retrieve a result for every parameter set in `params`. `sweep` makes no assumption about how the sets relate — full Cartesian products, zipped/diagonal sweeps, sampled or filtered sets are all just iterables of dicts. For the common full-product case, build the iterable with [`expand_grid`](#expand_grid). `params` is consumed once, so generators are fine. Duplicate parameter sets (same hash) are de-duplicated. Cached entries are reused; only misses invoke the runner. If `client` is a Dask `distributed.Client`, new runs are dispatched as futures via `client.map` and gathered before returning. On any error the client falls back to serial execution. #### `delete` ```python def delete(self, params: dict[str, Any], remove_file: bool = False) -> bool ``` Delete a row by exact parameter match. If `remove_file=True`, also unlinks the result file. Returns `True` if a row was removed. ## `expand_grid` ```python def expand_grid(grid: dict[str, list[Any]]) -> list[dict[str, Any]] ``` Convenience builder for the common full-product sweep: expands a grid (each key mapped to a list of candidate values) into the full Cartesian product, in `itertools.product` order. Feed the result straight to [`sweep`](#sweep): ```python from entropic import expand_grid store.sweep(expand_grid({"alpha": [1, 2, 3], "beta": [0.1, 0.2]})) ``` `expand_grid` is the only product-expansion helper entropic ships; non-product sweeps (zip, sampling, filtering) are expressed directly as iterables of dicts. ## `Base` — record schema ```python from entropic import Base, Mapped, mapped_column class SimResult(Base): __tablename__ = "results" # your columns — must match keys in your params dicts n: Mapped[int] dt: Mapped[float] ``` `Base` is a SQLAlchemy `DeclarativeBase` subclass that defines four reserved columns: | Column | Type | Description | | ------------- | ------------------- | ---------------------------------------------------------------------- | | `id` | `str` (PK) | 16-character hex hash of params. | | `result_file` | `str` | Path to the result file on disk. | | `created_at` | `datetime` | UTC timestamp, default `datetime.utcnow` at insert. | | `custom_data` | `dict[str, Any]` | Mutable JSON column. Always non-null; defaults to `{}`. | The four reserved column names cannot be redefined as user columns. `Base` also provides `apply_patch(data)` and `_apply_custom_data_patch(patch)` for partial updates: a `None` value on `custom_data` keys removes them, an empty dict clears the column, otherwise keys are merged. ## Runner contract ```python Runner = Callable[[dict[str, Any]], None] def my_runner(params: dict[str, Any]) -> None: # params["result_file"] is the path to write to (auto-injected by the Store) # everything else is your simulation parameters ... ``` entropic is format-agnostic — HDF5, NumPy, Parquet, CSV, anything works. ## Parameter hashing Parameters are normalized before hashing to ensure stability across Python runs: - **Dict keys** are sorted recursively. - **Floats** are rounded to 12 decimal digits (suppresses IEEE 754 noise). - **Enums** are replaced by their `.value`. - **Lists and tuples** preserve order; tuples become lists; each element is normalized. - **Everything else** falls back to `str()`. The normalized structure is serialized to compact JSON and hashed with SHA-256. The first 16 hex characters (64 bits) are used as the row's primary key. `{"dt": 0.1, "n": 100}` and `{"n": 100, "dt": 0.1}` produce the same hash. ## Reserved keys in `params` `id`, `result_file`, `created_at`, and `custom_data` are stripped from a copy of `params` before hashing (so passing them is harmless — they don't pollute the hash). An explicit `id` short-circuits hashing and is used verbatim as the primary key. User-defined params keys must match column names on `result_cls`; extra keys will fail the SQLAlchemy insert.