# API Reference

## `Store`

```python
class Store(Generic[ModelT]):
    def __init__(
        self,
        runner: Callable[[dict[str, Any]], None],
        result_cls: type[ModelT],
        results_dir: str | Path = "./results",
        file_suffix: str = ".h5",
        db_url: str = "sqlite:///db.sqlite3",
    ) -> None
```

The main entry point. Creates `results_dir` if it does not exist and runs
`metadata.create_all` on the engine derived from `db_url`.

| Parameter     | Description                                                                                    |
| ------------- | ---------------------------------------------------------------------------------------------- |
| `runner`      | Callable invoked as `runner(params)`. The Store injects `params["result_file"]` before calling. |
| `result_cls`  | User-defined SQLAlchemy model subclassing `entropic.Base`. Columns must mirror `params` keys.   |
| `results_dir` | Directory where result files (and ingest sidecars) live. Created if missing.                    |
| `file_suffix` | Extension appended to auto-generated result filenames (e.g. `".h5"`, `".npy"`, `".csv"`).       |
| `db_url`      | SQLAlchemy URL for the backing database. SQLite by default; any dialect SQLAlchemy supports.    |

`Store` is generic in `ModelT`; methods that return a record are typed as
`ModelT` so your editor sees the user-defined columns.

### Methods

#### `run_or_retrieve`

```python
def run_or_retrieve(
    self,
    params: dict[str, Any],
    **custom_data: Any,
) -> ModelT
```

Returns the cached row if `params` hashes to an existing primary key. Otherwise
calls `run` and persists the new row. `custom_data` is forwarded to the runner
and stored on the row's `custom_data` column when a run actually happens.

#### `run`

```python
def run(
    self,
    params: dict[str, Any],
    **custom_data: Any,
) -> ModelT
```

Always executes the runner and persists. Same params hash to the same primary
key, so a re-run overwrites the existing row (and the file at the same path).

`elapsed_seconds` is automatically added to `custom_data`.

#### `retrieve`

```python
def retrieve(self, params: dict[str, Any]) -> ModelT | None
```

Look up a row by exact parameter match. Returns `None` on a miss.

If `params` contains an explicit `id` it is used verbatim and hashing is
skipped; otherwise the reserved keys (`result_file`, `created_at`, `custom_data`)
are stripped from a copy and the rest is hashed.

#### `register`

```python
def register(
    self,
    params: dict[str, Any],
    result_file: str | Path,
    **custom_data: Any,
) -> ModelT
```

Index an externally-produced result file. Raises `FileNotFoundError` if
`result_file` does not exist.

#### `sweep`

```python
def sweep(
    self,
    params: Iterable[dict[str, Any]],
    client: Client | None = None,
) -> list[ModelT]
```

Batch counterpart to `run_or_retrieve`: run or retrieve a result for every
parameter set in `params`. `sweep` makes no assumption about how the sets
relate — full Cartesian products, zipped/diagonal sweeps, sampled or filtered
sets are all just iterables of dicts. For the common full-product case, build
the iterable with [`expand_grid`](#expand_grid).

`params` is consumed once, so generators are fine. Duplicate parameter sets
(same hash) are de-duplicated. Cached entries are reused; only misses invoke
the runner.

If `client` is a Dask `distributed.Client`, new runs are dispatched as futures
via `client.map` and gathered before returning. On any error the client falls
back to serial execution.

#### `delete`

```python
def delete(self, params: dict[str, Any], remove_file: bool = False) -> bool
```

Delete a row by exact parameter match. If `remove_file=True`, also unlinks
the result file. Returns `True` if a row was removed.

## `expand_grid`

```python
def expand_grid(grid: dict[str, list[Any]]) -> list[dict[str, Any]]
```

Convenience builder for the common full-product sweep: expands a grid (each key
mapped to a list of candidate values) into the full Cartesian product, in
`itertools.product` order. Feed the result straight to [`sweep`](#sweep):

```python
from entropic import expand_grid

store.sweep(expand_grid({"alpha": [1, 2, 3], "beta": [0.1, 0.2]}))
```

`expand_grid` is the only product-expansion helper entropic ships; non-product
sweeps (zip, sampling, filtering) are expressed directly as iterables of dicts.

## `Base` — record schema

```python
from entropic import Base, Mapped, mapped_column

class SimResult(Base):
    __tablename__ = "results"

    # your columns — must match keys in your params dicts
    n: Mapped[int]
    dt: Mapped[float]
```

`Base` is a SQLAlchemy `DeclarativeBase` subclass that defines four reserved
columns:

| Column        | Type                | Description                                                            |
| ------------- | ------------------- | ---------------------------------------------------------------------- |
| `id`          | `str` (PK)          | 16-character hex hash of params.                                       |
| `result_file` | `str`               | Path to the result file on disk.                                       |
| `created_at`  | `datetime`          | UTC timestamp, default `datetime.utcnow` at insert.                    |
| `custom_data` | `dict[str, Any]`    | Mutable JSON column. Always non-null; defaults to `{}`.                |

The four reserved column names cannot be redefined as user columns.

`Base` also provides `apply_patch(data)` and `_apply_custom_data_patch(patch)`
for partial updates: a `None` value on `custom_data` keys removes them, an
empty dict clears the column, otherwise keys are merged.

## Runner contract

```python
Runner = Callable[[dict[str, Any]], None]

def my_runner(params: dict[str, Any]) -> None:
    # params["result_file"] is the path to write to (auto-injected by the Store)
    # everything else is your simulation parameters
    ...
```

entropic is format-agnostic — HDF5, NumPy, Parquet, CSV, anything works.

## Parameter hashing

Parameters are normalized before hashing to ensure stability across Python runs:

- **Dict keys** are sorted recursively.
- **Floats** are rounded to 12 decimal digits (suppresses IEEE 754 noise).
- **Enums** are replaced by their `.value`.
- **Lists and tuples** preserve order; tuples become lists; each element is normalized.
- **Everything else** falls back to `str()`.

The normalized structure is serialized to compact JSON and hashed with SHA-256.
The first 16 hex characters (64 bits) are used as the row's primary key.

`{"dt": 0.1, "n": 100}` and `{"n": 100, "dt": 0.1}` produce the same hash.

## Reserved keys in `params`

`id`, `result_file`, `created_at`, and `custom_data` are stripped from a copy
of `params` before hashing (so passing them is harmless — they don't pollute
the hash). An explicit `id` short-circuits hashing and is used verbatim as the
primary key.

User-defined params keys must match column names on `result_cls`; extra keys
will fail the SQLAlchemy insert.