Evaluation¶
Read this first¶
- Quick run:
flash_ansr evaluate-run -c configs/evaluation/scaling/v23.0-20M_fastsrb.yaml --experiment flash_ansr_fastsrb_choices_00032 -v. - Outputs: pickles with entries containing
expression,log_prob,fits(per-dataset metrics), and optionalplaceholderentries when data generation fails but counts must stay aligned. - Scope: shared engine covers FlashANSR, PySR, NeSymReS, SkeletonPool, BruteForce, and E2E baselines via a single YAML config.
General workflow¶
- Use
flash_ansr evaluate-run -c <config>as the single entrypoint; configs live underconfigs/evaluation/(withscaling/,noise_sweep/, andsupport_sweep/families). - Each config wires a
data_source, amodel_adapter, and arunner(persistence/resume). The same structure covers FlashANSR, PySR, NeSymReS, and baselines. runner.resumeallows checkpointed pickles to continue; placeholders are inserted when sample generation fails so counts stay consistent.datasets_per_expressioncontrols how many deterministic datasets per skeleton/equation are generated; sampling mode is removed.
Models and baselines¶
- FlashANSR: Default adapter; supports generation overrides (beam/softmax/MCTS) and prompt options. See the scaling configs under
configs/evaluation/scaling/v23.0-*/. - PySR: Adapter expects PySR installed; config fields mirror PySR runtime knobs (timeout, iterations, parsimony). Watchdog helper:
scripts/evaluate_PySR.py. - NeSymReS: Adapter expects external checkout + checkpoint paths; exposes beam width/restarts. See
run_nesymres.yamland scaling configs. - SkeletonPoolModel (baseline): Transformer-free baseline that samples skeletons from a provided pool and only refines constants. Configure via
model_adapter.type: skeleton_pool(or add a dedicated config entry) with pool path/config,samples,unique,ignore_holdouts, andseed. Useful for ablations and replication. - BruteForceModel (baseline): Exhaustive baseline over a provided skeleton pool. Configs live alongside the scaling files.
- E2E: External transformer baseline. Requires the authors'
model1.pt, a workingsymbolicregressioninstall, and thee2e_fastsrbscaling config.
External model setup (one-time)¶
PySR
1. Install PySR into the same environment as flash-ansr: pip install pysr.
2. Trigger Julia precompilation (first import is slow): python -c "from pysr import PySRRegressor".
3. Optional but recommended for long sweeps: use the watchdog wrapper python scripts/evaluate_PySR.py -c <config> --experiment <name> -v to auto-restart if PySR stalls.
NeSymReS
1. Clone their repo (see README) and install: pip install -e nesymres/NeuralSymbolicRegressionThatScales/src.
2. Install Lightning compatible with the checkpoint loader: pip install pytorch-lightning==2.5.6.
3. Patch Python 3.13 incompatibilities: python scripts/patch_typing_io.py then python scripts/patch_nesymres.py nesymres/NeuralSymbolicRegressionThatScales.
4. Place the checkpoint triplet under models/nesymres/: eq_setting.json, config.yaml, 100M.ckpt.
E2E (End-to-end symbolic regression)
1. From e2e/symbolicregression, install dependencies (pip install -r requirements.txt or use the authors' environment.yml).
2. Patch for modern numpy + scaler guard + tree_idx compatibility (idempotent): python scripts/patch_symbolicregression.py e2e/symbolicregression.
3. Install the method with pip install -e ..
4. Install the required sympytorch fork: pip install git+https://github.com/pakamienny/sympytorch.git.
5. Download the pretrained checkpoint to e2e/model1.pt (mirror of https://dl.fbaipublicfiles.com/symbolicregression/model1.pt). Keep the filename as-is; the scaling config points there.
Configs at a glance¶
- Evaluation configs live under
configs/evaluation/(families:scaling/,noise_sweep/,support_sweep/). - Each file is a single run definition:
data_source,model_adapter, andrunnerblocks. - Multi-experiment configs run all experiments when
--experimentis omitted; pass a name to isolate one. - Outputs default to
results/evaluation/...as specified in the config; override with-o/--output-file.
Step-by-step run guide¶
0. Benchmark data¶
Fetch the FastSRB benchmark once (if you do not already have data/ansr-data/test_set/fastsrb/expressions.yaml):
mkdir -p "{{ROOT}}/data/ansr-data/test_set/fastsrb"
wget -O "{{ROOT}}/data/ansr-data/test_set/fastsrb/expressions.yaml" \
"https://raw.githubusercontent.com/viktmar/FastSRB/refs/heads/main/src/expressions.yaml"
This writes skeleton_pool.yaml and skeletons.pkl under the specified output directory.
1. Run evaluation¶
flash_ansr evaluate-run -c configs/evaluation/scaling/v23.0-20M_fastsrb.yaml --experiment flash_ansr_fastsrb_choices_00032 -v
- Adjust
-cto any file underconfigs/evaluation/and optionally set--experiment. - Override on the fly:
-n/--limit,--save-every,-o/--output-file,--no-resume. - The runner loads existing partial pickles, skips processed items, and appends new results. If sample generation fails within
max_trials, a placeholder entry is written to preserve counts.
2. Example configs¶
- FlashANSR v23.0-20M scaling:
configs/evaluation/scaling/v23.0-20M_fastsrb.yaml - PySR scaling:
configs/evaluation/scaling/pysr_fastsrb.yaml - NeSymReS scaling:
configs/evaluation/scaling/nesymres_fastsrb.yaml - E2E baseline:
configs/evaluation/scaling/e2e_fastsrb.yaml