# Evaluation & Simulation Framework The d/dx research stack consists of two main components: 1. **BacktestHarness** (`forecasting.simulation.backtest_harness`): Runs the C++ engine to generate features and targets. 2. **AnalysisHarness** (`forecasting.simulation.analysis_harness`): Analyzes the generated results (R^2, PnL, etc.) using Polars. --- ## 1. BacktestHarness (Feature Generation) The **BacktestHarness** orchestrates the execution of the C++ Event Engine. It is responsible for: 1. **Ingestion**: Loading raw MBO data (Databento `.dbn.zst`) from S3. 2. **Simulation**: Feeding events into the C++ `OrderBook` and `Feature` engines. 3. **Target Generation**: Calculating future returns (e.g. `target_60s`) in Python. 4. **Persistence**: Saving the resulting Feature+Target dataframe to S3 as Parquet. ### Workflow The BacktestHarness is configuration-driven (`BacktestConfig`). It supports both: * **Sequential Execution**: For debugging loop through days locally. * **Distributed Execution**: Using `run_distributed(executor)`, it fans out processing of days to hundreds of Modal workers. --- ## 2. AnalysisHarness (Research & Metrics) The **AnalysisHarness** is the core orchestration layer for *analyzing* the outputs of the backtest. It is designed to solve the common bottlenecks of quantitative research: repetitiveness, I/O bottlenecks, and lack of reproducibility. ### Why AnalysisHarness? Traditional research workflows often involve ad-hoc scripts that load data, modify it, and print results. This approach breaks down at scale: - **Memory Bound**: Loading 1TB of parquet files into pandas crashes the driver. - **I/O Bound**: Re-reading the same S3 files for every slightly different experiment is slow. - **Sequential**: Running 50 backtests one after another takes days. The Harness addresses these with three core principles: ### 1. Lazy Evaluation (Polars) Instead of executing operations immediately, the Harness builds a **computation graph** (DAG). - **Benefit**: You can define complex pipelines on massive datasets without triggering any execution. The data is only touched when you explicitly call `collect()` or `sink_parquet()`. - **Optimization**: Polars optimizes this graph (predicate pushdown, projection pushdown) to minimize the amount of data read from S3. ### 2. Computational Locality Experiments are defined as **transformations** of a base LazyFrame ($f(LF) \rightarrow LF$). - **Benefit**: This allows us to "fan out" multiple experiments from a single data source. - **Efficiency**: When running distributedly, we can send just the *function definition* (a few bytes) to the workers, rather than shuffling gigabytes of DataFrames. ### 3. Distributed Parallelism (Modal) The `get_metrics()` method automatically detects if a `ModalExecutor` is provided. - **Benefit**: It seamlessly transitions from local debugging (sequential) to serverless cloud execution (parallel). - **Scale**: If you define 100 experiments, the Harness distributes them across hundreds of Modal containers. Each worker pulls its own data slice from S3, computes the metric, and returns only the small result to the driver. ## Architecture ```{mermaid} graph TD A[S3 Data Lake] -->|Lazy Scan| B(Base LazyFrame) B --> C{Experiment 1} B --> D{Experiment 2} B --> E{Experiment 3} subgraph Cloud Execution C -->|Worker 1| F[Metric Result] D -->|Worker 2| G[Metric Result] E -->|Worker 3| H[Metric Result] end F --> I[Aggregated Report] G --> I H --> I ```