Quant Finance·Data Engineering·Full-Stack·2026

Distributed Market Intelligence
& Self-Validating Quant Platform

A self-validating quant pipeline with live job tracking and a Next.js control plane.

Most projects stop at results — this one validates them in real time, streams every stage over WebSocket, and ships signals, backtest, and versioned runs in one local research workstation.

Next.jsFastAPIPySparkDuckDBWebSocketQA

View repository (coming soon)

01 — The Problem

Four gaps most financial projects ignore.

Most projects stop at "I got results." These four gaps matter in production.

Scalability

Pandas breaks at scale — PySpark handles multi-ticker workloads.

Correctness

Rolling metrics and drawdown need careful window functions per ticker.

Validation

Formal QA scores Spark output against a Pandas/SciPy benchmark.

Observability

WebSocket streams every pipeline stage — no black-box runs.

02 — How It Works

Four layers, seven pipeline stages.

Layer 01UI Layer

Next.js dashboard — run pipeline, jobs, QA, signals, backtest, reports

Layer 02API Layer

FastAPI — POST /run-pipeline, WebSocket /ws/{job_id}, /api/v1/*

Layer 03Processing Layer

PySpark — ingest, validate, features, QA, signals, backtest, versioned export

Layer 04Job & Persistence

BackgroundTasks + job_id + DuckDB history; single active-run lock (409)

Core pipeline stages

1FETCHING_DATA
2VALIDATION
3FEATURE_ENGINEERING
4RUNNING_QA
5SIGNAL_GENERATION
6BACKTEST
7REPORT_EXPORT

Then SAVING_DATA → COMPLETED; UI fetches via /features, /qa, and /api/v1/*.

03 — Highlights

What ships today.

📊

Pipeline & Research Outputs

Rule-based signals, simple backtest, versioned run export, Parquet storage.

🔴

Real-time Execution

WebSocket timeline for all stages; job_id routing; 409 lock on concurrent runs.

🖥

Next.js Control Plane

Dashboard, Run pipeline, Jobs, Explorer, QA, Signals, Regime, Backtest, Reports, Settings.

⚙

Local Research Platform

backend/ + web/ + config.json; snapshot or live ingestion; Docker Compose & start_dev.sh.

Full stack▼

04 — Status

Build status and what's next.

✓

Phase 1

PySpark pipeline — validation, features, QA, signals, backtest, versioning

✓

Phase 2

FastAPI backend & data serving layer

✓

Phase 3

Next.js dashboard & full UI (web/)

✓

Phase 4

WebSocket real-time timeline + job_id tracking

○

Phase 5

Demo mode + Vercel deployment (static JSON preview)

○

Phase 6

MLlib/Torch, Recharts, auth, pytest

Phases 1–4 are complete (PySpark pipeline, FastAPI, Next.js UI, WebSocket + job_id). Phase 5 adds a Vercel-friendly static demo mode. Phase 6 covers MLlib/Torch models, Recharts, auth, and pytest — ML features are roadmap only; today the platform ships engineered features, rule-based signals, backtest, and formal QA.

Distributed Market Intelligence& Self-Validating Quant Platform

Four gaps most financial projects ignore.

Four layers, seven pipeline stages.

What ships today.

Build status and what's next.

Distributed Market Intelligence
& Self-Validating Quant Platform