ResumeLet's Talk →
Quant Finance·Data Engineering·Full-Stack·2026

Distributed Market Intelligence
& Self-Validating Quant Platform

A self-validating quant pipeline with live job tracking and a Next.js control plane.

Most projects stop at results — this one validates them in real time, streams every stage over WebSocket, and ships signals, backtest, and versioned runs in one local research workstation.

Next.jsFastAPIPySparkDuckDBWebSocketQA
View repository (coming soon)
01 — The Problem

Four gaps most financial projects ignore.

Most projects stop at "I got results." These four gaps matter in production.

01
Scalability
Pandas breaks at scale — PySpark handles multi-ticker workloads.
02
Correctness
Rolling metrics and drawdown need careful window functions per ticker.
03
Validation
Formal QA scores Spark output against a Pandas/SciPy benchmark.
04
Observability
WebSocket streams every pipeline stage — no black-box runs.
02 — How It Works

Four layers, seven pipeline stages.

Layer 01UI Layer

Next.js dashboard — run pipeline, jobs, QA, signals, backtest, reports

Layer 02API Layer

FastAPI — POST /run-pipeline, WebSocket /ws/{job_id}, /api/v1/*

Layer 03Processing Layer

PySpark — ingest, validate, features, QA, signals, backtest, versioned export

Layer 04Job & Persistence

BackgroundTasks + job_id + DuckDB history; single active-run lock (409)

Core pipeline stages
  1. 1FETCHING_DATA
  2. 2VALIDATION
  3. 3FEATURE_ENGINEERING
  4. 4RUNNING_QA
  5. 5SIGNAL_GENERATION
  6. 6BACKTEST
  7. 7REPORT_EXPORT

Then SAVING_DATA → COMPLETED; UI fetches via /features, /qa, and /api/v1/*.

03 — Highlights

What ships today.

Key Differentiator
QA Validation System
Spark outputs benchmarked against Pandas/SciPy — per-ticker alignment % in qa_report.json (>95% target).
📊
Pipeline & Research Outputs
Rule-based signals, simple backtest, versioned run export, Parquet storage.
🔴
Real-time Execution
WebSocket timeline for all stages; job_id routing; 409 lock on concurrent runs.
🖥
Next.js Control Plane
Dashboard, Run pipeline, Jobs, Explorer, QA, Signals, Regime, Backtest, Reports, Settings.
Local Research Platform
backend/ + web/ + config.json; snapshot or live ingestion; Docker Compose & start_dev.sh.
04 — Status

Build status and what's next.

Phase 1
PySpark pipeline — validation, features, QA, signals, backtest, versioning
Phase 2
FastAPI backend & data serving layer
Phase 3
Next.js dashboard & full UI (web/)
Phase 4
WebSocket real-time timeline + job_id tracking
Phase 5
Demo mode + Vercel deployment (static JSON preview)
Phase 6
MLlib/Torch, Recharts, auth, pytest

Phases 1–4 are complete (PySpark pipeline, FastAPI, Next.js UI, WebSocket + job_id). Phase 5 adds a Vercel-friendly static demo mode. Phase 6 covers MLlib/Torch models, Recharts, auth, and pytest — ML features are roadmap only; today the platform ships engineered features, rule-based signals, backtest, and formal QA.

Contents