ResumeLet's Talk →
Quant Finance·Data Engineering·AI Full-Stack·2026

Distributed Market Intelligence
& Self-Validating Quant Platform

A real-time, job-tracked quantitative pipeline that processes equity data at scale — with live WebSocket updates, formal QA validation, and seamless demo/live modes.

PySparkFastAPIReactWebSocketjob_idAWS EC2Parquetyfinance
02 — The Problem It Solves

Four gaps most financial projects ignore.

Most financial data projects stop at "I got results." Four real gaps exist in that approach.

01
Scalability
Data doesn't stay small. Pandas breaks at scale — PySpark handles millions of rows across tickers without breaking a sweat.
02
Correctness
Rolling metrics like volatility and drawdown are expensive to compute correctly. Window functions need care — especially drawdown's unbounded cumulative max.
03
Validation
No one validates whether the numbers are actually right. This project does — with a formal QA layer that scores alignment against a trusted benchmark.
04
Observability
Most pipelines are black boxes. This one streams live status via WebSocket so you can see exactly what’s happening at every step.
03 — Architecture Overview

Four clean layers, one cohesive real-time system.

Layer 01
UI Layer
React dashboard for visualization and pipeline control
Layer 02
API Layer
FastAPI backend that decouples the frontend from processing logic
Layer 03
Processing Layer
PySpark pipeline handling ingestion, feature engineering, QA validation, and Parquet storage
Layer 04
Job & Real-time Layer
Background tasks + WebSocket + job_id routing for safe multi-user live execution and timeline streaming
End-to-end data flow (with job_id & WebSocket)
User opens UISelect ticker & datePOST /run-pipelineGenerate job_idSpark Ingestion (yfinance)Feature EngineeringQA ValidationParquet StorageWebSocket updatesFetch results & QARender dashboard
04 — Key Features

Seven capabilities, two major differentiators.

Log Returns
Normalized price change series for cross-asset comparison.
Rolling Volatility
20-day standard deviation of returns for risk measurement.
Momentum
Rate of price change signal for trend detection.
Maximum Drawdown
Peak-to-trough decline tracking for downside risk.
🔴
Real-time Pipeline Timeline
WebSocket-driven live status updates with job_id tracking — watch the pipeline breathe in real time.
🔄
Seamless Demo ↔ Live Mode
Same UI, zero code change. Works instantly on Vercel (demo) or connected to real Spark backend (live).
Key Differentiator
QA Validation System
Every Spark output benchmarked against Pandas/SciPy with relative error thresholds and an alignment score (>95% target).
05 — Tech Stack

Every tool chosen with purpose.

LayerTechnologyRole
FrontendReact, RechartsDashboard & visualization
BackendFastAPI, PythonAPI layer & pipeline orchestration
ProcessingPySparkDistributed feature engineering
QAPandas, NumPyBenchmark validation
StorageParquetColumnar data storage
Data SourceyfinanceEquity market data ingestion
InfrastructureAWS EC2Single-instance deployment
Job ManagementFastAPI Background Tasks + UUIDjob_id tracking & multi-user safety
Real-timeFastAPI WebSocketslive pipeline timeline updates
Demo LayerStatic JSON + env switchVercel-ready zero-backend preview
06 — Why This Stands Out

Built for two audiences.

Most portfolio projects compute results and stop. This one questions them in real time. The QA validation layer + live WebSocket timeline + job_id tracking is the kind of rigor you see in production financial systems. The architecture mirrors real quant research and risk teams.

07 — AI Full-Stack Finance Expertise

I own the entire stack — AI, Full-Stack & Quant Finance.

This project is deliberately architected as a complete AI-powered quantitative platform. Here’s my depth across the three core pillars.

🧠

AI & Predictive Intelligence

Extending the PySpark pipeline with MLlib and future Torch models for predictive signals and anomaly detection.

  • Built modular feature store ready for ML training
  • QA layer designed to benchmark both classical quant metrics AND ML model outputs
  • Momentum & volatility features engineered to feed directly into LSTM/Transformer forecasters
  • Anomaly detection hook using isolation forest (planned Phase 6)
08 — Roadmap

Current build status.

Phase 1
PySpark pipeline — ingestion, feature engineering & QA
Phase 2
FastAPI backend & data serving layer
Phase 3
React dashboard & full UI
Phase 4
WebSocket real-time timeline + job_id tracking
Phase 5
Demo mode + Vercel deployment (same UI, different data source)
Contents