Plain-language technical choices that prime the build — a living spec for an AI-powered hiring platform designed to reduce bias, accelerate screening, and surface the most qualified candidates objectively.
"Hire on merit. Every time. At scale."
| Layer | Choice | Rationale |
|---|---|---|
| Frontend | React 18 + TypeScript | Component reuse for resume cards and score widgets; TypeScript catches data-shape bugs early when handling variable resume formats. |
| Backend API | FastAPI (Python 3.11) | Async-native, auto-generates OpenAPI docs, and keeps all ML code in the same ecosystem as the models — no language boundary. |
| AI / LLM | Claude claude-sonnet-4-20250514 via Anthropic API | Strong instruction following for structured extraction; Constitutional AI training reduces model-level demographic bias; JSON mode for deterministic outputs. |
| Embeddings | text-embedding-3-small (OpenAI) | High semantic recall at low cost; used for skill-to-JD similarity; dimensions tunable to 256 for fast ANN search. |
| Vector Store | Pinecone (serverless) | Managed ANN search; serverless tier eliminates cold-start ops; supports metadata filters for department/level namespace isolation. |
| Database | PostgreSQL 16 + pgvector | Primary store for jobs, candidates, scores, audit logs; pgvector as fallback for low-volume similarity search without Pinecone cost. |
| Auth | Auth0 (OIDC + RBAC) | Roles: admin, hiring_manager, recruiter; SSO for enterprise clients; audit trail for who accessed which candidate. |
| File Parsing | Apache Tika + PyMuPDF | Tika handles docx/odt/rtf; PyMuPDF for layout-aware PDF extraction; preserves section structure used by the skill extractor. |
| Task Queue | Celery + Redis | Async resume processing jobs; Redis as broker + result backend; prevents blocking API responses when processing bulk uploads of 500+ resumes. |
| Observability | LangSmith + Datadog | LangSmith traces every LLM call for debugging extraction quality; Datadog monitors API latency, queue depth, and cost-per-screening-run. |
| Infra / Deploy | AWS (ECS Fargate + RDS + ElastiCache) | Containerized services; RDS for managed Postgres; ElastiCache for Redis; VPC-isolated for GDPR data residency (eu-west-1 for EU customers). |
| CI/CD | GitHub Actions + ECR | On merge to main: lint → test → build Docker image → push to ECR → deploy to ECS; staging environment mirrors prod. |
PII removal happens before any LLM call — the model never sees a name, address, or photo. Bias mitigation is structural, not just instructional.
/api/v1/resumes/batch. Celery workers process asynchronously; recruiter notified via webhook when complete.| Decision | Choice Made | Why — and What We Rejected |
|---|---|---|
| LLM API vs. Self-Hosted | Claude API (hosted) | Self-hosting a 70B+ model requires GPU infra we can't justify at launch. API cost per resume screening run is ~$0.003 — well under budget. Data stays encrypted in transit; no fine-tune needed for extraction quality. |
| Vector DB choice | Pinecone serverless | Weaviate and Qdrant considered. Pinecone's serverless tier is zero-ops and scales to millions of vectors; no cluster to manage. Accepted trade-off: less control over index internals. |
| PII handling strategy | Strip before LLM, store raw encrypted | Some vendors blind the model but retain PII in structured fields for recruiter UX. We strip aggressively — the model never processes names. Raw resumes stored AES-256 in S3 for audit; model-processed data is always anonymised. |
| Scoring architecture | Weighted rubric (not end-to-end ML) | An end-to-end trained ranker would require thousands of labeled "good hire" outcomes we don't have at launch. A transparent rubric with configurable weights is auditable, explainable, and doesn't perpetuate historical hiring bias baked into training labels. |
| Sync vs. async processing | Async Celery jobs | Parsing + embedding + scoring 500 resumes takes 3–8 minutes. Synchronous would time out. Celery jobs + webhook callbacks let the frontend stay responsive; recruiters get a push notification when ready. |
| Monolith vs. microservices | Modular monolith (Phase 1) | Three bounded contexts (ingestion, scoring, reporting) share a codebase but use internal service interfaces. Can extract to microservices when scale demands it — without the networking complexity at MVP stage. |
| Interview question generation | RAG from internal playbooks | Generic interview questions waste time. We embed company-specific interview playbooks in Pinecone; the LLM generates role-calibrated questions grounded in the JD and candidate's unique background — not hallucinated generic prompts. |
| Entity | Key Fields | Notes |
|---|---|---|
| Job | id, title, jd_text, jd_embedding, scoring_weights, status |
JD embedding computed once on publish; weights customisable per role; status controls visibility to recruiters. |
| Resume | id, raw_s3_key, parsed_json, pii_scrubbed_text, uploaded_by, created_at |
Raw file is immutable in S3. parsed_json contains structured extraction. pii_scrubbed_text is the only version sent to LLM. |
| Candidate | id, resume_id, skills[], embedding_id, anonymised_profile |
Decoupled from Resume to support one person applying to multiple roles. Anonymised profile is the model's view — never contains PII. |
| Application | id, job_id, candidate_id, score, score_breakdown, explanation, status |
Junction between Job and Candidate. score_breakdown is JSON with per-dimension scores. explanation is Claude-generated prose. |
| AuditLog | id, event_type, actor_id, target_id, payload_hash, timestamp |
Append-only table. Payload is hashed, not stored, to avoid PII in logs. Covers scoring runs, shortlist views, status changes, and exports. |
| BiasReport | id, job_id, run_at, group_rates, flag_triggered, reviewed_by |
Generated automatically after each scoring batch. flag_triggered gates shortlist release until an admin acknowledges the report. |
| Dimension | Target | Implementation Approach |
|---|---|---|
| API Latency | p95 < 800ms (sync calls) | Resume parsing is async. Sync endpoints (job search, shortlist fetch) hit Postgres + Redis cache; no LLM in the hot path. |
| Throughput | 500 resumes / batch run | Celery workers auto-scale on ECS; 10 concurrent workers process ~50 resumes/min. Full 500-resume run completes in ~10 min. |
| Uptime | 99.5% monthly SLA | ECS service with min 2 tasks across 2 AZs; RDS Multi-AZ; health check auto-restart. Anthropic API outages gracefully degrade to cached scores. |
| Data Security | SOC 2 Type II (target) | AES-256 at rest (S3 + RDS); TLS 1.3 in transit; VPC isolation; secrets in AWS Secrets Manager; audit log for all data access events. |
| Cost per Run | < $0.05 per resume | Claude extraction: ~$0.003; embedding: ~$0.0001; Pinecone upsert: ~$0.001; infra amortised: ~$0.01. Total well within target at volume. |
| GDPR / CCPA | Right to erasure in < 30 days | DELETE /candidates/{id} cascades to S3, Postgres, and Pinecone namespace. Cryptographic shredding on S3 keys. Audit log entry retained (no PII content) per legal requirement. |