FairHire AI — Implementation Spec

01 Project Overview

Project Name

FairHire AI

AI-augmented recruitment screening platform

Problem Statement

Bias in human-led resume screening

Studies show resumes with "white-sounding" names receive 50% more callbacks — this system corrects that systematically

Primary Output

Ranked Candidate Shortlist

Scored, explained, and auditable — not a black box

Target Users

HR teams, Hiring Managers, Recruiters

Non-technical operators using a clean dashboard interface

Deployment Model

SaaS Web App + API

Browser-based dashboard + REST API for ATS integrations

Compliance Targets

EEOC · GDPR · CCPA

Adverse impact monitoring built into every scoring run

02 Technology Stack

Layer	Choice	Rationale
Frontend	React 18 + TypeScript	Component reuse for resume cards and score widgets; TypeScript catches data-shape bugs early when handling variable resume formats.
Backend API	FastAPI (Python 3.11)	Async-native, auto-generates OpenAPI docs, and keeps all ML code in the same ecosystem as the models — no language boundary.
AI / LLM	Claude claude-sonnet-4-20250514 via Anthropic API	Strong instruction following for structured extraction; Constitutional AI training reduces model-level demographic bias; JSON mode for deterministic outputs.
Embeddings	text-embedding-3-small (OpenAI)	High semantic recall at low cost; used for skill-to-JD similarity; dimensions tunable to 256 for fast ANN search.
Vector Store	Pinecone (serverless)	Managed ANN search; serverless tier eliminates cold-start ops; supports metadata filters for department/level namespace isolation.
Database	PostgreSQL 16 + pgvector	Primary store for jobs, candidates, scores, audit logs; pgvector as fallback for low-volume similarity search without Pinecone cost.
Auth	Auth0 (OIDC + RBAC)	Roles: `admin`, `hiring_manager`, `recruiter`; SSO for enterprise clients; audit trail for who accessed which candidate.
File Parsing	Apache Tika + PyMuPDF	Tika handles docx/odt/rtf; PyMuPDF for layout-aware PDF extraction; preserves section structure used by the skill extractor.
Task Queue	Celery + Redis	Async resume processing jobs; Redis as broker + result backend; prevents blocking API responses when processing bulk uploads of 500+ resumes.
Observability	LangSmith + Datadog	LangSmith traces every LLM call for debugging extraction quality; Datadog monitors API latency, queue depth, and cost-per-screening-run.
Infra / Deploy	AWS (ECS Fargate + RDS + ElastiCache)	Containerized services; RDS for managed Postgres; ElastiCache for Redis; VPC-isolated for GDPR data residency (eu-west-1 for EU customers).
CI/CD	GitHub Actions + ECR	On merge to main: lint → test → build Docker image → push to ECR → deploy to ECS; staging environment mirrors prod.

03 AI Processing Pipeline

STEP 01

Ingest & Parse

Upload PDF/DOCX → Tika/PyMuPDF extracts raw text + structure → stored in S3 and Postgres

STEP 02

PII Scrub

Regex + NER removes name, photo references, address, graduation year before any LLM call

STEP 03

Structured Extract

Claude extracts skills, titles, tenure, education, and achievements as typed JSON — no hallucination on factual fields

STEP 04

Embed & Index

Skill + experience text embedded via text-embedding-3-small; upserted to Pinecone under job namespace

STEP 05

Score & Rank

Weighted scoring: skill match (40%) · experience fit (30%) · trajectory (20%) · culture signals (10%)

STEP 06

Explain & Audit

Claude generates plain-English rationale per candidate; adverse impact 4/5ths rule check logged per run

Design Principle

PII removal happens before any LLM call — the model never sees a name, address, or photo. Bias mitigation is structural, not just instructional.

04 Core Feature Set

🔍

Semantic JD Matching

Embeds job descriptions and resumes in shared vector space. Ranks candidates by cosine similarity — catches "machine learning" ↔ "ML engineer" equivalences that keyword search misses.

⚖️

Bias Audit Dashboard

Runs EEOC 4/5ths rule check on every screening batch. Flags adverse impact by inferred demographic proxies and surfaces it to admins before shortlists are shared.

💬

Explainable Scores

Every candidate score includes a plain-English explanation generated by Claude — "Strong fit: 8 years Python, led distributed systems team, OSS contributor" — not just a number.

📄

Bulk Resume Ingestion

Upload 1–1000 resumes via dashboard or POST to /api/v1/resumes/batch. Celery workers process asynchronously; recruiter notified via webhook when complete.

🔗

ATS Integrations

REST API + webhooks connect to Greenhouse, Lever, and Workday. Scores sync back to ATS candidate records; no double-entry for recruiting teams.

🎯

Custom Scoring Weights

Hiring managers adjust skill/experience/culture signal weights per role via a simple UI slider — a staff engineer role weights technical depth heavier than a sales role.

🏷️

Skills Taxonomy

Proprietary skills graph normalises "React.js", "ReactJS", and "React" to a single node. Edges encode skill adjacency — a Rails dev scores partial credit for Django experience.

📊

Screening Analytics

Per-role funnel metrics: applicant volume, average score distribution, time-to-shortlist, and source quality (LinkedIn vs. referral vs. job board) over rolling 90 days.

🔐

Role-Based Access

Admins see all; hiring managers see their own roles; recruiters see shortlists only — no raw resume access unless explicitly granted. Full access log for GDPR compliance.

05 Key Architectural Decisions

Decision	Choice Made	Why — and What We Rejected
LLM API vs. Self-Hosted	Claude API (hosted)	Self-hosting a 70B+ model requires GPU infra we can't justify at launch. API cost per resume screening run is ~$0.003 — well under budget. Data stays encrypted in transit; no fine-tune needed for extraction quality.
Vector DB choice	Pinecone serverless	Weaviate and Qdrant considered. Pinecone's serverless tier is zero-ops and scales to millions of vectors; no cluster to manage. Accepted trade-off: less control over index internals.
PII handling strategy	Strip before LLM, store raw encrypted	Some vendors blind the model but retain PII in structured fields for recruiter UX. We strip aggressively — the model never processes names. Raw resumes stored AES-256 in S3 for audit; model-processed data is always anonymised.
Scoring architecture	Weighted rubric (not end-to-end ML)	An end-to-end trained ranker would require thousands of labeled "good hire" outcomes we don't have at launch. A transparent rubric with configurable weights is auditable, explainable, and doesn't perpetuate historical hiring bias baked into training labels.
Sync vs. async processing	Async Celery jobs	Parsing + embedding + scoring 500 resumes takes 3–8 minutes. Synchronous would time out. Celery jobs + webhook callbacks let the frontend stay responsive; recruiters get a push notification when ready.
Monolith vs. microservices	Modular monolith (Phase 1)	Three bounded contexts (ingestion, scoring, reporting) share a codebase but use internal service interfaces. Can extract to microservices when scale demands it — without the networking complexity at MVP stage.
Interview question generation	RAG from internal playbooks	Generic interview questions waste time. We embed company-specific interview playbooks in Pinecone; the LLM generates role-calibrated questions grounded in the JD and candidate's unique background — not hallucinated generic prompts.

06 Bias Mitigation Strategy

Structural

Pre-LLM PII Scrubbing

Name, gender pronouns, address, photo descriptions, and graduation year are removed from the text before any LLM or embedding call. Uses spaCy NER + regex patterns. This is non-negotiable architecture — not a prompt instruction that can be overridden.

Monitoring

EEOC 4/5ths Rule Check

After every shortlist generation, the system calculates selection rates across inferred demographic proxies (name-based inference, zip code analysis). If any group's rate falls below 80% of the highest-selected group, a warning is surfaced to the admin before the shortlist is released.

Model-Level

Constitutional AI Model

Claude's training includes Constitutional AI principles that explicitly reduce demographic bias in outputs. We supplement this with system prompt constraints that prohibit scoring inferences based on name patterns, gaps framed as red flags, or non-job-relevant signals.

Scoring Design

Skills-First Rubric

The scoring rubric is anchored on demonstrated skills and measurable achievements — not pedigree signals like company prestige or university rankings. "Increased checkout conversion by 14%" outweighs "worked at FAANG" in the default weighting configuration.

Audit Trail

Immutable Score Ledger

Every scoring event is written to an append-only audit log: model version, prompt hash, input embedding fingerprint, score breakdown, and reviewer identity. Provides defensible records if a hiring decision is ever legally challenged.

Human-in-Loop

AI Recommends, Humans Decide

FairHire AI produces a ranked shortlist with explanations — it never auto-rejects a candidate or sends communications autonomously. Every consequential action (advance, reject, interview invite) requires a human click in the dashboard with logged intent.

07 Core Data Model

Entity	Key Fields	Notes
Job	`id, title, jd_text, jd_embedding, scoring_weights, status`	JD embedding computed once on publish; weights customisable per role; status controls visibility to recruiters.
Resume	`id, raw_s3_key, parsed_json, pii_scrubbed_text, uploaded_by, created_at`	Raw file is immutable in S3. `parsed_json` contains structured extraction. `pii_scrubbed_text` is the only version sent to LLM.
Candidate	`id, resume_id, skills[], embedding_id, anonymised_profile`	Decoupled from Resume to support one person applying to multiple roles. Anonymised profile is the model's view — never contains PII.
Application	`id, job_id, candidate_id, score, score_breakdown, explanation, status`	Junction between Job and Candidate. `score_breakdown` is JSON with per-dimension scores. `explanation` is Claude-generated prose.
AuditLog	`id, event_type, actor_id, target_id, payload_hash, timestamp`	Append-only table. Payload is hashed, not stored, to avoid PII in logs. Covers scoring runs, shortlist views, status changes, and exports.
BiasReport	`id, job_id, run_at, group_rates, flag_triggered, reviewed_by`	Generated automatically after each scoring batch. `flag_triggered` gates shortlist release until an admin acknowledges the report.

08 Build Roadmap

Phase 1
MVP · 8 wks

Resume upload + parsing

PII scrubbing pipeline

LLM structured extraction

Semantic JD matching

Candidate score + explanation

Shortlist dashboard

Basic RBAC (admin + recruiter)

Phase 2
Growth · 12 wks

Bias audit dashboard

Greenhouse + Lever integration

Custom scoring weights UI

Bulk upload (batch API)

RAG interview questions

Hiring manager portal

Phase 3
Enterprise · Q3

SSO / SAML integration

Workday connector

EU data residency (eu-west-1)

Skills taxonomy editor

Analytics & funnel reporting

Immutable audit export (SOC 2)

Complete In Progress Planned

09 Non-Functional Requirements

Dimension	Target	Implementation Approach
API Latency	p95 < 800ms (sync calls)	Resume parsing is async. Sync endpoints (job search, shortlist fetch) hit Postgres + Redis cache; no LLM in the hot path.
Throughput	500 resumes / batch run	Celery workers auto-scale on ECS; 10 concurrent workers process ~50 resumes/min. Full 500-resume run completes in ~10 min.
Uptime	99.5% monthly SLA	ECS service with min 2 tasks across 2 AZs; RDS Multi-AZ; health check auto-restart. Anthropic API outages gracefully degrade to cached scores.
Data Security	SOC 2 Type II (target)	AES-256 at rest (S3 + RDS); TLS 1.3 in transit; VPC isolation; secrets in AWS Secrets Manager; audit log for all data access events.
Cost per Run	< $0.05 per resume	Claude extraction: ~$0.003; embedding: ~$0.0001; Pinecone upsert: ~$0.001; infra amortised: ~$0.01. Total well within target at volume.
GDPR / CCPA	Right to erasure in < 30 days	`DELETE /candidates/{id}` cascades to S3, Postgres, and Pinecone namespace. Cryptographic shredding on S3 keys. Audit log entry retained (no PII content) per legal requirement.