Thrive Career Wellness:
Automated Outplacement Platform

Engineering an agentic system to mitigate corporate liability by maximizing interview conversion rates for offboarded employees.

System Demo: The Supervisor-Worker-Editor Loop

Executive Summary

01 // The Liability Problem 02 // The 3-Workflow Architecture 03 // Synthetic Market Simulation 04 // PII & Data Schemas 05 // Production Roadmap 06 // Unit Economics (Scenario A vs B)

01 // The Liability Problem

Client: Thrive Career Wellness (Outplacement Provider)
The Context: Companies conducting layoffs face high legal risks. To mitigate wrongful termination lawsuits, they must demonstrate they provided meaningful support to help ex-employees land new roles.

The Constraint: We cannot use generic "resume builders." The system needs to map a candidate's specific Impact Stories to open market opportunities with high precision. Furthermore, due to strict enterprise data agreements, we have Zero-Trust PII constraints—no candidate names or contact info can touch the public LLM layer.

Placement Rate Primary Success Metric

0% Allowed PII Leakage

02 // The 3-Workflow Architecture

To match the nuance of a human career coach, I architected three distinct graph workflows. This ensures separation of concerns between Strategy, Execution, and Validation.

A. The Resume Graph (Supervisor-Worker-Editor)

Uses a Supervisor (DeepSeek-R1) to plan the pivot strategy, a Writer to draft the content, and an Editor to enforce the "Liability Shield"—rejecting any hallucinated skills.

agent_graph.py (Resume Workflow)

import os
import json
import re
from typing import TypedDict, Optional, List, Any
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import HumanMessage
from langgraph.graph import StateGraph, END
from schemas import CandidateProfile, ResumeStrategy, JobMatchReport

load_dotenv()

# =============================================================================
# 1. SHARED STATE DEFINITIONS
# =============================================================================

class AgentState(TypedDict):
    """Represents the state for the resume generation workflow."""
    profile: CandidateProfile
    target_role: str
    strategy: ResumeStrategy
    final_draft: str 
    match_report: JobMatchReport
    editor_feedback: Optional[str] # Changed to Optional to handle None

class ApplicationState(TypedDict):
    """Represents the state for the cover letter tailoring workflow."""
    job_title: str
    job_description: str
    candidate_profile: str
    cover_letter: str
    status: str

class JobSearchState(TypedDict):
    """Represents the state for the job market search and ranking workflow."""
    profile: CandidateProfile
    raw_jobs: List[dict]
    ranked_jobs: List[dict]
    search_criteria: str

# =============================================================================
# 2. MODEL FACTORY
# =============================================================================

# DeepSeek Reasoner for high-level strategy (Supervisor/Profiler)
supervisor_llm = ChatOpenAI(
    model="deepseek-reasoner", 
    base_url="https://api.deepseek.com", 
    api_key=os.getenv("DEEPSEEK_API_KEY")
)

# DeepSeek Chat for content generation and scrubbing (Writer/Editor/Matcher)
worker_llm = ChatOpenAI(
    model="deepseek-chat", 
    base_url="https://api.deepseek.com", 
    api_key=os.getenv("DEEPSEEK_API_KEY"), 
    temperature=0.7
)

# =============================================================================
# 3. HELPER FUNCTIONS
# =============================================================================

def get_anonymized_profile(profile: CandidateProfile) -> str:
    """
    Redacts Personally Identifiable Information (PII) from the profile.
    """
    safe_profile = profile.model_copy(deep=True)
    safe_profile.full_name = "[CANDIDATE_NAME]"
    safe_profile.contact_email = "[CONTACT_EMAIL]"
    safe_profile.phone = "[PHONE_NUMBER]"
    safe_profile.linkedin = "[LINKEDIN_URL]"
    return safe_profile.model_dump_json()

def extract_json_from_text(text: str) -> Optional[Any]:
    """
    Extracts and parses JSON content from a string.
    """
    try:
        match = re.search(r"```json\n(.*?)\n```", text, re.DOTALL)
        if match: return json.loads(match.group(1))
        match = re.search(r"(\{.*\})", text, re.DOTALL)
        if match: return json.loads(match.group(1))
        return None
    except (json.JSONDecodeError, AttributeError):
        return None

def sanitize_latex(latex_code: str) -> str:
    """
    Cleans and validates LaTeX source code generated by the LLM.
    """
    
    # 1. Clean Markdown and extra whitespace
    clean = latex_code.replace("```latex", "").replace("```", "").strip()
    
    # 2. ESCAPE DOLLAR SIGNS: Critical for currency in text
    clean = re.sub(r'(?<!\\)\$([0-9])', r'\\$\1', clean)
    
    # 3. FORCE ITEM MAPPING: Converts LLM's raw \item bullets to your command
    clean = re.sub(r'\\item\s+\\textbf\{(.*?)\}:\s+(.*)', r'\\resumeItem{\1}{\2}', clean)
    
    # 4. REMOVE EMPTY STRUCTURES: Prevents Tectonic from hanging
    clean = re.sub(r"\\resumeItemListStart\s*\\resumeItemListEnd", "", clean)
    clean = re.sub(r"\\resumeSubHeadingListStart\s*\\resumeSubHeadingListEnd", "", clean)
    
    return clean

# =============================================================================
# 4. WORKFLOW 1: RESUME GENERATION
# =============================================================================

def supervisor_node(state: AgentState):
    """
    Analyzes candidate profile against target role to define a pivot strategy.
    """
    prompt = ChatPromptTemplate.from_messages([
        ("system", """You are a Career Strategy Architect. 
        Analyze the profile and generate a strategy.
        Return ONLY valid JSON matching the schema:
        {{
            "reasoning_summary": "Pivot justification",
            "gaps_identified": ["skill1", "skill2"],
            "instruction_to_writer": "How to tailor bullets",
            "next_action": "delegate_to_writer"
        }}"""),
        ("user", "Target: {target_role}\nProfile: {profile}")
    ])
    
    chain = prompt | supervisor_llm
    response = chain.invoke({
        "target_role": state['target_role'], 
        "profile": get_anonymized_profile(state['profile']) 
    })
    
    data = extract_json_from_text(response.content)
    if data:
        if "strategy_overview" in data: data["reasoning_summary"] = data.pop("strategy_overview")
        if "gaps_identified" not in data: data["gaps_identified"] = []
    
    return {"strategy": ResumeStrategy(**(data or {}))}

def writer_node(state: AgentState):
    """
    Generates tailored LaTeX resume content based on the defined strategy.
    """
    strategy = state['strategy']
    
    # Include editor feedback if this is a re-try loop
    feedback_context = ""
    if state.get("editor_feedback"):
        feedback_context = f"\nCRITICAL FEEDBACK FROM PREVIOUS DRAFT: {state['editor_feedback']}\nFix these issues immediately."

    latex_template = r"""
\documentclass[letterpaper,10pt]{article}
\usepackage{latexsym, fullpage, titlesec, marvosym, verbatim, enumitem, hyperref, fancyhdr, times, xcolor}

\pagestyle{fancy} \fancyhf{} \fancyfoot{} \renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{0pt}
\addtolength{\oddsidemargin}{-0.55in} \addtolength{\evensidemargin}{-0.55in} \addtolength{\textwidth}{1.1in}
\addtolength{\topmargin}{-0.6in} \addtolength{\textheight}{1.2in}

\titleformat{\section}{\vspace{-8pt}\scshape\raggedright\large}{}{0em}{}[\color{black}\titlerule \vspace{-4pt}]

\newcommand{\resumeItem}[2]{\item\small{\textbf{#1}{: #2 \vspace{-2pt}}}}
\newcommand{\resumeSubheading}[4]{\vspace{-2pt}\item[]\begin{tabular*}{0.98\textwidth}{l@{\extracolsep{\fill}}r}\hspace{-10pt}\textbf{#1} & #2 \\ \hspace{-10pt}\textit{\small#3} & \textit{\small #4} \end{tabular*}\vspace{-6pt}}
\newcommand{\resumeSubHeadingListStart}{\begin{itemize}[leftmargin=*]} \newcommand{\resumeSubHeadingListEnd}{\end{itemize}}
\newcommand{\resumeItemListStart}{\begin{itemize}} \newcommand{\resumeItemListEnd}{\end{itemize}\vspace{-6pt}}

\begin{document}
\begin{center}\huge \textbf{[CANDIDATE_NAME]} \\ \vspace{4pt} \small [CONTACT_EMAIL] $\vert$ [PHONE_NUMBER] $\vert$ [LINKEDIN_URL] \end{center}
\vspace{-20pt}
% CONTENT_START
\end{document}
    """
    
    system_instruction = """
    You are an Expert LaTeX Resume Writer.
    1. STRICTLY follow the Action-Context-Result (ACR) framework for bullets.
    2. DO NOT use standard '\\item'. YOU MUST USE '\\resumeItem{{Heading}}{{Content}}'. 
    3. ESCAPE ALL CURRENCY SYMBOLS. Write '\$5M', never '$5M'.
    4. DO NOT invent commands. Use ONLY the commands provided in the template.
    5. Output the FULL LaTeX code starting from \documentclass.
    """

    prompt = ChatPromptTemplate.from_messages([
        ("system", system_instruction),
        ("user", "STRATEGY: {instruction}\nDATA: {profile_data}\nFEEDBACK: {feedback}\nTEMPLATE: {template}")
    ])
    
    chain = prompt | worker_llm
    response = chain.invoke({
        "instruction": strategy.instruction_to_writer, 
        "profile_data": get_anonymized_profile(state['profile']), 
        "feedback": feedback_context,
        "template": latex_template
    })
    return {"final_draft": sanitize_latex(response.content)}

def editor_node(state: AgentState):
    """
    Fact-checks the resume draft against the source profile.
    Decides whether to APPROVE the draft or REJECT it for hallucination.
    """
    prompt = ChatPromptTemplate.from_messages([
        ("system", """
        You are a Strict Background Checker.
        Compare the Draft against the Source Profile.
        
        Rules:
        1. If the Draft contains skills or metrics NOT in Source, REJECT.
        2. If the Draft has broken LaTeX syntax, REJECT.
        3. If the Draft is faithful, APPROVE.
        
        Return JSON:
        {{
            "status": "APPROVE" | "REJECT",
            "feedback": "Specific instructions on what to fix (if REJECT)",
            "corrected_latex": "Optional minor fixes"
        }}
        """),
        ("user", "SOURCE: {profile}\nDRAFT: {draft}")
    ])
    
    chain = prompt | worker_llm
    response = chain.invoke({
        "profile": get_anonymized_profile(state['profile']), 
        "draft": state['final_draft']
    })
    
    data = extract_json_from_text(response.content)
    
    if data:
        # If the editor wants to reject, we pass the feedback back to the graph state
        if data.get("status") == "REJECT":
            return {
                "editor_feedback": data.get("feedback", "General hallucination detected."),
                "final_draft": state['final_draft'] # Keep old draft to show history if needed
            }
        
        # If approved, check if there are minor auto-fixes
        if data.get("corrected_latex"):
            return {
                "editor_feedback": None, # Clear feedback
                "final_draft": sanitize_latex(data['corrected_latex'])
            }
            
    # Default Approve
    return {"editor_feedback": None, "final_draft": state['final_draft']}

def should_continue(state: AgentState):
    """
    Conditional Edge Logic:
    If editor_feedback is present, loop back to 'writer'.
    Else, go to END.
    """
    if state.get("editor_feedback"):
        return "writer"
    return END

def build_graph():
    """Compiles the primary resume generation state machine."""
    workflow = StateGraph(AgentState)
    
    # Add Nodes
    workflow.add_node("supervisor", supervisor_node)
    workflow.add_node("writer", writer_node)
    workflow.add_node("editor", editor_node)
    
    # Set Entry
    workflow.set_entry_point("supervisor")
    
    # Standard Edges
    workflow.add_edge("supervisor", "writer")
    workflow.add_edge("writer", "editor")
    
    # Conditional Edge (The Loop)
    workflow.add_conditional_edges(
        "editor",
        should_continue,
        {
            "writer": "writer",  # Loop back if rejected
            END: END             # Finish if approved
        }
    )
    
    return workflow.compile()

# =============================================================================
# 5. WORKFLOW 2: APPLICATION TAILORING
# =============================================================================

def cover_letter_node(state: ApplicationState):
    """Generates a contextual cover letter (Now PII-Safe)."""
    raw_profile = CandidateProfile.model_validate_json(state['candidate_profile'])
    safe_profile_json = get_anonymized_profile(raw_profile)
    
    prompt = f"""
    Write a punchy, 3-paragraph cover letter for the role: {state['job_title']}.
    JOB DESCRIPTION: {state['job_description']}
    CANDIDATE PROFILE: {safe_profile_json} 
    Use [CANDIDATE_NAME] and [CONTACT_EMAIL] placeholders.
    """
    msg = worker_llm.invoke([HumanMessage(content=prompt)])
    return {"cover_letter": msg.content}

def cover_letter_scrubber_node(state: ApplicationState):
    """Verifies cover letter accuracy (Now PII-Safe)."""
    raw_profile = CandidateProfile.model_validate_json(state['candidate_profile'])
    safe_profile_json = get_anonymized_profile(raw_profile)
    
    prompt = f"""
    Compare the following cover letter against the profile. 
    Delete any skills or achievements that are not directly supported by the profile.
    PROFILE: {safe_profile_json}
    LETTER: {state['cover_letter']}
    """
    msg = worker_llm.invoke([HumanMessage(content=prompt)])
    return {"cover_letter": msg.content}

def build_application_graph():
    """Compiles the application packet state machine."""
    workflow = StateGraph(ApplicationState)
    workflow.add_node("writer", cover_letter_node)
    workflow.add_node("scrubber", cover_letter_scrubber_node)
    workflow.set_entry_point("writer")
    workflow.add_edge("writer", "scrubber")
    workflow.add_edge("scrubber", END)
    return workflow.compile()

# =============================================================================
# 6. WORKFLOW 3: JOB DISCOVERY
# =============================================================================

def profiler_node(state: JobSearchState):
    """Identifies cross-sector skill clusters to broaden search parameters."""
    prompt = f"Identify core skill clusters for a cross-sector pivot. PROFILE: {get_anonymized_profile(state['profile'])}"
    response = supervisor_llm.invoke([HumanMessage(content=prompt)])
    return {"search_criteria": response.content}

def matcher_node(state: JobSearchState):
    """Ranks and selects the top 10 matches from a raw vector search pool."""
    job_list_str = "\n".join([f"ID: {j['id']} | Title: {j['title']} | Desc: {j.get('description', '')[:200]}" for j in state['raw_jobs']])
    prompt = f"""
    Select the top 10 Job IDs based on the strategy. 
    STRATEGY: {state['search_criteria']}
    POOL: {job_list_str}
    Return ONLY a JSON list of integers.
    """
    response = worker_llm.invoke([HumanMessage(content=prompt)])
    
    try:
        raw_data = extract_json_from_text(response.content)
        if isinstance(raw_data, dict):
            for key in ["ids", "selected_ids", "matches"]:
                if key in raw_data and isinstance(raw_data[key], list):
                    raw_data = raw_data[key]
                    break
        
        ai_ids = [str(x) for x in raw_data] if isinstance(raw_data, list) else []
        final_jobs = [j for j in state['raw_jobs'] if str(j['id']) in ai_ids]
        
        if not final_jobs: 
            final_jobs = state['raw_jobs'][:10]
        return {"ranked_jobs": final_jobs}
    except Exception:
        return {"ranked_jobs": state['raw_jobs'][:10]}

def build_job_search_graph():
    """Compiles the market discovery state machine."""
    workflow = StateGraph(JobSearchState)
    workflow.add_node("profiler", profiler_node)
    workflow.add_node("matcher", matcher_node)
    workflow.set_entry_point("profiler")
    workflow.add_edge("profiler", "matcher")
    workflow.add_edge("matcher", END)
    return workflow.compile()

B. The Application Graph (Writer-Scrubber)

Generates tailored cover letters while strictly adhering to PII protocols. The Scrubber node verifies that no hallucinated contact details or false claims made it into the final PDF packet.

C. The Job Search Graph (Profiler-Matcher)

Instead of simple keyword matching, the Profiler analyzes the candidate's transferrable skills (e.g., "Project Management" -> "Product Owner") and the Matcher ranks vector search results based on that strategic pivot.

03 // Synthetic Market Simulation

We could not train on real client data due to privacy laws. To validate the system's ability to handle diverse roles (from Junior DevOps to Staff Product Managers), I built a Synthetic Data Engine.

This async pipeline generates thousands of realistic "Impact Stories" and "Job Descriptions," allowing us to stress-test the Agents' reasoning capabilities before a single real user logs in.

generate_candidates.py (Async Persona Generation)

"""
Candidate Persona Generator
Utilizes asynchronous concurrency to generate detailed, story-based 
professional profiles for system testing and database seeding.
"""

import os
import json
import random
import asyncio
from openai import AsyncOpenAI
from dotenv import load_dotenv
from tqdm.asyncio import tqdm

load_dotenv()

# --- CONFIGURATION ---
NUM_TO_GENERATE = 30
MAX_CONCURRENT_REQUESTS = 10  # Lowered concurrency to handle larger JSON payloads
OUTPUT_FILE = "candidates_database.json"

client = AsyncOpenAI(
    base_url="https://api.deepseek.com",
    api_key=os.getenv("DEEPSEEK_API_KEY")
)

# --- SEED DATA ---
ROLES = ["Backend Engineer", "Data Scientist", "Product Manager", "DevOps Engineer"]
SENIORITY = ["Junior", "Senior", "Staff"]

async def generate_single_profile(profile_id: int, semaphore: asyncio.Semaphore) -> dict:
    """
    Generates a single comprehensive candidate profile via asynchronous LLM call.
    
    Args:
        profile_id: Unique integer ID for the candidate.
        semaphore: Concurrency controller to prevent API rate-limiting.
        
    Returns:
        A validated dictionary representing a candidate profile or None on failure.
    """
    async with semaphore:
        role = random.choice(ROLES)
        level = random.choice(SENIORITY)
        
        prompt = f"""
        Generate a detailed Resume Profile for a {level} {role} in JSON format.
        
        CRITICAL RULE: In the 'achievements' and 'description_bullets' fields, 
        DO NOT provide short bullets. Provide a 5-sentence story describing:
        1. A specific technical crisis or project goal.
        2. The architecture/tools the candidate chose to solve it.
        3. The measurable result (e.g., latency reduced by 50%, $2M saved).

        JSON SCHEMA:
        {{
            "full_name": "Name",
            "contact_email": "email",
            "phone": "phone",
            "linkedin": "url",
            "summary": "Summary",
            "skills": ["Python", "SQL", "Docker", "AWS"],
            "experience_history": [
                {{ "company": "Co", "role": "Role", "achievements": ["5-sentence impact story..."] }}
            ],
            "projects": [
                {{ "title": "Name", "description_bullets": ["4-sentence narrative..."] }}
            ],
            "education": [{{ "school": "Uni", "degree": "Degree", "year": "2024" }}]
        }}
        """

        try:
            response = await client.chat.completions.create(
                model="deepseek-chat",
                messages=[{"role": "user", "content": prompt}],
                temperature=0.8
            )
            
            # Extract and clean JSON content from markdown wrappers
            content = response.choices[0].message.content.replace("```json", "").replace("```", "").strip()
            data = json.loads(content)
            
            # Inject metadata for system consistency
            data['id'] = profile_id
            data['level'] = level
            return data
        except (json.JSONDecodeError, Exception) as e:
            print(f"❌ Failed to generate profile {profile_id}: {e}")
            return None

async def main():
    """
    Orchestrates the parallel generation of candidate personas.
    """
    print(f"🚀 Generating {NUM_TO_GENERATE} Narratively-Dense Candidates...")
    
    sem = asyncio.Semaphore(MAX_CONCURRENT_REQUESTS)
    
    # Initialize task list
    tasks = [generate_single_profile(i, sem) for i in range(NUM_TO_GENERATE)]
    
    # Execute with visual progress tracking
    results = await tqdm.gather(*tasks, desc="Building Personas")
    
    # Filter out failed requests and save to local JSON
    valid_profiles = [p for p in results if p is not None]
    
    with open(OUTPUT_FILE, "w") as f:
        json.dump(valid_profiles, f, indent=2)
        
    print(f"\n✅ Success: Saved {len(valid_profiles)} narrative profiles to {OUTPUT_FILE}.")

if __name__ == "__main__":
    asyncio.run(main())

generate_jobs.py (Market Simulation)

"""
Job Database Generator
Orchestrates high-concurrency asynchronous API calls to DeepSeek
to build a diverse dataset of job descriptions.
"""

import os
import json
import random
import asyncio
from openai import AsyncOpenAI
from dotenv import load_dotenv
from tqdm.asyncio import tqdm

load_dotenv()

# --- CONFIGURATION ---
NUM_TO_GENERATE = 1000
MAX_CONCURRENT_REQUESTS = 50  # Limits simultaneous API hits to prevent rate limiting
OUTPUT_FILE = "jobs_database.json"

client = AsyncOpenAI(
    base_url="https://api.deepseek.com",
    api_key=os.getenv("DEEPSEEK_API_KEY")
)

# --- SEED DATA ---
SECTORS = ["FinTech", "Healthcare", "E-commerce", "Cybersecurity", "Green Energy", 
           "Gaming", "Logistics", "EdTech", "LegalTech", "AgriTech"]

ROLES = ["Machine Learning Engineer", "Backend Developer", "Data Scientist", 
         "DevOps Engineer", "Full Stack Developer", "Product Manager", 
         "Security Analyst", "Cloud Architect"]

SENIORITY = ["Junior", "Mid-Level", "Senior", "Staff", "Lead"]

async def generate_single_jd(job_id: int, semaphore: asyncio.Semaphore) -> dict:
    """
    Generates a single job description using an asynchronous API call.
    
    Args:
        job_id: Unique identifier for the job record.
        semaphore: Controller to limit the number of concurrent tasks.
        
    Returns:
        A dictionary containing job metadata or None if the request fails.
    """
    async with semaphore:
        sector = random.choice(SECTORS)
        role = random.choice(ROLES)
        level = random.choice(SENIORITY)
        
        prompt = (
            f"Write a brief Job Description for a {level} {role} at a {sector} company. "
            "Format: Plain text. Include 1. Role Summary, 2. Key Responsibilities, 3. Tech Stack. "
            "Keep it under 100 words."
        )

        try:
            response = await client.chat.completions.create(
                model="deepseek-chat",
                messages=[{"role": "user", "content": prompt}],
                temperature=0.9
            )
            content = response.choices[0].message.content
            
            return {
                "id": job_id,
                "title": f"{level} {role}",
                "sector": sector,
                "description": content
            }
        except Exception as e:
            print(f"❌ Error on ID {job_id}: {e}")
            return None

async def main():
    """
    Main orchestrator to schedule and execute batch job generation.
    """
    print(f"🚀 Starting Async Generation: {NUM_TO_GENERATE} records...")
    
    sem = asyncio.Semaphore(MAX_CONCURRENT_REQUESTS)
    
    # Schedule all tasks into the event loop
    tasks = [generate_single_jd(i, sem) for i in range(NUM_TO_GENERATE)]

    # Execute tasks in parallel with a progress bar
    results = await tqdm.gather(*tasks, desc="Generating Jobs")

    # Filter failures and write to local storage
    valid_jobs = [job for job in results if job is not None]

    with open(OUTPUT_FILE, "w") as f:
        json.dump(valid_jobs, f, indent=2)
        
    print(f"\n✅ Success: Saved {len(valid_jobs)} jobs to {OUTPUT_FILE}")

if __name__ == "__main__":
    asyncio.run(main())

04 // Pydantic Typing & PII Protection

In enterprise software, unstructured JSON is a liability. I enforced strict Pydantic Schemas for every data exchange. This guarantees that the AI cannot output malformed data that would break the downstream PDF compiler.

Crucially, the `CandidateProfile` schema includes fields for PII (`contact_email`, `phone`) that are programmatically redacted before entering the Agent Graph, ensuring compliance with GDPR/CCPA.

schemas.py (Type Safety)

"""
Data schemas for the Thrive Wellness Career Platform.

Defines Pydantic models for candidate profiles, AI agent strategies, 
and job matching analytics to ensure strict type validation across the system.
"""

from pydantic import BaseModel, Field
from typing import List, Optional, Union


# --- 1. CANDIDATE PROFILE COMPONENTS ---

class Experience(BaseModel):
    """Represents a single professional role in a candidate's work history."""
    company: str
    location: str
    role: str
    duration: str
    achievements: List[str]


class Project(BaseModel):
    """Represents a technical or professional project in a candidate's portfolio."""
    title: str
    tech_stack: str
    date: str
    description_bullets: List[str]


class Education(BaseModel):
    """Represents a formal academic degree or certification."""
    school: str
    location: str
    degree: str
    year: str


class CandidateProfile(BaseModel):
    """The central data model for a candidate's full professional identity."""
    full_name: str
    contact_email: str
    phone: str
    linkedin: str
    summary: str
    skills: List[str]
    experience_history: List[Experience]
    projects: List[Project]
    education: List[Education]
    target_role: Optional[str] = None
    job_description: Optional[str] = None


# --- 2. AI AGENT STRATEGY MODELS ---

class ResumeStrategy(BaseModel):
    """Output schema for the Supervisor Agent's strategic planning phase."""
    reasoning_summary: str = Field(
        description="The internal logic explaining the pivot strategy."
    )
    gaps_identified: List[Union[str, dict]] = Field(
        description="List of detected skill deficiencies or narrative weaknesses."
    )
    instruction_to_writer: str = Field(
        description="Technical directives for the Writer Agent to follow."
    )
    next_action: str = Field(
        description="Determines the next node in the graph (e.g., 'delegate_to_writer')."
    )


# --- 3. ANALYTICS & SCORING MODELS ---

class JobMatchReport(BaseModel):
    """Schema for ATS-style alignment analysis against a specific job role."""
    score: int = Field(
        description="Numerical match assessment ranging from 0 to 100."
    )
    missing_keywords: List[str] = Field(
        description="Specific technical or soft skills missing from the resume."
    )
    hiring_manager_tip: str = Field(
        description="Actionable advice to make the candidate more competitive."
    )

05 // Production Roadmap

The current architecture is an MVP. The roadmap to scale this to 100k+ users involves four critical infrastructure upgrades:

1. Evaluation Pipeline (DeepEval)

We will replace manual review with CI/CD for Agents. Using a "Golden Dataset" of perfect resumes approved by human recruiters, we will run `deepeval` on every model update. If the "Hallucination Score" exceeds 0.1%, the deployment automatically rolls back.

2. Teacher-Student Distillation

To reduce inference costs while maintaining reasoning quality, we will use our Synthetic Data Engine to create a training dataset. We will distill the complex reasoning patterns of DeepSeek-R1 (Teacher) into a smaller, quantized Llama-3-8B (Student) model, hosted on our own vLLM clusters.

3. Shadow Deployment

We will implement an A/B testing framework where a "Challenger" model runs in shadow mode on 5% of traffic. We measure success not by latency, but by the business KPI: "Did the user get an interview?"

4. The Privacy Firewall (Presidio)

We will operationalize PII redaction by deploying Microsoft Presidio as a sidecar proxy. This ensures that even if a developer accidentally logs a payload, names and emails are tokenized at the network edge before they ever hit the database logs or the model.

app.py (Current Frontend Entry)

"""
Thrive Wellness Career Transition Platform - Application Entry Point
------------------------------------------------------------------
This module serves as the frontend interface for the multi-agent career 
transition system. It handles:
1. User input and profile management via Streamlit sidebar.
2. Local job database loading and vector search indexing.
3. Orchestration of Agent Graphs (Resume, Application, Search).
4. Local LaTeX compilation and PDF rendering.
5. PII Restoration and data privacy enforcement.
"""

import streamlit as st
import subprocess
import os
import json
import base64
import re
import urllib.parse
import chromadb
from chromadb.utils import embedding_functions
from dotenv import load_dotenv

load_dotenv()
from schemas import CandidateProfile, Experience, Education, Project 
from agent_graph import build_graph, build_application_graph, build_job_search_graph

# --- PAGE CONFIGURATION ---
st.set_page_config(page_title="Thrive Career Wellness", page_icon="🌿", layout="wide")

# Initialize session state counters for dynamic UI elements
if 'exp_count' not in st.session_state: st.session_state.exp_count = 1
if 'proj_count' not in st.session_state: st.session_state.proj_count = 1

# --- DATA LOADING ---
@st.cache_data
def load_jobs():
    """
    Loads the job market database from local storage.

    Returns:
        list: A list of job dictionaries. Returns a default placeholder if 
              the database file is missing.
    """
    try:
        with open("jobs_database.json", "r") as f: 
            return json.load(f)
    except FileNotFoundError: 
        return [{"id": 0, "title": "Default", "sector": "Tech", "description": "N/A"}]

@st.cache_data
def load_candidates():
    """
    Loads pre-defined candidate personas for demonstration purposes.

    Returns:
        list: A list of candidate profile dictionaries.
    """
    try:
        with open("candidates_database.json", "r") as f: 
            return json.load(f)
    except FileNotFoundError: 
        return []

jobs_db = load_jobs()
candidates_db = load_candidates()

# --- VECTOR DATABASE ---
@st.cache_resource
class JobBoard:
    """
    Manages vector search operations for job market discovery.

    Attributes:
        client: The ChromaDB persistent client.
        collection: The specific document collection for job embeddings.
    """
    def __init__(self):
        self.client = chromadb.PersistentClient(path="./chroma_db")
        self.collection = self.client.get_or_create_collection(
            name="job_market", 
            embedding_function=embedding_functions.DefaultEmbeddingFunction()
        )
        if self.collection.count() == 0: 
            self._index_jobs()

    def _index_jobs(self):
        """
        Populates the vector database with job descriptions from the local store.
        """
        ids = [str(j['id']) for j in jobs_db]
        docs = [f"{j['title']} - {j['description']}" for j in jobs_db]
        metadatas = [{"title": j['title'], "sector": j.get('sector', 'General'), "id": j['id']} for j in jobs_db]
        self.collection.add(documents=docs, metadatas=metadatas, ids=ids)

    def recommend_jobs(self, resume_text, top_k=50):
        """
        Performs semantic similarity search against the job database.

        Args:
            resume_text (str): The candidate's resume content or summary.
            top_k (int): Number of results to retrieve.

        Returns:
            list: A list of job dictionaries matching the semantic query.
        """
        results = self.collection.query(query_texts=[resume_text], n_results=top_k)
        jobs = []
        if results['ids']:
            for i in range(len(results['ids'][0])):
                meta = results['metadatas'][0][i]
                jobs.append({
                    "id": meta['id'], 
                    "title": meta['title'], 
                    "sector": meta['sector'], 
                    "description": results['documents'][0][i]
                })
        return jobs

# --- LOCAL LATEX COMPILER ---
def compile_latex_local(latex_code):
    """
    Compiles LaTeX source into PDF using the local Tectonic engine.

    Args:
        latex_code (str): The raw LaTeX source string.

    Returns:
        tuple: (bytes, None) on success, or (None, str) containing stderr on failure.
    """
    with open("resume.tex", "w", encoding="utf-8") as f: 
        f.write(latex_code)
    try:
        res = subprocess.run(
            [os.path.join(os.getcwd(), "tectonic.exe"), "resume.tex"], 
            capture_output=True, 
            text=True            
        )
        if res.returncode == 0:
            with open("resume.pdf", "rb") as f: 
                return f.read(), None
        else:
            return None, res.stderr
    except Exception as e: 
        return None, str(e)

# --- SIDEBAR: PROFILE MANAGEMENT ---
with st.sidebar:
    st.header("1. Member Profile")
    persona_names = ["New Blank Profile"] + [f"{p.get('level', 'N/A')} - {p['full_name']}" for p in candidates_db]
    selected_persona = st.selectbox("Select Candidate:", persona_names)

    if selected_persona == "New Blank Profile":
        p_data = {"full_name": "", "contact_email": "", "phone": "", "linkedin": "", 
                  "experience_history": [], "projects": [], 
                  "education": [{"school": "", "degree": "", "year": ""}]}
        st.session_state.exp_count = 1
        st.session_state.proj_count = 1
    else:
        p_data = candidates_db[persona_names.index(selected_persona) - 1]
        st.session_state.exp_count = max(1, len(p_data.get('experience_history', [])))
        st.session_state.proj_count = max(1, len(p_data.get('projects', [])))

    full_name = st.text_input("Full Name", p_data['full_name'], key=f"fn_{selected_persona}")
    email = st.text_input("Email", p_data['contact_email'], key=f"em_{selected_persona}")
    phone = st.text_input("Phone", p_data['phone'], key=f"ph_{selected_persona}")
    linkedin = st.text_input("LinkedIn", p_data.get('linkedin', ""), key=f"li_{selected_persona}")

    with st.expander("💼 Work History", expanded=True):
        for i in range(st.session_state.exp_count):
            st.markdown(f"**Job #{i+1}**")
            d_exp = p_data['experience_history'][i] if i < len(p_data['experience_history']) else {"company": "", "role": "", "achievements": [""]}
            st.text_input("Company", key=f"comp_{selected_persona}_{i}", value=d_exp['company'])
            st.text_input("Role", key=f"role_{selected_persona}_{i}", value=d_exp['role'])
            st.text_area("Impact Story", key=f"ach_{selected_persona}_{i}", value="\n".join(d_exp['achievements']))
        if st.button("➕ Add Job"): 
            st.session_state.exp_count += 1
            st.rerun()

    with st.expander("🚀 Projects", expanded=False):
        for j in range(st.session_state.proj_count):
            st.markdown(f"**Project #{j+1}**")
            d_pj = p_data['projects'][j] if j < len(p_data['projects']) else {"title": "", "description_bullets": [""]}
            st.text_input("Title", key=f"ptit_{selected_persona}_{j}", value=d_pj['title'])
            st.text_area("Story", key=f"pdesc_{selected_persona}_{j}", value="\n".join(d_pj['description_bullets']))
        if st.button("➕ Add Project"): 
            st.session_state.proj_count += 1
            st.rerun()

    with st.expander("🎓 Education", expanded=False):
        d_edu = p_data['education'][0] if p_data.get('education') else {"school": "", "degree": "", "year": ""}
        school = st.text_input("University", d_edu['school'], key=f"sch_{selected_persona}")
        degree = st.text_input("Degree", d_edu['degree'], key=f"deg_{selected_persona}")
        grad_year = st.text_input("Year", d_edu['year'], key=f"yr_{selected_persona}")

    st.header("2. Target Role")
    sects = sorted(list(set(j['sector'] for j in jobs_db)))
    sel_sect = st.selectbox("Industry Sector", sects)
    filt_jobs = [j for j in jobs_db if j['sector'] == sel_sect]
    sel_job_t = st.selectbox("Role Template", [f"{j['title']} (ID: {j['id']})" for j in filt_jobs])
    sel_job_obj = next(j for j in filt_jobs if f"{j['title']} (ID: {j['id']})" == sel_job_t)
    target_role = st.text_input("Target Role Name", sel_job_obj['title'])
    job_desc = st.text_area("Job Description", sel_job_obj['description'], height=150)
    
    submit = st.button("🚀 Generate Strategic Resume", type="primary")

# --- EXECUTION LAYER ---
if submit:
    # Collect UI Data
    exps = []
    for i in range(st.session_state.exp_count):
        k_comp, k_role, k_ach = f"comp_{selected_persona}_{i}", f"role_{selected_persona}_{i}", f"ach_{selected_persona}_{i}"
        c = st.session_state[k_comp] if k_comp in st.session_state else p_data['experience_history'][i]['company']
        r = st.session_state[k_role] if k_role in st.session_state else p_data['experience_history'][i]['role']
        a = st.session_state[k_ach] if k_ach in st.session_state else "\n".join(p_data['experience_history'][i]['achievements'])
        exps.append(Experience(company=c, location="N/A", role=r, duration="N/A", achievements=[a]))

    projs = []
    for j in range(st.session_state.proj_count):
        k_tit, k_desc = f"ptit_{selected_persona}_{j}", f"pdesc_{selected_persona}_{j}"
        t = st.session_state[k_tit] if k_tit in st.session_state else p_data['projects'][j]['title']
        d = st.session_state[k_desc] if k_desc in st.session_state else "\n".join(p_data['projects'][j]['description_bullets'])
        projs.append(Project(title=t, tech_stack="N/A", date="N/A", description_bullets=[d]))

    k_sch, k_deg, k_yr = f"sch_{selected_persona}", f"deg_{selected_persona}", f"yr_{selected_persona}"
    edu_school = st.session_state[k_sch] if k_sch in st.session_state else p_data['education'][0]['school']
    edu_degree = st.session_state[k_deg] if k_deg in st.session_state else p_data['education'][0]['degree']
    edu_year = st.session_state[k_yr] if k_yr in st.session_state else p_data['education'][0]['year']

    profile = CandidateProfile(
        full_name=full_name, contact_email=email, phone=phone, linkedin=linkedin, 
        summary=f"Strategic {target_role} transition.", skills=[], 
        experience_history=exps, projects=projs, 
        education=[Education(school=edu_school, location="N/A", degree=edu_degree, year=edu_year)], 
        target_role=target_role, job_description=job_desc
    )
    st.session_state.profile = profile

    # Orchestrate Multi-Agent Workflow
    with st.status("🤖 AI Agents Initializing...", expanded=True) as status:
        st.write("🧠 Reasoning through career pivot strategy...")
        app = build_graph()
        st.session_state.result = app.invoke({"profile": profile, "target_role": target_role})
        
        st.write("🎯 Mapping broad market opportunities...")
        board = JobBoard()
        raw = board.recommend_jobs(target_role, top_k=50)
        search_agent = build_job_search_graph()
        matches = search_agent.invoke({"profile": profile, "raw_jobs": raw, "ranked_jobs": []})
        st.session_state.ranked_results = matches['ranked_jobs']
        
        status.update(label="✅ Ready!", state="complete")
    st.rerun()

# --- PRESENTATION LAYER ---
if 'result' in st.session_state and st.session_state.result:
    res = st.session_state.result
    prof = st.session_state.profile
    
    st.divider()
    st.header("📋 Strategic Career Assets")
    
    with st.expander("🧠 View AI Reasoning & Pivot Strategy", expanded=True):
        st.write(res['strategy'].reasoning_summary)

    # PII RESTORATION: Double-pass loop to handle AI-escaped placeholders
    raw_latex = res['final_draft']
    pii_map = {
        "NAME": prof.full_name,
        "EMAIL": prof.contact_email,
        "NUMBER": prof.phone,
        "URL": prof.linkedin
    }

    clean_latex = raw_latex
    for key, value in pii_map.items():
        # Pass 1: Standard Placeholders
        clean_latex = clean_latex.replace(f"[CANDIDATE_{key}]", value)\
                                 .replace(f"[CONTACT_{key}]", value)\
                                 .replace(f"[PHONE_{key}]", value)\
                                 .replace(f"[LINKEDIN_{key}]", value)
        
        # Pass 2: Escaped Placeholders (Latex safe)
        clean_latex = clean_latex.replace(f"[CANDIDATE\\_{key}]", value)\
                                 .replace(f"[CONTACT\\_{key}]", value)\
                                 .replace(f"[PHONE\\_{key}]", value)\
                                 .replace(f"[LINKEDIN\\_{key}]", value)

    c1, c2 = st.columns(2)
    with c1:
        st.subheader("LaTeX Source Editor")
        edited = st.text_area("Source Code", value=clean_latex, height=600, key="latex_editor")
    with c2:
        st.subheader("Resume Preview")
        if st.button("📄 Compile PDF Document"):
            with st.spinner("Generating PDF..."):
                pdf_bytes, error_log = compile_latex_local(edited)
                
                if pdf_bytes:
                    b64 = base64.b64encode(pdf_bytes).decode()
                    st.markdown(f'<iframe src="data:application/pdf;base64,{b64}" width="100%" height="600"></iframe>', unsafe_allow_html=True)
                else:
                    st.error("LaTeX Compilation Failed")
                    with st.expander("🔍 View Compiler Error Log"):
                        st.code(error_log)

    if 'ranked_results' in st.session_state:
        st.divider()
        st.header("🎯 Thrive Job Match (Top 10)")
        m1, m2 = st.columns(2)
        for i, job in enumerate(st.session_state.ranked_results):
            job_id = job['id']
            with (m1 if i % 2 == 0 else m2):
                with st.container(border=True):
                    st.subheader(job['title'])
                    st.caption(f"Sector: {job['sector']} | ID: {job_id}")
                    with st.expander("🔍 Full Responsibilities & Details"): 
                        st.write(job['description'])
                    
                    ce, cp = st.columns([1, 1.2])
                    
                    with ce:
                        cl_base = st.session_state.get(f"packet_{job_id}", f"Hi, I'm {prof.full_name}, I'm writing to express interest in the {job['title']} role...")
                        cl_clean = cl_base.replace("[CANDIDATE_NAME]", prof.full_name)\
                                          .replace("[CONTACT_EMAIL]", prof.contact_email)
                        
                        subj = urllib.parse.quote(f"Interest in {job['title']}")
                        mailto = f"mailto:hiring@thrive.com?subject={subj}&body={urllib.parse.quote(cl_clean)}"
                        st.link_button("📧 Email Manager", mailto, use_container_width=True)
                    
                    with cp:
                        if st.button(f"🤖 Prep Cover Letter", key=f"prep_{job_id}", use_container_width=True):
                            with st.spinner("Tailoring cover letter..."):
                                
                                # 🛡️ PII GUARD: Create anonymized profile for the agent
                                safe_prof = prof.model_copy()
                                safe_prof.full_name = "[CANDIDATE_NAME]"
                                safe_prof.contact_email = "[CONTACT_EMAIL]"
                                safe_prof.phone = "[PHONE_NUMBER]"
                                safe_prof.linkedin = "[LINKEDIN_URL]"

                                app_res = build_application_graph().invoke({
                                    "job_title": job['title'], 
                                    "job_description": job['description'], 
                                    "candidate_profile": safe_prof.model_dump_json(), # Safe JSON
                                    "cover_letter": "", "status": "pending"
                                })
                                st.session_state[f"packet_{job_id}"] = app_res['cover_letter']
                                st.rerun()

                    if f"packet_{job_id}" in st.session_state:
                        st.info("📑 Tailored Application Ready")
                        final_cl = st.session_state[f"packet_{job_id}"]
                        for key, value in pii_map.items():
                             final_cl = final_cl.replace(f"[CANDIDATE_{key}]", value)\
                                                .replace(f"[CONTACT_{key}]", value)
                        st.text_area("Cover Letter", value=final_cl, height=200, key=f"text_{job_id}")

06 // Unit Economics: The "Staff Engineer" Pivot

The difference between a hobby project and a sustainable business is cost structure. Scaling to 100k users using a raw API wrapper (Scenario A) creates a dangerous variable cost liability. Distillation (Scenario B) converts this into a predictable fixed cost.

SCENARIO A: API WRAPPER

The Variable Cost Trap

Passing every prompt to DeepSeek/OpenAI.

Cost per Resume: $0.02
100k Users/Mo: $2,000/mo
Risk: Uncapped

SCENARIO B: DISTILLED

The Fixed Cost Win

Self-hosting Llama-3 (Student) on vLLM.

Training Cost: $60 (One-time)
Hosting (T4 GPU): $250/mo (Flat)
100k Users/Mo: $0.0025 each

Tech Stack

> Orchestration: LangGraph (Multi-Agent State Machine)
> Validation: Pydantic (Strict Schemas)
> Simulation: Asyncio + Faker (Synthetic Data)
> Vector Search: ChromaDB (Prototype) -> Pinecone (Prod)
> Inference: vLLM / Ray Serve (Planned)

Thrive Career Wellness: Automated Outplacement Platform