Overview

Purpose and Scope

TalentSync is an AI-powered hiring intelligence platform that automates resume analysis, candidate evaluation, and job application workflows for both job seekers and recruiters. This document provides a high-level introduction to the system architecture, target users, core capabilities, and technology stack.

For detailed information on specific subsystems:

Backend service implementation details, see Backend Services
Frontend application structure, see Frontend Application
Database schema and data models, see Database & Data Models
Deployment infrastructure, see Deployment & Infrastructure

System Architecture Overview

TalentSync implements a three-tier architecture consisting of a Next.js frontend, FastAPI backend, and PostgreSQL database, with extensive integration of AI/ML services.

System Components

Frontend Application Stack

Component	Technology	Location	Purpose
Framework	Next.js 14+	`frontend/app/`	Server-side rendering, routing
Authentication	NextAuth.js	`frontend/app/api/auth/`	Multi-provider auth (email, OAuth)
Database ORM	Prisma Client	`frontend/prisma/`	Type-safe database access
PWA Support	Workbox	Service worker	Offline capabilities, caching
Analytics	PostHog	Client-side	Product analytics
UI Framework	React + TypeScript	`frontend/components/`	Component architecture

Backend Service Stack

Component	Technology	Location	Purpose
API Framework	FastAPI	`backend/server.py`	REST API endpoints
LLM Integration	LangChain + Gemini 2.0	`backend/server.py:68-86`	Prompt engineering, LLM calls
Workflow Engine	LangGraph	Backend services	Multi-step AI workflows
NLP Processing	spaCy + NLTK	`backend/server.py:702-711`	Text cleaning, lemmatization
ML Classifier	scikit-learn	`backend/app/model/best_model.pkl`	Job category prediction
Document Parsing	PyPDF2, python-docx	`backend/server.py:752-793`	Resume text extraction
GitHub Analysis	gitingest	`backend/app/agents/github_agent.py`	Repository ingestion

Target Users and Use Cases

TalentSync serves two distinct user segments with different feature sets:

Job Seeker Features

Feature	Description	Primary Backend Endpoint
Resume Analysis	ML-based job field prediction, skills extraction, work experience parsing	`POST /analyze_resume`
ATS Evaluation	Score resume against job descriptions, provide improvement suggestions	`POST /evaluate_resume_ats`
Cold Mail Generation	AI-generated personalized cold emails with company research	`POST /v2/generate_cold_mail`
Hiring Assistant	Generate interview answers based on resume and job context	`POST /generate_answer`
LinkedIn Posts	Create LinkedIn content from projects/achievements with GitHub integration	`POST /generate_linkedin_posts`
Career Tips	Personalized resume and interview tips based on job category	`GET /tips`

Recruiter Features

Feature	Description	Primary Backend Endpoint
Bulk Resume Processing	Upload ZIP files with multiple resumes for parallel processing	`POST /bulk_upload`
Candidate Dashboard	Structured view of parsed candidates with filtering and ranking	Database query via Prisma
ATS Evaluation	Evaluate candidate resumes against job requirements	`POST /evaluate_resume_ats`

Core Processing Pipeline

The resume analysis pipeline demonstrates the integration of traditional ML and modern LLM techniques:

Processing Stages

Text Extraction (backend/server.py752-793): Handles PDF, DOCX, and TXT formats using PyPDF2 and python-docx libraries
Text Cleaning (backend/server.py738-749): Applies spaCy lemmatization and NLTK stopword removal
ML Classification (backend/server.py714-735): Uses pre-trained best_model.pkl (25 job categories) with TF-IDF features
Field Extraction (backend/server.py923-1029): Regex-based extraction of name, email, skills, work experience, projects
LLM Enhancement (backend/server.py373-462): Google Gemini 2.0 Flash generates structured analysis via ComprehensiveAnalysisData schema
Storage (Prisma ORM): Persists both raw text and structured analysis in PostgreSQL

Processing Time: Typically 4-8 seconds for complete analysis including LLM calls.

AI/ML Technology Stack

TalentSync employs a hybrid approach combining traditional machine learning with large language models:

Component	Technology	Model/Version	Use Case
Text Classification	scikit-learn	GradientBoostingClassifier	Predict job field from resume text
Feature Extraction	scikit-learn	TfidfVectorizer	Convert text to ML features
NLP Processing	spaCy	en_core_web_sm-3.8.0	Lemmatization, tokenization
Text Cleaning	NLTK	stopwords corpus	Remove common words
LLM Inference	Google Generative AI	gemini-2.0-flash	Structured data extraction
Prompt Engineering	LangChain	v0.3.25+	Template management, output parsing
Workflow Orchestration	LangGraph	v0.2.38+	Multi-step agent workflows
Web Search	Tavily	v0.7.12+	Company research, job insights
Content Extraction	Jina AI	r.jina.ai API	Markdown from URLs
Repository Analysis	gitingest	v0.3.1+	GitHub code ingestion

LLM Configuration

The system uses Google's Gemini 2.0 Flash model with the following configuration:

# From backend/server.py:76-80
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    google_api_key=google_api_key,
    temperature=0.1,
)

Temperature: Set to 0.1 for deterministic, factual outputs suitable for structured data extraction.

Data Models

Pydantic Schemas

The backend uses Pydantic for request/response validation and data structuring:

Schema	Purpose	Location	Key Fields
`ComprehensiveAnalysisData`	Structured resume analysis output	backend/server.py198-208	`skills_analysis`, `recommended_roles`, `work_experience`, `projects`, `education`
`WorkExperienceEntry`	Work history entry	backend/server.py88-92	`role`, `company`, `duration`, `description`
`ProjectEntry`	Project details	backend/server.py95-98	`title`, `technologies_used`, `description`
`SkillProficiency`	Skill with proficiency level	backend/server.py173-175	`skill_name`, `percentage`
`LanguageEntry`	Language proficiency	backend/server.py190-191	`language`
`EducationEntry`	Educational qualification	backend/server.py194-195	`education_detail`

Database Schema

The PostgreSQL database uses Prisma ORM for type-safe access. Key tables include:

User: Authentication and profile data
Resume: Uploaded resume files and metadata
Analysis: Structured analysis results linked to resumes
Session: NextAuth session management
VerificationToken: Email verification tokens

For complete schema details, see Database & Data Models.

Deployment Architecture

TalentSync is deployed as a multi-container Docker Compose application:

Container Specifications

Container	Base Image	Runtime	Build Command	Startup Command
Frontend	`oven/bun:1-slim`	Bun	`bun install && bun run build`	`bun run start`
Backend	`python:3.13-slim`	Python 3.13	`uv sync`	`uvicorn backend.server:app --host 0.0.0.0 --port 8000`
Database	`postgres:16`	PostgreSQL 16	N/A (official image)	N/A (managed by Docker)

CI/CD Pipeline

GitHub Actions automatically deploys on push to main branch:

SSH into VPS
Pull latest code
Rebuild Docker images
Restart containers with zero downtime

For detailed deployment configuration, see Deployment & Infrastructure.

Key Differentiators

TalentSync distinguishes itself through several technical capabilities:

Feature	Implementation	Benefit
Hybrid ML+LLM Pipeline	scikit-learn classifier → Gemini 2.0	Fast categorization + deep semantic understanding
Context-Aware Analysis	LangChain prompt templates	Domain-specific extraction and recommendations
GitHub Integration	gitingest library + Jina AI	Analyze projects from repositories for LinkedIn content
Multi-Step Workflows	LangGraph state machines	Complex agent behaviors (ATS evaluation, research)
Dual-Sided Platform	Seeker + Recruiter features	Network effects from both supply and demand
PWA Capabilities	Workbox service worker	Offline access, app-like experience
Type Safety	TypeScript + Pydantic	Reduced runtime errors, better DX

System Metrics

Based on platform usage as displayed on landing pages:

Metric	Value	Context
Resumes Parsed	12,000+	Total documents processed
Average Time Saved	6 hours/week	Per user estimate
Generated Assets	30,000+	Cold emails, LinkedIn posts, tips
Job Categories	25	ML classifier output classes
Processing Time	4-8 seconds	Complete resume analysis

Market Context (from about page):

AI in HR market: $6.05B (2024) → $14.08B (2029), 19.1% CAGR
Resume parsing market: $20.19B (2024) → $43.20B (2029), 114% growth

Next Steps