Overview
Purpose and Scope
TalentSync is an AI-powered hiring intelligence platform that automates resume analysis, candidate evaluation, and job application workflows for both job seekers and recruiters. This document provides a high-level introduction to the system architecture, target users, core capabilities, and technology stack.
For detailed information on specific subsystems:
- Backend service implementation details, see Backend Services
- Frontend application structure, see Frontend Application
- Database schema and data models, see Database & Data Models
- Deployment infrastructure, see Deployment & Infrastructure
System Architecture Overview
TalentSync implements a three-tier architecture consisting of a Next.js frontend, FastAPI backend, and PostgreSQL database, with extensive integration of AI/ML services.
System Components
Frontend Application Stack
| Component | Technology | Location | Purpose |
|---|---|---|---|
| Framework | Next.js 14+ | frontend/app/ |
Server-side rendering, routing |
| Authentication | NextAuth.js | frontend/app/api/auth/ |
Multi-provider auth (email, OAuth) |
| Database ORM | Prisma Client | frontend/prisma/ |
Type-safe database access |
| PWA Support | Workbox | Service worker | Offline capabilities, caching |
| Analytics | PostHog | Client-side | Product analytics |
| UI Framework | React + TypeScript | frontend/components/ |
Component architecture |
Backend Service Stack
| Component | Technology | Location | Purpose |
|---|---|---|---|
| API Framework | FastAPI | backend/server.py |
REST API endpoints |
| LLM Integration | LangChain + Gemini 2.0 | backend/server.py:68-86 |
Prompt engineering, LLM calls |
| Workflow Engine | LangGraph | Backend services | Multi-step AI workflows |
| NLP Processing | spaCy + NLTK | backend/server.py:702-711 |
Text cleaning, lemmatization |
| ML Classifier | scikit-learn | backend/app/model/best_model.pkl |
Job category prediction |
| Document Parsing | PyPDF2, python-docx | backend/server.py:752-793 |
Resume text extraction |
| GitHub Analysis | gitingest | backend/app/agents/github_agent.py |
Repository ingestion |
Target Users and Use Cases
TalentSync serves two distinct user segments with different feature sets:
Job Seeker Features
| Feature | Description | Primary Backend Endpoint |
|---|---|---|
| Resume Analysis | ML-based job field prediction, skills extraction, work experience parsing | POST /analyze_resume |
| ATS Evaluation | Score resume against job descriptions, provide improvement suggestions | POST /evaluate_resume_ats |
| Cold Mail Generation | AI-generated personalized cold emails with company research | POST /v2/generate_cold_mail |
| Hiring Assistant | Generate interview answers based on resume and job context | POST /generate_answer |
| LinkedIn Posts | Create LinkedIn content from projects/achievements with GitHub integration | POST /generate_linkedin_posts |
| Career Tips | Personalized resume and interview tips based on job category | GET /tips |
Recruiter Features
| Feature | Description | Primary Backend Endpoint |
|---|---|---|
| Bulk Resume Processing | Upload ZIP files with multiple resumes for parallel processing | POST /bulk_upload |
| Candidate Dashboard | Structured view of parsed candidates with filtering and ranking | Database query via Prisma |
| ATS Evaluation | Evaluate candidate resumes against job requirements | POST /evaluate_resume_ats |
Core Processing Pipeline
The resume analysis pipeline demonstrates the integration of traditional ML and modern LLM techniques:
Processing Stages
- Text Extraction (backend/server.py752-793): Handles PDF, DOCX, and TXT formats using PyPDF2 and python-docx libraries
- Text Cleaning (backend/server.py738-749): Applies spaCy lemmatization and NLTK stopword removal
- ML Classification (backend/server.py714-735): Uses pre-trained
best_model.pkl(25 job categories) with TF-IDF features - Field Extraction (backend/server.py923-1029): Regex-based extraction of name, email, skills, work experience, projects
- LLM Enhancement (backend/server.py373-462): Google Gemini 2.0 Flash generates structured analysis via
ComprehensiveAnalysisDataschema - Storage (Prisma ORM): Persists both raw text and structured analysis in PostgreSQL
Processing Time: Typically 4-8 seconds for complete analysis including LLM calls.
AI/ML Technology Stack
TalentSync employs a hybrid approach combining traditional machine learning with large language models:
| Component | Technology | Model/Version | Use Case |
|---|---|---|---|
| Text Classification | scikit-learn | GradientBoostingClassifier | Predict job field from resume text |
| Feature Extraction | scikit-learn | TfidfVectorizer | Convert text to ML features |
| NLP Processing | spaCy | en_core_web_sm-3.8.0 | Lemmatization, tokenization |
| Text Cleaning | NLTK | stopwords corpus | Remove common words |
| LLM Inference | Google Generative AI | gemini-2.0-flash | Structured data extraction |
| Prompt Engineering | LangChain | v0.3.25+ | Template management, output parsing |
| Workflow Orchestration | LangGraph | v0.2.38+ | Multi-step agent workflows |
| Web Search | Tavily | v0.7.12+ | Company research, job insights |
| Content Extraction | Jina AI | r.jina.ai API | Markdown from URLs |
| Repository Analysis | gitingest | v0.3.1+ | GitHub code ingestion |
LLM Configuration
The system uses Google's Gemini 2.0 Flash model with the following configuration:
# From backend/server.py:76-80
llm = ChatGoogleGenerativeAI(
model="gemini-2.0-flash",
google_api_key=google_api_key,
temperature=0.1,
)
Temperature: Set to 0.1 for deterministic, factual outputs suitable for structured data extraction.
Data Models
Pydantic Schemas
The backend uses Pydantic for request/response validation and data structuring:
| Schema | Purpose | Location | Key Fields |
|---|---|---|---|
ComprehensiveAnalysisData |
Structured resume analysis output | backend/server.py198-208 | skills_analysis, recommended_roles, work_experience, projects, education |
WorkExperienceEntry |
Work history entry | backend/server.py88-92 | role, company, duration, description |
ProjectEntry |
Project details | backend/server.py95-98 | title, technologies_used, description |
SkillProficiency |
Skill with proficiency level | backend/server.py173-175 | skill_name, percentage |
LanguageEntry |
Language proficiency | backend/server.py190-191 | language |
EducationEntry |
Educational qualification | backend/server.py194-195 | education_detail |
Database Schema
The PostgreSQL database uses Prisma ORM for type-safe access. Key tables include:
- User: Authentication and profile data
- Resume: Uploaded resume files and metadata
- Analysis: Structured analysis results linked to resumes
- Session: NextAuth session management
- VerificationToken: Email verification tokens
For complete schema details, see Database & Data Models.
Deployment Architecture
TalentSync is deployed as a multi-container Docker Compose application:
Container Specifications
| Container | Base Image | Runtime | Build Command | Startup Command |
|---|---|---|---|---|
| Frontend | oven/bun:1-slim |
Bun | bun install && bun run build |
bun run start |
| Backend | python:3.13-slim |
Python 3.13 | uv sync |
uvicorn backend.server:app --host 0.0.0.0 --port 8000 |
| Database | postgres:16 |
PostgreSQL 16 | N/A (official image) | N/A (managed by Docker) |
CI/CD Pipeline
GitHub Actions automatically deploys on push to main branch:
- SSH into VPS
- Pull latest code
- Rebuild Docker images
- Restart containers with zero downtime
For detailed deployment configuration, see Deployment & Infrastructure.
Key Differentiators
TalentSync distinguishes itself through several technical capabilities:
| Feature | Implementation | Benefit |
|---|---|---|
| Hybrid ML+LLM Pipeline | scikit-learn classifier → Gemini 2.0 | Fast categorization + deep semantic understanding |
| Context-Aware Analysis | LangChain prompt templates | Domain-specific extraction and recommendations |
| GitHub Integration | gitingest library + Jina AI | Analyze projects from repositories for LinkedIn content |
| Multi-Step Workflows | LangGraph state machines | Complex agent behaviors (ATS evaluation, research) |
| Dual-Sided Platform | Seeker + Recruiter features | Network effects from both supply and demand |
| PWA Capabilities | Workbox service worker | Offline access, app-like experience |
| Type Safety | TypeScript + Pydantic | Reduced runtime errors, better DX |
System Metrics
Based on platform usage as displayed on landing pages:
| Metric | Value | Context |
|---|---|---|
| Resumes Parsed | 12,000+ | Total documents processed |
| Average Time Saved | 6 hours/week | Per user estimate |
| Generated Assets | 30,000+ | Cold emails, LinkedIn posts, tips |
| Job Categories | 25 | ML classifier output classes |
| Processing Time | 4-8 seconds | Complete resume analysis |
Market Context (from about page):
- AI in HR market: $6.05B (2024) → $14.08B (2029), 19.1% CAGR
- Resume parsing market: $20.19B (2024) → $43.20B (2029), 114% growth
Next Steps
For detailed information on specific subsystems:
- Backend Implementation: See Backend Services for API endpoints, ML services, and agent architecture
- Frontend Features: See Frontend Application for page structure, authentication, and UI components
- Data Layer: See Database & Data Models for schema design and relationships
- Operations: See Deployment & Infrastructure for Docker setup, CI/CD, and monitoring