Manager Rolling Operations Architecture

Overview

The Manager Rolling Operations system provides a robust, fault-tolerant mechanism for transferring sites between manager departments. This document describes the high-level architecture, data flows, and system components involved in rolling operations.

System Architecture

graph TB
    subgraph "Client Layer"
        UI[Web UI/Admin Portal]
    end

    subgraph "API Gateway"
        Route["/departments/managers/departments/rolling/*"]
    end

    subgraph "Controller Layer"
        MC[ManagerController]
    end

    subgraph "Business Logic Layer"
        MRU[ManagerRollingUsecase]
        MU[ManagerUsecase]
        RSM[RollbackStateManager]
    end

    subgraph "Repository Layer"
        ROR[RollingOperationRepo]
        DR[DepartmentRepo]
        SR[SiteRepo]
        DIR[DepartmentIndexRepo]
        SSR[SiteScheduleIndexRepo]
        QR[QuestionnaireRepo]
        QIR[QuestionnaireIndexRepo]
        UR[UserRepo]
        CTR[CloudTaskRepo]
    end

    subgraph "Data Layer"
        MongoDB[(MongoDB Collections)]
        GCT[Google Cloud Tasks]
    end

    UI --> Route
    Route --> MC
    MC --> MRU
    MC --> MU
    MRU --> RSM
    MRU --> ROR
    MRU --> DR
    MRU --> SR
    MRU --> DIR
    MRU --> SSR
    MRU --> QR
    MRU --> QIR
    MRU --> UR
    MRU --> CTR

    ROR --> MongoDB
    DR --> MongoDB
    SR --> MongoDB
    DIR --> MongoDB
    SSR --> MongoDB
    QR --> MongoDB
    QIR --> MongoDB
    UR --> MongoDB
    CTR --> GCT

API Endpoints

Manager Department APIs

EndpointMethodPurposeCollections Touched
/departments/managersGETList manager departmentsdepartments, sites, users
/departments/managersPOSTCreate manager departmentdepartments, departmentIndex, sites, users
/departments/managers/:idGETGet department detailsdepartments, sites, users
/departments/managers/:idPUTUpdate departmentdepartments, departmentIndex, questionnaires
/departments/managers/:id/toggle-statusPUTToggle statusdepartments, departmentIndex
/departments/managers/:id/sitesGETGet department sitessites, departments
/departments/managers/department/makemanager/:idPUTMake department managerdepartments
/departments/managers/departments/makemanagerPOSTBulk convert departmentsdepartments
/departments/managers/users/makemanager/:idPUTMake user managerusers
/departments/managers/users/makemanagerPOSTBulk convert usersusers

Rolling Operation APIs

EndpointMethodPurposeCollections Touched
/departments/managers/departments/rolling/executePOSTExecute rollingAll (see phases below)
/departments/managers/departments/rolling/statusGETGet current statusrollingOperations
/departments/managers/departments/rolling/eligiblePOSTGet eligible departmentssites, departments
/departments/managers/departments/rolling/status/:idGETGet operation statusrollingOperations
/departments/managers/departments/rolling/:id/cancelDELETECancel operationrollingOperations, cloudTasks

Rolling Operation Flow

sequenceDiagram
    participant Client
    participant API
    participant Controller
    participant Usecase
    participant Repos
    participant DB
    participant CloudTask

    Client->>API: POST /rolling/execute
    API->>Controller: executeManagerRolling()
    Controller->>Usecase: executeRolling()

    alt Immediate Execution
        Usecase->>DB: Create operation record
        Usecase->>Usecase: processRollingAsync()
        Usecase-->>Controller: Return operationID
        Controller-->>Client: { status: "processing" }

        Note over Usecase: Background Processing
        Usecase->>Usecase: processRolling()
        loop 6 Phases
            Usecase->>Repos: Update collections
            Repos->>DB: Persist changes
        end
    else Scheduled Execution
        Usecase->>CloudTask: Schedule task
        CloudTask-->>Usecase: Task name
        Usecase-->>Controller: Return operationID
        Controller-->>Client: { status: "queued" }
    end

Processing Phases

Phase Architecture

graph LR
    subgraph "Phase 1: State Collection"
        CS[Collect Current State]
        CS --> SS[Sites State]
        CS --> DS[DepartmentIndex State]
        CS --> SCS[SiteScheduleIndex State]
        CS --> QS[Questionnaire State]
    end

    subgraph "Phase 2-5: Updates"
        US[Update Sites]
        UD[Update DepartmentIndex]
        USS[Update SiteScheduleIndex]
        UQ[Update Questionnaires]
    end

    subgraph "Phase 6: Completion"
        MC[Mark Complete]
    end

    CS --> US
    US --> UD
    UD --> USS
    USS --> UQ
    UQ --> MC

Phase Details

Phase 1: Current State Collection

Purpose: Backup all data that will be modified for potential rollback

graph TB
    subgraph "State Collection"
        A[collectCurrentState] --> B[Sites Collection]
        A --> C[DepartmentIndex Collection]
        A --> D[SiteScheduleIndex Collection]
        A --> E[Questionnaire Collection]

        B --> B1[departments mapping]
        B --> B2[departmentList array]
        B --> B3[owner field]
        B --> B4[supervisors array]

        C --> C1[Roll-out dept sites]
        C --> C2[Roll-in dept sites]
        C --> C3[User assignments]

        D --> D1[Schedule assignments]
        D --> D2[Email targets]
        D --> D3[Default issue owner]

        E --> E1[Auto-assignments]
        E --> E2[Questionnaire versions]
        E --> E3[Latest pointers]
    end

Phase 2: Sites Collection Update

Collections: sites Operations:

  • Transfer department ownership from roll-out to roll-in
  • Update departmentList arrays
  • Maintain site integrity

Phase 3: DepartmentIndex Collection Update

Collections: departmentIndex Operations:

  • Remove sites from roll-out department
  • Add sites to roll-in department
  • Create index if doesn’t exist

Phase 4: SiteScheduleIndex Collection Update

Collections: siteScheduleIndex Operations:

  • Update departmentID for schedules
  • Update email targets
  • Update default issue owners

Phase 5: Questionnaire AutoAssignments Update

Collections: questionnaires, questionnaireIndex Operations:

  • Update questionnaire-level assignments
  • Update category-level assignments
  • Update question-level assignments
  • Create new questionnaire versions
  • Update questionnaire indices

Rollback Implementation:

  • Questionnaire states are backed up in Phase 1 including versions array and latest pointer
  • On rollback, questionnaire indices are reverted to original versions
  • New questionnaire versions created during update are orphaned but not deleted (repository limitation)
  • Original questionnaire assignments are effectively restored through index reversion

Phase 6: Operation Completion

Collections: rollingOperations Operations:

  • Mark operation as completed
  • Record completion timestamp
  • Update progress metrics

Data Models

Collections Structure

erDiagram
    DEPARTMENTS ||--o{ SITES : "manages"
    DEPARTMENTS ||--|| USERS : "has manager"
    DEPARTMENTS ||--|| DEPARTMENT_INDEX : "indexed by"
    SITES ||--o{ SITE_SCHEDULE_INDEX : "has schedules"
    QUESTIONNAIRES ||--o{ QUESTIONNAIRE_INDEX : "versioned in"
    ROLLING_OPERATIONS ||--|| ORGANIZATIONS : "belongs to"

    DEPARTMENTS {
        string departmentID PK
        string organizationID
        string name
        string email
        boolean isManager
        string status
    }

    SITES {
        string siteID PK
        string organizationID
        object departments
        array departmentList
        string owner
        array supervisors
    }

    DEPARTMENT_INDEX {
        string departmentIndexID PK
        string departmentID FK
        array sites
        array users
    }

    SITE_SCHEDULE_INDEX {
        string scheduleID PK
        string siteID FK
        string departmentID FK
        array emailTargets
        string defaultIssueOwner
    }

    QUESTIONNAIRES {
        string questionnaireID PK
        object autoAssignmentV2
        string dateUpdated
    }

    QUESTIONNAIRE_INDEX {
        string questionnaireIndexID PK
        array versions
        string latest
    }

    ROLLING_OPERATIONS {
        string rollingOperationID PK
        string operationId UK
        string organizationID
        string status
        object siteOperation
        object progress
    }

Rollback System

RollbackStateManager Architecture

classDiagram
    class RollbackStateManager {
        -siteStates: Record~string, SiteState~
        -departmentIndexStates: Record~string, DepartmentIndexState~
        -siteScheduleIndexStates: Record~string, SiteScheduleIndexState~
        -questionnaireStates: Record~string, QuestionnaireState~
        +addSiteState(siteID, state)
        +addDepartmentIndexState(deptID, state)
        +addSiteScheduleIndexState(siteID, state)
        +addQuestionnaireState(qID, state)
        +getAllStates()
        +executeRollback()
    }

    class SiteState {
        departments: object
        departmentList: array
        owner: string
        supervisors: array
    }

    class DepartmentIndexState {
        sites: array
        users: array
    }

    class SiteScheduleIndexState {
        schedules: array
    }

    class QuestionnaireState {
        questionnaireID: string
        versions: array
        latest: string
        autoAssignmentV2: object
    }

    RollbackStateManager --> SiteState
    RollbackStateManager --> DepartmentIndexState
    RollbackStateManager --> SiteScheduleIndexState
    RollbackStateManager --> QuestionnaireState

Rollback Execution Flow

graph TD
    A[Error Detected] --> B[executeRollback]
    B --> C[Rollback Sites]
    B --> D[Rollback DepartmentIndex]
    B --> E[Rollback SiteScheduleIndex]
    B --> F[Rollback Questionnaires]

    C --> G[Restore department mappings]
    D --> H[Restore site assignments]
    E --> I[Restore schedule configs]
    F --> J[Revert questionnaire indices]

    G --> K[Mark Operation Failed]
    H --> K
    I --> K
    J --> K

    K --> L[Save Rollback Data]

Asynchronous Processing

Background Processing Flow

graph TB
    A[executeRolling] --> B{Immediate?}
    B -->|Yes| C[Create DB Record]
    B -->|No| D[Schedule Cloud Task]

    C --> E[processRollingAsync]
    E --> F[setImmediate]
    F --> G[Start Heartbeat]
    G --> H[processRolling]

    H --> I{Success?}
    I -->|Yes| J[Mark Completed]
    I -->|No| K[Execute Rollback]
    K --> L[Mark Failed]

    D --> M[Cloud Task Queue]
    M --> N[Task Execution]
    N --> H

    subgraph "Heartbeat System"
        G --> O[Every 30s]
        O --> P[Update Heartbeat]
        P --> O
    end

Security & Validation

Authorization Flow

graph TD
    A[API Request] --> B{JWT Valid?}
    B -->|No| C[401 Unauthorized]
    B -->|Yes| D{Role Check}
    D -->|Not Admin/Holder| E[403 Forbidden]
    D -->|Admin/Holder| F[Organization Check]
    F -->|Different Org| G[404 Not Found]
    F -->|Same Org| H[Process Request]

Validation Layers

  1. Route Level: JWT authentication middleware
  2. Controller Level: Request structure validation (TODO: add validator)
  3. Usecase Level:
    • Role authorization
    • Business logic validation
    • Entity existence checks
    • Relationship validation
  4. Repository Level: Data integrity constraints

Performance Characteristics

Metrics

MetricValueNotes
Sites/minute~20Estimated throughput
Heartbeat interval30sKeep-alive for long operations
Max operation time10 minCloud Task timeout
Concurrent operations1Per organization
Rollback time<30sAll phases reversible

Optimization Strategies

  1. Batch Processing: Process multiple sites in parallel where possible
  2. Async Execution: Non-blocking API responses
  3. State Caching: Minimize database reads during processing
  4. Selective Updates: Only update changed fields
  5. Index Usage: Leverage MongoDB indices for queries

Monitoring & Observability

Logging Points

graph LR
    A[Request Start] --> B[Authorization]
    B --> C[Validation]
    C --> D[Phase Start]
    D --> E[Phase Complete]
    E --> F[Next Phase]
    F --> D
    E --> G[Operation Complete]

    H[Error] --> I[Rollback Start]
    I --> J[Rollback Complete]
    J --> K[Operation Failed]

Key Metrics to Monitor

  1. Operation Metrics:

    • Total operations per day
    • Success/failure rate
    • Average duration
    • Sites processed per operation
  2. Performance Metrics:

    • Phase durations
    • Database query times
    • Memory usage
    • CPU utilization
  3. Error Metrics:

    • Error types and frequencies
    • Rollback success rate
    • Cloud Task failures
    • Timeout occurrences

Future Enhancements

Planned Features

  1. Batch Rolling Operations

    • Multiple department pairs in single operation
    • Parallel processing for independent transfers
  2. Partial Rollback

    • Checkpoint-based recovery
    • Resume from failure point
  3. Real-time Updates

    • WebSocket progress notifications
    • Live status dashboard
  4. Enhanced Validation

    • Pre-flight validation endpoint
    • Dry-run capability
  5. Audit Trail

    • Complete operation history
    • Change tracking per entity
    • Compliance reporting

Technical Debt

  1. Add validation middleware for rolling requests
  2. Implement questionnaire state reversion on rollback ✅ Completed
  3. Add operation metrics collection
  4. Enhance error recovery mechanisms
  5. Implement operation queueing system
  6. Add delete method to questionnaire repository for complete cleanup during rollback

Conclusion

The Manager Rolling Operations system provides a robust, fault-tolerant mechanism for transferring sites between manager departments. With comprehensive validation, rollback capabilities, and asynchronous processing, it ensures data integrity while maintaining system performance. The architecture supports both immediate and scheduled operations, with detailed monitoring and observability for production operations.