Manager Rolling Operations Architecture
Overview
The Manager Rolling Operations system provides a robust, fault-tolerant mechanism for transferring sites between manager departments. This document describes the high-level architecture, data flows, and system components involved in rolling operations.
System Architecture
graph TB subgraph "Client Layer" UI[Web UI/Admin Portal] end subgraph "API Gateway" Route["/departments/managers/departments/rolling/*"] end subgraph "Controller Layer" MC[ManagerController] end subgraph "Business Logic Layer" MRU[ManagerRollingUsecase] MU[ManagerUsecase] RSM[RollbackStateManager] end subgraph "Repository Layer" ROR[RollingOperationRepo] DR[DepartmentRepo] SR[SiteRepo] DIR[DepartmentIndexRepo] SSR[SiteScheduleIndexRepo] QR[QuestionnaireRepo] QIR[QuestionnaireIndexRepo] UR[UserRepo] CTR[CloudTaskRepo] end subgraph "Data Layer" MongoDB[(MongoDB Collections)] GCT[Google Cloud Tasks] end UI --> Route Route --> MC MC --> MRU MC --> MU MRU --> RSM MRU --> ROR MRU --> DR MRU --> SR MRU --> DIR MRU --> SSR MRU --> QR MRU --> QIR MRU --> UR MRU --> CTR ROR --> MongoDB DR --> MongoDB SR --> MongoDB DIR --> MongoDB SSR --> MongoDB QR --> MongoDB QIR --> MongoDB UR --> MongoDB CTR --> GCT
API Endpoints
Manager Department APIs
| Endpoint | Method | Purpose | Collections Touched |
|---|---|---|---|
/departments/managers | GET | List manager departments | departments, sites, users |
/departments/managers | POST | Create manager department | departments, departmentIndex, sites, users |
/departments/managers/:id | GET | Get department details | departments, sites, users |
/departments/managers/:id | PUT | Update department | departments, departmentIndex, questionnaires |
/departments/managers/:id/toggle-status | PUT | Toggle status | departments, departmentIndex |
/departments/managers/:id/sites | GET | Get department sites | sites, departments |
/departments/managers/department/makemanager/:id | PUT | Make department manager | departments |
/departments/managers/departments/makemanager | POST | Bulk convert departments | departments |
/departments/managers/users/makemanager/:id | PUT | Make user manager | users |
/departments/managers/users/makemanager | POST | Bulk convert users | users |
Rolling Operation APIs
| Endpoint | Method | Purpose | Collections Touched |
|---|---|---|---|
/departments/managers/departments/rolling/execute | POST | Execute rolling | All (see phases below) |
/departments/managers/departments/rolling/status | GET | Get current status | rollingOperations |
/departments/managers/departments/rolling/eligible | POST | Get eligible departments | sites, departments |
/departments/managers/departments/rolling/status/:id | GET | Get operation status | rollingOperations |
/departments/managers/departments/rolling/:id/cancel | DELETE | Cancel operation | rollingOperations, cloudTasks |
Rolling Operation Flow
sequenceDiagram participant Client participant API participant Controller participant Usecase participant Repos participant DB participant CloudTask Client->>API: POST /rolling/execute API->>Controller: executeManagerRolling() Controller->>Usecase: executeRolling() alt Immediate Execution Usecase->>DB: Create operation record Usecase->>Usecase: processRollingAsync() Usecase-->>Controller: Return operationID Controller-->>Client: { status: "processing" } Note over Usecase: Background Processing Usecase->>Usecase: processRolling() loop 6 Phases Usecase->>Repos: Update collections Repos->>DB: Persist changes end else Scheduled Execution Usecase->>CloudTask: Schedule task CloudTask-->>Usecase: Task name Usecase-->>Controller: Return operationID Controller-->>Client: { status: "queued" } end
Processing Phases
Phase Architecture
graph LR subgraph "Phase 1: State Collection" CS[Collect Current State] CS --> SS[Sites State] CS --> DS[DepartmentIndex State] CS --> SCS[SiteScheduleIndex State] CS --> QS[Questionnaire State] end subgraph "Phase 2-5: Updates" US[Update Sites] UD[Update DepartmentIndex] USS[Update SiteScheduleIndex] UQ[Update Questionnaires] end subgraph "Phase 6: Completion" MC[Mark Complete] end CS --> US US --> UD UD --> USS USS --> UQ UQ --> MC
Phase Details
Phase 1: Current State Collection
Purpose: Backup all data that will be modified for potential rollback
graph TB subgraph "State Collection" A[collectCurrentState] --> B[Sites Collection] A --> C[DepartmentIndex Collection] A --> D[SiteScheduleIndex Collection] A --> E[Questionnaire Collection] B --> B1[departments mapping] B --> B2[departmentList array] B --> B3[owner field] B --> B4[supervisors array] C --> C1[Roll-out dept sites] C --> C2[Roll-in dept sites] C --> C3[User assignments] D --> D1[Schedule assignments] D --> D2[Email targets] D --> D3[Default issue owner] E --> E1[Auto-assignments] E --> E2[Questionnaire versions] E --> E3[Latest pointers] end
Phase 2: Sites Collection Update
Collections: sites
Operations:
- Transfer department ownership from roll-out to roll-in
- Update departmentList arrays
- Maintain site integrity
Phase 3: DepartmentIndex Collection Update
Collections: departmentIndex
Operations:
- Remove sites from roll-out department
- Add sites to roll-in department
- Create index if doesn’t exist
Phase 4: SiteScheduleIndex Collection Update
Collections: siteScheduleIndex
Operations:
- Update departmentID for schedules
- Update email targets
- Update default issue owners
Phase 5: Questionnaire AutoAssignments Update
Collections: questionnaires, questionnaireIndex
Operations:
- Update questionnaire-level assignments
- Update category-level assignments
- Update question-level assignments
- Create new questionnaire versions
- Update questionnaire indices
Rollback Implementation:
- Questionnaire states are backed up in Phase 1 including versions array and latest pointer
- On rollback, questionnaire indices are reverted to original versions
- New questionnaire versions created during update are orphaned but not deleted (repository limitation)
- Original questionnaire assignments are effectively restored through index reversion
Phase 6: Operation Completion
Collections: rollingOperations
Operations:
- Mark operation as completed
- Record completion timestamp
- Update progress metrics
Data Models
Collections Structure
erDiagram DEPARTMENTS ||--o{ SITES : "manages" DEPARTMENTS ||--|| USERS : "has manager" DEPARTMENTS ||--|| DEPARTMENT_INDEX : "indexed by" SITES ||--o{ SITE_SCHEDULE_INDEX : "has schedules" QUESTIONNAIRES ||--o{ QUESTIONNAIRE_INDEX : "versioned in" ROLLING_OPERATIONS ||--|| ORGANIZATIONS : "belongs to" DEPARTMENTS { string departmentID PK string organizationID string name string email boolean isManager string status } SITES { string siteID PK string organizationID object departments array departmentList string owner array supervisors } DEPARTMENT_INDEX { string departmentIndexID PK string departmentID FK array sites array users } SITE_SCHEDULE_INDEX { string scheduleID PK string siteID FK string departmentID FK array emailTargets string defaultIssueOwner } QUESTIONNAIRES { string questionnaireID PK object autoAssignmentV2 string dateUpdated } QUESTIONNAIRE_INDEX { string questionnaireIndexID PK array versions string latest } ROLLING_OPERATIONS { string rollingOperationID PK string operationId UK string organizationID string status object siteOperation object progress }
Rollback System
RollbackStateManager Architecture
classDiagram class RollbackStateManager { -siteStates: Record~string, SiteState~ -departmentIndexStates: Record~string, DepartmentIndexState~ -siteScheduleIndexStates: Record~string, SiteScheduleIndexState~ -questionnaireStates: Record~string, QuestionnaireState~ +addSiteState(siteID, state) +addDepartmentIndexState(deptID, state) +addSiteScheduleIndexState(siteID, state) +addQuestionnaireState(qID, state) +getAllStates() +executeRollback() } class SiteState { departments: object departmentList: array owner: string supervisors: array } class DepartmentIndexState { sites: array users: array } class SiteScheduleIndexState { schedules: array } class QuestionnaireState { questionnaireID: string versions: array latest: string autoAssignmentV2: object } RollbackStateManager --> SiteState RollbackStateManager --> DepartmentIndexState RollbackStateManager --> SiteScheduleIndexState RollbackStateManager --> QuestionnaireState
Rollback Execution Flow
graph TD A[Error Detected] --> B[executeRollback] B --> C[Rollback Sites] B --> D[Rollback DepartmentIndex] B --> E[Rollback SiteScheduleIndex] B --> F[Rollback Questionnaires] C --> G[Restore department mappings] D --> H[Restore site assignments] E --> I[Restore schedule configs] F --> J[Revert questionnaire indices] G --> K[Mark Operation Failed] H --> K I --> K J --> K K --> L[Save Rollback Data]
Asynchronous Processing
Background Processing Flow
graph TB A[executeRolling] --> B{Immediate?} B -->|Yes| C[Create DB Record] B -->|No| D[Schedule Cloud Task] C --> E[processRollingAsync] E --> F[setImmediate] F --> G[Start Heartbeat] G --> H[processRolling] H --> I{Success?} I -->|Yes| J[Mark Completed] I -->|No| K[Execute Rollback] K --> L[Mark Failed] D --> M[Cloud Task Queue] M --> N[Task Execution] N --> H subgraph "Heartbeat System" G --> O[Every 30s] O --> P[Update Heartbeat] P --> O end
Security & Validation
Authorization Flow
graph TD A[API Request] --> B{JWT Valid?} B -->|No| C[401 Unauthorized] B -->|Yes| D{Role Check} D -->|Not Admin/Holder| E[403 Forbidden] D -->|Admin/Holder| F[Organization Check] F -->|Different Org| G[404 Not Found] F -->|Same Org| H[Process Request]
Validation Layers
- Route Level: JWT authentication middleware
- Controller Level: Request structure validation (TODO: add validator)
- Usecase Level:
- Role authorization
- Business logic validation
- Entity existence checks
- Relationship validation
- Repository Level: Data integrity constraints
Performance Characteristics
Metrics
| Metric | Value | Notes |
|---|---|---|
| Sites/minute | ~20 | Estimated throughput |
| Heartbeat interval | 30s | Keep-alive for long operations |
| Max operation time | 10 min | Cloud Task timeout |
| Concurrent operations | 1 | Per organization |
| Rollback time | <30s | All phases reversible |
Optimization Strategies
- Batch Processing: Process multiple sites in parallel where possible
- Async Execution: Non-blocking API responses
- State Caching: Minimize database reads during processing
- Selective Updates: Only update changed fields
- Index Usage: Leverage MongoDB indices for queries
Monitoring & Observability
Logging Points
graph LR A[Request Start] --> B[Authorization] B --> C[Validation] C --> D[Phase Start] D --> E[Phase Complete] E --> F[Next Phase] F --> D E --> G[Operation Complete] H[Error] --> I[Rollback Start] I --> J[Rollback Complete] J --> K[Operation Failed]
Key Metrics to Monitor
-
Operation Metrics:
- Total operations per day
- Success/failure rate
- Average duration
- Sites processed per operation
-
Performance Metrics:
- Phase durations
- Database query times
- Memory usage
- CPU utilization
-
Error Metrics:
- Error types and frequencies
- Rollback success rate
- Cloud Task failures
- Timeout occurrences
Future Enhancements
Planned Features
-
Batch Rolling Operations
- Multiple department pairs in single operation
- Parallel processing for independent transfers
-
Partial Rollback
- Checkpoint-based recovery
- Resume from failure point
-
Real-time Updates
- WebSocket progress notifications
- Live status dashboard
-
Enhanced Validation
- Pre-flight validation endpoint
- Dry-run capability
-
Audit Trail
- Complete operation history
- Change tracking per entity
- Compliance reporting
Technical Debt
- Add validation middleware for rolling requests
- Implement questionnaire state reversion on rollback ✅ Completed
- Add operation metrics collection
- Enhance error recovery mechanisms
- Implement operation queueing system
- Add delete method to questionnaire repository for complete cleanup during rollback
Conclusion
The Manager Rolling Operations system provides a robust, fault-tolerant mechanism for transferring sites between manager departments. With comprehensive validation, rollback capabilities, and asynchronous processing, it ensures data integrity while maintaining system performance. The architecture supports both immediate and scheduled operations, with detailed monitoring and observability for production operations.