Domain IV: Manage AI Model Development and Evaluation — Study Game
How to Play
Pick a game mode and test yourself. Cover answers and try to recall before peeking.
GAME MODE 1: Rapid Fire Flashcards
The Two Gates (IV.5 + IV.6)
Card 1 — Front: What's the IV.5 gate question?Answer: Is the prepared data quality sufficient to train on? End of Phase III.Card 2 — Front: What's the IV.6 gate question?
Answer: Is the model ready to operate in production? End of Phase V.Card 3 — Front: Distinguish III.8 from IV.5.
Answer: III.8 = end of Phase II ("do we have what we need?"). IV.5 = end of Phase III ("is prepared data sufficient to train?"). Different artifacts, different boundaries.Card 4 — Front: What does QCBVR stand for?
Answer: Quality dimensions, Coverage of attributes, Bias within tolerance, Volume sufficient, Reproducibility verified (IV.5 criteria).Card 5 — Front: What does PBRBARTAO stand for?
Answer: Performance, Bias, Robustness, Baseline, Audit, Reproducibility, Trustworthy AI, Operational fit (IV.6's 8 criteria).Card 6 — Front: Three outcomes at IV.5 and IV.6 gates?
Answer: GO / ITERATE / DESCOPE.Card 7 — Front: What does IV.6 GO authorize?
Answer: Domain V (deployment) work to begin.
Technique Selection (IV.1)
Card 8 — Front: What's the PM's role in IV.1?Answer: Oversee — ensure technique is documented, justified against AI pattern + success criteria, aligned with operational constraints. PM doesn't pick the technique.Card 9 — Front: Three ML categories?
Answer: Supervised, Unsupervised, Reinforcement.Card 10 — Front: Difference between algorithm and model?
Answer: Algorithm = procedure. Model = trained artifact. You train an algorithm to produce a model.Card 11 — Front: Three patterns of pretrained AI?
Answer: Pretrained model (adapt for task), Foundation model (very large pretrained), GenAI (generates new content).Card 12 — Front: What's transfer learning?
Answer: Pretrained + fine-tune on your task data.Card 13 — Front: What's RAG?
Answer: Retrieval-Augmented Generation — retrieve relevant context + generate from foundation model.
Training (IV.3)
Card 14 — Front: What does DTHR stand for in training triage?Answer: Data, Technique, Hardware, Results — review when training overruns.Card 15 — Front: Overfit vs underfit?
Answer: Overfit = memorizes training data, fails on new. Underfit = doesn't learn even on training.Card 16 — Front: Typical train/validation/test split?
Answer: ~70%/15%/15%.Card 17 — Front: What does generalization mean?
Answer: Model performs well on data it hasn't seen — the goal of training.
Data Preparation (IV.4)
Card 18 — Front: What does TRIM stand for?Answer: Transform formats, Reconcile inconsistencies, Impute missing values, Map fields.Card 19 — Front: What % of project time is typically spent on data prep?
Answer: 70-80%.
QA/QC (IV.2)
Card 20 — Front: What does IV.2 QA/QC cover?Answer: Configuration management + performance verification + bias measurement + documentation throughout development.Card 21 — Front: Three transparency dimensions?
Answer: Systemic (how built), Decision (why this prediction), Algorithmic (algorithm-level).Card 22 — Front: XAI vs Interpretability?
Answer: XAI = post-hoc explain any model. Interpretability = inherently understandable models. High-stakes prefers interpretability.
GAME MODE 2: Scenario Showdown — What Should the PM Do?
Scenario 1: The Training Overrun
- Training planned for 2 days
- Already running 5 days
- Data scientist says "one more day should do it"
Reveal
Pause training. Conduct DTHR (Data/Technique/Hardware/Results) root-cause review. Document decision: continue, change approach, or escalate. 2.5x overrun = project event, not technical hiccup.Scenario 2: The Black-Box Healthcare Decision
- Data scientist proposes deep learning for high-stakes medical-imaging classification
- Healthcare client requires AI decisions be explainable
Reveal
Document technique selection; ensure trade-off between performance and explainability is presented to stakeholders for decision; consider interpretable-by-design alternatives. IV.1 + Domain I.2 cross-pull.Scenario 3: The Operational Mismatch
- IV.6 review: model performance meets success criteria
- Chosen technique requires GPU compute
- Production environment is CPU-only
Reveal
ITERATE — operational fit failure (PBRBARTAO criterion). Loop back to V.1 (infrastructure) or IV.1 (technique change) with stakeholder decision.Scenario 4: The Bias Discovery During QA
- QA/QC reveals demographic bias in recommendation model
- Data scientist suggests fairness post-processing layer
Reveal
Treat as IV.2 + I.3 issue: document, escalate per accountability, engage stakeholders for remediation, do not authorize IV.6 GO until bias within tolerance.Scenario 5: The Parallel Work Request
- Phase IV complete
- Data scientist asks to begin Domain V work in parallel with IV.6 gate
Reveal
Confirm IV.6 must complete before Domain V work begins — sequential, not parallel. Gate authorizes the transition.Scenario 6: The Plateau Concern
- Training completed; loss curve plateaued early
- Data scientist reports "acceptable" final accuracy
Reveal
Coordinate investigation of early plateau (data quality, technique fit, hyperparameter tuning). Engage IV.2 QA/QC. Don't blindly accept "acceptable" without root-cause.Scenario 7: The Iteration Trap
- 4 training runs over 2 weeks
- Each improving slightly but not meeting criteria
- Data scientist suggests 5th iteration
Reveal
Pause and review iteration trajectory: are improvements converging or plateauing? Is the technique a fit? Is data sufficient? Document decision: continue, change technique, descope, or escalate.Scenario 8: The Quality Gate with Bias
- IV.5: all quality dimensions met
- Bias measurement reveals demographic disparity
- Data scientist suggests proceeding and addressing at training time
Reveal
ITERATE — bias is QCBVR criterion. Outside tolerance = no GO. Loop to IV.4 to remediate or III.1 to redefine.GAME MODE 3: Pattern Match Challenge
| # | Scenario | ECO Task |
|---|---|---|
| 1 | Overseeing model technique selection | IV.1 |
| 2 | Overseeing model QA/QC | IV.2 |
| 3 | Managing model training execution | IV.3 |
| 4 | Managing data transformation | IV.4 |
| 5 | Verifying data quality (gate) | IV.5 |
| 6 | Verifying model ready for ops (gate) | IV.6 |
| 7 | The PBRBARTAO gate | IV.6 |
| 8 | The QCBVR gate | IV.5 |
| 9 | DTHR triage when training overruns | IV.3 |
| 10 | TRIM categories of data prep | IV.4 |
GAME MODE 4: Fill-in-the-Blank Speed Round
- IV.5 evaluates ________ Coverage Bias Volume Reproducibility (QCBVR).
- IV.6 evaluates Performance Bias ________ Baseline Audit Reproducibility Trustworthy-AI Operational-fit (PBRBARTAO).
- III.8 = "do we have what we need?" IV.5 = "is the ________ data sufficient to train?"
- AutoML automates the technical decision but doesn't replace ________ documentation (IV.1).
- The PM doesn't pick the technique — the ________ does.
- Training overrun by 2.5x = project event. Apply DTHR review: Data, Technique, ________, Results.
- ~70-80% of project time is typically spent on ________ ________.
- IV.6 GO authorizes ________ ________ work to begin.
- RAG = ________-Augmented Generation.
- Reproducibility means same data + same pipeline = ________ output.
Reveal answers
- Quality
- Robustness
- prepared
- governance
- data scientist
- Hardware
- data preparation
- Domain V
- Retrieval
- same
GAME MODE 5: True or False Lightning Round
| # | Statement | Correct |
|---|---|---|
| 1 | III.8 and IV.5 are the same gate | FALSE — different gates at adjacent boundaries |
| 2 | The PM picks the model technique | FALSE — data scientist picks; PM oversees governance |
| 3 | IV.6 has 8 criteria (PBRBARTAO) | TRUE |
| 4 | Domain V work can begin in parallel with IV.6 gate | FALSE — sequential |
| 5 | AutoML bypasses IV.1 documentation requirement | FALSE — automation ≠ governance |
| 6 | Operational fit is part of IV.6 PBRBARTAO | TRUE |
| 7 | Production validation substitutes for IV.6 evaluation | FALSE — gate is pre-deployment |
| 8 | Reproducibility means inference reproducibility only | FALSE — training reproducibility too |
| 9 | The PM declares IV.6 GO unilaterally | FALSE — multi-stakeholder sign-off |
| 10 | A failed contingency test still satisfies V.7 if documented | FALSE — V.7 requires tested plans |
| 11 | Performance vs baseline is part of IV.6 | TRUE |
| 12 | "Reflecting real-world differences" excuses bias | FALSE — amplification/perpetuation matter |
GAME MODE 6: Mnemonic Speed Recall
| Mnemonic | Expand it |
|---|---|
| TRIM | Transform, Reconcile, Impute, Map (data prep) |
| QCBVR | Quality, Coverage, Bias, Volume, Reproducibility (IV.5) |
| DTHR | Data, Technique, Hardware, Results (training triage) |
| PBRBARTAO | Performance, Bias, Robustness, Baseline, Audit, Reproducibility, Trustworthy AI, Operational fit (IV.6) |
| 3 ML Categories | Supervised, Unsupervised, Reinforcement |
| Algorithm vs Model | Algorithm = procedure. Model = trained artifact. |
| 3 Gates | III.8 (data ↔ needs) · IV.5 (prepared data quality) · IV.6 (model ↔ ops) |
| Overfit vs Underfit | Overfit = memorize. Underfit = doesn't learn. |
Scoring Summary
| Mode | Score | Max |
|---|---|---|
| Flashcards | ___/22 | 22 |
| Scenarios | ___/8 | 8 |
| Pattern Match | ___/10 | 10 |
| Fill-in | ___/10 | 10 |
| True/False | ___/12 | 12 |
| Mnemonic | ___/8 | 8 |
| TOTAL | ___/70 | 70 |