Domain IV: Manage AI Model Development and Evaluation — Study Game

How to Play

Pick a game mode and test yourself. Cover answers and try to recall before peeking.

GAME MODE 1: Rapid Fire Flashcards

The Two Gates (IV.5 + IV.6)

Card 1 — Front: What's the IV.5 gate question?

Answer: Is the prepared data quality sufficient to train on? End of Phase III.

Card 2 — Front: What's the IV.6 gate question?

Answer: Is the model ready to operate in production? End of Phase V.

Card 3 — Front: Distinguish III.8 from IV.5.

Answer: III.8 = end of Phase II ("do we have what we need?"). IV.5 = end of Phase III ("is prepared data sufficient to train?"). Different artifacts, different boundaries.

Card 4 — Front: What does QCBVR stand for?

Answer: Quality dimensions, Coverage of attributes, Bias within tolerance, Volume sufficient, Reproducibility verified (IV.5 criteria).

Card 5 — Front: What does PBRBARTAO stand for?

Answer: Performance, Bias, Robustness, Baseline, Audit, Reproducibility, Trustworthy AI, Operational fit (IV.6's 8 criteria).

Card 6 — Front: Three outcomes at IV.5 and IV.6 gates?

Answer: GO / ITERATE / DESCOPE.

Card 7 — Front: What does IV.6 GO authorize?

Answer: Domain V (deployment) work to begin.

Technique Selection (IV.1)

Card 8 — Front: What's the PM's role in IV.1?

Answer: Oversee — ensure technique is documented, justified against AI pattern + success criteria, aligned with operational constraints. PM doesn't pick the technique.

Card 9 — Front: Three ML categories?

Answer: Supervised, Unsupervised, Reinforcement.

Card 10 — Front: Difference between algorithm and model?

Answer: Algorithm = procedure. Model = trained artifact. You train an algorithm to produce a model.

Card 11 — Front: Three patterns of pretrained AI?

Answer: Pretrained model (adapt for task), Foundation model (very large pretrained), GenAI (generates new content).

Card 12 — Front: What's transfer learning?

Answer: Pretrained + fine-tune on your task data.

Card 13 — Front: What's RAG?

Answer: Retrieval-Augmented Generation — retrieve relevant context + generate from foundation model.

Training (IV.3)

Card 14 — Front: What does DTHR stand for in training triage?

Answer: Data, Technique, Hardware, Results — review when training overruns.

Card 15 — Front: Overfit vs underfit?

Answer: Overfit = memorizes training data, fails on new. Underfit = doesn't learn even on training.

Card 16 — Front: Typical train/validation/test split?

Answer: ~70%/15%/15%.

Card 17 — Front: What does generalization mean?

Answer: Model performs well on data it hasn't seen — the goal of training.

Data Preparation (IV.4)

Card 18 — Front: What does TRIM stand for?

Answer: Transform formats, Reconcile inconsistencies, Impute missing values, Map fields.

Card 19 — Front: What % of project time is typically spent on data prep?

Answer: 70-80%.

QA/QC (IV.2)

Card 20 — Front: What does IV.2 QA/QC cover?

Answer: Configuration management + performance verification + bias measurement + documentation throughout development.

Card 21 — Front: Three transparency dimensions?

Answer: Systemic (how built), Decision (why this prediction), Algorithmic (algorithm-level).

Card 22 — Front: XAI vs Interpretability?

Answer: XAI = post-hoc explain any model. Interpretability = inherently understandable models. High-stakes prefers interpretability.

GAME MODE 2: Scenario Showdown — What Should the PM Do?

Scenario 1: The Training Overrun

Training planned for 2 days
Already running 5 days
Data scientist says "one more day should do it"

Reveal

Pause training. Conduct DTHR (Data/Technique/Hardware/Results) root-cause review. Document decision: continue, change approach, or escalate. 2.5x overrun = project event, not technical hiccup.

Scenario 2: The Black-Box Healthcare Decision

Data scientist proposes deep learning for high-stakes medical-imaging classification
Healthcare client requires AI decisions be explainable

Reveal

Document technique selection; ensure trade-off between performance and explainability is presented to stakeholders for decision; consider interpretable-by-design alternatives. IV.1 + Domain I.2 cross-pull.

Scenario 3: The Operational Mismatch

IV.6 review: model performance meets success criteria
Chosen technique requires GPU compute
Production environment is CPU-only

Reveal

ITERATE — operational fit failure (PBRBARTAO criterion). Loop back to V.1 (infrastructure) or IV.1 (technique change) with stakeholder decision.

Scenario 4: The Bias Discovery During QA

QA/QC reveals demographic bias in recommendation model
Data scientist suggests fairness post-processing layer

Reveal

Treat as IV.2 + I.3 issue: document, escalate per accountability, engage stakeholders for remediation, do not authorize IV.6 GO until bias within tolerance.

Scenario 5: The Parallel Work Request

Phase IV complete
Data scientist asks to begin Domain V work in parallel with IV.6 gate

Reveal

Confirm IV.6 must complete before Domain V work begins — sequential, not parallel. Gate authorizes the transition.

Scenario 6: The Plateau Concern

Training completed; loss curve plateaued early
Data scientist reports "acceptable" final accuracy

Reveal

Coordinate investigation of early plateau (data quality, technique fit, hyperparameter tuning). Engage IV.2 QA/QC. Don't blindly accept "acceptable" without root-cause.

Scenario 7: The Iteration Trap

4 training runs over 2 weeks
Each improving slightly but not meeting criteria
Data scientist suggests 5th iteration

Reveal

Pause and review iteration trajectory: are improvements converging or plateauing? Is the technique a fit? Is data sufficient? Document decision: continue, change technique, descope, or escalate.

Scenario 8: The Quality Gate with Bias

IV.5: all quality dimensions met
Bias measurement reveals demographic disparity
Data scientist suggests proceeding and addressing at training time

Reveal

ITERATE — bias is QCBVR criterion. Outside tolerance = no GO. Loop to IV.4 to remediate or III.1 to redefine.

GAME MODE 3: Pattern Match Challenge

#	Scenario	ECO Task
1	Overseeing model technique selection	IV.1
2	Overseeing model QA/QC	IV.2
3	Managing model training execution	IV.3
4	Managing data transformation	IV.4
5	Verifying data quality (gate)	IV.5
6	Verifying model ready for ops (gate)	IV.6
7	The PBRBARTAO gate	IV.6
8	The QCBVR gate	IV.5
9	DTHR triage when training overruns	IV.3
10	TRIM categories of data prep	IV.4

Scoring: 9-10 = Expert | 7-8 = Solid | Below 7 = Review

GAME MODE 4: Fill-in-the-Blank Speed Round

IV.5 evaluates ________ Coverage Bias Volume Reproducibility (QCBVR).
IV.6 evaluates Performance Bias ________ Baseline Audit Reproducibility Trustworthy-AI Operational-fit (PBRBARTAO).
III.8 = "do we have what we need?" IV.5 = "is the ________ data sufficient to train?"
AutoML automates the technical decision but doesn't replace ________ documentation (IV.1).
The PM doesn't pick the technique — the ________ does.
Training overrun by 2.5x = project event. Apply DTHR review: Data, Technique, ________, Results.
~70-80% of project time is typically spent on ________ ________.
IV.6 GO authorizes ________ ________ work to begin.
RAG = ________-Augmented Generation.
Reproducibility means same data + same pipeline = ________ output.

Reveal answers

Quality
Robustness
prepared
governance
data scientist
Hardware
data preparation
Domain V
Retrieval
same

GAME MODE 5: True or False Lightning Round

#	Statement	Correct
1	III.8 and IV.5 are the same gate	FALSE — different gates at adjacent boundaries
2	The PM picks the model technique	FALSE — data scientist picks; PM oversees governance
3	IV.6 has 8 criteria (PBRBARTAO)	TRUE
4	Domain V work can begin in parallel with IV.6 gate	FALSE — sequential
5	AutoML bypasses IV.1 documentation requirement	FALSE — automation ≠ governance
6	Operational fit is part of IV.6 PBRBARTAO	TRUE
7	Production validation substitutes for IV.6 evaluation	FALSE — gate is pre-deployment
8	Reproducibility means inference reproducibility only	FALSE — training reproducibility too
9	The PM declares IV.6 GO unilaterally	FALSE — multi-stakeholder sign-off
10	A failed contingency test still satisfies V.7 if documented	FALSE — V.7 requires tested plans
11	Performance vs baseline is part of IV.6	TRUE
12	"Reflecting real-world differences" excuses bias	FALSE — amplification/perpetuation matter

Scoring: 11-12 = Exam ready | 9-10 = Almost | <9 = Review

GAME MODE 6: Mnemonic Speed Recall

Mnemonic	Expand it
TRIM	Transform, Reconcile, Impute, Map (data prep)
QCBVR	Quality, Coverage, Bias, Volume, Reproducibility (IV.5)
DTHR	Data, Technique, Hardware, Results (training triage)
PBRBARTAO	Performance, Bias, Robustness, Baseline, Audit, Reproducibility, Trustworthy AI, Operational fit (IV.6)
3 ML Categories	Supervised, Unsupervised, Reinforcement
Algorithm vs Model	Algorithm = procedure. Model = trained artifact.
3 Gates	III.8 (data ↔ needs) · IV.5 (prepared data quality) · IV.6 (model ↔ ops)
Overfit vs Underfit	Overfit = memorize. Underfit = doesn't learn.

Scoring Summary

Mode	Score	Max
Flashcards	___/22	22
Scenarios	___/8	8
Pattern Match	___/10	10
Fill-in	___/10	10
True/False	___/12	12
Mnemonic	___/8	8
TOTAL	___/70	70

Rating: 60+ = mastered · 45-59 = strong · 30-44 = review · <30 = re-study.