Domain III: Identify Data Needs — Study Game
How to Play
Pick a game mode and test yourself. Cover the answers and try to recall before peeking.
GAME MODE 1: Rapid Fire Flashcards
Defining Required Data (III.1)
Card 1 — Front: What's the output of ECO Task III.1?Answer: A documented data requirements specification (not a dataset).Card 2 — Front: What does the DRIP mnemonic stand for?
Answer: Determine pattern, Required attributes, Identify sources, Plan aggregation.Card 3 — Front: Required data is defined ____ collection, not during it.
Answer: before.Card 4 — Front: What drives the data type required?
Answer: The AI pattern from Phase I (Recognition→images, Predictive→structured, Conversational→text).
Sources, SMEs, Infrastructure (III.2-III.5)
Card 5 — Front: What's the SCALE mnemonic for source identification?Answer: Source type, Cost & access, Accuracy & cadence, Legal/license, Endpoint.Card 6 — Front: Distinguish data steward from data custodian.
Answer: Steward = strategic, policy-enforcing, cross-functional. Custodian = operational, NOT the data owner.Card 7 — Front: What does ECO III.4 cover?
Answer: Coordinate AI workspace and infrastructure (compute, storage, pipelines, environments, security, access).Card 8 — Front: Who executes data gathering in III.5?
Answer: Data engineering team. PM coordinates and tracks.
Privacy, Compliance, Bias (III.6 + cross-pulls)
Card 9 — Front: Privacy/compliance checks belong in which phase?Answer: Phase II (Data Understanding) per ECO III.6 — NOT Phase III.Card 10 — Front: Difference between anonymization and pseudonymization?
Answer: Anonymization is irreversible (no longer PII). Pseudonymization is reversible (still PII under GDPR).Card 11 — Front: Three types of bias in AI?
Answer: Neural-network (math), Variance (fitting), Informational (fairness — exam one). Mnemonic: NVI.Card 12 — Front: Three types of informational bias?
Answer: Reporting (some aspects recorded), Recall (recent vs old data weighted), Classification (data categorized to misrepresent groups).
The 4 Vs of Big Data
Card 13 — Front: What are the 4 Vs?Answer: Volume, Velocity, Variety, Veracity. Mnemonic: VVVV.Card 14 — Front: Volume challenge in big data?
Answer: Massive amounts spread across locations (we're in zettabyte era).Card 15 — Front: Velocity challenge?
Answer: Rapidly changing data OR moving data quickly between locations.Card 16 — Front: Variety challenge?
Answer: Different formats — structured, unstructured, semistructured. One system can't handle all three.Card 17 — Front: Veracity challenge?
Answer: Different levels of quality, accuracy, trustworthiness. Hard to assess at scale.
Data Quality Dimensions
Card 18 — Front: What does ACCTUVI stand for?Answer: Accuracy, Completeness, Consistency, Timeliness, Uniqueness, Validity, Integrity. "A Cat Caught Two Unwary Voles Inside."
Data Types
Card 19 — Front: What % of organizational data is unstructured?Answer: ~80%.Card 20 — Front: Three data type categories?
Answer: Structured (defined schema), Unstructured (no schema), Semi-structured (partial schema).Card 21 — Front: What's training data?
Answer: Prepared, cleaned, labeled data used to train an ML model.Card 22 — Front: What's ground truth data?
Answer: Definitive reference data the model is measured against.
The Gate (III.8)
Card 23 — Front: Three areas evaluated at III.8 gate?Answer: Sources, Description, Quality. Mnemonic: SDQ.Card 24 — Front: Three outcomes at III.8?
Answer: GO (proceed to Phase III) / ITERATE (loop back) / DESCOPE (reduce scope).Card 25 — Front: PMI's key gate concept?
Answer: "If you can confidently say 'We have the data and know the problem,' move to Phase III. If not, pause."
Iteration Triggers
Card 26 — Front: How many iteration triggers does PMI document for Phase II → Phase I loops?Answer: 12 distinct scenarios (business shift, infeasible data, wrong type, etc.).Card 27 — Front: Is iterating back to a prior phase a project failure?
Answer: No — it's methodology-correct. CPMAI's iterative design specifically allows backing up "without penalty."
Conveying to Leadership (III.9)
Card 28 — Front: What's the III.9 deliverable?Answer: Leadership briefing covering data state, gate decision, key risks, recommendations, scope/schedule impacts.Card 29 — Front: Is III.9 optional?
Answer: No — mandatory leadership communication before Phase III work begins.Card 30 — Front: What's the cross-pull from III.9 → V.5?
Answer: III.9 conveyance feeds Domain V's final report; gate decisions become lessons-learned content.
GAME MODE 2: Scenario Showdown — What Should the PM Do?
Scenario 1: The Pattern Mismatch
- Team is defining required data for a project where Phase I selected the Recognition AI pattern
- Available data is structured tabular, not labeled images
- Data scientist suggests changing the pattern to Predictive Analytics
- Team is in Phase II
Reveal
Loop back to Phase I (II.7) to revisit the AI pattern selection with stakeholders. Pattern is a Phase I deliverable; changing it inside Domain III bypasses governance. Engage stakeholders, document the revision, assess scope impact (II.4).Scenario 2: The Compliance Surprise
- BAA required under HIPAA before partner data sharing
- Legal estimates 6 weeks for execution
- Project schedule wants to start data gathering immediately
- Stakeholders not yet engaged
Reveal
Document the delay as a risk, escalate to leadership with options, engage stakeholders. Cross-pull III.6 + III.8 + I.4. Options: proceed-and-wait, iterate to alternative sources, descope. Don't proceed unilaterally.Scenario 3: The Volume Shortfall
- Required volume is 1M records; team gathered 200K
- Data scientist suggests synthetic data augmentation
- Mid-Phase II
- The original technique requires high data volume
Reveal
Pause and engage stakeholders. PMI iteration trigger #4/5. Options: synthetic data (with QA + governance), additional sourcing, technique change, descope. Document decision before proceeding.Scenario 4: The Quality Gate
- All 7 ACCTUVI dimensions met
- Privacy plan not signed off by legal
- Team wants GO at III.8
- Stakeholder pressure to keep schedule
Reveal
ITERATE — III.8 includes trustworthy-AI alignment. Privacy plan sign-off is governance gap (I.1 cross-pull). Don't GO with conditional sign-offs that defer governance.Scenario 5: The Source Disappears
- Mid-gathering, an internal source is decommissioned
- Data scientist suggests workaround
- 25% of required data was on this source
Reveal
Loop back to III.3 to identify alternative sources. Iteration trigger. Document inventory revision and assess III.8 gate impact.Scenario 6: The Bias Discovery
- Required dataset shows demographic skew during evaluation
- Success criteria from II.8 require fairness across user segments
- Data scientist suggests proceeding and adjusting at training time
Reveal
Cross-domain issue: III.7 + I.3 + II.8. Document, engage stakeholders for mitigation options (re-source, rebalance training data, in-processing fairness, post-processing calibration). Don't approve "fix at training time" without governance.Scenario 7: The PoC Confusion
- Mid-Phase II, the team realizes they're doing a Proof-of-Concept (PoC), not a Pilot
- Data is sufficient for technical demonstration but not production value proof
- Data scientist proposes proceeding as if it's a pilot
Reveal
PMI iteration trigger #12 (PoC vs Pilot misalignment). Loop back to Phase I to redefine project scope as PoC with explicit follow-up plan, OR descope to confirm pilot data requirements.Scenario 8: The Consent Gap
- During privacy check, team discovers training data was collected without consent for AI/ML use
- Data scientist suggests anonymizing and proceeding
Reveal
Treat as serious compliance and accountability incident. Escalate to legal/compliance/leadership. Anonymization may not cure consent defects. Cross-pull I.1 + I.4 + I.5.GAME MODE 3: Pattern Match Challenge
Match each scenario to the right ECO Domain III task:
| # | Scenario | Your Answer | Correct ECO Task |
|---|---|---|---|
| 1 | Defining required data attributes | _____ | III.1 |
| 2 | Identifying data SMEs | _____ | III.2 |
| 3 | Identifying data sources | _____ | III.3 |
| 4 | Coordinating AI workspace | _____ | III.4 |
| 5 | Gathering required data | _____ | III.5 |
| 6 | Privacy/compliance/access check | _____ | III.6 |
| 7 | Overseeing data evaluation | _____ | III.7 |
| 8 | Determining if data meets needs (THE GATE) | _____ | III.8 |
| 9 | Conveying findings to leadership | _____ | III.9 |
| 10 | Defining what 7 quality dimensions mean | _____ | ACCTUVI (III.7) |
| 11 | Documenting source inventory | _____ | III.3 |
| 12 | Engaging data steward | _____ | III.2 |
GAME MODE 4: Fill-in-the-Blank Speed Round
- The 4 Vs of Big Data are ________, Velocity, Variety, ________.
- The III.8 gate has three outcomes: ________, ITERATE, ________.
- A data ________ enforces policy; a data ________ ensures safe storage and is NOT the data owner.
- The AI pattern from Phase I drives the ________ in Phase II.
- ECO III.6 (privacy/compliance/access) cross-pulls primarily to Domain I.1 and Domain ________.
- Anonymization is irreversible; ________ is reversible (still PII under GDPR).
- PMI lists ________ iteration triggers from Phase II back to Phase I.
- Three types of informational bias: ________, recall bias, classification bias.
- The III.8 gate evaluates Sources, ________, Quality.
- ECO III.5 produces an updated Data Source Inventory + ________ dataset.
- Domain III's PM Oversight Angle includes: PM owns / Deliverable / Iteration trigger / Escalation trigger / Wrong-answer trap / Question pattern signal / ________.
- Required data spans training, validation, test, and ________ data.
- ~80% of organizational data is ________.
- Ground truth data is the definitive ________ data.
- The PM's job in III.7 is to ________ the evaluation, not perform it.
Reveal answers
- Volume / Veracity
- GO / DESCOPE
- steward / custodian
- data type
- I.4
- pseudonymization
- 12
- Reporting bias
- Description
- staged
- ECO task tag
- production-inference
- unstructured
- reference
- oversee
GAME MODE 5: True or False Lightning Round
| # | Statement | Your Answer | Correct |
|---|---|---|---|
| 1 | III.1 is complete when the data scientist has a list of required fields | FALSE — needs documented spec with multi-stakeholder sign-off | |
| 2 | The III.8 gate has three outcomes (GO/ITERATE/DESCOPE) | TRUE | |
| 3 | Privacy checks belong in Phase III | FALSE — Phase II per III.6 | |
| 4 | Stewards and custodians are the same role | FALSE — strategic vs operational; custodian is NOT the data owner | |
| 5 | PMI documents 8 iteration triggers from Phase II to Phase I | FALSE — 12 triggers | |
| 6 | The 4 Vs are Volume, Velocity, Variety, Validity | FALSE — Veracity, not Validity | |
| 7 | Anonymization is reversible; pseudonymization is irreversible | FALSE — reverse is true | |
| 8 | The PM coordinates infrastructure but doesn't build it | TRUE | |
| 9 | Iterating back to Phase I is project failure | FALSE — methodology-correct | |
| 10 | Required data is defined before collection, not during | TRUE | |
| 11 | III.9 is an optional formality | FALSE — mandatory leadership communication | |
| 12 | The data scientist owns the III.7 evaluation | FALSE — team executes; PM oversees | |
| 13 | Ground truth data is needed for objective evaluation | TRUE | |
| 14 | Synthetic data substitution requires governance review | TRUE | |
| 15 | The III.8 gate is point-in-time and irreversible | FALSE — CPMAI is iterative; new info can warrant revisit |
GAME MODE 6: Mnemonic Speed Recall
| Mnemonic | Expand it |
|---|---|
| DRIP | Determine pattern, Required attributes, Identify sources, Plan aggregation |
| VVVV | Volume, Velocity, Variety, Veracity (4 Vs of Big Data) |
| ACCTUVI | Accuracy, Completeness, Consistency, Timeliness, Uniqueness, Validity, Integrity (Data Quality Dimensions) |
| SCALE | Source type, Cost & access, Accuracy & cadence, Legal/license, Endpoint (Source Identification) |
| NVI | Neural-net bias, Variance bias, Informational bias (3 Bias Types) |
| SDQ | Sources, Description, Quality (III.8 Gate Areas) |
| GO/ITERATE/DESCOPE | Three outcomes for III.8 (and IV.5, IV.6) |
| Steward vs Custodian | Stewards Set policy. Custodians Carry it (operational, NOT data owner) |
| Data Life Cycle (10 stages) | Generation, Collection, Storage, Access, Usage, Transfer, Security, Deletion, Archival, Privacy |
Scoring Summary
| Game Mode | Your Score | Max |
|---|---|---|
| Flashcards | ___/30 | 30 |
| Scenario Showdown | ___/8 | 8 |
| Pattern Match | ___/12 | 12 |
| Fill-in-the-Blank | ___/15 | 15 |
| True/False | ___/15 | 15 |
| Mnemonic Speed Recall | ___/9 | 9 |
| TOTAL | ___/89 | 89 |
- 80-89: Domain III mastered — move to next domain
- 65-79: Strong foundation. Review weak modes once more.
- 50-64: Re-read the study guide, then try again.
- Below 50: Revisit Module 02 source PDF.