Domain III: Identify Data Needs — Study Game

How to Play

Pick a game mode and test yourself. Cover the answers and try to recall before peeking.


GAME MODE 1: Rapid Fire Flashcards

Defining Required Data (III.1)

Card 1 — Front: What's the output of ECO Task III.1?
Answer: A documented data requirements specification (not a dataset).
Card 2 — Front: What does the DRIP mnemonic stand for?
Answer: Determine pattern, Required attributes, Identify sources, Plan aggregation.
Card 3 — Front: Required data is defined ____ collection, not during it.
Answer: before.
Card 4 — Front: What drives the data type required?
Answer: The AI pattern from Phase I (Recognition→images, Predictive→structured, Conversational→text).

Sources, SMEs, Infrastructure (III.2-III.5)

Card 5 — Front: What's the SCALE mnemonic for source identification?
Answer: Source type, Cost & access, Accuracy & cadence, Legal/license, Endpoint.
Card 6 — Front: Distinguish data steward from data custodian.
Answer: Steward = strategic, policy-enforcing, cross-functional. Custodian = operational, NOT the data owner.
Card 7 — Front: What does ECO III.4 cover?
Answer: Coordinate AI workspace and infrastructure (compute, storage, pipelines, environments, security, access).
Card 8 — Front: Who executes data gathering in III.5?
Answer: Data engineering team. PM coordinates and tracks.

Privacy, Compliance, Bias (III.6 + cross-pulls)

Card 9 — Front: Privacy/compliance checks belong in which phase?
Answer: Phase II (Data Understanding) per ECO III.6 — NOT Phase III.
Card 10 — Front: Difference between anonymization and pseudonymization?
Answer: Anonymization is irreversible (no longer PII). Pseudonymization is reversible (still PII under GDPR).
Card 11 — Front: Three types of bias in AI?
Answer: Neural-network (math), Variance (fitting), Informational (fairness — exam one). Mnemonic: NVI.
Card 12 — Front: Three types of informational bias?
Answer: Reporting (some aspects recorded), Recall (recent vs old data weighted), Classification (data categorized to misrepresent groups).

The 4 Vs of Big Data

Card 13 — Front: What are the 4 Vs?
Answer: Volume, Velocity, Variety, Veracity. Mnemonic: VVVV.
Card 14 — Front: Volume challenge in big data?
Answer: Massive amounts spread across locations (we're in zettabyte era).
Card 15 — Front: Velocity challenge?
Answer: Rapidly changing data OR moving data quickly between locations.
Card 16 — Front: Variety challenge?
Answer: Different formats — structured, unstructured, semistructured. One system can't handle all three.
Card 17 — Front: Veracity challenge?
Answer: Different levels of quality, accuracy, trustworthiness. Hard to assess at scale.

Data Quality Dimensions

Card 18 — Front: What does ACCTUVI stand for?
Answer: Accuracy, Completeness, Consistency, Timeliness, Uniqueness, Validity, Integrity. "A Cat Caught Two Unwary Voles Inside."

Data Types

Card 19 — Front: What % of organizational data is unstructured?
Answer: ~80%.
Card 20 — Front: Three data type categories?
Answer: Structured (defined schema), Unstructured (no schema), Semi-structured (partial schema).
Card 21 — Front: What's training data?
Answer: Prepared, cleaned, labeled data used to train an ML model.
Card 22 — Front: What's ground truth data?
Answer: Definitive reference data the model is measured against.

The Gate (III.8)

Card 23 — Front: Three areas evaluated at III.8 gate?
Answer: Sources, Description, Quality. Mnemonic: SDQ.
Card 24 — Front: Three outcomes at III.8?
Answer: GO (proceed to Phase III) / ITERATE (loop back) / DESCOPE (reduce scope).
Card 25 — Front: PMI's key gate concept?
Answer: "If you can confidently say 'We have the data and know the problem,' move to Phase III. If not, pause."

Iteration Triggers

Card 26 — Front: How many iteration triggers does PMI document for Phase II → Phase I loops?
Answer: 12 distinct scenarios (business shift, infeasible data, wrong type, etc.).
Card 27 — Front: Is iterating back to a prior phase a project failure?
Answer: No — it's methodology-correct. CPMAI's iterative design specifically allows backing up "without penalty."

Conveying to Leadership (III.9)

Card 28 — Front: What's the III.9 deliverable?
Answer: Leadership briefing covering data state, gate decision, key risks, recommendations, scope/schedule impacts.
Card 29 — Front: Is III.9 optional?
Answer: No — mandatory leadership communication before Phase III work begins.
Card 30 — Front: What's the cross-pull from III.9 → V.5?
Answer: III.9 conveyance feeds Domain V's final report; gate decisions become lessons-learned content.

GAME MODE 2: Scenario Showdown — What Should the PM Do?

Scenario 1: The Pattern Mismatch

Reveal Loop back to Phase I (II.7) to revisit the AI pattern selection with stakeholders. Pattern is a Phase I deliverable; changing it inside Domain III bypasses governance. Engage stakeholders, document the revision, assess scope impact (II.4).

Scenario 2: The Compliance Surprise

Reveal Document the delay as a risk, escalate to leadership with options, engage stakeholders. Cross-pull III.6 + III.8 + I.4. Options: proceed-and-wait, iterate to alternative sources, descope. Don't proceed unilaterally.

Scenario 3: The Volume Shortfall

Reveal Pause and engage stakeholders. PMI iteration trigger #4/5. Options: synthetic data (with QA + governance), additional sourcing, technique change, descope. Document decision before proceeding.

Scenario 4: The Quality Gate

Reveal ITERATE — III.8 includes trustworthy-AI alignment. Privacy plan sign-off is governance gap (I.1 cross-pull). Don't GO with conditional sign-offs that defer governance.

Scenario 5: The Source Disappears

Reveal Loop back to III.3 to identify alternative sources. Iteration trigger. Document inventory revision and assess III.8 gate impact.

Scenario 6: The Bias Discovery

Reveal Cross-domain issue: III.7 + I.3 + II.8. Document, engage stakeholders for mitigation options (re-source, rebalance training data, in-processing fairness, post-processing calibration). Don't approve "fix at training time" without governance.

Scenario 7: The PoC Confusion

Reveal PMI iteration trigger #12 (PoC vs Pilot misalignment). Loop back to Phase I to redefine project scope as PoC with explicit follow-up plan, OR descope to confirm pilot data requirements.

Scenario 8: The Consent Gap

Reveal Treat as serious compliance and accountability incident. Escalate to legal/compliance/leadership. Anonymization may not cure consent defects. Cross-pull I.1 + I.4 + I.5.


GAME MODE 3: Pattern Match Challenge

Match each scenario to the right ECO Domain III task:

#ScenarioYour AnswerCorrect ECO Task
1Defining required data attributes_____III.1
2Identifying data SMEs_____III.2
3Identifying data sources_____III.3
4Coordinating AI workspace_____III.4
5Gathering required data_____III.5
6Privacy/compliance/access check_____III.6
7Overseeing data evaluation_____III.7
8Determining if data meets needs (THE GATE)_____III.8
9Conveying findings to leadership_____III.9
10Defining what 7 quality dimensions mean_____ACCTUVI (III.7)
11Documenting source inventory_____III.3
12Engaging data steward_____III.2
Scoring: 11-12 correct = Expert | 8-10 = Solid | 5-7 = Review needed | Below 5 = Re-study Module 1

GAME MODE 4: Fill-in-the-Blank Speed Round

  1. The 4 Vs of Big Data are ________, Velocity, Variety, ________.
  2. The III.8 gate has three outcomes: ________, ITERATE, ________.
  3. A data ________ enforces policy; a data ________ ensures safe storage and is NOT the data owner.
  4. The AI pattern from Phase I drives the ________ in Phase II.
  5. ECO III.6 (privacy/compliance/access) cross-pulls primarily to Domain I.1 and Domain ________.
  6. Anonymization is irreversible; ________ is reversible (still PII under GDPR).
  7. PMI lists ________ iteration triggers from Phase II back to Phase I.
  8. Three types of informational bias: ________, recall bias, classification bias.
  9. The III.8 gate evaluates Sources, ________, Quality.
  10. ECO III.5 produces an updated Data Source Inventory + ________ dataset.
  11. Domain III's PM Oversight Angle includes: PM owns / Deliverable / Iteration trigger / Escalation trigger / Wrong-answer trap / Question pattern signal / ________.
  12. Required data spans training, validation, test, and ________ data.
  13. ~80% of organizational data is ________.
  14. Ground truth data is the definitive ________ data.
  15. The PM's job in III.7 is to ________ the evaluation, not perform it.

Reveal answers
  1. Volume / Veracity
  2. GO / DESCOPE
  3. steward / custodian
  4. data type
  5. I.4
  6. pseudonymization
  7. 12
  8. Reporting bias
  9. Description
  10. staged
  11. ECO task tag
  12. production-inference
  13. unstructured
  14. reference
  15. oversee


GAME MODE 5: True or False Lightning Round

#StatementYour AnswerCorrect
1III.1 is complete when the data scientist has a list of required fieldsFALSE — needs documented spec with multi-stakeholder sign-off
2The III.8 gate has three outcomes (GO/ITERATE/DESCOPE)TRUE
3Privacy checks belong in Phase IIIFALSE — Phase II per III.6
4Stewards and custodians are the same roleFALSE — strategic vs operational; custodian is NOT the data owner
5PMI documents 8 iteration triggers from Phase II to Phase IFALSE — 12 triggers
6The 4 Vs are Volume, Velocity, Variety, ValidityFALSE — Veracity, not Validity
7Anonymization is reversible; pseudonymization is irreversibleFALSE — reverse is true
8The PM coordinates infrastructure but doesn't build itTRUE
9Iterating back to Phase I is project failureFALSE — methodology-correct
10Required data is defined before collection, not duringTRUE
11III.9 is an optional formalityFALSE — mandatory leadership communication
12The data scientist owns the III.7 evaluationFALSE — team executes; PM oversees
13Ground truth data is needed for objective evaluationTRUE
14Synthetic data substitution requires governance reviewTRUE
15The III.8 gate is point-in-time and irreversibleFALSE — CPMAI is iterative; new info can warrant revisit
Scoring: 14-15 = Exam ready | 11-13 = Almost there | Below 11 = Review the guide

GAME MODE 6: Mnemonic Speed Recall

MnemonicExpand it
DRIPDetermine pattern, Required attributes, Identify sources, Plan aggregation
VVVVVolume, Velocity, Variety, Veracity (4 Vs of Big Data)
ACCTUVIAccuracy, Completeness, Consistency, Timeliness, Uniqueness, Validity, Integrity (Data Quality Dimensions)
SCALESource type, Cost & access, Accuracy & cadence, Legal/license, Endpoint (Source Identification)
NVINeural-net bias, Variance bias, Informational bias (3 Bias Types)
SDQSources, Description, Quality (III.8 Gate Areas)
GO/ITERATE/DESCOPEThree outcomes for III.8 (and IV.5, IV.6)
Steward vs CustodianStewards Set policy. Custodians Carry it (operational, NOT data owner)
Data Life Cycle (10 stages)Generation, Collection, Storage, Access, Usage, Transfer, Security, Deletion, Archival, Privacy

Scoring Summary

Game ModeYour ScoreMax
Flashcards___/3030
Scenario Showdown___/88
Pattern Match___/1212
Fill-in-the-Blank___/1515
True/False___/1515
Mnemonic Speed Recall___/99
TOTAL___/8989
Rating: