PMI-CPMAI Glossary

291 entries from the PMI-CPMAI glossary. Search live, or jump to a letter.

291 shown

pyramid

A visual model illustrating the increasing value derived as data is transformed into information, knowledge, understanding, and ultimately, wisdom.
accuracy (machine learning definition)

A measure of model performance that is defined as the ratio of correctly predicted data points to the total number of data points evaluated.
ACID

A set of properties used to guarantee data integrity by ensuring that data remains in a consistent state before and after a transaction—even when errors or failures occur. It includes: • Atomicity: All changes occur as a single operation (all or nothing). • Consistency: Data remains uniform and unambiguous. • Isolation: Individual operations are shielded from one another’s effects. • Durability: Once a transaction is completed, its results persist permanently.
action (reinforcement learning)

A discrete operation or step performed by an agent in response to its current state within an environment.
action space (reinforcement learning)

The complete set of all possible actions available to an agent in a given environment.
activation function

A function in neural networks that transforms the weighted sum of inputs (plus bias) into an output signal. It determines the threshold at which a neuron “activates” (e.g., ReLU, sigmoid) and introduces the nonlinearity necessary for modeling complex relationships.
adversarial attack

A malicious attempt to fool a machine learning model by introducing intentionally modified inputs—often images—that cause the model to make incorrect predictions.
agent (reinforcement learning)

An entity that interacts with an environment by taking actions to maximize cumulative rewards.
agentic AI

AI systems that exhibit autonomous, goal-oriented behavior utilizing AI agents, often with a degree of self-direction and decision-making capability.
agile

A development approach that breaks projects into short, iterative sprints; emphasizes continuous feedback and adaptive planning; and contrasts with the traditional predictive approach.
AI agents

Software systems that use artificial intelligence to perform tasks. They can perceive their environment, make decisions, and take actions to achieve specific goals. AI agents can range from simple rule-based systems to complex models using machine learning and deep learning.
AI winter

A period of declining funding, interest, and investment in artificial intelligence research and applications, typically following cycles of overhyped expectations and subsequent disappointment.
algorithm

A precise, step-by-step set of instructions for processing data and solving problems. In machine learning, an algorithm learns patterns from data to produce a predictive model.
algorithmic discrimination

Bias in outcomes caused by biased training data; when the data used to train an algorithm is skewed, it can result in unfair or prejudiced decisions.
AlphaGo

An AI system developed by DeepMind that uses advanced reinforcement learning and deep neural networks to play the game “Go”—famously defeating the world’s top human player in 2016.
AlphaZero

An AI system from DeepMind that, using self-play and reinforcement learning, rapidly achieved superhuman performance in games such as chess and “Go.” analytical models. Models that employ algorithmic techniques to discover patterns and extract insights from data.
analytics

The use of statistical and computational methods to extract meaningful insights from data (related: predictive, descriptive, hot path, cold path, and prescriptive/projective analytics).
anonymization

Techniques for removing or modifying personally identifiable information (PII) from data sets to protect privacy while retaining data usefulness.
Apache Spark

An open-source, distributed computing framework designed for large- scale data processing and analytics, initially released in 2014.
artificial general intelligence (AGI)

The capability of a machine to perform any cognitive task at human or superhuman levels; also known as strong AI.
artificial intelligence (AI)

The simulation of human cognitive functions—such as learning, reasoning, and problem-solving—by machines. One view describes it as machine behavior that exhibits human-like intelligence.
artificial neural networks (ANN)

Computational models inspired by the human brain, consisting of interconnected neurons with learnable weights and biases that are adjusted during training.
attended bots

Software automation tools that work alongside humans (typically in front- office roles) to assist with tasks and boost productivity.
augmented intelligence

A collaborative approach where human expertise is enhanced by machine assistance, enabling humans to perform tasks that were previously too difficult or time- consuming.
autoencoder/encoder-decoder

A pair of neural networks where one network encodes input data into a compact representation and the other decodes it back to its original form. This technique is used in unsupervised learning for tasks such as denoising, anomaly detection, and dimensionality reduction.
automated machine learning (AutoML)

Platforms and tools that automate aspects of the machine learning workflow—such as model selection, hyperparameter tuning, and data preprocessing—making ML more accessible to both experts and nonexperts.
automatic speech recognition (ASR)

Technology that converts spoken language (sound waves) into text, forming a key component of applications like voice assistants and transcription services. It is often used interchangeably with speech recognition or speech-to-text (STT).
automation

The use of technology to perform repetitive tasks automatically, thereby increasing speed and accuracy without requiring human intervention.
autonomous retail

A retail model that employs technologies, such as automated checkout systems, to enable a fully self-service shopping experience where payment is processed automatically (often via a companion app).
autonomous systems

Systems—either physical or virtual—that perform tasks and make decisions with minimal human intervention.
autonomous vehicle

A vehicle equipped with technology that allows it to operate and navigate without human input. These are typically classified into levels ranging from 0 (no automation) to 5 (full automation).Umquosanis magnam que backpropagation. A neural network training algorithm in which errors are propagated backward from the output layer to update weights and biases, thereby reducing prediction errors.
BASE

A set of properties used in distributed systems as an alternative to ACID, trading strict consistency for higher availability and fault tolerance. It includes: • Basically available: Some data is available even if not fully current. • Soft-state: The system state may be temporarily inconsistent. • Eventually consistent: The system works to ensure data becomes consistent over time.
batch (training)

A method of processing training data in groups (batches) so that gradients are computed and aggregated across these groups, increasing training efficiency.
batch prediction

The generation of predictions for large data sets in a non-real-time (offline) mode, used when immediate responses are not required.
Bayes’ theorem

A statistical formula that calculates the probability of an event based on prior knowledge of conditions related to the event.
Bayesian classifier

A classification algorithm that uses Bayes’ theorem and probability distributions to assign class labels based on prior information.
bias (model fitting)

The degree to which a model’s predictions systematically deviate from the true target values, often indicating an overly simplistic model.
bias (neural network parameter)

A learnable constant added to the weighted sum of inputs in a neuron, which helps adjust the neuron’s output.
bias/variance trade-off

The balance between reducing bias (error from incorrect assumptions) and reducing variance (error from sensitivity to fluctuations) to optimize model performance.
big data

Extremely large, complex, and variable data sets that require advanced methods for storage, processing, and analysis.
big data management

The practice of organizing, storing, and processing large and complex sets of data to make it accessible and usable for various business purposes.
binary (or binomial) classification

A classification task where data is categorized into one of two classes (e.g., spam vs. not spam).
black box

A system whose internal workings are not transparent, making it difficult to understand how inputs are transformed into outputs.
Boltzmann machine

A type of fully connected neural network where every neuron is connected to every other neuron and probabilistic methods are used to model data distributions (restricted Boltzmann machines are a faster variant).
boosted trees

An ensemble learning method that sequentially combines multiple decision trees to improve overall predictive accuracy.
bounding box

A rectangular (or 3D) box drawn around an object in an image or video to indicate the area of interest for object detection.
brute-force search

A heuristic method that exhaustively generates and tests every possible solution until the correct one is found; often used as a baseline approach despite its inefficiency.
chatbot

A software application that converses with humans using natural language via text or voice.
ChatGPT

A conversational large language model (LLM) developed by OpenAI that generates human-like text responses.
citizen data scientist/citizen developer

An individual without formal data science training who uses no-code or low-code tools to build data models or applications.
classification

The process of assigning data inputs to predefined categories or classes.
classifier

An algorithm or model that predicts the category or class to which a given data input belongs.
cloud machine learning (cloud ML)

The use of cloud-based platforms and services hosted by third-party providers for developing, training, and deploying machine learning models.
cluster analysis

A technique for identifying natural groupings within a data set by clustering similar data points together.
clustering

An unsupervised process that partitions data into groups (clusters) based on similarity without preassigned labels.
Cognitive Project Management for AI (CPMAI)

A vendor-neutral project management methodology for AI, machine learning, and advanced analytics projects, typically following iterative phases such as business understanding, data understanding, data preparation, modeling, evaluation, and operationalization.
cognitive technology

Technologies that simulate human thought processes, often used as an alternative term to AI when applied to specific tasks.
cold path analytics

An approach that emphasizes batch processing of data for analysis and reporting, prioritizing accuracy over real-time speed.
collaborative robot (cobot)

A robot designed to work safely alongside humans, assisting with tasks rather than replacing human labor.
computer vision

A field of AI that enables computers to interpret, understand, and extract information from visual inputs such as images and videos.
confusion matrix

A table that displays the performance of a classification model by showing counts of true positives, true negatives, false positives, and false negatives, which are used to calculate metrics like precision, recall, and F1 score.
content summarization

The process of using AI/ML techniques to generate a concise overview of a larger body of text or multimedia content.
context window

A context window refers to the amount of surrounding text or data that is considered when processing or analyzing a specific piece of information.
continuous improvement and respect for people

An organizational principle that emphasizes ongoing process enhancement while valuing every team member’s contributions.
convergence

The process by which a neural network’s parameters stabilize as the training error approaches a minimum.
conversational systems/pattern

Frameworks that enable interactions between humans and machines via voice, text, or images, typically leveraging natural language processing (NLP).
convolution neural network (CNN)

A deep learning architecture that uses convolutional layers to automatically and adaptively learn spatial hierarchies of features from grid-like data such as images.
cost function

A function that aggregates the errors (losses) made by a model during training, serving as a measure of overall prediction error.
Cross-Industry Standard Process for Data Mining (CRISP-DM)

A structured methodology for data mining projects that includes phases such as business understanding, data preparation, modeling, evaluation, and deployment.
cross-validation

A statistical method for evaluating how well a model generalizes by partitioning data into training and validation subsets (e.g., k-fold cross- validation).
cybernetics

The interdisciplinary study of control and communication in living beings and machines, focusing on feedback systems and self-regulation.
DALL-E

A transformer-based AI model developed by OpenAI that generates images from textual descriptions.
data

The basic unit of discrete values—facts, quantities, or observations—that has no intrinsic meaning until processed and analyzed.
data analyst

A professional who collects, cleans, analyzes, visualizes, and interprets data to support decision-making.
data anonymization

Techniques for removing or modifying personally identifiable information (PII) from data sets to protect privacy.
data augmentation

Methods used to increase the quantity or diversity of data by applying transformations (e.g., rotation, scaling) or combining data sources.
data cleaning

The process of identifying and correcting or removing errors, inconsistencies, and inaccuracies in data prior to analysis.
data collection

The systematic gathering of information from various sources for analysis or model training.
data consistency

Ensuring that data remains uniform and accurate across systems and over time.
data custodian

An individual or team responsible for the safe storage, transfer, and administrative management of data.
data debt

The accumulation of inefficiencies and issues in data systems over time, often due to legacy practices, that can hinder future data quality and governance.
data drift

The gradual change in data characteristics over time that can lead to degraded model performance if not addressed.
data ecosystem

The complete infrastructure—including hardware, software, processes, and personnel—that an organization uses to collect, store, process, and analyze its data.
data engineer

A professional who develops and manages data pipelines and architectures to ensure data is accessible and reliable.
data engineering

The discipline of designing, building, and maintaining systems for collecting, storing, and processing large volumes of data.
data feed

A method for delivering data from one or more sources to a system for processing or analysis.
data governance

The processes, procedures, standards, roles and tools an organization implements to ensure the organization’s data is properly stored, managed, accurate, available, secured, and properly controlled over the data’s lifecycle.
data ingestion

The process of importing data from various sources into a storage or processing system.
data integration

The process of combining data from different sources into a unified view for analysis.
data integrity

The assurance that data remains accurate, consistent, and reliable throughout its life cycle.
data labeling

The process of adding descriptive tags or metadata to data—especially training data— to enable supervised learning.
data lake

A centralized repository that stores large volumes of raw data in its native format until it is needed for analysis.
data management

The practice of organizing, storing, and maintaining data so that it can be efficiently accessed and analyzed.
data mining

The process of discovering patterns and extracting insights from large data sets using statistical and computational techniques.
data multiplication

Techniques used to increase the effective quantity of training data by transforming or augmenting existing data.
data normalization

Methods for standardizing data values and formats to reduce redundancy and improve data integrity.
data operations (DataOps)

A set of practices and technologies for managing the data life cycle in an agile, automated manner to improve quality and speed.
data pipeline

A series of interconnected steps that transport data from source systems to target destinations, often involving extraction, transformation, and loading (ETL).
data preparation

All steps—such as cleaning, transforming, and formatting—performed on raw data to make it suitable for analysis or machine learning.
data privacy

Practices and technologies aimed at protecting personal and sensitive information from unauthorized access or misuse.
data quality management

The ongoing process of monitoring and improving data to ensure it is accurate, complete, and reliable.
data science

The interdisciplinary field that uses scientific methods, statistics, and algorithms to extract knowledge and insights from data.
data science notebook

An interactive computing environment (such as Jupyter Notebook) that combines live code, visualizations, and narrative text for data exploration.
data scientist

A professional skilled in statistical analysis, machine learning, and domain expertise who extracts actionable insights from data.
data security

Measures taken to protect data against unauthorized access, corruption, or loss.
data set

A collection of related data items organized into a logical structure.
data splitting

Dividing a data set into subsets (e.g., training, validation, test) for model development and evaluation.
data stewardship

The practice of ensuring an organization’s data is accessible, trustworthy, usable and secure.
data storage

The systems and methods used to store digital data securely and accessibly.
data transformation

The process of converting data from one format or structure to another, typically as part of an extraction, transformation, and loading (ETL) process.
data visualization

The use of graphical representations (charts, graphs, diagrams) to convey patterns and insights from data.
data warehouse

A centralized repository that aggregates data from multiple sources and is optimized for query and analysis using extraction, transformation, and loading (ETL) processes.
data warehousing

The process of collecting, storing, and managing large volumes of data from different sources in a centralized repository (called a data warehouse) so it can be easily used for reporting, analysis, and decision- making.
data wrangling

The process of cleaning, structuring, and enriching raw data into a desired format for analysis.
database

An organized collection of electronic data stored in a structured format that enables efficient retrieval and management.
deep learning

An algorithmic approach to machine learning that uses neural networks with multiple hidden layers to learn from data. Also referred to as deep learning neural networks. These models can learn complex patterns and representations, making them effective for tasks such as image and speech recognition, natural language processing, and predictive analytics.
deterministic

A system that always produces the same output given the same input.
digital transformation

The integration of digital technology into all areas of an organization, fundamentally changing how it operates and delivers value.
dimension

A measurable attribute or feature (such as age, income, location) used to describe data for analysis or in machine learning.
dimensionality reduction

Techniques that reduce the number of input variables in a data set while preserving the essential information.
distributed file system

A file system that stores data across multiple servers or locations to improve accessibility and reliability.
edge device

A hardware device (e.g., sensor, smartphone, camera) that collects and processes data at the network’s edge, often with limited resources.
encoder-decoder (neural network)

A pair of neural networks where one encodes input data into a compact representation and the other decodes it back to its original form, used in unsupervised learning.
ensemble models

Techniques that combine predictions from multiple models trained on the same data to achieve improved accuracy and robustness.
environment (reinforcement learning)

The external system or context with which an agent interacts, providing states and rewards based on the agent’s actions.
episode (reinforcement learning)

A complete sequence of interactions (states, actions, rewards) between an agent and its environment, from start to finish.
epoch

One complete pass through the entire training data set during model training.
exabyte

A unit of digital information equal to one billion gigabytes.
expert system

An AI system that mimics the decision-making abilities of a human expert using a knowledge base and inference rules.
explainable AI (XAI)

AI systems designed to provide clear, understandable explanations for their predictions, thereby increasing trust and transparency.
exploration vs. exploitation trade-off

A reinforcement learning dilemma where an agent must decide between exploring new actions to discover higher rewards and exploiting known actions that yield high rewards.
extract, transform, load (ETL)

A process that extracts raw data from sources, transforms it (e.g., cleaning, deduplication) into a suitable format, and loads it into a target system for analysis.
F1 score

A metric that combines precision and recall into a single value (ranging from 0 to 1) to evaluate classification accuracy.
feature

Any measurable property or characteristic of data used as input for a predictive model.
feature engineering/extraction

Techniques to create, enhance, or select features from raw data to improve model performance.
feature reduction

The process of decreasing the number of features in a data set to simplify the model and reduce training time while retaining essential information.
feature selection

The process of identifying the most relevant features from a data set for a given predictive task.
federated learning

A machine learning approach where a model is trained across multiple decentralized devices or servers while keeping data local to preserve privacy.
feed-forward neural network

A basic neural network in which data flows in one direction—from input, through hidden layers, to output—without cycles.
foundation models

Large-scale, pretrained models (often deep-learning-based models) focused on a general domain (e.g., language, vision) that can be fine- tuned for specific tasks.
fuzzy logic

A form of logic that allows reasoning with degrees of truth rather than binary true/false values, enabling the handling of uncertainty.
Gaussian mixture model (GMM)

A probabilistic model representing a data set as a mixture of multiple Gaussian distributions, used for clustering or classification.
generalization

The ability of a machine learning model to perform well on unseen data after training.
generalization error

The error a model makes when predicting on new, unseen data, reflecting its ability to generalize.
generalization performance

An overall measure of how well a model performs on data outside its training set.
generative adversarial network (GAN)

A deep learning framework where two neural networks (a generator and a discriminator) are trained simultaneously to generate realistic synthetic data.
generative AI (GenAI)

AI systems that create new data (e.g., text, images, music) based on patterns learned from existing data.
GPT models

A family of transformer-based language models (such as GPT-3 and GPT-4) developed by OpenAI that generate human-like text from short prompts.
gradient descent

An optimization algorithm that iteratively adjusts model parameters by moving in the direction of the steepest decrease in the cost function.
graph

A visual diagram that represents relationships between variables, often used for data analysis.
graph database

A database that uses graph structures (nodes, edges, and properties) to store and query data based on relationships.
graphical processing units (GPUs)

Specialized hardware originally designed for graphics rendering, now widely used to accelerate machine learning model training and inference.
ground truth data

Data collected from real-world observations that serves as the definitive reference for training and validating models.
Hadoop

An open-source framework that enables distributed storage and processing of large data sets across clusters of computers.
heuristic

A practical, experience-based technique for problem-solving that may not guarantee an optimal solution but offers a quick approximation.
hidden layer

A layer of neurons in a neural network that lies between the input and output layers, enabling the model to learn complex patterns.
hot path analytics

An approach to data processing that focuses on real-time or near-real- time analysis, prioritizing speed over absolute accuracy.
hyperpersonalization

A pattern of AI focused on creating and evolving an individualized profile for each user to provide highly tailored recommendations.
ImageNet

A large, publicly available repository of labeled images organized by the WordNet hierarchy, used for training and benchmarking computer vision models.
inference

The process of using a trained model to make predictions or decisions on new, unseen data.
internet minute

A metric measuring the volume of information created, processed, or transmitted on the internet in a 60-second interval.
interpretable AI

Systems that offer insights into how a model arrived at its predictions, increasing transparency and trust.
jailbreaks

Techniques or methods used to bypass or circumvent the restrictions, safeguards, or guidelines built into AI models. These restrictions are typi- cally designed to prevent the AI from generating harmful, inappropriate, or unsafe content.
Julia

A high-level, high-performance programming language designed for technical and scientific computing.
Jupyter

An open-source interactive computing environment that allows for the creation and sharing of documents containing live code, equations, visualizations, and narrative text.
K-means

An unsupervised algorithm that partitions data into K clusters by minimizing within-cluster variance.
K-nearest neighbor (KNN)

A simple, instance-based algorithm that classifies data points based on the majority label among their K closest neighbors.
Kaggle

A platform for predictive modeling competitions and a community where data scientists collaborate and share knowledge.
Keras

An open-source neural network library written in Python that provides a simple interface for building deep learning models, now integrated primarily with TensorFlow.
kernel method

A technique used in algorithms such as support vector machines to map input data into a higher-dimensional space using kernel functions, enabling linear separation of nonlinear data.
key performance indicator (KPI)

A measurable value that indicates how effectively an organization or project is achieving its key objectives.
large language models (LLMs)

Deep learning models trained on massive amounts of text data, capable of generating, summarizing, and understanding human language.
layer (network layer)

A collection of nodes in a neural network that collectively transform input data into output signals; includes input, hidden, and output layers.
lazy learning

A machine learning approach where computation is deferred until a prediction is requested, rather than building a generalized model in advance.
Lean methodology

A set of principles aimed at maximizing value while minimizing waste in business processes and project management.
learning curve

A graphical representation showing how a model’s performance improves with additional training data over time.
learning rate

A hyperparameter that determines the size of the steps taken during gradient descent, scaling the magnitude of weight updates.
levels of autonomy

A classification system that describes the degree of automation in a system, from level 0 (no automation) to level 5 (full automation).
linear regression

A statistical method that models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to the data.
long-short-term memory (LSTM)

A type of recurrent neural network designed to capture long-term dependencies using mechanisms like input, output, and forget gates.
loss function

A function that quantifies the error between predicted outputs and actual target values, guiding the optimization process.
low code

Platforms that allow non-developers to build software applications with minimal or no coding.
machine learning (ML)

The ability of a machine to learn from data, improve with experience, and apply that learning to make predictions. It typically includes supervised, unsupervised, and reinforcement learning.
machine learning operations (MLOps)

Practices and tools for managing the life cycle of machine learning models, from development to deployment and monitoring, similar to DevOps for ML.
machine learning-as-a-service (MLaaS)

Cloud-based solutions that offer a range of machine learning services (such as data preprocessing, model training, and inference) on a subscription or usage-based model.
machine translation

The use of AI to automatically translate text or speech from one language to another.
malicious AI

The intentional use of artificial intelligence for unethical, dangerous, or criminal purposes.
MapReduce

A programming model that processes large data sets by partitioning tasks across multiple parallel systems and aggregating the results.
Markov model

A stochastic model where the probability of each event depends only on the state attained in the previous event; a Markov chain is its simplest form.
master data management (MDM)

A process and set of tools used by organizations to create and maintain a single, consistent, and accurate view of their critical business data, such as customer, product, supplier, or employee information, across all systems and departments.
megabyte

A unit of data storage approximately equal to one million bytes.
methodology

A defined set of processes and frameworks followed to achieve consistent, repeatable outcomes; for example, CPMAI for AI projects.
microservice

An architectural approach that decomposes a large application into small, loosely coupled services that communicate over a network. microservice “on demand.” An architectural design that breaks a large system into smaller, independent services that can be deployed and scaled individually based on demand.
model

The final output of training for a machine learning algorithm—a function that makes predictions on new data.
model drift

The degradation in a model’s performance over time as the underlying data distribution changes.
model retraining

The process of updating a deployed model by retraining it on new data to maintain or improve its performance.
model tuning

The process of adjusting a model’s hyperparameters and settings to optimize its performance and generalization.
model validation

The evaluation of a model’s performance on a separate validation data set to ensure it generalizes well.
multiclass classification

A classification task where data is assigned to one of more than two classes.
naive Bayes

A family of simple probabilistic classifiers based on Bayes’ theorem with the “naive” assumption of feature independence.
narrow AI

Also known as weak AI, referring to systems designed for specific tasks rather than general intelligence.
natural language generation (NLG)

The use of AI to automatically produce human-like text or speech from structured data.
natural language processing (NLP)

A field of AI that focuses on enabling machines to understand, interpret, and generate human language.
natural language understanding (NLU)

A subset of NLP that enables machines to comprehend human language, including its intent and context.
neural network

A machine learning algorithmic approach that consists of layers of interconnected nodes or “neurons” that can be trained on input data. Neural networks are particularly useful for tasks like image recognition, speech processing, and natural language understanding.
no code

Platforms that allow individuals with no coding experience to build software applications.
node

The basic computational unit in a neural network that receives inputs, applies a transfer function, and produces an output.
nondeterministic

Referring to systems where the same input can yield different outputs due to inherent randomness.
off-policy learning algorithm

A reinforcement learning algorithm that evaluates and improves a policy different from the one currently used by the agent.
on-policy learning algorithm

A reinforcement learning algorithm that updates its policy based on the actions taken by the current policy.
on-premise

Infrastructure, software, or systems hosted and managed within an organization’s own facilities rather than in the cloud.
OpenAI

An organization dedicated to advancing artificial intelligence research, known for models such as GPT-3, GPT-4, and DALL-E.
operationalization

The process of deploying a machine learning model into a real-world environment for live predictions or inferences.
optimizer (algorithm)

An algorithm (e.g., Adam, SGD, RMSprop) used to adjust model parameters during training to minimize the loss function.
overfitting

A modeling error where a model learns the training data too well, including its noise, resulting in poor performance on new data.
pattern recognition

The process by which machine learning systems identify and learn patterns from data to make predictions or classifications.
Perceptron

One of the earliest artificial neural network models, consisting of a single layer of neurons that laid the groundwork for later architectures.
personalization

The use of technology to tailor products, services, or content based on individual user characteristics or behavior.
personally identifiable health information (PHI)

Sensitive healthcare data that can uniquely identify an individual and requires special protection.
personally identifiable information (PII)

Data such as names, social security numbers, or addresses that uniquely identify an individual.
prediction

The process of using a trained model to forecast an outcome based on new input data.
predictive analytics

Techniques that use historical data to forecast future outcomes or trends.
prescriptive/projective analytics

Analytics that determine the potential impact of decisions, often answering “what if” scenarios.
pretrained model

A machine learning model that has been previously trained on a large data set and can be fine-tuned for related tasks.
principal component analysis (PCA)

A dimensionality reduction technique that transforms a data set into a new set of uncorrelated variables (principal components) that capture most of the variance.
prompt engineering

The process of crafting and refining input prompts to optimize the performance of language models.
pseudo AI

Products or companies that claim to use AI but rely primarily on human input or simple algorithms without genuine intelligence.
Python

A popular, high-level programming language widely used for data science, machine learning, and general-purpose programming.
Q-learning

A reinforcement learning algorithm that learns the value of actions in a state without requiring a model of the environment, enabling it to handle stochastic transitions.
R

A programming language and environment widely used for statistical computing, data analysis, and visualization.
random forest

An ensemble learning method that constructs multiple decision trees and aggregates their predictions for improved accuracy and robustness.
real-time prediction

The generation of predictions instantly as new data is received, which is crucial for time-sensitive applications.
receiver operating characteristic (ROC) curve

A graph that plots the true positive rate against the false positive rate at various threshold settings to evaluate classifier performance.
recognition systems

AI systems that identify and categorize patterns or objects within data such as facial or handwriting recognition.
recommendation system

Systems that suggest products, services, or content to users based on their behavior and profile data.
rectified linear unity (ReLU)

A fast and simple nonlinear activation function defined as ReLU(x) = max(0, x), widely used in deep learning.
recurrent neural network (RNN)

A neural network designed for sequential data with loops that allow information to persist; includes variants such as long-short-term memory (LSTM).
regression

A statistical method that models the relationship between input and output variables to predict continuous outcomes.
reinforcement learning (RL)

A type of machine learning where an agent learns to make decisions by interacting with an environment and receiving rewards.
relational database

A database that organizes data into tables (rows and columns) and is managed by a relational database management system (RDBMS).
return on investment (ROI)

A metric that measures the financial benefit of an investment relative to its cost.
reward (reinforcement learning)

The feedback provided to an agent after taking an action, which guides the learning process.
roboadvisor

An automated, algorithm-driven service that provides financial planning and investment advice with minimal human intervention.
robot

A hardware or software system that performs tasks automatically on behalf of a human.
robotic process automation (RPA)

The use of software robots to automate repetitive tasks, often involving user interface interactions, either in attended or unattended modes.
robotics

The engineering discipline focused on designing, constructing, operating, and applying robots.
scikit-learn

A free, open-source machine learning library for Python that supports a wide range of algorithms and tools.
self-supervised learning

A machine learning approach that enables models to learn on their own from vast amounts of unlabeled data.
semi-structured data

Data that contains both a defined schema and elements of variability, falling between fully structured data (e.g., databases) and unstructured data (e.g., plain text). Examples include JSON and XML.
sensitivity (recall)

A performance measure that quantifies the proportion of actual positives correctly identified by a model.
sensor fusion

The process of combining data from multiple sensors (such as LiDAR, radar, and cameras) to improve situational awareness.
sentiment analysis

The use of machine learning and algorithms to identify and categorize opinions in text (or other data) as positive, negative, or neutral.
seven patterns of AI

A framework that groups AI applications into seven categories, including hyperpersonalization, autonomous systems, predictive analytics, conversational interactions, pattern recognition, recognition systems, and goal-driven systems.
streaming predictions

The process by which a model produces immediate, real-time predictions as data is received, rather than processing data in batches.
structured data

Data organized into a defined format with a schema, such as databases, tables, or spreadsheets.
structured query language (SQL)

A specialized programming language used for managing and querying relational databases.
support vector

The data points in an SVM that are closest to the decision boundary and determine the margin width.
support vector machine (SVM)

A supervised learning algorithm that classifies data by finding the optimal hyperplane that maximizes the margin between classes.
symbolic approaches

Techniques that rely on symbolic representations and logical inference rather than statistical methods.
symbolic systems

An approach to machine learning that uses explicit, human- understandable rules and logic for reasoning.
synthetic data

Artificially generated data that mimics real-world data, used when actual data is scarce or sensitive.
synthetic minority over-sampling technique (SMOTE)

A technique in machine learning used to address class imbalance in datasets—especially when the number of instances in one class (typically the minority class) is much lower than in another (the majority class) social engineering. A manipulation technique that exploits human psychology to gain access to confidential information or systems.
t-distributed stochastic neighbor embedding (t-SNE)

A dimensionality reduction technique that projects high-dimensional data into 2D or 3D space while preserving local relationships.
tensor processing units (TPUs)

Specialized hardware developed by Google to accelerate the training and inference of machine learning models, particularly with TensorFlow.
terabyte

A unit of digital storage equal to approximately 1,000 gigabytes.
test data set (holdout data set)

A portion of data set aside from training and validation to verify a model’s performance on unseen data.
the curse of dimensionality

The phenomenon where increasing the number of features makes data sparse, requiring exponentially more data to achieve reliable modeling.
tokenization

A preprocessing step in which input data (such as text) is split into smaller meaningful units (tokens), like words or phrases.
training data

A data set of cleaned and labeled data used to train a machine learning model.
transfer learning

A technique that leverages a pretrained model (trained on a large, relevant data set) as a starting point for a new, related task.
transformer models/transformer network

Deep learning architectures that process sequential data using self- attention mechanisms instead of recurrence, enabling efficient handling of long sequences.
Turing test

A test proposed by Alan Turing to determine whether a machine exhibits behavior indistinguishable from a human.
unattended bots

Software automation systems that operate in the background without human intervention, commonly used in robotic process automation (RPA).
underfitting

A modeling error in which a model is too simple to capture the underlying structure of the data, resulting in poor performance on both training and new data.
unstructured data

Data that lacks a predefined schema and is highly variable, such as images, videos, text, and emails.
unsupervised learning

A type of machine learning that identifies patterns in unlabeled data such as through clustering or dimensionality reduction.
V’s of big data

The defining characteristics of big data, typically including volume, velocity, variety, and veracity (and sometimes additional V’s).
validation data

A subset of data set aside during model development to fine-tune and validate model performance.
variance (model)

The degree to which a model’s predictions vary for different subsets of the training data; high variance often leads to overfitting.
variety

The challenge of handling data in multiple formats, structures, and sources.
vectorization

In natural language processing (NLP), the process of converting words or phrases into numerical vectors that capture their meaning.
vectorization and word embedding

A method in natural language processing (NLP) where words or phrases are mapped to high-dimensional vectors such that similar words are close together in vector space.
velocity

One of the V’s of big data, referring to the speed at which data is generated and must be processed.
veracity

One of the V’s of big data, concerning the accuracy, reliability, and trustworthiness of data.
voice assistant

A conversational system or device that uses natural language processing (NLP) to understand and respond to voice commands.
volume

One of the V’s of big data, referring to the massive amounts of data generated and stored by organizations. W, X, Y, Z waterfall. A sequential project management approach where each phase (requirements, design, coding, testing, deployment) must be completed before the next begins; typically contrasted with adaptive or agile approaches. Alternatively, the term “predictive” has been used to describe project management approaches that follow a waterfall-style approach.
weights

Fundamental parameters in a neural network that determine the strength of connections between neurons, adjusted during training.
Yet Another Resource Negotiator (YARN)

A core component of Hadoop that manages and monitors cluster resources and schedules jobs across nodes.
yottabyte

A unit of digital storage equal to 1,000 zettabytes, one billion petabytes, or one quadrillion gigabytes.
zero-day discovery

A vulnerability in software or a system found before the developer knows about it and can create defenses for.
zettabyte

A unit of digital storage approximately equal to 1,000 exabytes or one billion terabytes. 4 4D task. Work or activities that are characterized by four attributes of what is more suitable for machines to do versus people, Tasks that are dull, dear, dangerous/dirty, and/or demeaning. These tasks are considered undesirable due to their monotonous and uninteresting nature (dull), high costs or resource requirements (dear), risks to health or safety and unpleasant working conditions (dangerous/dirty), and their potential to be humiliating or degrading (demeaning). Visit us at PMI.org Stay Connected © 2025 Project Management Institute, Inc. All rights reserved.