PMI-CPMAI Glossary
291 entries from the PMI-CPMAI glossary. Search live, or jump to a letter.
291 shown
- pyramidA visual model illustrating the increasing value derived as data is transformed into information, knowledge, understanding, and ultimately, wisdom.
- accuracy (machine learning definition)A measure of model performance that is defined as the ratio of correctly predicted data points to the total number of data points evaluated.
- ACIDA set of properties used to guarantee data integrity by ensuring that data remains in a consistent state before and after a transaction—even when errors or failures occur. It includes: • Atomicity: All changes occur as a single operation (all or nothing). • Consistency: Data remains uniform and unambiguous. • Isolation: Individual operations are shielded from one another’s effects. • Durability: Once a transaction is completed, its results persist permanently.
- action (reinforcement learning)A discrete operation or step performed by an agent in response to its current state within an environment.
- action space (reinforcement learning)The complete set of all possible actions available to an agent in a given environment.
- activation functionA function in neural networks that transforms the weighted sum of inputs (plus bias) into an output signal. It determines the threshold at which a neuron “activates” (e.g., ReLU, sigmoid) and introduces the nonlinearity necessary for modeling complex relationships.
- adversarial attackA malicious attempt to fool a machine learning model by introducing intentionally modified inputs—often images—that cause the model to make incorrect predictions.
- agent (reinforcement learning)An entity that interacts with an environment by taking actions to maximize cumulative rewards.
- agentic AIAI systems that exhibit autonomous, goal-oriented behavior utilizing AI agents, often with a degree of self-direction and decision-making capability.
- agileA development approach that breaks projects into short, iterative sprints; emphasizes continuous feedback and adaptive planning; and contrasts with the traditional predictive approach.
- AI agentsSoftware systems that use artificial intelligence to perform tasks. They can perceive their environment, make decisions, and take actions to achieve specific goals. AI agents can range from simple rule-based systems to complex models using machine learning and deep learning.
- AI winterA period of declining funding, interest, and investment in artificial intelligence research and applications, typically following cycles of overhyped expectations and subsequent disappointment.
- algorithmA precise, step-by-step set of instructions for processing data and solving problems. In machine learning, an algorithm learns patterns from data to produce a predictive model.
- algorithmic discriminationBias in outcomes caused by biased training data; when the data used to train an algorithm is skewed, it can result in unfair or prejudiced decisions.
- AlphaGoAn AI system developed by DeepMind that uses advanced reinforcement learning and deep neural networks to play the game “Go”—famously defeating the world’s top human player in 2016.
- AlphaZeroAn AI system from DeepMind that, using self-play and reinforcement learning, rapidly achieved superhuman performance in games such as chess and “Go.” analytical models. Models that employ algorithmic techniques to discover patterns and extract insights from data.
- analyticsThe use of statistical and computational methods to extract meaningful insights from data (related: predictive, descriptive, hot path, cold path, and prescriptive/projective analytics).
- anonymizationTechniques for removing or modifying personally identifiable information (PII) from data sets to protect privacy while retaining data usefulness.
- Apache SparkAn open-source, distributed computing framework designed for large- scale data processing and analytics, initially released in 2014.
- artificial general intelligence (AGI)The capability of a machine to perform any cognitive task at human or superhuman levels; also known as strong AI.
- artificial intelligence (AI)The simulation of human cognitive functions—such as learning, reasoning, and problem-solving—by machines. One view describes it as machine behavior that exhibits human-like intelligence.
- artificial neural networks (ANN)Computational models inspired by the human brain, consisting of interconnected neurons with learnable weights and biases that are adjusted during training.
- attended botsSoftware automation tools that work alongside humans (typically in front- office roles) to assist with tasks and boost productivity.
- augmented intelligenceA collaborative approach where human expertise is enhanced by machine assistance, enabling humans to perform tasks that were previously too difficult or time- consuming.
- autoencoder/encoder-decoderA pair of neural networks where one network encodes input data into a compact representation and the other decodes it back to its original form. This technique is used in unsupervised learning for tasks such as denoising, anomaly detection, and dimensionality reduction.
- automated machine learning (AutoML)Platforms and tools that automate aspects of the machine learning workflow—such as model selection, hyperparameter tuning, and data preprocessing—making ML more accessible to both experts and nonexperts.
- automatic speech recognition (ASR)Technology that converts spoken language (sound waves) into text, forming a key component of applications like voice assistants and transcription services. It is often used interchangeably with speech recognition or speech-to-text (STT).
- automationThe use of technology to perform repetitive tasks automatically, thereby increasing speed and accuracy without requiring human intervention.
- autonomous retailA retail model that employs technologies, such as automated checkout systems, to enable a fully self-service shopping experience where payment is processed automatically (often via a companion app).
- autonomous systemsSystems—either physical or virtual—that perform tasks and make decisions with minimal human intervention.
- autonomous vehicleA vehicle equipped with technology that allows it to operate and navigate without human input. These are typically classified into levels ranging from 0 (no automation) to 5 (full automation).Umquosanis magnam que backpropagation. A neural network training algorithm in which errors are propagated backward from the output layer to update weights and biases, thereby reducing prediction errors.
- BASEA set of properties used in distributed systems as an alternative to ACID, trading strict consistency for higher availability and fault tolerance. It includes: • Basically available: Some data is available even if not fully current. • Soft-state: The system state may be temporarily inconsistent. • Eventually consistent: The system works to ensure data becomes consistent over time.
- batch (training)A method of processing training data in groups (batches) so that gradients are computed and aggregated across these groups, increasing training efficiency.
- batch predictionThe generation of predictions for large data sets in a non-real-time (offline) mode, used when immediate responses are not required.
- Bayes’ theoremA statistical formula that calculates the probability of an event based on prior knowledge of conditions related to the event.
- Bayesian classifierA classification algorithm that uses Bayes’ theorem and probability distributions to assign class labels based on prior information.
- bias (model fitting)The degree to which a model’s predictions systematically deviate from the true target values, often indicating an overly simplistic model.
- bias (neural network parameter)A learnable constant added to the weighted sum of inputs in a neuron, which helps adjust the neuron’s output.
- bias/variance trade-offThe balance between reducing bias (error from incorrect assumptions) and reducing variance (error from sensitivity to fluctuations) to optimize model performance.
- big dataExtremely large, complex, and variable data sets that require advanced methods for storage, processing, and analysis.
- big data managementThe practice of organizing, storing, and processing large and complex sets of data to make it accessible and usable for various business purposes.
- binary (or binomial) classificationA classification task where data is categorized into one of two classes (e.g., spam vs. not spam).
- black boxA system whose internal workings are not transparent, making it difficult to understand how inputs are transformed into outputs.
- Boltzmann machineA type of fully connected neural network where every neuron is connected to every other neuron and probabilistic methods are used to model data distributions (restricted Boltzmann machines are a faster variant).
- boosted treesAn ensemble learning method that sequentially combines multiple decision trees to improve overall predictive accuracy.
- bounding boxA rectangular (or 3D) box drawn around an object in an image or video to indicate the area of interest for object detection.
- brute-force searchA heuristic method that exhaustively generates and tests every possible solution until the correct one is found; often used as a baseline approach despite its inefficiency.
- chatbotA software application that converses with humans using natural language via text or voice.
- ChatGPTA conversational large language model (LLM) developed by OpenAI that generates human-like text responses.
- citizen data scientist/citizen developerAn individual without formal data science training who uses no-code or low-code tools to build data models or applications.
- classificationThe process of assigning data inputs to predefined categories or classes.
- classifierAn algorithm or model that predicts the category or class to which a given data input belongs.
- cloud machine learning (cloud ML)The use of cloud-based platforms and services hosted by third-party providers for developing, training, and deploying machine learning models.
- cluster analysisA technique for identifying natural groupings within a data set by clustering similar data points together.
- clusteringAn unsupervised process that partitions data into groups (clusters) based on similarity without preassigned labels.
- Cognitive Project Management for AI (CPMAI)A vendor-neutral project management methodology for AI, machine learning, and advanced analytics projects, typically following iterative phases such as business understanding, data understanding, data preparation, modeling, evaluation, and operationalization.
- cognitive technologyTechnologies that simulate human thought processes, often used as an alternative term to AI when applied to specific tasks.
- cold path analyticsAn approach that emphasizes batch processing of data for analysis and reporting, prioritizing accuracy over real-time speed.
- collaborative robot (cobot)A robot designed to work safely alongside humans, assisting with tasks rather than replacing human labor.
- computer visionA field of AI that enables computers to interpret, understand, and extract information from visual inputs such as images and videos.
- confusion matrixA table that displays the performance of a classification model by showing counts of true positives, true negatives, false positives, and false negatives, which are used to calculate metrics like precision, recall, and F1 score.
- content summarizationThe process of using AI/ML techniques to generate a concise overview of a larger body of text or multimedia content.
- context windowA context window refers to the amount of surrounding text or data that is considered when processing or analyzing a specific piece of information.
- continuous improvement and respect for peopleAn organizational principle that emphasizes ongoing process enhancement while valuing every team member’s contributions.
- convergenceThe process by which a neural network’s parameters stabilize as the training error approaches a minimum.
- conversational systems/patternFrameworks that enable interactions between humans and machines via voice, text, or images, typically leveraging natural language processing (NLP).
- convolution neural network (CNN)A deep learning architecture that uses convolutional layers to automatically and adaptively learn spatial hierarchies of features from grid-like data such as images.
- cost functionA function that aggregates the errors (losses) made by a model during training, serving as a measure of overall prediction error.
- Cross-Industry Standard Process for Data Mining (CRISP-DM)A structured methodology for data mining projects that includes phases such as business understanding, data preparation, modeling, evaluation, and deployment.
- cross-validationA statistical method for evaluating how well a model generalizes by partitioning data into training and validation subsets (e.g., k-fold cross- validation).
- cyberneticsThe interdisciplinary study of control and communication in living beings and machines, focusing on feedback systems and self-regulation.
- DALL-EA transformer-based AI model developed by OpenAI that generates images from textual descriptions.
- dataThe basic unit of discrete values—facts, quantities, or observations—that has no intrinsic meaning until processed and analyzed.
- data analystA professional who collects, cleans, analyzes, visualizes, and interprets data to support decision-making.
- data anonymizationTechniques for removing or modifying personally identifiable information (PII) from data sets to protect privacy.
- data augmentationMethods used to increase the quantity or diversity of data by applying transformations (e.g., rotation, scaling) or combining data sources.
- data cleaningThe process of identifying and correcting or removing errors, inconsistencies, and inaccuracies in data prior to analysis.
- data collectionThe systematic gathering of information from various sources for analysis or model training.
- data consistencyEnsuring that data remains uniform and accurate across systems and over time.
- data custodianAn individual or team responsible for the safe storage, transfer, and administrative management of data.
- data debtThe accumulation of inefficiencies and issues in data systems over time, often due to legacy practices, that can hinder future data quality and governance.
- data driftThe gradual change in data characteristics over time that can lead to degraded model performance if not addressed.
- data ecosystemThe complete infrastructure—including hardware, software, processes, and personnel—that an organization uses to collect, store, process, and analyze its data.
- data engineerA professional who develops and manages data pipelines and architectures to ensure data is accessible and reliable.
- data engineeringThe discipline of designing, building, and maintaining systems for collecting, storing, and processing large volumes of data.
- data feedA method for delivering data from one or more sources to a system for processing or analysis.
- data governanceThe processes, procedures, standards, roles and tools an organization implements to ensure the organization’s data is properly stored, managed, accurate, available, secured, and properly controlled over the data’s lifecycle.
- data ingestionThe process of importing data from various sources into a storage or processing system.
- data integrationThe process of combining data from different sources into a unified view for analysis.
- data integrityThe assurance that data remains accurate, consistent, and reliable throughout its life cycle.
- data labelingThe process of adding descriptive tags or metadata to data—especially training data— to enable supervised learning.
- data lakeA centralized repository that stores large volumes of raw data in its native format until it is needed for analysis.
- data managementThe practice of organizing, storing, and maintaining data so that it can be efficiently accessed and analyzed.
- data miningThe process of discovering patterns and extracting insights from large data sets using statistical and computational techniques.
- data multiplicationTechniques used to increase the effective quantity of training data by transforming or augmenting existing data.
- data normalizationMethods for standardizing data values and formats to reduce redundancy and improve data integrity.
- data operations (DataOps)A set of practices and technologies for managing the data life cycle in an agile, automated manner to improve quality and speed.
- data pipelineA series of interconnected steps that transport data from source systems to target destinations, often involving extraction, transformation, and loading (ETL).
- data preparationAll steps—such as cleaning, transforming, and formatting—performed on raw data to make it suitable for analysis or machine learning.
- data privacyPractices and technologies aimed at protecting personal and sensitive information from unauthorized access or misuse.
- data quality managementThe ongoing process of monitoring and improving data to ensure it is accurate, complete, and reliable.
- data scienceThe interdisciplinary field that uses scientific methods, statistics, and algorithms to extract knowledge and insights from data.
- data science notebookAn interactive computing environment (such as Jupyter Notebook) that combines live code, visualizations, and narrative text for data exploration.
- data scientistA professional skilled in statistical analysis, machine learning, and domain expertise who extracts actionable insights from data.
- data securityMeasures taken to protect data against unauthorized access, corruption, or loss.
- data setA collection of related data items organized into a logical structure.
- data splittingDividing a data set into subsets (e.g., training, validation, test) for model development and evaluation.
- data stewardshipThe practice of ensuring an organization’s data is accessible, trustworthy, usable and secure.
- data storageThe systems and methods used to store digital data securely and accessibly.
- data transformationThe process of converting data from one format or structure to another, typically as part of an extraction, transformation, and loading (ETL) process.
- data visualizationThe use of graphical representations (charts, graphs, diagrams) to convey patterns and insights from data.
- data warehouseA centralized repository that aggregates data from multiple sources and is optimized for query and analysis using extraction, transformation, and loading (ETL) processes.
- data warehousingThe process of collecting, storing, and managing large volumes of data from different sources in a centralized repository (called a data warehouse) so it can be easily used for reporting, analysis, and decision- making.
- data wranglingThe process of cleaning, structuring, and enriching raw data into a desired format for analysis.
- databaseAn organized collection of electronic data stored in a structured format that enables efficient retrieval and management.
- deep learningAn algorithmic approach to machine learning that uses neural networks with multiple hidden layers to learn from data. Also referred to as deep learning neural networks. These models can learn complex patterns and representations, making them effective for tasks such as image and speech recognition, natural language processing, and predictive analytics.
- deterministicA system that always produces the same output given the same input.
- digital transformationThe integration of digital technology into all areas of an organization, fundamentally changing how it operates and delivers value.
- dimensionA measurable attribute or feature (such as age, income, location) used to describe data for analysis or in machine learning.
- dimensionality reductionTechniques that reduce the number of input variables in a data set while preserving the essential information.
- distributed file systemA file system that stores data across multiple servers or locations to improve accessibility and reliability.
- edge deviceA hardware device (e.g., sensor, smartphone, camera) that collects and processes data at the network’s edge, often with limited resources.
- encoder-decoder (neural network)A pair of neural networks where one encodes input data into a compact representation and the other decodes it back to its original form, used in unsupervised learning.
- ensemble modelsTechniques that combine predictions from multiple models trained on the same data to achieve improved accuracy and robustness.
- environment (reinforcement learning)The external system or context with which an agent interacts, providing states and rewards based on the agent’s actions.
- episode (reinforcement learning)A complete sequence of interactions (states, actions, rewards) between an agent and its environment, from start to finish.
- epochOne complete pass through the entire training data set during model training.
- exabyteA unit of digital information equal to one billion gigabytes.
- expert systemAn AI system that mimics the decision-making abilities of a human expert using a knowledge base and inference rules.
- explainable AI (XAI)AI systems designed to provide clear, understandable explanations for their predictions, thereby increasing trust and transparency.
- exploration vs. exploitation trade-offA reinforcement learning dilemma where an agent must decide between exploring new actions to discover higher rewards and exploiting known actions that yield high rewards.
- extract, transform, load (ETL)A process that extracts raw data from sources, transforms it (e.g., cleaning, deduplication) into a suitable format, and loads it into a target system for analysis.
- F1 scoreA metric that combines precision and recall into a single value (ranging from 0 to 1) to evaluate classification accuracy.
- featureAny measurable property or characteristic of data used as input for a predictive model.
- feature engineering/extractionTechniques to create, enhance, or select features from raw data to improve model performance.
- feature reductionThe process of decreasing the number of features in a data set to simplify the model and reduce training time while retaining essential information.
- feature selectionThe process of identifying the most relevant features from a data set for a given predictive task.
- federated learningA machine learning approach where a model is trained across multiple decentralized devices or servers while keeping data local to preserve privacy.
- feed-forward neural networkA basic neural network in which data flows in one direction—from input, through hidden layers, to output—without cycles.
- foundation modelsLarge-scale, pretrained models (often deep-learning-based models) focused on a general domain (e.g., language, vision) that can be fine- tuned for specific tasks.
- fuzzy logicA form of logic that allows reasoning with degrees of truth rather than binary true/false values, enabling the handling of uncertainty.
- Gaussian mixture model (GMM)A probabilistic model representing a data set as a mixture of multiple Gaussian distributions, used for clustering or classification.
- generalizationThe ability of a machine learning model to perform well on unseen data after training.
- generalization errorThe error a model makes when predicting on new, unseen data, reflecting its ability to generalize.
- generalization performanceAn overall measure of how well a model performs on data outside its training set.
- generative adversarial network (GAN)A deep learning framework where two neural networks (a generator and a discriminator) are trained simultaneously to generate realistic synthetic data.
- generative AI (GenAI)AI systems that create new data (e.g., text, images, music) based on patterns learned from existing data.
- GPT modelsA family of transformer-based language models (such as GPT-3 and GPT-4) developed by OpenAI that generate human-like text from short prompts.
- gradient descentAn optimization algorithm that iteratively adjusts model parameters by moving in the direction of the steepest decrease in the cost function.
- graphA visual diagram that represents relationships between variables, often used for data analysis.
- graph databaseA database that uses graph structures (nodes, edges, and properties) to store and query data based on relationships.
- graphical processing units (GPUs)Specialized hardware originally designed for graphics rendering, now widely used to accelerate machine learning model training and inference.
- ground truth dataData collected from real-world observations that serves as the definitive reference for training and validating models.
- HadoopAn open-source framework that enables distributed storage and processing of large data sets across clusters of computers.
- heuristicA practical, experience-based technique for problem-solving that may not guarantee an optimal solution but offers a quick approximation.
- hot path analyticsAn approach to data processing that focuses on real-time or near-real- time analysis, prioritizing speed over absolute accuracy.
- hyperpersonalizationA pattern of AI focused on creating and evolving an individualized profile for each user to provide highly tailored recommendations.
- ImageNetA large, publicly available repository of labeled images organized by the WordNet hierarchy, used for training and benchmarking computer vision models.
- inferenceThe process of using a trained model to make predictions or decisions on new, unseen data.
- internet minuteA metric measuring the volume of information created, processed, or transmitted on the internet in a 60-second interval.
- interpretable AISystems that offer insights into how a model arrived at its predictions, increasing transparency and trust.
- jailbreaksTechniques or methods used to bypass or circumvent the restrictions, safeguards, or guidelines built into AI models. These restrictions are typi- cally designed to prevent the AI from generating harmful, inappropriate, or unsafe content.
- JuliaA high-level, high-performance programming language designed for technical and scientific computing.
- JupyterAn open-source interactive computing environment that allows for the creation and sharing of documents containing live code, equations, visualizations, and narrative text.
- K-meansAn unsupervised algorithm that partitions data into K clusters by minimizing within-cluster variance.
- K-nearest neighbor (KNN)A simple, instance-based algorithm that classifies data points based on the majority label among their K closest neighbors.
- KaggleA platform for predictive modeling competitions and a community where data scientists collaborate and share knowledge.
- KerasAn open-source neural network library written in Python that provides a simple interface for building deep learning models, now integrated primarily with TensorFlow.
- kernel methodA technique used in algorithms such as support vector machines to map input data into a higher-dimensional space using kernel functions, enabling linear separation of nonlinear data.
- key performance indicator (KPI)A measurable value that indicates how effectively an organization or project is achieving its key objectives.
- large language models (LLMs)Deep learning models trained on massive amounts of text data, capable of generating, summarizing, and understanding human language.
- layer (network layer)A collection of nodes in a neural network that collectively transform input data into output signals; includes input, hidden, and output layers.
- lazy learningA machine learning approach where computation is deferred until a prediction is requested, rather than building a generalized model in advance.
- Lean methodologyA set of principles aimed at maximizing value while minimizing waste in business processes and project management.
- learning curveA graphical representation showing how a model’s performance improves with additional training data over time.
- learning rateA hyperparameter that determines the size of the steps taken during gradient descent, scaling the magnitude of weight updates.
- levels of autonomyA classification system that describes the degree of automation in a system, from level 0 (no automation) to level 5 (full automation).
- linear regressionA statistical method that models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to the data.
- long-short-term memory (LSTM)A type of recurrent neural network designed to capture long-term dependencies using mechanisms like input, output, and forget gates.
- loss functionA function that quantifies the error between predicted outputs and actual target values, guiding the optimization process.
- low codePlatforms that allow non-developers to build software applications with minimal or no coding.
- machine learning (ML)The ability of a machine to learn from data, improve with experience, and apply that learning to make predictions. It typically includes supervised, unsupervised, and reinforcement learning.
- machine learning operations (MLOps)Practices and tools for managing the life cycle of machine learning models, from development to deployment and monitoring, similar to DevOps for ML.
- machine learning-as-a-service (MLaaS)Cloud-based solutions that offer a range of machine learning services (such as data preprocessing, model training, and inference) on a subscription or usage-based model.
- machine translationThe use of AI to automatically translate text or speech from one language to another.
- malicious AIThe intentional use of artificial intelligence for unethical, dangerous, or criminal purposes.
- MapReduceA programming model that processes large data sets by partitioning tasks across multiple parallel systems and aggregating the results.
- Markov modelA stochastic model where the probability of each event depends only on the state attained in the previous event; a Markov chain is its simplest form.
- master data management (MDM)A process and set of tools used by organizations to create and maintain a single, consistent, and accurate view of their critical business data, such as customer, product, supplier, or employee information, across all systems and departments.
- megabyteA unit of data storage approximately equal to one million bytes.
- methodologyA defined set of processes and frameworks followed to achieve consistent, repeatable outcomes; for example, CPMAI for AI projects.
- microserviceAn architectural approach that decomposes a large application into small, loosely coupled services that communicate over a network. microservice “on demand.” An architectural design that breaks a large system into smaller, independent services that can be deployed and scaled individually based on demand.
- modelThe final output of training for a machine learning algorithm—a function that makes predictions on new data.
- model driftThe degradation in a model’s performance over time as the underlying data distribution changes.
- model retrainingThe process of updating a deployed model by retraining it on new data to maintain or improve its performance.
- model tuningThe process of adjusting a model’s hyperparameters and settings to optimize its performance and generalization.
- model validationThe evaluation of a model’s performance on a separate validation data set to ensure it generalizes well.
- multiclass classificationA classification task where data is assigned to one of more than two classes.
- naive BayesA family of simple probabilistic classifiers based on Bayes’ theorem with the “naive” assumption of feature independence.
- narrow AIAlso known as weak AI, referring to systems designed for specific tasks rather than general intelligence.
- natural language generation (NLG)The use of AI to automatically produce human-like text or speech from structured data.
- natural language processing (NLP)A field of AI that focuses on enabling machines to understand, interpret, and generate human language.
- natural language understanding (NLU)A subset of NLP that enables machines to comprehend human language, including its intent and context.
- neural networkA machine learning algorithmic approach that consists of layers of interconnected nodes or “neurons” that can be trained on input data. Neural networks are particularly useful for tasks like image recognition, speech processing, and natural language understanding.
- no codePlatforms that allow individuals with no coding experience to build software applications.
- nodeThe basic computational unit in a neural network that receives inputs, applies a transfer function, and produces an output.
- nondeterministicReferring to systems where the same input can yield different outputs due to inherent randomness.
- off-policy learning algorithmA reinforcement learning algorithm that evaluates and improves a policy different from the one currently used by the agent.
- on-policy learning algorithmA reinforcement learning algorithm that updates its policy based on the actions taken by the current policy.
- on-premiseInfrastructure, software, or systems hosted and managed within an organization’s own facilities rather than in the cloud.
- OpenAIAn organization dedicated to advancing artificial intelligence research, known for models such as GPT-3, GPT-4, and DALL-E.
- operationalizationThe process of deploying a machine learning model into a real-world environment for live predictions or inferences.
- optimizer (algorithm)An algorithm (e.g., Adam, SGD, RMSprop) used to adjust model parameters during training to minimize the loss function.
- overfittingA modeling error where a model learns the training data too well, including its noise, resulting in poor performance on new data.
- pattern recognitionThe process by which machine learning systems identify and learn patterns from data to make predictions or classifications.
- PerceptronOne of the earliest artificial neural network models, consisting of a single layer of neurons that laid the groundwork for later architectures.
- personalizationThe use of technology to tailor products, services, or content based on individual user characteristics or behavior.
- personally identifiable health information (PHI)Sensitive healthcare data that can uniquely identify an individual and requires special protection.
- personally identifiable information (PII)Data such as names, social security numbers, or addresses that uniquely identify an individual.
- predictionThe process of using a trained model to forecast an outcome based on new input data.
- predictive analyticsTechniques that use historical data to forecast future outcomes or trends.
- prescriptive/projective analyticsAnalytics that determine the potential impact of decisions, often answering “what if” scenarios.
- pretrained modelA machine learning model that has been previously trained on a large data set and can be fine-tuned for related tasks.
- principal component analysis (PCA)A dimensionality reduction technique that transforms a data set into a new set of uncorrelated variables (principal components) that capture most of the variance.
- prompt engineeringThe process of crafting and refining input prompts to optimize the performance of language models.
- pseudo AIProducts or companies that claim to use AI but rely primarily on human input or simple algorithms without genuine intelligence.
- PythonA popular, high-level programming language widely used for data science, machine learning, and general-purpose programming.
- Q-learningA reinforcement learning algorithm that learns the value of actions in a state without requiring a model of the environment, enabling it to handle stochastic transitions.
- RA programming language and environment widely used for statistical computing, data analysis, and visualization.
- random forestAn ensemble learning method that constructs multiple decision trees and aggregates their predictions for improved accuracy and robustness.
- real-time predictionThe generation of predictions instantly as new data is received, which is crucial for time-sensitive applications.
- receiver operating characteristic (ROC) curveA graph that plots the true positive rate against the false positive rate at various threshold settings to evaluate classifier performance.
- recognition systemsAI systems that identify and categorize patterns or objects within data such as facial or handwriting recognition.
- recommendation systemSystems that suggest products, services, or content to users based on their behavior and profile data.
- rectified linear unity (ReLU)A fast and simple nonlinear activation function defined as ReLU(x) = max(0, x), widely used in deep learning.
- recurrent neural network (RNN)A neural network designed for sequential data with loops that allow information to persist; includes variants such as long-short-term memory (LSTM).
- regressionA statistical method that models the relationship between input and output variables to predict continuous outcomes.
- reinforcement learning (RL)A type of machine learning where an agent learns to make decisions by interacting with an environment and receiving rewards.
- relational databaseA database that organizes data into tables (rows and columns) and is managed by a relational database management system (RDBMS).
- return on investment (ROI)A metric that measures the financial benefit of an investment relative to its cost.
- reward (reinforcement learning)The feedback provided to an agent after taking an action, which guides the learning process.
- roboadvisorAn automated, algorithm-driven service that provides financial planning and investment advice with minimal human intervention.
- robotA hardware or software system that performs tasks automatically on behalf of a human.
- robotic process automation (RPA)The use of software robots to automate repetitive tasks, often involving user interface interactions, either in attended or unattended modes.
- roboticsThe engineering discipline focused on designing, constructing, operating, and applying robots.
- scikit-learnA free, open-source machine learning library for Python that supports a wide range of algorithms and tools.
- self-supervised learningA machine learning approach that enables models to learn on their own from vast amounts of unlabeled data.
- semi-structured dataData that contains both a defined schema and elements of variability, falling between fully structured data (e.g., databases) and unstructured data (e.g., plain text). Examples include JSON and XML.
- sensitivity (recall)A performance measure that quantifies the proportion of actual positives correctly identified by a model.
- sensor fusionThe process of combining data from multiple sensors (such as LiDAR, radar, and cameras) to improve situational awareness.
- sentiment analysisThe use of machine learning and algorithms to identify and categorize opinions in text (or other data) as positive, negative, or neutral.
- seven patterns of AIA framework that groups AI applications into seven categories, including hyperpersonalization, autonomous systems, predictive analytics, conversational interactions, pattern recognition, recognition systems, and goal-driven systems.
- streaming predictionsThe process by which a model produces immediate, real-time predictions as data is received, rather than processing data in batches.
- structured dataData organized into a defined format with a schema, such as databases, tables, or spreadsheets.
- structured query language (SQL)A specialized programming language used for managing and querying relational databases.
- support vectorThe data points in an SVM that are closest to the decision boundary and determine the margin width.
- support vector machine (SVM)A supervised learning algorithm that classifies data by finding the optimal hyperplane that maximizes the margin between classes.
- symbolic approachesTechniques that rely on symbolic representations and logical inference rather than statistical methods.
- symbolic systemsAn approach to machine learning that uses explicit, human- understandable rules and logic for reasoning.
- synthetic dataArtificially generated data that mimics real-world data, used when actual data is scarce or sensitive.
- synthetic minority over-sampling technique (SMOTE)A technique in machine learning used to address class imbalance in datasets—especially when the number of instances in one class (typically the minority class) is much lower than in another (the majority class) social engineering. A manipulation technique that exploits human psychology to gain access to confidential information or systems.
- t-distributed stochastic neighbor embedding (t-SNE)A dimensionality reduction technique that projects high-dimensional data into 2D or 3D space while preserving local relationships.
- tensor processing units (TPUs)Specialized hardware developed by Google to accelerate the training and inference of machine learning models, particularly with TensorFlow.
- terabyteA unit of digital storage equal to approximately 1,000 gigabytes.
- test data set (holdout data set)A portion of data set aside from training and validation to verify a model’s performance on unseen data.
- the curse of dimensionalityThe phenomenon where increasing the number of features makes data sparse, requiring exponentially more data to achieve reliable modeling.
- tokenizationA preprocessing step in which input data (such as text) is split into smaller meaningful units (tokens), like words or phrases.
- training dataA data set of cleaned and labeled data used to train a machine learning model.
- transfer learningA technique that leverages a pretrained model (trained on a large, relevant data set) as a starting point for a new, related task.
- transformer models/transformer networkDeep learning architectures that process sequential data using self- attention mechanisms instead of recurrence, enabling efficient handling of long sequences.
- Turing testA test proposed by Alan Turing to determine whether a machine exhibits behavior indistinguishable from a human.
- unattended botsSoftware automation systems that operate in the background without human intervention, commonly used in robotic process automation (RPA).
- underfittingA modeling error in which a model is too simple to capture the underlying structure of the data, resulting in poor performance on both training and new data.
- unstructured dataData that lacks a predefined schema and is highly variable, such as images, videos, text, and emails.
- unsupervised learningA type of machine learning that identifies patterns in unlabeled data such as through clustering or dimensionality reduction.
- V’s of big dataThe defining characteristics of big data, typically including volume, velocity, variety, and veracity (and sometimes additional V’s).
- validation dataA subset of data set aside during model development to fine-tune and validate model performance.
- variance (model)The degree to which a model’s predictions vary for different subsets of the training data; high variance often leads to overfitting.
- varietyThe challenge of handling data in multiple formats, structures, and sources.
- vectorizationIn natural language processing (NLP), the process of converting words or phrases into numerical vectors that capture their meaning.
- vectorization and word embeddingA method in natural language processing (NLP) where words or phrases are mapped to high-dimensional vectors such that similar words are close together in vector space.
- velocityOne of the V’s of big data, referring to the speed at which data is generated and must be processed.
- veracityOne of the V’s of big data, concerning the accuracy, reliability, and trustworthiness of data.
- voice assistantA conversational system or device that uses natural language processing (NLP) to understand and respond to voice commands.
- volumeOne of the V’s of big data, referring to the massive amounts of data generated and stored by organizations. W, X, Y, Z waterfall. A sequential project management approach where each phase (requirements, design, coding, testing, deployment) must be completed before the next begins; typically contrasted with adaptive or agile approaches. Alternatively, the term “predictive” has been used to describe project management approaches that follow a waterfall-style approach.
- weightsFundamental parameters in a neural network that determine the strength of connections between neurons, adjusted during training.
- Yet Another Resource Negotiator (YARN)A core component of Hadoop that manages and monitors cluster resources and schedules jobs across nodes.
- yottabyteA unit of digital storage equal to 1,000 zettabytes, one billion petabytes, or one quadrillion gigabytes.
- zero-day discoveryA vulnerability in software or a system found before the developer knows about it and can create defenses for.
- zettabyteA unit of digital storage approximately equal to 1,000 exabytes or one billion terabytes. 4 4D task. Work or activities that are characterized by four attributes of what is more suitable for machines to do versus people, Tasks that are dull, dear, dangerous/dirty, and/or demeaning. These tasks are considered undesirable due to their monotonous and uninteresting nature (dull), high costs or resource requirements (dear), risks to health or safety and unpleasant working conditions (dangerous/dirty), and their potential to be humiliating or degrading (demeaning). Visit us at PMI.org Stay Connected © 2025 Project Management Institute, Inc. All rights reserved.