MDM Glossary for Stakeholder Understanding
A specific piece of data representing a particular aspect of an entity, such as a product's colour or a customer's address.
Augmented Data Modeling:
An approach that uses AI and machine learning techniques to assist in the creation and management of data models.
Large and complex data sets that traditional data processing software can't manage. Often includes data from social media, IoT devices, and more.
A design pattern used to communicate between different data formats where a common data format is defined to which all other data formats are mapped.
Change Data Capture (CDC):
A technique used to track and capture changes in data, such as inserts, updates, and deletes, enabling synchronization between databases or data warehouses.
A feature in CluedIn that cleans data by fixing inaccuracies, filling in missing values, and ensuring data consistency.
A feature in CluedIn that enhances data by adding more context, details, or attributes from external sources.
CluedIn's capability to help users discover new insights, relationships, and patterns within their data.
Centralized MDM involves establishing a centralized repository for master data while allowing individual systems to maintain local copies for operational purposes.
Co-existence MDM recognizes that different domains or business units within an organization may have unique requirements for managing master data.
Consolidated MDM, also known as Analytical MDM, focuses on integrating and harmonizing master data from multiple sources into a single, authoritative view.
A centralized repository that allows for efficient data management and data discovery, often containing metadata, data profiles, and data lineage.
Data Cleansing (or Data Cleaning):
The process of identifying and rectifying (or removing) errors and inconsistencies in data to improve its quality.
The process of integrating and unifying data from various sources into a centralized data repository to achieve a single, accurate, and consistent view of master data. This process involves cleaning, transforming, and rationalizing data to ensure quality and consistency, thereby supporting better decision-making and compliance with governance standards.
A centralized repository of metadata definitions used to describe the structure, type, and usage of data within a database, data warehouse, or data lake.
A category of data, such as customer, product, or supplier data.
The overall management of data availability, usability, integrity, and security. It involves a set of processes, roles, policies, and metrics to ensure the effective and efficient use of information.
The process of combining data from different sources, providing a unified view or dataset.
A storage repository that holds a vast amount of raw data in its native format until it's needed, providing more flexible data processing options compared to traditional databases.
The stages through which data passes, typically characterized as creation, storage, use, sharing, archiving, and deletion.
The process of tracking data from its source to its final form, including every touchpoint and transformation in between. It helps in understanding how data flows and transforms across systems.
The process of transferring data from one system to another, which may involve transforming the data to a new format.
An abstract representation of organizational data, often represented visually.
The management of data to ensure the protection of personal information, complying with regulations such as GDPR or CCPA.
The process of examining data to gather statistics and information about its quality and the nature of the information contained.
A measure of the condition of data based on factors such as accuracy, completeness, reliability, and relevance.
A repository of fixed data that remains under the control of one department or team and is isolated from the rest of the organization, leading to disjointed data management.
An individual responsible for managing and maintaining specific data in an organization.
The management and oversight of an organization's data assets to provide business users with high-quality data that is easily accessible in a consistent manner.
The ability to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted or where it is physically located.
A large, centralized database that integrates data from different sources, making it accessible for business intelligence activities.
A design technique used in data warehousing to separate measurement data (facts) from descriptive data (dimensions), facilitating the understanding of data and improving query performance.
The process of identifying duplicate records within a dataset.
ELT (Extract, Load, Transform):
ELT is a data integration approach where data is extracted from source systems and loaded into a target database or data warehouse; only after loading is the data transformed to meet operational or analytical requirements.
ETL (Extract, Transform, Load):
ETL is a data integration process where data is extracted from source systems, then transformed to meet specific operational needs or to comply with certain standards, and finally loaded into a target database or data warehouse.
A distinct object (like a customer, product, or location) about which data is stored.
An approach where master data is managed across multiple systems and processes but can be accessed and used in a unified manner.
The most trusted version of a record in a system. In MDM, it refers to the most accurate and complete version of a data entity.
A database that uses graph structures to represent and store data flexibly, without restricting it to an existing model, such that the connections between the data are as important as the data itself.
Hierarchy Management: The process of managing and organizing data in a hierarchical structure, such as product categories or organizational charts.
The process of managing and organizing data in a hierarchical structure, such as product categories or organizational charts.
Hub and Spoke Model:
A type of MDM architecture where the MDM solution (hub) is centrally connected to different data sources (spokes).
The process of ensuring that an entity (like a customer) is consistently and uniquely identified across different datasets.
The consistent and uniform set of identifiers and extended attributes that describe the core entities of an enterprise.
Match & Merge:
The process of identifying duplicate records and merging them into a single, most accurate version.
Data that describes other data. It provides information about a certain item's content.
The management of data that describes other data, ensuring consistency, accessibility, and proper usage of metadata across the organization.
Modern MDM (Master Data Management):
An evolved approach to MDM that leverages new technologies and methodologies to manage master data. Modern MDM often incorporates:
- Real-Time Processing: Data is processed as it's generated or received, allowing for real-time insights and actions.
- Schema-On-Read: Data schema is defined when it's read, offering flexibility in handling diverse data sources.
- Decentralized Architecture: Data can be managed and accessed across multiple systems, often leveraging cloud technologies.
- Collaborative Implementation: A more inclusive approach where business and IT teams collaborate closely, ensuring the MDM solution meets real-world needs.
- Flexible Data Models: Adaptable data models that can evolve with business changes.
- AI and Machine Learning: Leveraging AI technologies to improve data quality, match/merge processes, and derive insights.
A database design technique that reduces redundancy and dependency by organizing data into separate tables based on logical relationships.
A data model that defines the structure of data, including the types of data, the relationships between them, and the rules governing data.
Data that is generated as a result of day-to-day business operations.
Data used to classify or categorize other data, such as country codes or product codes.
An MDM approach where the master data is stored as a reference or index, rather than consolidating data into a physical master record.
The ability to adapt the schema of a database as the model or requirements change over time.
The process of creating associations or mappings between two or more data models.
Ensuring that data is consistently categorized and labelled across the organization.
Single View (or 360-Degree View):
A holistic, single view of an entity (like a customer) derived from data consolidated across various sources.
A software system that is used as the source of data for the MDM system.
Data that is organized in a specific manner or model, often in relational databases, making it easily searchable.
Rules that determine which data attributes will be chosen in the final master record during the match and merge process.
The process of distributing and sharing master data across different IT systems and departments.
Traditional MDM (Master Data Management):
A data management approach that centralizes, cleanses, and synchronizes a company's master data across systems and applications. Traditional MDM often relies on:
- Batch Processing: Data is processed in large batches rather than in real-time or near-real-time.
- Schema-On-Write: Data schema is defined before data ingestion.
- Centralized Architecture: A single, central repository where master data is stored and managed.
- Top-Down Implementation: MDM solutions are often driven by IT departments, with business stakeholders providing requirements.
- Rigid Data Models: Predefined data models that might not easily adapt to changing business needs.
Data related to business transactions, such as purchases. Unlike master data, which remains fairly consistent, transactional data changes frequently.
Unified Data Model:
A framework for representing data objects and their relationships in a single, consistent structure.
Data that doesn't have a specific form or structure, such as emails, social media posts, and documents.
The process of tracking and managing changes to data over time.
An approach in modern MDM solutions that allows for data to be ingested in its raw form without predefined models, and then makes sense of it over time.