In the past few years, the volumes of data used by organizations has grown exponentially. Let’s face it: it has become vital for organizations to understand their data sets and organize them. In parallel with this phenomenon, a new series of data integration disciplines have developed with a whole new jargon: data migration, data warehousing, data synchronization, big data and business intelligence (just to name a few) have become such a hype. I’ve been wanting to put together a glossary of the most common data integration terms for a while... Here it is.
Data integration is the action of combining data from different sources and providing users with a unified view of these data. Data integration has a very commercial use for companies that want to integrate different systems (for example a CRM and an ERP) to avoid silos of information, and it has become a common expression today, but in reality it isn’t that old. The first data integration system driven by structured metadata was designed at the University of Minnesota in 1991, for the Integrated Public Use Microdata Series.
Data Migration is the process of transferring data from one system to another while changing the storage, database or application. Organizations typically use data migration to migrate to or from hardware platform, upgrade a database or migrate to a new software. Another typical scenario is when two companies merge and need to merge their parallel systems into one.
Data synchronization is about establishing consistency among data from a source to a target data storage and vice versa and continuously harmonizing this data over time. Organizations typically use data synchronization to gain consistency between their different systems. Generally, data synchronization should not be considered a “one-off” task and a continuous effort are required to maintain consistency.
Data Warehousing is a system used for reporting and analyzing data and is considered a core component of business intelligence. Data Warehousing Systems are central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating analytical reports (for example, annual or quarterly comparisons and trends, daily sales analysis, etc.) for users throughout an organization.
ETL stands for Extract-Transform-Load and refers to the process in which data is loaded from a source system to a data warehouse system in three steps:
- Step 1 Extract: In this step, data is extracted from the source system and made accessible for further processing.
- Step 2 Transform: This step transforms the data from the source to the target according to a set of rules. This includes converting any measured data to the same dimension using the same units so that they can later be joined.
- Step 3 Load: This step loads the data into the end target, which can be a file, a database or a data warehouse.
Data visualization provides a visual representation of data. The goal here is to communicate information clearly and efficiently via statistical graphics, plots, and information graphics.
Data cleansing (data cleaning is also used) is the process of detecting and correcting/removing corrupt or inaccurate records from a record set, table, or database. The data cleansing process often consists of identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.
Data cleansing should be a focus of any organization as a system that does not contain data that can be trusted will not be used. Also, it is a necessary step before any system integration project.
Data harmonization consists in combining data from different sources and providing users with a comparable view of data from different studies. Thanks to data harmonization, organizations can combine data from heterogeneous sources into integrated, consistent and unambiguous information products.
Data standardization is the process of bringing data into a common format that allows for collaborative research, large-scale analytics, and sharing of sophisticated tools and methodologies.
Master Data Management
Master Data Management is the process of identifying the most critical information within an organization and linking all of these data into one single source, called a master file that provides a common point of reference. As a result, Master Data Management streamlines data sharing among personnel and departments, facilitates the use of different system architectures, platforms, and applications. Master Data Management helps organizations solve the challenges they are experiencing around consistent reporting, regulatory compliance, strong interest in Service-Oriented Architecture (SOA), and Software as a Service (SaaS).
Big data is significantly changing the way people within organizations work together, giving employees deeper insight to make better decisions. Big data is a term that describes a large volume of data that is so large or complex that traditional data processing applications are inadequate to deal with them and the data becomes difficult to analyze, capture, search, share, store, transfer, visualize, or update. One of the challenges is that today, big data is being generated by everything around us at all times. It is produced by diverse digital processes, social media exchanges, but also systems, sensors and mobile devices. These increasing volumes make it difficult to extract meaningful value from the generated data.
Any successfull big data project has to include big data integration, which includes:
- Finding, discovering information sources.
- Profiling information.
- Understanding the data and the value of the data.
- Tracking it through metadata.
- Analyzing and improving data quality.
- Transforming data into a form that can be used in big data analysis.
In a time where data integration is more and more performed on-demand, I could not avoid including SSL encryption to this list. SSL (Secure Sockets Layer) is the standard security technology for establishing an encrypted link between a web server and a browser. At RapidiOnline, we have specialized in providing safe and secure data integration solutions by using SSL encrypted data transmission without storing data outside the systems we integrate (for example Salesforce and Microsoft Dynamics NAV). You can have a look at our technology and our integration solutions to learn more.
This list is not comprehensive, and I might have missed a couple, but it is a start. Do feel free to drop me a comment and add the data integration term(s) you believe should be a part of this list. I’d be happy to include them in an update.
See how get started with your data integration project in 10 easy steps: