The term «data pipeline» refers to a series of processes that gather raw data and transform it into an format that can be utilized by software applications. Pipelines can be batch-based or real-time. They can be implemented in the cloud or on-premises and their software is commercial dataroomsystems.info/should-i-trust-a-secure-online-data-room/ or open source.
Data pipelines are similar to physical pipelines that bring water from a river to your home. They move data from one layer into another (data lakes or warehouses) similar to how the physical pipe transports water from the river to your home. This enables analysis and insights from the data. In the past, transfer of the data required manual processes like daily uploads of files and long wait time for insights. Data pipelines are a replacement for these manual procedures and allow organizations to transfer data between layers more efficiently and with less risk.
Accelerate development by using a virtual data pipeline
A virtual pipeline for data provides significant savings on infrastructure costs in terms of storage costs in the datacenter and remote offices and also hardware, network and management costs associated with deploying non production environments such as test environments. Automation of data refresh, masking, and access control by role as well as the ability to customize and integrate databases, can reduce time.
IBM InfoSphere Virtual Data Pipeline (VDP) is a multi-cloud copy management solution that separates development and test environments from production infrastructures. It uses patented snapshot and changed-block tracking technology to capture application-consistent copies of databases and other files. Users can instantly provide masked, fast virtual copies of databases from VDP to VMs and mount them in non-production environments to begin testing within minutes. This is particularly useful for accelerating DevOps agile methods, agile methodologies and speeding the time to market.