Difference between Server-side and Client-side Deduplication

Tivoli Storage Manager provides two options for performing data deduplication - Server-side data deduplication and Client-side data deduplication. Both methods use the same process to identify redundant data, however the time and location of the deduplication processing is different. In server-side data deduplication, processing takes place exclusively on the server after the data is backed up. In client-side data deduplication, the processing is distributed between the server and the backup-archive client during the backup process.

When restoring or retrieving files, the client node queries for and displays files as it typically does. If a user selects a file that exists in a deduplicated storage pool, the server reconstructs the file before restoring it to client machine.

What is Data Deduplication ?

In addition to whole files, IBM Tivoli Storage Manager can also deduplicate parts of files that are common with parts of other files. Data becomes eligible for duplicate identification as volumes in the storage pool are filled. A volume does not have to be full before duplicate identification starts.

Also Read: TSM Storage Pool Concepts (V7 Revised)

The ability to deduplicate data on either the backup-archive client or the server provides flexibility in terms of resource utilization, policy management, and security. You can also combine both client-side and server-side data deduplication in the same production environment. For example, you can specify certain nodes for client-side data deduplication and certain nodes for server-side data deduplication. You can store the data for both sets of nodes in the same deduplicated storage pool.

Backup-archive clients that can deduplicate data can also access data that was deduplicated by server-side processes. Similarly, data that was deduplicated by client-side processes can be accessed by the server. Furthermore, duplicate data can be identified across objects regardless of whether the data deduplication is performed on the client or the server.

 Server-Side Data Deduplication

Server-side data deduplication is a two-phase process. In the first phase, the server identifies duplicate data. In the second phase, duplicate data is removed by certain server processes. Duplicate data is removed by one of the following processes
  • Reclaiming volumes in the primary storage pool, copy storage pool, or active-data pool.
  • Backing up a primary storage pool to a copy storage pool that is also set up for data deduplication.
  • Copying active data in the primary storage pool to an active-data pool that is also set up for data deduplication.
  • Migrating data from the primary storage pool to another primary storage pool that is also set up for data deduplication.
  • Moving data from the primary storage pool to a different primary storage pool that is also set up for data deduplication.
  • Moving data within the same copy storage pool or moving data within the same active-data pool.
Also Read: Different types of Incremental Backups

Client-Side Data Deduplication 

In client-side data deduplication, the backup-archive client and the server work together to identify duplicate data.

Client-side data deduplication is a three-phase process:
  • The client creates extents.
  • The client and server work together to identify duplicate extents.
  • The client sends non-duplicate extents to the server.
Subsequent client data-deduplication operations create new extents. Some or all of those extents might match the extents that were created in previous data-deduplication operations and sent to the server. Matching extents are not sent to the server again.

When configuring client-side data deduplication, the following requirements must be met:
  • The client and server must be at version 6.2.0 or later. The latest maintenance version should always be used.
  • When a client backs up or archives a file, the data is written to the primary storage pool that is specified by the copy group of the management class that is bound to the data. To deduplicate the client data, the primary storage pool must be a sequential-access disk (FILE) storage pool that is enabled for data deduplication.
  • The value of the DEDUPLICATION option on the client must be set to YES. You can set the DEDUPLICATION option in the client options file, in the preference editor of the TSM client GUI, or in the client option set on the Tivoli Storage Manager server. Use the DEFINE CLIENTOPT command to set the DEDUPLICATION option in a client option set. To prevent the client from overriding the value in the client option set, specify FORCE=YES.
  • Client-side data deduplication must be enabled on the server. To enable client-side data deduplication, use the DEDUPLICATION parameter on the REGISTER NODE or UPDATE NODE server command. Set the value of the parameter to CLIENTORSERVER.
         register node <nodename> <nodepasswd> domain=<domainname> deduplication=clientorserver
  • Ensure files on the client are not excluded from client-side data deduplication processing. By default, all files are included. You can optionally exclude specific files from client-side data deduplication with the exclude.dedup client option.
  • Files on the client must not be encrypted. Encrypted files and files from encrypted file systems cannot be deduplicated.
  • Files must be larger than 2 KB and transactions must be below the value that is specified by the CLIENTDEDUPTXNLIMIT option. Files that are 2 KB or smaller are not deduplicated.
Also Read: What is TSM Policy Management ?

Advantages of Client-Side Data Deduplication

  • It can reduce the amount of data that is sent over the local area network (LAN).
  • The processing power that is required to identify duplicate data is offloaded from the server to client nodes. 
  • Server-side data deduplication is always enabled for deduplication-enabled storage pools. However, files that are in the deduplication-enabled storage pools and that were deduplicated by the client, do not require additional processing. The processing power that is required to remove duplicate data on the server is eliminated, allowing space savings on the server to occur immediately.
The possible disadvantage of Client side deduplication is that the server does not have whole copies of client files until you back up the primary storage pools that contain client extents to a non-deduplicated copy storage pool. (Extents are parts of a file that are created during the data-deduplication process.) During storage pool backup to a non-deduplicated storage pool, client extents are reassembled into contiguous files.

By default, primary sequential-access storage pools that are set up for data deduplication must be backed up to non-deduplicated copy storage pools before they can be reclaimed and before duplicate data can be removed. The default ensures that the server has copies of whole files at all times, in either a primary storage pool or a copy storage pool.

0 Comment to "Difference between Server-side and Client-side Deduplication "

Post a Comment