Impact of enabling TSM Deduplication on TSM database and storagepools

Q1. What is TSM deduplication?

  • It is an optional TSM feature that removes redundant data from a disk-based TSM storage pool. Reducing the amount of backup* data can reduce the cost of storage associated with backup and may allow more data to be stored on disk for faster access.
  • It is important to consider that deduplication is just one method for data reduction. TSM also uses a progressive incremental backup methodology, which only backs up changed data, and supports client-side compression. Additionally, TSM allows exclusion of individual files from backup operations, which further reduces the data involved in these operations.
Q2. How effective is TSM deduplication?

  • Deduplication effectiveness is usually measured in terms of the ratio of the amount of data before deduplication to the amount of data after deduplication, called the "deduplication ratio". It can also be expressed as a percentage of data reduction. However, what is more important than just deduplication is the total data reduction of the backup data, which for TSM can include progressive incremental, deduplication, and optionally, compression.
  • TSM deduplication is as effective as any deduplication technology that is available on the market. Deduplication effectiveness is mostly determined by the type of data that is being backed up, and whether the data is unique or repeated. For example, repeated full backups of the same data results in high deduplication ratios, but backing up only changed data (such as with the progressive incremental methodology) results in a lower deduplication ratio. However, with progressive incremental backups the overall data reduction ratio will remain high. Data that is very unique and not backed up repeatedly will typically not benefit from deduplication.
  • TSM deduplication ratios typically range from 2:1 (50% reduction) to 15:1 (93% reduction), and is data dependent. Lower ratios are associated with backups of unique data, and higher ratios are associated with backups that are repeated, such as repeated full backups of databases or virtual machine images. Mixtures of unique and repeated data will result in ratios within that range. If you are not sure of what type of data you have and how well it will reduce, use 3:1 for planning purposes when comparing with non-deduplicated TSM storage pool occupancy. This ratio corresponds to an overall data reduction ratio of over 15:1 when factoring in the data reduction benefits of progressive incremental backups.
Q3. How does TSM deduplication affect backup and restore performance?

  • With client-side deduplication, client backup elapsed times can be longer compared to backups to a disk storage pool that is not deduplicated. However when the backup network is constrained, backup elapsed times can be faster when using client-side deduplication. The use of server-side deduplication does not directly affect backup throughput.
  • Throughput for storage pool backup operations from a deduplicated storage pool to a storage pool that is not deduplicated is slower when compared to backing up a storage pool that is not deduplicated.
  • Restore throughput from a deduplicated storage pool is generally slower when compared to restore from a disk based storage pool that is not deduplicated. However, when compared to restore performance from physical tape, restore from a disk-based deduplicated storage pool can be much faster.
Q4. What are the hardware prerequisites for using TSM deduplication?
Here are some circumstances when you should consider using client-side deduplication:
  • You wish to achieve the highest potential data reduction, since client-side deduplication can be combined with compression.
  • You wish to distribute the workload across client systems rather than perform deduplication processing in the TSM server.
  • Bandwidth between the client and server is constrained.
Here are some circumstances when you should consider using server-side deduplication:
  • The fastest possible backup time is required to meet service-level agreements.
  • You require the shortest possible window for producing non-deduplicating storage pool copies (such as for shipping offsite).
  • CPU resources on the client host system are inadequate to support the additional processing required by client-side deduplication during scheduled backup processing.
Q5. How do I decide between using TSM's server-side or client-side deduplication?
Here are some circumstances when you should consider using TSM client-side deduplication:
  • You wish to achieve the highest potential data reduction, since client-side deduplication can be combined with compression.
  • You wish to distribute the workload across client systems rather than perform deduplication processing in the TSM server.
  • Bandwidth between the client and server is constrained.
Also Read: Difference between Server-side and Client-side Deduplication
Here are some circumstances when you should consider using server-side deduplication:
  • The fastest possible backup time is required to meet service-level agreements.
  • You require the shortest possible window for producing non-deduplicating storage pool copies (such as for shipping offsite).
  • CPU resources on the client host system are inadequate to support the additional processing required by client-side deduplication during scheduled backup processing.
Q6. Which TSM features and options are incompatible or not supported with deduplication?

  • Client-side encryption is incompatible with TSM deduplication. However TSM deduplication can be used together with SSL (encryption of data in flight) or encryption by the storage device.
  • LAN-free backup is not supported for client-side deduplication. However LAN-free backup can be used with server-side deduplication.
  • Simultaneous write
  • Subfile backup
  • UNIX HSM
  • Client side compression should not be used with server-side deduplication (since compressed objects do not deduplicate well). However, client-side compression used in conjunction with client-side deduplication can provide an effective means to further reduce storage pool data.
Q7. How do I determine how much storage I have saved by using TSM deduplication?
The easiest way to determine deduplication storage savings is to use the administrator command "query stgpool f=d".

Also Read: TSM Administrator Daily routine tasks

The value of the "Duplicate data not stored" field will show the amount of bytes saved and the percentage of savings. Note that this value is not updated until after reclamation processing occurs for server-side deduplication.

1 Response to "Impact of enabling TSM Deduplication on TSM database and storagepools"

  1. How do I determine how much storage Management I have saved by using TSM deduplication? and its helpful or not?
    Storage Mangement

    ReplyDelete