Optimizing Backup Performance Using Data Science Techniques


One of the most important tasks for a database administrator is taking (and testing!) backups. As databases get larger and larger, the amount of time it takes to perform a backup can grow as well, to the point where your backups take longer than your available backup window. There are several settings we can use to optimize backup performance, such as buffer counts, maximum transfer size, and the number of files, but trying every combination of settings on a single production-sized database could take weeks or even months. In this talk, we will apply data science techniques to the problem of backup settings optimization and look at different models for approaching the problem and analyzing data. Some statistics background would be helpful, but is not required; the big requirement is a desire to speed up backups.



I presented this for the Microsoft Data Platform Business Continuity Virtual Group on 2023-04-11.