Working Effectively With Legacy SQL

Kevin Feasel (@feaselkl)

http://CSmore.info/on/legacysql

Who Am I? What Am I Doing Here?

Catallaxy Services
@feaselkl
Curated SQL
We Speak Linux

What Is Legacy Code?

Neutral connotations:

  • Code I inherited
  • Older code I need to maintain

Negative connotations:

  • Code without unit tests (Michael Feathers)
  • "Rotten" code (not well maintained)
  • Code stuck on an older/obsolete platform

My working definition is "rotten" code: code which has not been maintained very well. This code often lacks tests, but the key characteristic is the amount of pain necessary to make changes.

Motivation

Legacy SQL will de-motivate you over time. You will spend too much time debugging terrible procedures. In order to fulfill even simple business requests, you will change lots of difficult code, including code you don't even know is obsolete. Even after you leave, your successor will curse your name.

Older code doesn't need to be legacy code. Older code does need TLC, however.

Refactoring Legacy Code

  1. Prerequisites
  2. SQL Formatting
  3. Simplify Your Code
  4. Simplify Your Design
  5. Remove Obsolete Code
  6. Modernize Your Code
  7. Improve Durability and Visibility

Prerequisites

Source Control

You need source control for your SQL code. Source control gives you the ability safely to experiment. If your experiment fails, roll back your changes.

Use a distributed source control system like Git or Mercurial, or a centralized system like SVN or Team Foundation Version Control. What you use is less important than simply having something.

Prerequisites

Repeatable, Working Tests

Object-oriented languages have fully-featured test frameworks like NUnit to create unit tests. Developers can break apart small sections of functionality and test those sections controlling for that code's depdendencies.

T-SQL has a very limited version of database unit and integration testing. The most popular tools for these kinds of tests are tSQLt and Visual Studio database test projects.

Refactoring Legacy Code

  1. Prerequisites
  2. SQL Formatting
  3. Simplify Your Code
  4. Simplify Your Design
  5. Remove Obsolete Code
  6. Modernize Your Code
  7. Improve Durability and Visibility

Format Your SQL

Formatting SQL is the lowest-risk form of refactoring, as you aren't making any logical code changes. If you format your SQL well, however, you can make your code more readable. Readable code is (potentially) understandable code.

Code is for humans as much as it is for compilers and interpreters!

Format Your SQL

Demo Time

Format Your SQL

Benefits

  • Lower risk of bugs due to misunderstanding code
  • Lower risk of bugs due to forgetfulness (e.g., nested IF statements)
  • Faster and easier maintenance
  • The process helps you understand the code better
  • Formatting holy wars (okay, not a benefit)

Format Your SQL

Basic Recommendations

  • Consistent casing for keywords and attributes
  • Use capital letters (LEFT OUTER JOIN) to offset keywords from attributes
  • Consistent indentation levels
  • One identifier per line
  • Use spacing to clarify intent--be flexible but (generally) consistent
  • Save time by using a tool

Refactoring Legacy Code

  1. Prerequisites
  2. SQL Formatting
  3. Simplify Your Code
  4. Simplify Your Design
  5. Remove Obsolete Code
  6. Modernize Your Code
  7. Improve Durability and Visibility

Simplify Your Code

Simple is better than complex.
Complex is better than complicated.

A few examples of complicated code:

  • Cursors and WHILE loops generating and executing dynamic SQL
  • Chains of stored procedures with temp tables holding state
  • Unrepeatable processes which kick off other processes which...
  • Deeply nested IF statements

Simplify Your Code

Tests help us simplify code because we can modify code inside a procedure while knowing that we maintain the same contract: given a particular set of inputs, we generate a particular set of outputs.

We will now look at a series of methods to simplify code.

  • Use the APPLY operator to simplify calculations
  • Use static SQL in place of dynamic SQL
  • Turn cursors and WHILE loops into set-based statements
  • Simplify filters
  • Simplify overall procedure design

Simplify Your Code

Simplify Calculations Using the APPLY Operator

We have a fact table and we need to calculate measures based on the values in the fact table.

Simplify Your Code

Simplify Calculations Using the APPLY Operator

Demo Time

Simplify Your Code

Simplify Calculations Using the APPLY Operator

The results are the same and the execution plans are almost the same, but the version with the APPLY operator is easier to read and easier to maintain.

With good database tests in place, this kind of change is extremely low-risk but potentially high-reward.

Simplify Your Code

Use Static SQL

Dynamic SQL is sometimes necessary, but using it when you don't need to complicates things needlessly.

The problem with dynamic SQL is that it's harder to read (red text), does not provide syntax checking or Intellisense, and is harder to format: you either format the dynamic SQL or the procedure's static SQL, but you can't get both.

Simplify Your Code

Use Static SQL

Demo Time

Simplify Your Code

Avoid RBAR Operations

Cursors and WHILE loops are examples of RBAR (Row By Agonizing Row). Not only do they have a negative performance impact, they also have the ability to make code needlessly complicated.

Simplify Your Code

Avoid RBAR Operations

Demo Time

Simplify Your Code

Avoid RBAR Operations

Ways in which RBAR complicates our code:

  • Need to keep track of more variables
  • Queries end up taking more lines of code
  • Cursors especially have extra syntactical cruft which draws attention from important code
  • We often need extra temp tables or table variables to store intermediate results

Cursors and WHILE loops may sometimes be necessary, but ditching them when they're unnecessary will simplify your code base and improve performance to boot.

Simplify Your Code

Simplify Filters

Look for "filters" (in the WHERE or SELECT clauses) which are needlessly complicated.

Simplify Your Code

Simplify Filters

Demo Time

Simplify Your Code

Simplify Filters

In this scenario, our query became a little less readable for a complete neophyte, but much shorter and easier to read overall. Even a relatively new T-SQL developer can run this code and quickly learn a new trick.

Look for walls of text, duplicated statements, and logically equivalent values in your code. Another example: unnecessary nested CASE statements.

Simplify Your Code

Simplify Overall Procedure Design

Here is a grab bag of procedure design tips:

  • Eliminate unnecessary temporary tables
  • Eliminate unnecessary code; use your source control system!
  • Make sure your variables and aliases are clear and make sense
  • Take advantage of language constructs

Simplify Your Code

Simplify Overall Procedure Design

Demo Time

Refactoring Legacy Code

  1. Prerequisites
  2. SQL Formatting
  3. Simplify Your Code
  4. Simplify Your Design
  5. Remove Obsolete Code
  6. Modernize Your Code
  7. Improve Durability and Visibility

Simplify Your Design

Once we have solid tests in place, we can look at refactoring entire designs instead of simply refactoring code within a single procedure.

This type of scenario will be most common if you have nested stored procedures. Then, treat the internal procedures as the equivalent of private methods and refactor around a contract.

Retain your outward-facing procedures' input and output signatures, and focus mostly on refactoring those "private" procedures.

Simplify Your Design

Summary of Issues:

  1. CostLog table serves multiple, independent purposes
  2. GetforService is not idempotent
  3. Get procedure is obsolete
  4. Insert procedure "weird"
  5. Garbage collection procedure doesn't really GC
  6. Column named "GUID"
  7. Surrogate date field (JDate)
  8. SendAlerts attribute unused

Simplify Your Design

Summary of Changes:

  1. CostLog split into two tables
  2. All Get procedures made idempotent
  3. Unused procedures dropped
  4. Insert procedure handled by external code
  5. All deleting handled by calling code
  6. "GUID" renamed to "FileName"
  7. Real date field used
  8. SendAlerts attribute dropped

Simplify Your Design

Step-by-Step Process:

  1. Get list of affected objects
  2. Generate high-level diagram
  3. Document existing design
  4. Incremental changes on paper
  5. Collaborate: consumers and peers
  6. Iterate, iterate, iterate
  7. Simplify code in the new design

Refactoring Legacy Code

  1. Prerequisites
  2. SQL Formatting
  3. Simplify Your Code
  4. Simplify Your Design
  5. Remove Obsolete Code
  6. Modernize Your Code
  7. Improve Durability and Visibility

Remove Obsolete Code

Code is a liability: every line of code requires somebody to maintain it; code takes time to parse, compile, interpret, or run; and more mental overhead for developers.

If that line of code is not solving a problem, get rid of it!

Example: eliminating 20% of the code base with zero effect on users

Remove Obsolete Code

Demo: Obsolete Code Inside A Procedure

Remove Obsolete Code

We tend to hoard code, saving things "just in case." Instead of that, use source control and let a server hoard behind the scenes. Be aggressive in cleaning up code in comments or which is logically impossible.

Remove Obsolete Code

Finding obsolete code:

  • Plan cache and procedure cache
  • SQL Agent jobs
  • Searches in source control
  • SSIS packages
  • sys.sql_modules

Refactoring Legacy Code

  1. Prerequisites
  2. SQL Formatting
  3. Simplify Your Code
  4. Simplify Your Design
  5. Remove Obsolete Code
  6. Modernize Your Code
  7. Improve Durability and Visibility

Modernize Your Code

Modernization On The Small

Don't forget to take advantage of new functionality in SQL Server: window functions instead of cursors for running totals, data types like DATE, and functions like CONCAT to put together strings.

Modernize Your Code

Modernization On The Small

Demo Time

Modernize Your Code

Modernization On The Large

Look back at previous designs and figure out when choices were made based on no-longer-existent constraints: too restrictive (or slow) parts of hardware, older versions of SQL Server that didn't have new features, or even developers and architects who had less knowledge and experience than they now have.

Refactoring Legacy Code

  1. Prerequisites
  2. SQL Formatting
  3. Simplify Your Code
  4. Simplify Your Design
  5. Remove Obsolete Code
  6. Modernize Your Code
  7. Improve Durability and Visibility

Improve Durability and Visibility

Error Handling

Microsoft introduced the TRY-CATCH formulation in SQL Server 2005. This offers you a central location for error-handling logic.

Modernize Your Code

Error Handling

Demo Time

Improve Durability and Visibility

Logging Errors

Once you have captured an error, we need to log that error.

Improve Durability and Visibility

Logging Errors

Demo Time

Improve Durability and Visibility

Capture Performance Data

Errors aren't enough. When users complain about slowly-performing queries, we need metrics, and the finer-grained, the better.

Improve Durability and Visibility

Capture Performance Data

Demo Time

Improve Durability and Visibility

Handle Transactions

Transaction handling in SQL Server is a little tricky; nested transactions don't work right, so you want to avoid those whenever possible. How do you avoid them when you don't know the call chain?

Improve Durability and Visibility

Handle Transactions

Demo Time

Improve Durability and Visibility

The net result from these changes is:

  • Lots more code (mostly boilerplate and approximately fixed cost)
  • Appropriate error handling and logging
  • Visibility into performance metrics by parameter selection
  • Transaction handling

Wrapping Up

Legacy code is painful. Get tests in place and refactor that code.

To learn more, go here: http://CSmore.info/on/legacysql

And for help, contact me: feasel@catallaxyservices.com | @feaselkl