Working with legacy code
What?
“source code that relates to a no-longer supported or manufactured operating system or other computer technology.”
legacy code as code without tests
legacy code the code that was written before they arrived in a company.
code that: 1) is hard for you to understand, 2) you’re not comfortable changing, 3) and you’re somehow concerned with.
Legacy help the code grow
Without legacy code, you would not have a job
People wrote legacy code without realising it would be legacy code
Best practices change
Technology has changes ie language features and hardware
What was required then for outcome of codebase has changed
There were other constraints, time, business, money
There can be parts of codebase which is legacy
If there are no tests around a piece of code, you won’t be comfortable changing it.
very cautious when you add a feature, because it could have an impact on this existing code.
have to spend time writing tests, maybe even refactor existing code in order to write tests, deal with regressions
need to understand someone else’s writing, or plenty of people’s writings combined
think about why they did what they did.
understand the structures in the code and how they relate together.
Legacy code becomes tangled and difficult to understand because of an inconsistent accumulation of changes, made by many people, who sometimes weren’t even employed by the company at the same time.
How to approach legacy code
Attitude
you’re paid for being rational, not for being primal
the best code we can write now might be laughed at in ten years
When view code, think
had I been given the chance, would I have done so much better at that time?
Take ownership
don’t complain if you are not intending to improve the code.
consider that the code you’re working on is your code.
acknowledging that you are the one in charge now of the code
Have role model
With right mindset
Learning from bad code
in legacy, what you learn tends to be specific knowledge about how the codebase is organized, rather than how to get better at development
not improving your skills is dangerous in the long term
you still can (and should) evaluate objectively the quality of your code and code you work with
explain why you don’t like it.
What’s wrong with that particular piece of code?
If you don’t like it, there must be something wrong with it, right?
If it’s just a matter of style, then don’t dwell on it
be precise
becoming more familiar with the code, seeing how code is structured and, most importantly: learning what to pay attention to.
to explain it to other people
think about why the code is now better than before you changed it.
elaborate
Have a good reason why
Workign wiht bad code, helps you find it again and solve it in other places
Work with and read good code
a lot of what writing good code is about comes down to making proper use of abstraction levels, and deciding which abstractions to put in place is the first step to a good design.
design issues stand out to you with more accuracy when reading bad code
Where is good code
standard library of language
Documentation
implementation
read the code that uses it
Popular and mature external libraries
Open source projects
Understand Legacy code
Get an overivew of code
Stronghold in the code
to pick a place in the code that you understand very well (or at least, better than the rest of the code).
ie inputs and outputs of that feature in the application
Starting point to build your map of the code, the directions to other code
Can use a dubugger from there
a small unit or line of code that you understand perfectly in the context of the application.
Once you have that reduced scope of where to look in codebase, explore it by reading the code or stepping through it with a debugger.
focus on cues that look related to the unit you’re after, until you step through to the area you were looking for
needs to be specific.
A commonly used method is not useful
some business code that calls this function in a context that you know in the application is a better stronghold.
Find someone with some experience of the code, to give you starting point for a strong hold
Starting from the inputs and outputs of the program
If no one has worked in the code, find some one who has used the code ie it's interfaces
Ask for a common use case, how they interact with the app ie gui/postman etc
If no, look at integration tests
look documentation for how libraries are implemented ie to start a server and accept requests etc
Code must take some input, find this
use grep
Once you have found a trace of the UI input or output in the code, ollow it like a thread until you reach the code of the test case that the business person showed you
To work from UI layer to business layer, need to understand UI framework
study how information is carried from graphical events to business code and back
Analysing code stacks
to fire up the debugger, find a “judicious” place in the code where to put a breakpoint, and launch a use case in the application.
A judicious breakpoint is one that is deep in a stack of a typical use case of the application.
the call stack displays in one shot all the layers of the application involved in that use case.
this snapshot provides insights on the architecture of your software: what the main modules and the frameworks are and how they relate together.
Repeat this experiment for several call stacks in the same use case in order to get a grasp of the sequencing of the calls
And for other use cases
Flame graphs
They aggregate the data produced by a performance analysis tool (such as perf on Linux) into a picture representing the stacks that a program went through during its execution.
Display several call stacks at the same time
Reading code faster
Speed reading
Code is non fiction
An inspectional reading consists in skimming through the book, looking for places that sum up the information (table of contents, beginning and end of chapters, main messages...).
This skimming of the code allows to achieve two things:
deciding whether this piece of code is relevant for you and deserves a deeper analysis,
getting a general idea of its meaning before getting into the details.
you do not want to start by reading a function “cover to cover”,
only when you have performed the inspectional reading should you read the code in more detail
Only code that is important, thus reducing code to read
Working your way backwards from the function’s outputs
Check the function name first
should be well named
the name, parameters and return type should be enough to indicate everything you need to know about this function.
If function signature is poor, look inside
check outputs first,
For one return, general last line
For multiple return statements
find commonality between them all
For void methods
doing side effects
check modifying their parameters or, for object methods, modi- fying the state of the object.
Modifying global variables
Throwing exceptions
Identifying the terms that occur frequently
Use a word count frequency tool in function/class
look for the objects that appear the most frequently in its code.
More frequent, the more central it is to function
Highlight them using IDE
If object is spread out through out thr function, it is important
Understanding how inputs are used
examine what it does with its inputs.
If used at beginning, to get another object then these might be more useful to follow
is the intensive use of a word in a portion of the code, and very few usages outside of this portion
his portion of code is focused on using a particular object, which can clarify the responsibilities of the portion of code
Filtering on control flow
instantly squeeze a long the function into a few lines of code is to filter on its control flow keywords: if, else, for, while, switch, case, try, catch
hide all the lines of the function except those that contain the control flow keywords
shows the skeleton of the function, ie table of contents
Distinguishing the main action of the function
in a function, not all lines contain the main action
Some lines are merely secondary quests, like getting a value, logging a piece of information, or preparing a secondary character.
Should not focus on
To locate the main action, you can quickly scan every line of the function, and determine if it looks like the main action, even if with a gut feeling.
Dont spend too long on this
achieving a 100% understanding of a long function the first time you encounter it is not always a good objective.
Comes wiht cost of time
And may not need full understanding to change or add feature
When to spend time understanding
they can be functions that have a bug in a corner case and that you need to fix, functions that you choose to refactor, or high level functions that show the structure of an important feature of the application
Understand code in detail
Using “practice” functions to improve your code-reading skills
A “practice” function is a big function that has a complex implementation but that has little to no dependency on anything else
It is self contained
get familiar with code, style, not always a good standard of quality
Decoupling the code
Start refactoring code
That changes structure of code, or puts in a structure
Decoupling entities
break down function into sub functions
To keep only the control flow in the original function and factor out each sub step or special case in its own separate well-named functions
decouple data processing from objects
Refactoring is high time intensive activity
Need to see what gives most bang for the buck
Work with others
working on your own to understand code you know little will take longer than working with someone who does or more experience.
Pair programming
Use of a rubber duck
Just explaining it can also help
Knowledge
Knowledge is Power
How to make knowledge flow in your team
The Dailies
Cutting through legacy code
How to find the source of a bug without knowing a lot of code
Maintaince is associated with using a debugger to find the error/bug in code
The natural way of find bug (inefficient):
You receive a bug report related feature X
You look around the code of feature X
You step through the codebase with the debugger, looking for the cause of the problem.
if you start by looking at the code, you don’t know what you’re looking for
Stumble into bug
don’t start by looking at the code.
begin by spending time analysing the application while it is running.
Better approach
Step 0: Dont look at code
Step 1: Reproduce the issue
Check the bug is there
Check bug is in on dev machine
if not then could be environment specific, config etc
Step 2: Perform differential testing to locate the issue
After reproducing bug
Reduce the test case
trying slight variations of the original test case in order to refine the scope of the bug
Aim of finding simplest test case to reproduce bug
Step 2a: Start with a tiny difference
Change something small between two configs, then the delta is where the issue is
Step 2b: Continue with larger differences
Maybe differences between new feature and old version
Step 3: Formulate and validate a hypothesis
After we end up with a probable location for the bug and a method for reliably reproducing it
We formulate a hypothesis about what is causing the incorrect behaviour.
If many things go wrong choose gut feeling
Now look at code to confirm it, use debugger here
If validates hypothesis, great, if not repeat this step
Binary search for root cause of bug
Use divide and conquer, to quickly get to a bug
narrowing the search by repeatedly splitting up the search space into a good half and a bad half, then looking further into the bad half for the problem.
What to fix and not fix
Can be like a bully
Big code base, inconsistent, duplicates, big functions/objects
When fixed, a regression test fails
Need a strategy to help fix code
The value-based approach
Problem is there are many areas to fix, but little time
Do lots of refactoring or rewrite is not feasible
Need to assess both the value and the cost, for each refactoring that you are considering.
act on the best value/cost ratios
Sources of costs to think about
Changing prod code
fixing regressions
adding tests
handling conflicts
There will be unknown unknowns that will add to the costs
To refactor, need to understand code -> takes time
The cost of the regression depends on two things:
how long it takes to identify what caused the regression - how long it takes to fix it.
Both those amounts of time are correlated with how close your tests are to the code.
The farther a test is from the code, the longer it takes to launch and the less often it’s launched
Use of unit tests, that run often, reduces fixes time, and increase fix is done correctly
The test that is the farthest from the code is when a client uses the software
Regressions discovered by a user are amongst the longest to identify and fix.
Due to context switching, bad rep, new release, research etc
Adding tests
time needed to write tests includes thinking of the scenarios to test and implementing them
Most of the time taken in writing tests is doing that one last thing to cut off a dependency so that the code under test can be put into a test harness.
refactoring code to make it more testable
Handling conflicts
good candidates for refactoring are regions of code that get in your way
Hot spots are areas of code where lots of current dev work is occuring.
can lead to merge conflicts
identify hot spots is to look in your version control system what parts of the code tend to be modified very often, and to understand why they need more fixes than the rest of the code
Other potential hot spots are those that come up in bug reports on a regular basis, as well as those where a lot of regressions appear.
Refactoring hotspots takes coordination between devs
Try having only one developer at a time is allowed to work on a refactoring on a hot spot.
developers aren’t allowed to exceed one or two days of lock time
every developer of the team marks off places in the code every time something related to the quality of the code slowed them down, while debugging or fixing a bug, or in any other development activity
Valuable refactorings
What annoys you the most in your codebase on a daily basis?
Get team invovled in deciding
Slice up a big function
Forces devs to be bog down into details and not get a big picture of what is going on quickly
Identifying the responsibilities of that function to split into sub functions with good naming
Or delegate to another object
If function is used a lot, then prime candidate
Slice up a big object
Object might have lots of responsibilities
Splitting their members allows you to manipulate lighter structures that take up less mental space in the mind of a reader.
Make side effects visible
Making it clear what effects a function has on objects helps following along and being less surprised when debugging code.
use of built in language features to make immutable objects
you can pass in the objects to be modified as function argument instead of global variable
bundle the function that makes a side effect on a piece of data with it into a class
make it clear in a function’s name what side effects it has
Use names that make sense
Poorly named objects can send you on a wrong track and make you waste a lot of time.
Don’t Repeat Yourself
Two (or more) identical pieces of code in the codebase means
more code to become familiar with
more places for bugs to settle in
more intellectual strain to fit everything in your head
duplicated code that start off as identical tend to evolve in separate directions
Merging two exact duplicates, that haven’t had the time to diverge yet.
preventing diverging
Merging two non-identical duplicates, that have already diverged.
harder, need to find commanlity
a first step is to place them next to each other in code
talking about the code in terms of the target design, even before you carry out a refactoring project
refactoring techniques to reduce function size
long functions/classes make legacy code hard to work with
Also over engineered (lots of patterns/abstractions)
Last updated