A Python programme that contains all of its code in a single script with thousands of lines,
a long code snippet being repeated dozens of times throughout the source code,
super elegant code that accomplishes miracles in just a few symbols (and might as well be black magic),
functions that contain a ton of different actions under the most generic names
– we all encounter at least some of these on a regular basis in code of our colleagues or in our own.

Why those points are issues might be obvious to many of you, but lets reiterate the problems here:

  1. Large and complex files with long and complex functions are hard to understand, and thus to maintain and debug.
  2. Large code parts that are repeated multiple times are also harder to maintain — and provide more nesting grounds for bugs.
  3. A short one-liner might feel genius in the moment — but is a pain to understand for anyone who isn’t trained in the dark arts of code golfing.
  4. Unclear, generic or misleading function and variable names are another cause for nobody, not even the authors, understanding the code in a few months time.

Ultimately, there is a pipeline:

And this is made worse by some of these issues bringing along their own bugs.

This is absolutely not so say that you’re to blame for bad software. Sure, in the search of immediate culprits, fingers may point to you.
But you likely didn’t set out to create bad code. Maybe time constraints required a quick and dirty solution. Or you thought it was just going to be an intermediate solution anyway, but as they say: There is nothing more permanent than a temporary solution. And finally, especially in the case of the gigantic Python script, you might just have thought that the scripting language you’re using doesn’t allow for or doesn’t deserve a lot of structure and thought put into it.

How to fix this?

If a code base already looks messy and over-complex, it is time for ✨refactoring✨.
But this is not the topic of this article, since my colleague Diana Hille has already written a three-part-series on this topic.
This blog article is a sort of follow-up, or rather a prequel, to that series.
Refactoring might be inevitable because of code growing more complex over time or larger changes being required for implementing new features or upgrading to newer dependency versions.

And even if a whole refactoring is not necessary, virtually no programmes can just be programmed and never touched again – as soon as other humans or software and hardware components come into play, change will be necessary over and over again (see Lehman’s categorisation of software).
But we can make the code toucher’s job a whole lot easier by

  1. programming as cleanly and clearly as possible from the beginning on,
  2. continuously improving the code base,
  3. and making sure that additions to the code are up to the same standards as the rest.

Here’s where clean code principles come in. You may have heard that term thrown around in software engineering circles. While this term and the accompanying concepts were made popular by Robert C. Martin, it is now an entire movement [It should be noted here, that his book is not without controversy. It is perceived as dogmatic and false in some points. Furthermore, the person of Robert C. Martin has faced significant criticism for his sexist and racism-apologetic remarks. This is why we will focus more on generally agreed upon principles, rather than taking his word as gospel.].
The principles encompass several Do’s and Don’ts for producing more readable and less error-prone code.

In the following, we will dive into those that I personally find the most helpful and impactful (and am therefore actually applying in my day-to-day life).
But if you are interested in learning more, you can check out the Clean Coders‘ movement website (Caution, it is a bit confusingly structured), or a list of all principles mentioned in Martin’s book. Alternatively, you can of course check out the actual source of it all, the Clean Code book. If you don’t feel like supporting its author with your money (because of the aforementioned sexism and politics), you might find it in your local library, or you can borrow it online in the Internet Archive’s Open Library.

Use expressive names

I maybe don’t have to tell you this one. Name your classes, your functions, your variables, in ways that convey their purpose. Thinking of these expressive names can be pretty hard. As the saying goes:

There are only two hard things in Computer Science: cache invalidation and naming things.

Phil Karlton

Therefore, it’s important to keep in mind another one of the clean code principles: you don’t have to put equal effort and expressiveness into the naming of all your code components. The further down the hierarchy you go, the shorter and generic your names can become.

The hierarchy:

This is closely tied to the scope of your named things. A module may be exposed to the whole world and should therefore have a quite unique name. A class has to have a unique name for at least its module or project. A function only strictly has to be unique for the file it is contained in (at the very least, you will likely have some form of main-function). With variables, it depends on the programming language you’re using. Global variables on the file or class level of course have to be unique for the overarching file or class. Variables defined in functions only have to be unique in that function. But whether variables contained in smaller scopes like in loops occupy only the space in which they are defined in – or whether they are automatically defined for the overarching function – varies from language to language.
Additionally, the lower down the hierarchy you go, the clearer the meaning of a component becomes, based on its context.

For example, if you name your module mServer, nobody except you will know what sort of server this is. But if you have a function called connect_to_mailserver and in there a variable called server, you don’t need to make it more explicit by naming it mailserver. It is pretty clear from context that it is.

This is why you can put less effort into making up unique names and stick to pretty generic ones for those lower components.
The same goes for variables used as iterators in loops. Given that they are really just used to count, you don’t have to name them iterator. You will likely stick to the commonly used name i. And that’s perfectly fine.

Besides explicit and contextual meaning, you can also apply formatting. Most languages have conventions for how to structure class names compared to function and variable names. But there is also a convention that is, for the most part, more non-committal. This is to write your constants‘ names in uppercase letters. I really recommend doing that, especially for languages like Python that don’t have special types or markers for constants.

Another important principle is to avoid using obscure suffixes and prefixes. The previous example mServer might be a relatively unique name and express sufficient meaning to you and your team mates. But as soon as somebody else has to get familiar with your code, this will be yet another hurdle. To say nothing of readability – some small letters in the front or back can easily be overlooked.
Even when you think an acronym is universally known, think about whether you really need the additional brevity and whether there is potential for misreading.
This is not to say that there isn’t a use case for ctrl instead of control or conf instead of configuration. Some abbreviations are so commonly used and hard to mistake, that I would absolutely not recommend you stop using them. But in case of doubt, I’d prefer the longer variant.

Finally, don’t be afraid to generously apply whatever syntactic dividers your programming language has to offer: snake_case, camelCase, kebap-case – you name it. Especially if you come from a language like German, it can be tempting to just leave long compound words sticking together. But I’d argue, readability is more important than grammatical correctness, at least in the context of programming.

One Functionality per Component

This one is a pretty famous, and maybe controversial, clean code principle. But I do find it quite helpful, even when you don’t apply it dogmatically. It advocates against putting a bunch of different and diverse functions into your components and instead to apply a strict division of labour. The main principle is that each component should only do one thing.

Of course this means different things depending on the hierarchy level. A module should focus on one topic, a class on one software component, a function on one function. You can keep it more or less strict, but that’s the gist of it.

There are some exceptions to this, although whether you consider them exceptions depends on your interpretation of what focusing on a single functionality means.
init-functions are meant to initialize multiple different things – therefore they might contain a wide range of functions. Or you might want to group all of your utility functions in one place with a utils file.

An init-function is also a good example to expand on what functionality separation means. You might, for example, have to initialize a database connection. You may just write all of that code into the init-function. Or you could move all that code to a separate function and keep the init-function as an overview of which things are initialized.

Example code:

def __init__(self):
LOG = ...
self.__connect_to_db(self.DB_URL)
self.__connect_to_mailserver(...)
...

def __connect_to_db(url):
try:
some_database_library.open_connection(url=url)
LOG.info('Connected to database', url=url)
except ConnectionRefusedError as e:
LOG.error('Database says no', exception=e)

This example snippet also shows, how to avoid a sudden explosion of externally exposed functions that mainly serve an internal function. __ in front of a name is Python’s way of marking something as private. In other cases, you might want to expose the smaller functions because it is useful to have them. Depending on what the project structure is, they might rather be utility functions, to be used by multiple classes. Or you might want to write unit test for the smaller steps of a process.

Structuring your programs like this has multiple advantages:

  • It results in a more organized and readable codebase.
  • By encapsulating similar logic into separate functions, you reduce code duplication and improve maintainability.
  • It makes your code easier to test and debug.
  • Combined with the next principle, this approach can greatly support you in building new code from the ground up.

Construct your programme from top to bottom

What do I mean by this? This is a general approach to coding. Instead of sitting in front of a specification sheet or feature request and thinking hard about how to approach solving it, I start writing. The overall structure is usually given by the framework and language used.

You start out with the highest level you need to program – be that a module, a class or a function. Then you think about what rough functionalities it has to include. And instead of immediately working on them, you first create new sub structures (classes or functions) for them. You then move on to those sub-structures. You think again what actions they might have to perform or what information they may need to store. If you feel like it’s overwhelming and might become very complex very quickly, it might still be too much for one function (or class). So you create new functions for each thing it has to do.
And so on. Until at some point you land at low-level logic or interacting with libraries or calculating some stuff. You implement that and ta-da – you got yourself a pretty structured programme with the nice side-effect of getting it done in a much less overwhelming way than by trial and error.

Don’t repeat yourself or DRY

This is one of the most popular concepts from clean code. In fact, I’d argue it pretty much exists independently of the clean code principles. I already mentioned it as a nice side effect of the one-functionality-per-component principle, and I believe they go hand in hand. What does it mean? That you do not repeat the same lines of code over and over again.

There are multiple ways to achieve this and they all somewhat depend on context. The biggest distinction to make is: Is the repetition actually necessary? Maybe it can be avoided by perfoming the necessary actions on a higher level and pass the result through to the lower-level components. Or maybe it requires a bigger restructuring because the repetition is caused by the programme’s components overlapping too much.

If the repetition cannot be avoided, and it’s more than a single line, it makes sense to encapsulate the logic in its own function and expose it for example via a utils module. Then you make sure that you can easily edit it, and you minimise your mistakes.

Keep it simple

As with the previous concept, you may have heard of this paradigm. There’s often a „, stupid“ attached, to form the acronym „KISS“.
It’s a very broad principle that provides a rough guidance for how to approach problem solving and software creation in general: Of all the available potential solutions, choose the simplest one. And of all the ways you could write down that solution in your code, choose the simplest expressions. Don’t overcomplicate things. Don’t be overly fancy with esoteric shorthands or one-liners.

If there is a builtin way of doing something in your programming language, that is only slightly more complex than third-party modules, probably choose that way. But on the other hand, also don’t attempt to re-implement existing solutions. Always look for the least complex solution in terms of code and dependencies.

Don’t overuse comments

If the goal of writing clean code is to write legible, understandable code – then there shouldn’t be a need for comments explaining every bit of code. Following the previous concept of Don’t repeat yourself, the main issue with comments is that they introduce another item to keep up-to-date when the code changes. And thus, another point of possible divergence. In the best case, the comment might be superfluous, in the worst case it might mislead and confuse the reader.

So, rather than naming a variable or function some generic way and attaching a comment to it to explain what it does, Clean Code operates on the principle that the code should speak for itself. Instead of choosing the shortest but least readable way to implement something and having to explain it with a comment that is longer than the code itself – you work to make the code itself understandable. This should eliminate the need for comments in most cases. However, this is not to say that there is not a time and a place to insert comments into your code. But that a comment should not be the default building block for your code, and more of a special notice: to warn people of side-effects, unexpected behaviour, or remind yourself of your TODOs.

Moreover, writing legible and self-explanatory code does not mean that we don’t need to document our code with docstrings or external documentation. A programmer wanting to use your library should not have to inspect your code — even though it might now be possible to do so. Another plus of documenting your code is that it serves as an additional sanity check:

  • Does my code function logically enough that I can explain what it does in simple terms?
  • Are the input parameters sensible?
  • Does my function do too many things?

Tests are awesome – Don’t be afraid to write tests

Automated tests are a great way to ensure that you can easily change your code without being afraid to lose the original functionality. And for writing tests, you can basically follow the same clean code rules as for the rest of your program: Keep the easily readable and make them test one concept only.
Additionally, Robert C. Martin recommends that you make your tests fast, independent from each other, repeatable in any environment, and self-validating (telling you if it failed or not, so that you don’t need to analyse that yourself), and that you write the tests shortly before the production code that makes them pass. This is all to reduce barriers to executing the tests regularly, and to make them useful enough that they are cared about.

Where I (and others) disagree with Robert C. Martin:

Despite the general acceptance of many of Martin’s principles, there are some takes that I, and also others, find unhelpful.

One of these, that I like to emphasise, is his rejection of types.

Types are your friends. Use type hints for variable-typed languages when available (like in Python).

They make your code more understandable by giving useful context and a guide of usage.

Another principle that has received some overhauling is DRY (Don’t Repeat Yourself).
Martin advocates that repition should be avoided by encapsulating the repeating code into its own function. But the prevailing, more pragmatic view, says that repition is not the worst thing in the world, and that there are cases where this kind of abstraction absolutely overcomplicates things.

Critics also disagree with the strictness of some of the principles, especially when it comes to nesting and function parameters.

All in all, I wouldn’t recommend reading the original Clean Code book. I find many principles helpful, and learned a lot when I originally read the book as a programming novice.
But as the author of the aforementioned article rightfully points out, there are lots of mistakes and bad practices in the code examples, and much of the advice should not be taken at face value.

Addendum: Clean Code and ChatGPT

I personally view ChatGPT (and other Large Language Models) as some of the greatest producers of dirty code. The whole idea of using an LLM to generate some code for you seems to me a textbook example of quick and dirty programming. The end result frequently reflects that. ChatGPT doesn’t care about legibility and adaptiblity. It doesn’t care about correctness. It doesn’t care about finding the easiest, most straightforward and secure solutions. It just uses the code, that has been fed into it during its training phase, to generate new code. And since we know that its learning material comes from the internet, it’s no stretch to assume that most of its programming knowledge therefore comes from Stackoverflow and the like.

Everyone who visits such sites regularly knows that the quality is not always the best. Moreover, you often encounter quite old questions and answers, using methods and frameworks that no longer work or are deprecated. Add to this a solid bit of hallucination, especially for less well-known frameworks and programming languages.

All of this is not to say that there is no good use case for ChatGPT in programming – it can be great when you just need to generate some data, or for simple tasks using popular technologies. But most of the time, I’d recommend you read through the answers, try to understand them, and reimplement them with cleaner code. And especially with more niche technologies, it will likely be much quicker to read some documentation…

Conclusion

I hope this guide was some help to you, as the reader and likely programmer. But as my final words, I’d also like to emphasise the importance of a general culture change. I believe that most of the dirty code in the world is not due to malice or lack of knowledge, but rather due to time constraints and potentially lack of awareness. So let’s work together and advocate for not only getting the time to complete our work, but to do it well. And let’s hold each other accountable in writing readable code. And let’s finally stop treating tests as a nice-to-have and an afterthought!

Uta Lemke
Uta Lemke is part of B1 Systems since 2020, developing and testing software. It all began with an internship in the course of their Bachelor studies. After an Erasmus semester in Wroclaw then came working alongside studying. Now, Uta is fully working at B1 as a developer and automating software testing.
Uta's Linux journey started a couple years earlier - and actually was the main motivation behind studying Informatics. Discovering that operating systems can be fun was pretty life-changing!
After work, Uta likes to do volunteering work and sings in a choir.