Why Regular Expressions Are Super Powerful, But A Terrible Coding Decision

We’ve all been there. You have a string input and need a fast and efficient way to parse something important out of it. Your options are relatively low since manually parsing a string for a pattern is tedious and often very inefficient as the string gets larger. So what do you do? You turn to Regular Expressions! What’s the problem with that? Let’s explore.

Impossible To Read or Debug

A huge assumption that is made when creating regular expressions is that the schema you are programming to won’t change. If it does, it could require rewriting the regular expression to hopefully produce the same usable output. But let’s say you are tasked with fixing a broken regular expression that fell victim to a changing schema. It means that you would have to first understand how the regex worked with the old schema, before understanding how the new schema changed. Only then can you rewrite the regular expression to account for the new input. That’s a fairly tedious process that is potentially very error prone. And the level of difficulty goes up exponentially with the length and complexity of regular expressions. I would hate to be the only one in charge of fixing this 6.2kb monster that validates RFC822 email addresses.

Regex Abuse

A common use case for regular expression is something like the following:

This regular expression tries to emulate a parser to rip out useful information into named capture groups from a structured data set like json. The benefit of this is that (in c# at least) you then can have reference to exactly what the regular expression matched on.

The downside of this is that you are using the wrong tool for the job. As much as it might seem like a quick and easy solution, it causes more problems than it solves. Parsing json, xml, or even html with regular expressions is a terrible idea. And it’s mostly a solved problem. Check out this HTML Python Parser. Using a tool like this will make your coding easier and make code maintenance easier in the future.

Balancing Act

I know most of this article has been bashing the use of regular expressions, but there are some benefits to using them (if used correctly). All developers and engineers should learn to use basic regular expressions, because they’ll produce better, more flexible, more maintainable code with them. When used responsibly, regular expressions are a huge net positive. For example, writing a regular expression to validate a phone number is relatively straightforward:

Conclusion

Regular expressions are extremely powerful and useful in the right situation. When abused and used in incorrect situations, they can lead to ugly and unmaintainable code. So use them wisely!

Comment below your opinion on Regular Expressions and if you use them regularly!