

Using UDFs, you can ensure that, given the same data, your calculations will always produce the same result.
#Redshift tutorial code#
You can basically take several lines of code that produce one value from your SELECT statement, give it a name, and keep it for future use. They are a great way to simplify your SQL queries and make them more reproducible at the same time. User-Defined Functions (UDFs) are simply a way of saving one or more calculations or expressions with a name so that you can refer to it as a SQL function for further use. User-Defined Functions can be used just like any other function in SQL like SUBSTRING or ROUND except you get to define what the output of the function is, given the input. And what happens when you have to perform the same analysis weeks later? You better hope you use the same iteration of your SQL query the second time as the first!Īnd that is exactly where User-Defined Functions become so valuable! SQL is iterative by nature! Think about it, just be adding and removing “WHEN” conditions, you’re liable to drastically change your results.Īs you iterate on a numerical calculation or classification in a CASE expression you are likely to change your query results. Whenever you have to write complex SQL queries to get an answer, your analytical method (the SQL query) becomes a big variable. This post is more concerned with the second factor of reproducibility, the analytical method. The lawyers would do this with the intent to get two different answers. If we use the court case example again, this would be like the prosecution and the defense asking a witness the same question in two different ways. The second factor is the analytical methods. A good example would be a court case: if you ask two witnesses the same question, each one will probably tell you something similar but likely slightly different.


The first is the data-different data for the same analysis is going to produce different results. I’ve learned that there are two broad factors to reproducibility. This tutorial is going to show you how you can use Redshift User Defined Functions (UDFs) to do just that. That’s why you must be careful to integrate reproducibility into your SQL analyses. And to maintain your credibility, it’s important to be able to answer questions correctly and consistently. As a data analyst, your credibility is as valuable as your analytical skills.
