A Means to an End

Choosing the right mean for an estimator.

As data scientists we often need to estimate a value from a sample data set in order to answer a business question about the whole population. In most cases we get a sample data set, and we wish to estimate the mean of some value and so the sample mean is a good choice for an estimator.

Gouge Away

Incorporating SQOOP in your Data Pipeline

You have just completed setting up your new and shiny EMR cluster, and want to unleash the full power of Spark on the nearest data-source.

Taming the Hydra

The Proper Way to Launch a Spark Cluster

Whenever you need to slice and dice a massive dataset, and pandas or SQL just can’t do the trick - pam pam pam, have no fear Spark is here.

Pagination