Applied Data Science with Python and Jupyter, Alex Galea, 2018 – Chapter 1

Basic system for Jupyter is a web front end to little pockets of code that execute on the backend; setup means getting the server running.

Assume this means that each notebook has its own kernel running on the server? Or not running, something more like a session.

Notebooks can be saved out as Python or HTML files.

DataFrames: created by Pandas constructor. Have a describe() method that gives summaries of individual variables. corr() gives a correlation matrix.

Seaborn pairplot: exactly what I wanted when working on reports at VCS – pairwise plots of variables against each other.

ndarray.reshape: reshapes the x-y sizes of an array; param values of -1 for a dimension mean that the correct value is inferred from other values.

sklearn.preprocessing.PolynomialFeatures: returns an object capable of transforming data frames, e.g. with degree of 2, and one-dimensional input the output frame would contain a frame for each input value, containing the value to the powers zero, one and two (i.e. the number one, the input value, and the input value squared).

sklearn.linear_model.LinearRegression: gives an object which can perform linear regression (multi-linear in the example)

There’s a bug in the last section, about categorical features. The cell that starts, “# Color-segmented pair plot” contains this:

sns.pairplot(df[cols], hue='AGE_category',
hue_order=['Relatively New', 'Relatively Old',
'Very Old'], plot_kws={'alpha': 0.5},
diag_kws={'bins': 30})

But this throws an AttributeError – ‘Line2D’ has no property ‘bins’. Removing the parameter diag_kws={‘bins’: 30} leaves the call running properly.