thanks for your comment and thanks for the update! I'll add this to the post (with a reference to your comment, of course).

]]>thanks for the great post. I wanted to offer an update to part 2 (python based ANOVA) for when the groups have different sample sizes.

First, rewrite the calculation for n:

n = data.groupby(var).size().values

Then the calculation for SSbetween and SSwithin needs to be modified:

SSbetween = (sum(data.groupby(var).sum()[‘LogSalePrice’].values**2/n)) – (data[‘LogSalePrice’].sum()**2)/N

SSwithin = sum_y_squared – sum(data.groupby(var).sum()[‘LogSalePrice’].values**2/n)

It just takes the division by n (element-wise) inside the outer sum in both cases.

I tested this by comparing with the output from f_oneway and it seems to work. It should also generalize well to the case where n is the same for all groups.

Thanks again for the write-up!

]]>With Spyder 3.3+, you can easily open an unlimited number of consoles in multiple Python environments (either venvs, conda envs, different Python installs or even kernels on remote machines) and switch between them on the fly so long as the Spyder-Kernels package is installed in the target environment to enable Spyder’s advanced functionality (Variable Explorer, etc). Spyder 4 will have a number of major improvements in this area with built-in GUI package and environment management and integrating them with an expanded version of the existing Projects system.

