Skip to main content

How To Give Name To A Size Column In Python

It is quite common to use size() in Python. size() function gives you a total number of elements.

Now, if it is that easy and straight forward, then why am I writing about it? Isn't it?

Well, calculating the size or getting the output of the size() function is very straight forward, but when it comes to labeling this value, things become more complicated.

Let's understand this with the help of an example.

Input Data

Here is how our sample data looks like. It is in the form of CSV:

Scenario Explained

The idea is to group data based on 2 columns named 'type_school' and 'interest' and then show their item count in a separate column.

Here is the sample code to achieve this:

import pandas as pd
df = pd.read_csv('data.csv')
data = df.groupby(['type_school','interest'])
data['size'] = data.size()
print(data)
Python

The above code looks all good but you will end up seeing an error in it's execution, which says:

'TypeError: 'DataFrameGroupBy' object does not support item assignment'

Is there anything wrong with the above code? Any guesses?

Analysis

The size() function, which is a function of DataFrameGroupBy objects actually returns a Series object with the group sizes.

So, if you want to display it for a data frame having column of group sizes, you need to change your code to below code:

import pandas as pd
df = pd.read_csv('data.csv')
data = df.groupby(['type_school','interest'])
data = df.groupby(['type_school','interest']).size()
print(data)
Python

The above execution will give you an output as shown below: 


At this point, the data looks correct and the only thing which is missing is column titles.

Giving Column Title

As we are displaying a data frame having column of group sizes, we need to use to_frame() function to make this column name association happen with the desired column name as it's parameter. Below is the code to do this:

import pandas as pd
df = pd.read_csv('data.csv')
data = df.groupby(['type_school','interest'])
data = df.groupby(['type_school','interest']).size().to_frame('size')
print(data)
Python

On execution of the above lines of code, you will get the expected output as shown below:

I hope you find this writeup useful. Do not forget to check out the recording of this article on my YouTube channel named Shweta Lodha:



Comments