5.4. Modifying a DataFrame in Pandas#

In data analysis, modifying a DataFrame is a fundamental operation. This section covers common methods for renaming columns, dropping columns or rows, and other essential modifications.

5.4.1. Renaming Columns#

Renaming columns in a DataFrame can be accomplished using the rename method. This method allows for specifying a dictionary where keys are the current column names and values are the new column names. This is particularly useful when you need to make your DataFrame more readable or align it with a specific format [Pandas Developers, 2023].

Example: The data in this example is sourced from Alberta Regional Dashboard [Government of Alberta, 2024].

import pandas as pd

# Sample DataFrame with population data from Alberta cities
df = pd.DataFrame({'Municipality': ['Calgary', 'Edmonton', 'Red Deer', 'Strathcona County', 'Lethbridge', 'Airdrie', 'Wood Buffalo'],
                   'Population 2022': [1413800, 1087803, 105883, 104330, 104254, 80222, 74532]
                   })
print('Original:')
display(df)

# Renaming columns
df.rename(columns={'Municipality': 'City', 'Population 2022': 'Population'}, inplace=True)

print('Renamed:')
display(df)
Original:
Municipality Population 2022
0 Calgary 1413800
1 Edmonton 1087803
2 Red Deer 105883
3 Strathcona County 104330
4 Lethbridge 104254
5 Airdrie 80222
6 Wood Buffalo 74532
Renamed:
City Population
0 Calgary 1413800
1 Edmonton 1087803
2 Red Deer 105883
3 Strathcona County 104330
4 Lethbridge 104254
5 Airdrie 80222
6 Wood Buffalo 74532

The inplace=True argument modifies the DataFrame directly without creating a copy.

5.4.2. Renaming using Axis-Style Parameters#

The rename method also supports using axis-style parameters to rename columns and index labels in a DataFrame. Here are examples using the population data from Alberta cities.

  • Renaming Columns with a Function: You can use a function to rename columns dynamically. For instance, converting column names to lowercase:

# Sample DataFrame with population data from Alberta cities
df = pd.DataFrame({'Municipality': ['Calgary', 'Edmonton', 'Red Deer', 'Strathcona County', 'Lethbridge', 'Airdrie', 'Wood Buffalo'],
                   'Population 2022': [1413800, 1087803, 105883, 104330, 104254, 80222, 74532]
                   })
print('Original:')
display(df)

# Renaming columns using a function
df.rename(str.lower, axis='columns', inplace=True)

print('Renamed:')
display(df)
Original:
Municipality Population 2022
0 Calgary 1413800
1 Edmonton 1087803
2 Red Deer 105883
3 Strathcona County 104330
4 Lethbridge 104254
5 Airdrie 80222
6 Wood Buffalo 74532
Renamed:
municipality population 2022
0 Calgary 1413800
1 Edmonton 1087803
2 Red Deer 105883
3 Strathcona County 104330
4 Lethbridge 104254
5 Airdrie 80222
6 Wood Buffalo 74532

Renaming Index Labels: You can also rename index labels by specifying a dictionary where keys are the current index labels and values are the new index labels:

# Sample DataFrame with population data from Alberta cities
df = pd.DataFrame({'Municipality': ['Calgary', 'Edmonton', 'Red Deer', 'Strathcona County', 'Lethbridge', 'Airdrie', 'Wood Buffalo'],
                                    'Population 2022': [1413800, 1087803, 105883, 104330, 104254, 80222, 74532]},
                                    index=[1, 2, 3, 4, 5, 6, 7])
print('Original:')
display(df)

# Renaming index labels
df.rename({1: 101, 2: 102, 3: 103, 4: 104, 5: 105, 6: 106, 7: 107}, axis='index', inplace=True)

print('Renamed:')
display(df)
Original:
Municipality Population 2022
1 Calgary 1413800
2 Edmonton 1087803
3 Red Deer 105883
4 Strathcona County 104330
5 Lethbridge 104254
6 Airdrie 80222
7 Wood Buffalo 74532
Renamed:
Municipality Population 2022
101 Calgary 1413800
102 Edmonton 1087803
103 Red Deer 105883
104 Strathcona County 104330
105 Lethbridge 104254
106 Airdrie 80222
107 Wood Buffalo 74532

These examples demonstrate different ways to rename columns and index labels, providing flexibility depending on the context and requirements of your DataFrame operations. For more details, you can refer to the Pandas documentation on renaming columns.

5.4.3. Dropping Columns or Rows#

The drop method is used to remove columns or rows from a DataFrame. This can be useful for cleaning up your data by removing unnecessary or irrelevant information.

Example - Dropping Columns: We will use the following data on vehicle registrations in Alberta cities [Government of Alberta, 2024]:

import pandas as pd

# Sample DataFrame with vehicle registrations data
df = pd.DataFrame({'Municipality': ['Calgary', 'Edmonton', 'Strathcona County', 'Red Deer', 'Lethbridge', 'Airdrie', 'Wood Buffalo'],
                   'Vehicle Registrations 2020': [1007282, 725796, 79271, 75533, 73964, 56838, 55905],
                   'Population 2020': [1413800, 1087803, 104330, 105883, 104254, 80222, 74532]  # Additional column for demonstration
                   })

print('Original:')
display(df)

# Dropping a single column
df.drop(columns=['Population 2020'], inplace=True)

print('After dropping one column:')
display(df)

# Dropping multiple columns
df.drop(columns=['Municipality', 'Vehicle Registrations 2020'], inplace=True)

print('After dropping multiple columns:')
display(df)
Original:
Municipality Vehicle Registrations 2020 Population 2020
0 Calgary 1007282 1413800
1 Edmonton 725796 1087803
2 Strathcona County 79271 104330
3 Red Deer 75533 105883
4 Lethbridge 73964 104254
5 Airdrie 56838 80222
6 Wood Buffalo 55905 74532
After dropping one column:
Municipality Vehicle Registrations 2020
0 Calgary 1007282
1 Edmonton 725796
2 Strathcona County 79271
3 Red Deer 75533
4 Lethbridge 73964
5 Airdrie 56838
6 Wood Buffalo 55905
After dropping multiple columns:
0
1
2
3
4
5
6

Example - Dropping Rows:

Rows can be dropped based on their index. This is useful when you need to remove specific entries from your dataset.

# Recreate the DataFrame for row dropping examples
df = pd.DataFrame({'Municipality': ['Calgary', 'Edmonton', 'Strathcona County', 'Red Deer', 'Lethbridge', 'Airdrie', 'Wood Buffalo'],
                   'Vehicle Registrations 2020': [1007282, 725796, 79271, 75533, 73964, 56838, 55905]},
                   index=[1, 2, 3, 4, 5, 6, 7])

print('Original:')
display(df)

# Dropping a single row
df.drop(index=1, inplace=True)

print('After dropping one row:')
display(df)

# Dropping multiple rows
df.drop(index=[2, 3], inplace=True)

print('After dropping multiple rows:')
display(df)
Original:
Municipality Vehicle Registrations 2020
1 Calgary 1007282
2 Edmonton 725796
3 Strathcona County 79271
4 Red Deer 75533
5 Lethbridge 73964
6 Airdrie 56838
7 Wood Buffalo 55905
After dropping one row:
Municipality Vehicle Registrations 2020
2 Edmonton 725796
3 Strathcona County 79271
4 Red Deer 75533
5 Lethbridge 73964
6 Airdrie 56838
7 Wood Buffalo 55905
After dropping multiple rows:
Municipality Vehicle Registrations 2020
4 Red Deer 75533
5 Lethbridge 73964
6 Airdrie 56838
7 Wood Buffalo 55905

Example - Dropping Rows and Columns Simultaneously:

The drop method also allows for simultaneously dropping both rows and columns by specifying both index and columns parameters.

# Recreate the DataFrame for combined dropping examples
df = pd.DataFrame({'Municipality': ['Calgary', 'Edmonton', 'Strathcona County', 'Red Deer', 'Lethbridge', 'Airdrie', 'Wood Buffalo'],
                   'Vehicle Registrations 2020': [1007282, 725796, 79271, 75533, 73964, 56838, 55905],
                   'Population 2020': [1413800, 1087803, 104330, 105883, 104254, 80222, 74532]}, index=[1, 2, 3, 4, 5, 6, 7])
print('Original:')
display(df)

# Dropping rows and columns simultaneously
df.drop(index=[4, 5], columns=['Population 2020'], inplace=True)

print('After dropping specific rows and a column:')
display(df)
Original:
Municipality Vehicle Registrations 2020 Population 2020
1 Calgary 1007282 1413800
2 Edmonton 725796 1087803
3 Strathcona County 79271 104330
4 Red Deer 75533 105883
5 Lethbridge 73964 104254
6 Airdrie 56838 80222
7 Wood Buffalo 55905 74532
After dropping specific rows and a column:
Municipality Vehicle Registrations 2020
1 Calgary 1007282
2 Edmonton 725796
3 Strathcona County 79271
6 Airdrie 56838
7 Wood Buffalo 55905

Example - Dropping Rows Based on Condition:

You can also drop rows based on a condition using boolean indexing.

# Recreate the DataFrame for conditional dropping examples
df = pd.DataFrame({'Municipality': ['Calgary', 'Edmonton', 'Strathcona County', 'Red Deer', 'Lethbridge', 'Airdrie', 'Wood Buffalo'],
                   'Vehicle Registrations 2020': [1007282, 725796, 79271, 75533, 73964, 56838, 55905],
                   'Population 2020': [1413800, 1087803, 104330, 105883, 104254, 80222, 74532]},
                    index=[1, 2, 3, 4, 5, 6, 7])

print('Original:')
display(df)

# Dropping rows where Vehicle Registrations 2020 is less than 100,000
df = df[df['Vehicle Registrations 2020'] >= 100000]

print('After dropping rows based on condition:')
display(df)
Original:
Municipality Vehicle Registrations 2020 Population 2020
1 Calgary 1007282 1413800
2 Edmonton 725796 1087803
3 Strathcona County 79271 104330
4 Red Deer 75533 105883
5 Lethbridge 73964 104254
6 Airdrie 56838 80222
7 Wood Buffalo 55905 74532
After dropping rows based on condition:
Municipality Vehicle Registrations 2020 Population 2020
1 Calgary 1007282 1413800
2 Edmonton 725796 1087803

For more information, check out the Pandas documentation on dropping rows and columns.

5.4.4. Adding/Modifying Columns#

New columns can be added directly by assigning a new series to a DataFrame. Existing columns can also be modified similarly. This is useful for creating new features or updating existing ones based on your analysis needs

# Setting new values, creating rows if needed
df.loc[new_row_label] = new_data
df['new_column'] = new_data

Advantages:

  • Convenient for adding new rows or columns without explicitly modifying the DataFrame shape.

  • Enables quick DataFrame expansion without explicit resizing.

Disadvantages:

  • Can lead to the unintentional creation of new rows or columns if not used carefully.

  • Might not be suitable for cases where strict control over data insertion is required.

Example:

import pandas as pd

# This example is from
# https://pandas.pydata.org/docs/getting_started/intro_tutorials/05_add_columns.html
# The Air Quality NO2 dataset if from
# http://dhhagan.github.io/py-openaq/index.html
df = pd.read_csv('https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/air_quality_no2.csv')
print(r'Air Quality NO2:')
df.head(8)
Air Quality NO2:
datetime station_antwerp station_paris station_london
0 2019-05-07 02:00:00 NaN NaN 23.0
1 2019-05-07 03:00:00 50.5 25.0 19.0
2 2019-05-07 04:00:00 45.0 27.7 19.0
3 2019-05-07 05:00:00 NaN 50.4 16.0
4 2019-05-07 06:00:00 NaN 61.9 NaN
5 2019-05-07 07:00:00 NaN 72.4 26.0
6 2019-05-07 08:00:00 NaN 77.7 32.0
7 2019-05-07 09:00:00 NaN 67.9 32.0

The goal is to express the \(NO_2\) concentration at the London station in milligrams per cubic meter (mg/m³). This conversion is achieved under the conditions of 25 degrees Celsius and 1013 hPa pressure, using the specific conversion factor of 1.882 (Further information can be found here).

df["london_mg_per_cubic"] = df["station_london"] * 1.882
df.head(8)
datetime station_antwerp station_paris station_london london_mg_per_cubic
0 2019-05-07 02:00:00 NaN NaN 23.0 43.286
1 2019-05-07 03:00:00 50.5 25.0 19.0 35.758
2 2019-05-07 04:00:00 45.0 27.7 19.0 35.758
3 2019-05-07 05:00:00 NaN 50.4 16.0 30.112
4 2019-05-07 06:00:00 NaN 61.9 NaN NaN
5 2019-05-07 07:00:00 NaN 72.4 26.0 48.932
6 2019-05-07 08:00:00 NaN 77.7 32.0 60.224
7 2019-05-07 09:00:00 NaN 67.9 32.0 60.224

For more details on modifying DataFrames, you can refer to the Pandas documentation on DataFrame operations.

Example - Daily Weather Data Report for October 2023 (Calgary Int’l Airport, Alberta):

This example is meant to assist in summarizing the above.

import pandas as pd

# Read the HTML data from the specified URL and store it as a list of DataFrames.
df_list = pd.read_html('https://climate.weather.gc.ca/climate_data/daily_data_e.html?StationID=50430')

# Select the first DataFrame from the list, which contains the climate data.
df = df_list[0]

# Display the DataFrame containing the climate data for analysis.
display(df)
DAY Max Temp Definition °C Min Temp Definition °C Mean Temp Definition °C Heat Deg Days Definition Cool Deg Days Definition Total Rain Definition mm Total Snow Definition cm Total Precip Definition mm Snow on Grnd Definition cm Dir of Max Gust Definition 10's deg Spd of Max Gust Definition km/h
0 01 21.7 12.3 17.0 1.0 0.0 1.6 0.0 1.6 NaN 32 64
1 02 20.8 10.0 15.4 2.6 0.0 19.8 0.0 19.8 NaN 2 53
2 03 17.2 10.3 13.8 4.2 0.0 5.2 0.0 5.2 NaN 32 54
3 04 21.4 10.2 15.8 2.2 0.0 18.4 0.0 18.4 NaN 36 46
4 05 23.6 9.6 16.6 1.4 0.0 0.4 0.0 0.4 NaN 12 36
5 06 23.3 13.6 18.5 0.0 0.5 LegendTT 0.0 LegendTT NaN 1 38
6 07 26.9 11.6 19.3 0.0 1.3 0.0 0.0 0.0 NaN 16 40
7 08 29.8 14.0 21.9 0.0 3.9 0.0 0.0 0.0 NaN 33 35
8 09 29.2 15.6 22.4 0.0 4.4 0.0 0.0 0.0 NaN 14 47
9 10 34.2 16.8 25.5 0.0 7.5 0.0 0.0 0.0 NaN 25 33
10 11 LegendMM LegendMM LegendMM LegendMM LegendMM 0.0 0.0 0.0 NaN 35 45
11 12 24.1 15.2 19.7 0.0 1.7 0.0 0.0 0.0 NaN 9 64
12 13 28.0 11.5 19.8 0.0 1.8 0.0 0.0 0.0 NaN 4 42
13 14 22.4 13.8 18.1 0.0 0.1 0.0 0.0 0.0 NaN 4 40
14 15 28.4 13.0 20.7 0.0 2.7 0.0 0.0 0.0 NaN 3 42
15 16 29.1 16.6 22.9 0.0 4.9 0.0 0.0 0.0 NaN 17 38
16 17 31.6 16.3 24.0 0.0 6.0 0.0 0.0 0.0 NaN 15 34
17 18 32.2 16.0 24.1 0.0 6.1 0.0 0.0 0.0 NaN 23 44
18 19 32.7 15.8 24.3 0.0 6.3 0.0 0.0 0.0 NaN 34 49
19 20 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
20 Sum NaN NaN NaN 11.4LegendCarer^ 47.2LegendCarer^ 45.4LegendCarer^ 0.0LegendCarer^ 45.4LegendCarer^ NaN NaN NaN
21 Avg 26.5LegendCarer^ 13.5LegendCarer^ 20.0LegendCarer^ NaN NaN NaN NaN NaN NaN NaN NaN
22 Xtrm 34.2LegendCarer^ 9.6LegendCarer^ NaN NaN NaN 19.8LegendCarer^ 0.0LegendCarer^ 19.8LegendCarer^ NaN 9LegendCarer^ 64LegendCarer^LegendSS
23 Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ...
  1. Dropping Columns:

    Dropping columns in Pandas is accomplished using the drop method, specifying the columns to be removed. The primary parameters for this operation are as follows:

    - `labels`: A list or single label representing the column(s) to be dropped.
    - `axis`: Specifies the axis along which the drop operation will occur. Use `axis=1` for columns.
    - `inplace`: An optional parameter, when set to `True`, modifies the original DataFrame; otherwise, it returns a new DataFrame with the specified columns removed.
    
    Here is a code example to drop a column named 'ColumnName':
    
    ```python
    df.drop(columns='ColumnName', inplace=True)
    ```
    

    Please see this link for more details.

# Remove the column 'Snow on Grnd Definition cm' from the DataFrame.
df.drop(columns='Snow on Grnd Definition cm', inplace=True)

# Display the DataFrame after removing the specified column.
display(df)
DAY Max Temp Definition °C Min Temp Definition °C Mean Temp Definition °C Heat Deg Days Definition Cool Deg Days Definition Total Rain Definition mm Total Snow Definition cm Total Precip Definition mm Dir of Max Gust Definition 10's deg Spd of Max Gust Definition km/h
0 01 21.7 12.3 17.0 1.0 0.0 1.6 0.0 1.6 32 64
1 02 20.8 10.0 15.4 2.6 0.0 19.8 0.0 19.8 2 53
2 03 17.2 10.3 13.8 4.2 0.0 5.2 0.0 5.2 32 54
3 04 21.4 10.2 15.8 2.2 0.0 18.4 0.0 18.4 36 46
4 05 23.6 9.6 16.6 1.4 0.0 0.4 0.0 0.4 12 36
5 06 23.3 13.6 18.5 0.0 0.5 LegendTT 0.0 LegendTT 1 38
6 07 26.9 11.6 19.3 0.0 1.3 0.0 0.0 0.0 16 40
7 08 29.8 14.0 21.9 0.0 3.9 0.0 0.0 0.0 33 35
8 09 29.2 15.6 22.4 0.0 4.4 0.0 0.0 0.0 14 47
9 10 34.2 16.8 25.5 0.0 7.5 0.0 0.0 0.0 25 33
10 11 LegendMM LegendMM LegendMM LegendMM LegendMM 0.0 0.0 0.0 35 45
11 12 24.1 15.2 19.7 0.0 1.7 0.0 0.0 0.0 9 64
12 13 28.0 11.5 19.8 0.0 1.8 0.0 0.0 0.0 4 42
13 14 22.4 13.8 18.1 0.0 0.1 0.0 0.0 0.0 4 40
14 15 28.4 13.0 20.7 0.0 2.7 0.0 0.0 0.0 3 42
15 16 29.1 16.6 22.9 0.0 4.9 0.0 0.0 0.0 17 38
16 17 31.6 16.3 24.0 0.0 6.0 0.0 0.0 0.0 15 34
17 18 32.2 16.0 24.1 0.0 6.1 0.0 0.0 0.0 23 44
18 19 32.7 15.8 24.3 0.0 6.3 0.0 0.0 0.0 34 49
19 20 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
20 Sum NaN NaN NaN 11.4LegendCarer^ 47.2LegendCarer^ 45.4LegendCarer^ 0.0LegendCarer^ 45.4LegendCarer^ NaN NaN
21 Avg 26.5LegendCarer^ 13.5LegendCarer^ 20.0LegendCarer^ NaN NaN NaN NaN NaN NaN NaN
22 Xtrm 34.2LegendCarer^ 9.6LegendCarer^ NaN NaN NaN 19.8LegendCarer^ 0.0LegendCarer^ 19.8LegendCarer^ 9LegendCarer^ 64LegendCarer^LegendSS
23 Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ...
  1. Dropping Rows:

    Similarly, dropping rows in Pandas is achieved using the drop method, with the primary parameters as follows:

    • labels: A list or single label representing the row(s) to be dropped.

    • axis: Use axis=0 (which is the default) to drop rows. You don’t need to specify axis explicitly for dropping rows.

    • inplace: As before, setting this to True will modify the original DataFrame.

    To drop rows by index, you can use the following code as an example:

    df.drop(index=[0, 1, 2], inplace=True)
    

    This code will remove the rows with index 0, 1, and 2 from the DataFrame ‘df’.

    Please see this link for more details.

# Remove the row with index 7 from the DataFrame.
df.drop(df.index[-1], inplace=True)

# Display the DataFrame after removing the specified row.
display(df)
DAY Max Temp Definition °C Min Temp Definition °C Mean Temp Definition °C Heat Deg Days Definition Cool Deg Days Definition Total Rain Definition mm Total Snow Definition cm Total Precip Definition mm Dir of Max Gust Definition 10's deg Spd of Max Gust Definition km/h
0 01 21.7 12.3 17.0 1.0 0.0 1.6 0.0 1.6 32 64
1 02 20.8 10.0 15.4 2.6 0.0 19.8 0.0 19.8 2 53
2 03 17.2 10.3 13.8 4.2 0.0 5.2 0.0 5.2 32 54
3 04 21.4 10.2 15.8 2.2 0.0 18.4 0.0 18.4 36 46
4 05 23.6 9.6 16.6 1.4 0.0 0.4 0.0 0.4 12 36
5 06 23.3 13.6 18.5 0.0 0.5 LegendTT 0.0 LegendTT 1 38
6 07 26.9 11.6 19.3 0.0 1.3 0.0 0.0 0.0 16 40
7 08 29.8 14.0 21.9 0.0 3.9 0.0 0.0 0.0 33 35
8 09 29.2 15.6 22.4 0.0 4.4 0.0 0.0 0.0 14 47
9 10 34.2 16.8 25.5 0.0 7.5 0.0 0.0 0.0 25 33
10 11 LegendMM LegendMM LegendMM LegendMM LegendMM 0.0 0.0 0.0 35 45
11 12 24.1 15.2 19.7 0.0 1.7 0.0 0.0 0.0 9 64
12 13 28.0 11.5 19.8 0.0 1.8 0.0 0.0 0.0 4 42
13 14 22.4 13.8 18.1 0.0 0.1 0.0 0.0 0.0 4 40
14 15 28.4 13.0 20.7 0.0 2.7 0.0 0.0 0.0 3 42
15 16 29.1 16.6 22.9 0.0 4.9 0.0 0.0 0.0 17 38
16 17 31.6 16.3 24.0 0.0 6.0 0.0 0.0 0.0 15 34
17 18 32.2 16.0 24.1 0.0 6.1 0.0 0.0 0.0 23 44
18 19 32.7 15.8 24.3 0.0 6.3 0.0 0.0 0.0 34 49
19 20 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
20 Sum NaN NaN NaN 11.4LegendCarer^ 47.2LegendCarer^ 45.4LegendCarer^ 0.0LegendCarer^ 45.4LegendCarer^ NaN NaN
21 Avg 26.5LegendCarer^ 13.5LegendCarer^ 20.0LegendCarer^ NaN NaN NaN NaN NaN NaN NaN
22 Xtrm 34.2LegendCarer^ 9.6LegendCarer^ NaN NaN NaN 19.8LegendCarer^ 0.0LegendCarer^ 19.8LegendCarer^ 9LegendCarer^ 64LegendCarer^LegendSS

In this code:

  • df is your DataFrame.

  • df.index[-1] retrieves the index label of the last row in the DataFrame.

  • drop is used to remove the row with the specified index label.

  • inplace=True is used to modify the DataFrame in place.

  1. Column Names: To access the column names in a Pandas DataFrame, you can use the .columns attribute. Here’s how to do it:

    # Assuming df is your DataFrame
    column_names = df.columns
    

    The df.columns attribute returns an Index object containing the names of all the columns in the DataFrame. You can then convert this Index object to a list if needed:

    column_names_list = list(column_names)
    

    Please see this link for more details.

# Retrieve and display the column names of the DataFrame.
df.columns
Index(['DAY', 'Max Temp Definition °C', 'Min Temp Definition °C',
       'Mean Temp Definition °C', 'Heat Deg Days Definition',
       'Cool Deg Days Definition', 'Total Rain Definition mm',
       'Total Snow Definition cm', 'Total Precip Definition mm',
       'Dir of Max Gust Definition 10's deg',
       'Spd of Max Gust Definition km/h'],
      dtype='object')
  1. Replace each occurrence of pattern/regex (Optional Content):

    The .str.replace method in Pandas is used to perform string replacement within each element of a Pandas Series or DataFrame column containing string values. It allows for the search and substitution of specific substrings or patterns in strings.

    Here are the primary parameters of the .str.replace method:

    • pat: The pattern or substring to search for.

    • repl: The replacement string or value.

    • regex: An optional boolean parameter (default is False) that specifies whether the pattern provided is a regular expression.

    • n: An optional parameter to control the maximum number of replacements. If not specified, it replaces all occurrences by default.

    Please see this link for more details.

# Iterate through each column in the DataFrame.
for Col in df.columns:
    # Replace 'LegendCarer^' with an empty string in the current column.
    df[Col] = df[Col].str.replace('LegendCarer^', '')
    
    # Replace 'LegendT' with an empty string in the current column.
    df[Col] = df[Col].str.replace('LegendT', '')
    
    # Replace 'LegendM' with an empty string in the current column.
    df[Col] = df[Col].str.replace('LegendM', '')

# Display the DataFrame after the specified replacements.
display(df)
DAY Max Temp Definition °C Min Temp Definition °C Mean Temp Definition °C Heat Deg Days Definition Cool Deg Days Definition Total Rain Definition mm Total Snow Definition cm Total Precip Definition mm Dir of Max Gust Definition 10's deg Spd of Max Gust Definition km/h
0 01 21.7 12.3 17.0 1.0 0.0 1.6 0.0 1.6 32 64
1 02 20.8 10.0 15.4 2.6 0.0 19.8 0.0 19.8 2 53
2 03 17.2 10.3 13.8 4.2 0.0 5.2 0.0 5.2 32 54
3 04 21.4 10.2 15.8 2.2 0.0 18.4 0.0 18.4 36 46
4 05 23.6 9.6 16.6 1.4 0.0 0.4 0.0 0.4 12 36
5 06 23.3 13.6 18.5 0.0 0.5 T 0.0 T 1 38
6 07 26.9 11.6 19.3 0.0 1.3 0.0 0.0 0.0 16 40
7 08 29.8 14.0 21.9 0.0 3.9 0.0 0.0 0.0 33 35
8 09 29.2 15.6 22.4 0.0 4.4 0.0 0.0 0.0 14 47
9 10 34.2 16.8 25.5 0.0 7.5 0.0 0.0 0.0 25 33
10 11 M M M M M 0.0 0.0 0.0 35 45
11 12 24.1 15.2 19.7 0.0 1.7 0.0 0.0 0.0 9 64
12 13 28.0 11.5 19.8 0.0 1.8 0.0 0.0 0.0 4 42
13 14 22.4 13.8 18.1 0.0 0.1 0.0 0.0 0.0 4 40
14 15 28.4 13.0 20.7 0.0 2.7 0.0 0.0 0.0 3 42
15 16 29.1 16.6 22.9 0.0 4.9 0.0 0.0 0.0 17 38
16 17 31.6 16.3 24.0 0.0 6.0 0.0 0.0 0.0 15 34
17 18 32.2 16.0 24.1 0.0 6.1 0.0 0.0 0.0 23 44
18 19 32.7 15.8 24.3 0.0 6.3 0.0 0.0 0.0 34 49
19 20 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
20 Sum NaN NaN NaN 11.4 47.2 45.4 0.0 45.4 NaN NaN
21 Avg 26.5 13.5 20.0 NaN NaN NaN NaN NaN NaN NaN
22 Xtrm 34.2 9.6 NaN NaN NaN 19.8 0.0 19.8 9 64LegendSS
  1. Pandas set_index Method:

    The set_index method in Pandas is used to set one or more columns as the index of a DataFrame. When you set an index, it means you are designating one or more columns to serve as the row labels, providing a way to uniquely identify and access rows in the DataFrame.

    Here’s the primary parameter of the set_index method:

    • keys: This parameter specifies the column or columns you want to use as the new index.

    df.set_index(keys, inplace=False)
    
    • df: Refers to the Pandas DataFrame.

    • keys: The column or columns that you want to set as the new index. This can be a single column name or a list of column names.

# Set the 'DAY' column as the index for the DataFrame.
df.set_index('DAY', inplace=True)

# Display the DataFrame with 'DAY' as the new index.
display(df)
Max Temp Definition °C Min Temp Definition °C Mean Temp Definition °C Heat Deg Days Definition Cool Deg Days Definition Total Rain Definition mm Total Snow Definition cm Total Precip Definition mm Dir of Max Gust Definition 10's deg Spd of Max Gust Definition km/h
DAY
01 21.7 12.3 17.0 1.0 0.0 1.6 0.0 1.6 32 64
02 20.8 10.0 15.4 2.6 0.0 19.8 0.0 19.8 2 53
03 17.2 10.3 13.8 4.2 0.0 5.2 0.0 5.2 32 54
04 21.4 10.2 15.8 2.2 0.0 18.4 0.0 18.4 36 46
05 23.6 9.6 16.6 1.4 0.0 0.4 0.0 0.4 12 36
06 23.3 13.6 18.5 0.0 0.5 T 0.0 T 1 38
07 26.9 11.6 19.3 0.0 1.3 0.0 0.0 0.0 16 40
08 29.8 14.0 21.9 0.0 3.9 0.0 0.0 0.0 33 35
09 29.2 15.6 22.4 0.0 4.4 0.0 0.0 0.0 14 47
10 34.2 16.8 25.5 0.0 7.5 0.0 0.0 0.0 25 33
11 M M M M M 0.0 0.0 0.0 35 45
12 24.1 15.2 19.7 0.0 1.7 0.0 0.0 0.0 9 64
13 28.0 11.5 19.8 0.0 1.8 0.0 0.0 0.0 4 42
14 22.4 13.8 18.1 0.0 0.1 0.0 0.0 0.0 4 40
15 28.4 13.0 20.7 0.0 2.7 0.0 0.0 0.0 3 42
16 29.1 16.6 22.9 0.0 4.9 0.0 0.0 0.0 17 38
17 31.6 16.3 24.0 0.0 6.0 0.0 0.0 0.0 15 34
18 32.2 16.0 24.1 0.0 6.1 0.0 0.0 0.0 23 44
19 32.7 15.8 24.3 0.0 6.3 0.0 0.0 0.0 34 49
20 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Sum NaN NaN NaN 11.4 47.2 45.4 0.0 45.4 NaN NaN
Avg 26.5 13.5 20.0 NaN NaN NaN NaN NaN NaN NaN
Xtrm 34.2 9.6 NaN NaN NaN 19.8 0.0 19.8 9 64LegendSS
  1. DataFrame.values: To access the underlying data as a NumPy array, you can do the following:

# Access the 'Max Temp Definition °C' column in the DataFrame.
df['Max Temp Definition °C']
DAY
01      21.7
02      20.8
03      17.2
04      21.4
05      23.6
06      23.3
07      26.9
08      29.8
09      29.2
10      34.2
11         M
12      24.1
13      28.0
14      22.4
15      28.4
16      29.1
17      31.6
18      32.2
19      32.7
20       NaN
Sum      NaN
Avg     26.5
Xtrm    34.2
Name: Max Temp Definition °C, dtype: object
# Retrieve the values from the 'Max Temp Definition °C' column in the DataFrame.
df['Max Temp Definition °C'].values
array(['21.7', '20.8', '17.2', '21.4', '23.6', '23.3', '26.9', '29.8',
       '29.2', '34.2', 'M', '24.1', '28.0', '22.4', '28.4', '29.1',
       '31.6', '32.2', '32.7', nan, nan, '26.5', '34.2'], dtype=object)
  1. Renaming Columns and Indices: To rename columns and indices, you can use the rename method. Here’s an example:

# Import the pprint function for pretty printing.
from pprint import pprint

# Create a dictionary for renaming columns by removing 'Definition' from their names.
column_rename_dict = dict(zip(df.columns, [x.replace('Definition', '') for x in df.columns]))

# Print the title for the column renaming dictionary.
print("Column Renaming Dictionary:")
pprint(column_rename_dict)

# Rename the columns in the DataFrame using the created dictionary.
df.rename(columns=column_rename_dict, inplace=True)

# Rename specific index labels for clarity.
df.rename(index={'01': '1st', '02': '2nd', '03': '3rd', '04': '4th', '05': '5th'}, inplace=True)

# Print the title for the DataFrame with renamed columns and index labels.
print("\nDataFrame with Renamed Columns and Index Labels:")
display(df)
Column Renaming Dictionary:
{'Cool Deg Days Definition': 'Cool Deg Days ',
 "Dir of Max Gust Definition 10's deg": "Dir of Max Gust  10's deg",
 'Heat Deg Days Definition': 'Heat Deg Days ',
 'Max Temp Definition °C': 'Max Temp  °C',
 'Mean Temp Definition °C': 'Mean Temp  °C',
 'Min Temp Definition °C': 'Min Temp  °C',
 'Spd of Max Gust Definition km/h': 'Spd of Max Gust  km/h',
 'Total Precip Definition mm': 'Total Precip  mm',
 'Total Rain Definition mm': 'Total Rain  mm',
 'Total Snow Definition cm': 'Total Snow  cm'}

DataFrame with Renamed Columns and Index Labels:
Max Temp °C Min Temp °C Mean Temp °C Heat Deg Days Cool Deg Days Total Rain mm Total Snow cm Total Precip mm Dir of Max Gust 10's deg Spd of Max Gust km/h
DAY
1st 21.7 12.3 17.0 1.0 0.0 1.6 0.0 1.6 32 64
2nd 20.8 10.0 15.4 2.6 0.0 19.8 0.0 19.8 2 53
3rd 17.2 10.3 13.8 4.2 0.0 5.2 0.0 5.2 32 54
4th 21.4 10.2 15.8 2.2 0.0 18.4 0.0 18.4 36 46
5th 23.6 9.6 16.6 1.4 0.0 0.4 0.0 0.4 12 36
06 23.3 13.6 18.5 0.0 0.5 T 0.0 T 1 38
07 26.9 11.6 19.3 0.0 1.3 0.0 0.0 0.0 16 40
08 29.8 14.0 21.9 0.0 3.9 0.0 0.0 0.0 33 35
09 29.2 15.6 22.4 0.0 4.4 0.0 0.0 0.0 14 47
10 34.2 16.8 25.5 0.0 7.5 0.0 0.0 0.0 25 33
11 M M M M M 0.0 0.0 0.0 35 45
12 24.1 15.2 19.7 0.0 1.7 0.0 0.0 0.0 9 64
13 28.0 11.5 19.8 0.0 1.8 0.0 0.0 0.0 4 42
14 22.4 13.8 18.1 0.0 0.1 0.0 0.0 0.0 4 40
15 28.4 13.0 20.7 0.0 2.7 0.0 0.0 0.0 3 42
16 29.1 16.6 22.9 0.0 4.9 0.0 0.0 0.0 17 38
17 31.6 16.3 24.0 0.0 6.0 0.0 0.0 0.0 15 34
18 32.2 16.0 24.1 0.0 6.1 0.0 0.0 0.0 23 44
19 32.7 15.8 24.3 0.0 6.3 0.0 0.0 0.0 34 49
20 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Sum NaN NaN NaN 11.4 47.2 45.4 0.0 45.4 NaN NaN
Avg 26.5 13.5 20.0 NaN NaN NaN NaN NaN NaN NaN
Xtrm 34.2 9.6 NaN NaN NaN 19.8 0.0 19.8 9 64LegendSS