Modifying a DataFrame in Pandas

5.4. Modifying a DataFrame in Pandas#

In data analysis, modifying a DataFrame is a fundamental operation. This section covers common methods for renaming columns, dropping columns or rows, and other essential modifications.

5.4.1. Renaming Columns#

Renaming columns in a DataFrame can be accomplished using the rename method. This method allows for specifying a dictionary where keys are the current column names and values are the new column names. This is particularly useful when you need to make your DataFrame more readable or align it with a specific format [Pandas Developers, 2023].

Example: The data in this example is sourced from Alberta Regional Dashboard [Government of Alberta, 2024].

import pandas as pd

# Sample DataFrame with population data from Alberta cities
df = pd.DataFrame({'Municipality': ['Calgary', 'Edmonton', 'Red Deer', 'Strathcona County', 'Lethbridge', 'Airdrie', 'Wood Buffalo'],
                   'Population 2022': [1413800, 1087803, 105883, 104330, 104254, 80222, 74532]
                   })
print('Original:')
display(df)

# Renaming columns
df.rename(columns={'Municipality': 'City', 'Population 2022': 'Population'}, inplace=True)

print('Renamed:')
display(df)

Original:

	Municipality	Population 2022
0	Calgary	1413800
1	Edmonton	1087803
2	Red Deer	105883
3	Strathcona County	104330
4	Lethbridge	104254
5	Airdrie	80222
6	Wood Buffalo	74532

Renamed:

	City	Population
0	Calgary	1413800
1	Edmonton	1087803
2	Red Deer	105883
3	Strathcona County	104330
4	Lethbridge	104254
5	Airdrie	80222
6	Wood Buffalo	74532

The inplace=True argument modifies the DataFrame directly without creating a copy.

5.4.2. Renaming using Axis-Style Parameters#

The rename method also supports using axis-style parameters to rename columns and index labels in a DataFrame. Here are examples using the population data from Alberta cities.

Renaming Columns with a Function: You can use a function to rename columns dynamically. For instance, converting column names to lowercase:

# Sample DataFrame with population data from Alberta cities
df = pd.DataFrame({'Municipality': ['Calgary', 'Edmonton', 'Red Deer', 'Strathcona County', 'Lethbridge', 'Airdrie', 'Wood Buffalo'],
                   'Population 2022': [1413800, 1087803, 105883, 104330, 104254, 80222, 74532]
                   })
print('Original:')
display(df)

# Renaming columns using a function
df.rename(str.lower, axis='columns', inplace=True)

print('Renamed:')
display(df)

Original:

	Municipality	Population 2022
0	Calgary	1413800
1	Edmonton	1087803
2	Red Deer	105883
3	Strathcona County	104330
4	Lethbridge	104254
5	Airdrie	80222
6	Wood Buffalo	74532

Renamed:

	municipality	population 2022
0	Calgary	1413800
1	Edmonton	1087803
2	Red Deer	105883
3	Strathcona County	104330
4	Lethbridge	104254
5	Airdrie	80222
6	Wood Buffalo	74532

Renaming Index Labels: You can also rename index labels by specifying a dictionary where keys are the current index labels and values are the new index labels:

# Sample DataFrame with population data from Alberta cities
df = pd.DataFrame({'Municipality': ['Calgary', 'Edmonton', 'Red Deer', 'Strathcona County', 'Lethbridge', 'Airdrie', 'Wood Buffalo'],
                                    'Population 2022': [1413800, 1087803, 105883, 104330, 104254, 80222, 74532]},
                                    index=[1, 2, 3, 4, 5, 6, 7])
print('Original:')
display(df)

# Renaming index labels
df.rename({1: 101, 2: 102, 3: 103, 4: 104, 5: 105, 6: 106, 7: 107}, axis='index', inplace=True)

print('Renamed:')
display(df)

Original:

	Municipality	Population 2022
1	Calgary	1413800
2	Edmonton	1087803
3	Red Deer	105883
4	Strathcona County	104330
5	Lethbridge	104254
6	Airdrie	80222
7	Wood Buffalo	74532

Renamed:

	Municipality	Population 2022
101	Calgary	1413800
102	Edmonton	1087803
103	Red Deer	105883
104	Strathcona County	104330
105	Lethbridge	104254
106	Airdrie	80222
107	Wood Buffalo	74532

These examples demonstrate different ways to rename columns and index labels, providing flexibility depending on the context and requirements of your DataFrame operations. For more details, you can refer to the Pandas documentation on renaming columns.

5.4.3. Dropping Columns or Rows#

The drop method is used to remove columns or rows from a DataFrame. This can be useful for cleaning up your data by removing unnecessary or irrelevant information.

Example - Dropping Columns: We will use the following data on vehicle registrations in Alberta cities [Government of Alberta, 2024]:

import pandas as pd

# Sample DataFrame with vehicle registrations data
df = pd.DataFrame({'Municipality': ['Calgary', 'Edmonton', 'Strathcona County', 'Red Deer', 'Lethbridge', 'Airdrie', 'Wood Buffalo'],
                   'Vehicle Registrations 2020': [1007282, 725796, 79271, 75533, 73964, 56838, 55905],
                   'Population 2020': [1413800, 1087803, 104330, 105883, 104254, 80222, 74532]  # Additional column for demonstration
                   })

print('Original:')
display(df)

# Dropping a single column
df.drop(columns=['Population 2020'], inplace=True)

print('After dropping one column:')
display(df)

# Dropping multiple columns
df.drop(columns=['Municipality', 'Vehicle Registrations 2020'], inplace=True)

print('After dropping multiple columns:')
display(df)

Original:

	Municipality	Vehicle Registrations 2020	Population 2020
0	Calgary	1007282	1413800
1	Edmonton	725796	1087803
2	Strathcona County	79271	104330
3	Red Deer	75533	105883
4	Lethbridge	73964	104254
5	Airdrie	56838	80222
6	Wood Buffalo	55905	74532

After dropping one column:

	Municipality	Vehicle Registrations 2020
0	Calgary	1007282
1	Edmonton	725796
2	Strathcona County	79271
3	Red Deer	75533
4	Lethbridge	73964
5	Airdrie	56838
6	Wood Buffalo	55905

After dropping multiple columns:


0
1
2
3
4
5
6

Example - Dropping Rows:

Rows can be dropped based on their index. This is useful when you need to remove specific entries from your dataset.

# Recreate the DataFrame for row dropping examples
df = pd.DataFrame({'Municipality': ['Calgary', 'Edmonton', 'Strathcona County', 'Red Deer', 'Lethbridge', 'Airdrie', 'Wood Buffalo'],
                   'Vehicle Registrations 2020': [1007282, 725796, 79271, 75533, 73964, 56838, 55905]},
                   index=[1, 2, 3, 4, 5, 6, 7])

print('Original:')
display(df)

# Dropping a single row
df.drop(index=1, inplace=True)

print('After dropping one row:')
display(df)

# Dropping multiple rows
df.drop(index=[2, 3], inplace=True)

print('After dropping multiple rows:')
display(df)

Original:

	Municipality	Vehicle Registrations 2020
1	Calgary	1007282
2	Edmonton	725796
3	Strathcona County	79271
4	Red Deer	75533
5	Lethbridge	73964
6	Airdrie	56838
7	Wood Buffalo	55905

After dropping one row:

	Municipality	Vehicle Registrations 2020
2	Edmonton	725796
3	Strathcona County	79271
4	Red Deer	75533
5	Lethbridge	73964
6	Airdrie	56838
7	Wood Buffalo	55905

After dropping multiple rows:

	Municipality	Vehicle Registrations 2020
4	Red Deer	75533
5	Lethbridge	73964
6	Airdrie	56838
7	Wood Buffalo	55905

Example - Dropping Rows and Columns Simultaneously:

The drop method also allows for simultaneously dropping both rows and columns by specifying both index and columns parameters.

# Recreate the DataFrame for combined dropping examples
df = pd.DataFrame({'Municipality': ['Calgary', 'Edmonton', 'Strathcona County', 'Red Deer', 'Lethbridge', 'Airdrie', 'Wood Buffalo'],
                   'Vehicle Registrations 2020': [1007282, 725796, 79271, 75533, 73964, 56838, 55905],
                   'Population 2020': [1413800, 1087803, 104330, 105883, 104254, 80222, 74532]}, index=[1, 2, 3, 4, 5, 6, 7])
print('Original:')
display(df)

# Dropping rows and columns simultaneously
df.drop(index=[4, 5], columns=['Population 2020'], inplace=True)

print('After dropping specific rows and a column:')
display(df)

Original:

	Municipality	Vehicle Registrations 2020	Population 2020
1	Calgary	1007282	1413800
2	Edmonton	725796	1087803
3	Strathcona County	79271	104330
4	Red Deer	75533	105883
5	Lethbridge	73964	104254
6	Airdrie	56838	80222
7	Wood Buffalo	55905	74532

After dropping specific rows and a column:

	Municipality	Vehicle Registrations 2020
1	Calgary	1007282
2	Edmonton	725796
3	Strathcona County	79271
6	Airdrie	56838
7	Wood Buffalo	55905

Example - Dropping Rows Based on Condition:

You can also drop rows based on a condition using boolean indexing.

# Recreate the DataFrame for conditional dropping examples
df = pd.DataFrame({'Municipality': ['Calgary', 'Edmonton', 'Strathcona County', 'Red Deer', 'Lethbridge', 'Airdrie', 'Wood Buffalo'],
                   'Vehicle Registrations 2020': [1007282, 725796, 79271, 75533, 73964, 56838, 55905],
                   'Population 2020': [1413800, 1087803, 104330, 105883, 104254, 80222, 74532]},
                    index=[1, 2, 3, 4, 5, 6, 7])

print('Original:')
display(df)

# Dropping rows where Vehicle Registrations 2020 is less than 100,000
df = df[df['Vehicle Registrations 2020'] >= 100000]

print('After dropping rows based on condition:')
display(df)

Original:

	Municipality	Vehicle Registrations 2020	Population 2020
1	Calgary	1007282	1413800
2	Edmonton	725796	1087803
3	Strathcona County	79271	104330
4	Red Deer	75533	105883
5	Lethbridge	73964	104254
6	Airdrie	56838	80222
7	Wood Buffalo	55905	74532

After dropping rows based on condition:

	Municipality	Vehicle Registrations 2020	Population 2020
1	Calgary	1007282	1413800
2	Edmonton	725796	1087803

For more information, check out the Pandas documentation on dropping rows and columns.

5.4.4. Adding/Modifying Columns#

New columns can be added directly by assigning a new series to a DataFrame. Existing columns can also be modified similarly. This is useful for creating new features or updating existing ones based on your analysis needs

# Setting new values, creating rows if needed
df.loc[new_row_label] = new_data
df['new_column'] = new_data

Advantages:

Convenient for adding new rows or columns without explicitly modifying the DataFrame shape.
Enables quick DataFrame expansion without explicit resizing.

Disadvantages:

Can lead to the unintentional creation of new rows or columns if not used carefully.
Might not be suitable for cases where strict control over data insertion is required.

Example:

import pandas as pd

# This example is from
# https://pandas.pydata.org/docs/getting_started/intro_tutorials/05_add_columns.html
# The Air Quality NO2 dataset if from
# http://dhhagan.github.io/py-openaq/index.html
df = pd.read_csv('https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/air_quality_no2.csv')
print(r'Air Quality NO2:')
df.head(8)

Air Quality NO2:

	datetime	station_antwerp	station_paris	station_london
0	2019-05-07 02:00:00	NaN	NaN	23.0
1	2019-05-07 03:00:00	50.5	25.0	19.0
2	2019-05-07 04:00:00	45.0	27.7	19.0
3	2019-05-07 05:00:00	NaN	50.4	16.0
4	2019-05-07 06:00:00	NaN	61.9	NaN
5	2019-05-07 07:00:00	NaN	72.4	26.0
6	2019-05-07 08:00:00	NaN	77.7	32.0
7	2019-05-07 09:00:00	NaN	67.9	32.0

The goal is to express the \(NO_2\) concentration at the London station in milligrams per cubic meter (mg/m³). This conversion is achieved under the conditions of 25 degrees Celsius and 1013 hPa pressure, using the specific conversion factor of 1.882 (Further information can be found here).

df["london_mg_per_cubic"] = df["station_london"] * 1.882
df.head(8)

	datetime	station_antwerp	station_paris	station_london	london_mg_per_cubic
0	2019-05-07 02:00:00	NaN	NaN	23.0	43.286
1	2019-05-07 03:00:00	50.5	25.0	19.0	35.758
2	2019-05-07 04:00:00	45.0	27.7	19.0	35.758
3	2019-05-07 05:00:00	NaN	50.4	16.0	30.112
4	2019-05-07 06:00:00	NaN	61.9	NaN	NaN
5	2019-05-07 07:00:00	NaN	72.4	26.0	48.932
6	2019-05-07 08:00:00	NaN	77.7	32.0	60.224
7	2019-05-07 09:00:00	NaN	67.9	32.0	60.224

For more details on modifying DataFrames, you can refer to the Pandas documentation on DataFrame operations.

Example - Daily Weather Data Report for October 2023 (Calgary Int’l Airport, Alberta):

This example is meant to assist in summarizing the above.

import pandas as pd

# Read the HTML data from the specified URL and store it as a list of DataFrames.
df_list = pd.read_html('https://climate.weather.gc.ca/climate_data/daily_data_e.html?StationID=50430')

# Select the first DataFrame from the list, which contains the climate data.
df = df_list[0]

# Display the DataFrame containing the climate data for analysis.
display(df)

	DAY	Max Temp Definition °C	Min Temp Definition °C	Mean Temp Definition °C	Heat Deg Days Definition	Cool Deg Days Definition	Total Rain Definition mm	Total Snow Definition cm	Total Precip Definition mm	Snow on Grnd Definition cm	Dir of Max Gust Definition 10's deg	Spd of Max Gust Definition km/h
0	01	21.7	12.3	17.0	1.0	0.0	1.6	0.0	1.6	NaN	32	64
1	02	20.8	10.0	15.4	2.6	0.0	19.8	0.0	19.8	NaN	2	53
2	03	17.2	10.3	13.8	4.2	0.0	5.2	0.0	5.2	NaN	32	54
3	04	21.4	10.2	15.8	2.2	0.0	18.4	0.0	18.4	NaN	36	46
4	05	23.6	9.6	16.6	1.4	0.0	0.4	0.0	0.4	NaN	12	36
5	06	23.3	13.6	18.5	0.0	0.5	LegendTT	0.0	LegendTT	NaN	1	38
6	07	26.9	11.6	19.3	0.0	1.3	0.0	0.0	0.0	NaN	16	40
7	08	29.8	14.0	21.9	0.0	3.9	0.0	0.0	0.0	NaN	33	35
8	09	29.2	15.6	22.4	0.0	4.4	0.0	0.0	0.0	NaN	14	47
9	10	34.2	16.8	25.5	0.0	7.5	0.0	0.0	0.0	NaN	25	33
10	11	LegendMM	LegendMM	LegendMM	LegendMM	LegendMM	0.0	0.0	0.0	NaN	35	45
11	12	24.1	15.2	19.7	0.0	1.7	0.0	0.0	0.0	NaN	9	64
12	13	28.0	11.5	19.8	0.0	1.8	0.0	0.0	0.0	NaN	4	42
13	14	22.4	13.8	18.1	0.0	0.1	0.0	0.0	0.0	NaN	4	40
14	15	28.4	13.0	20.7	0.0	2.7	0.0	0.0	0.0	NaN	3	42
15	16	29.1	16.6	22.9	0.0	4.9	0.0	0.0	0.0	NaN	17	38
16	17	31.6	16.3	24.0	0.0	6.0	0.0	0.0	0.0	NaN	15	34
17	18	32.2	16.0	24.1	0.0	6.1	0.0	0.0	0.0	NaN	23	44
18	19	32.7	15.8	24.3	0.0	6.3	0.0	0.0	0.0	NaN	34	49
19	20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
20	Sum	NaN	NaN	NaN	11.4LegendCarer^	47.2LegendCarer^	45.4LegendCarer^	0.0LegendCarer^	45.4LegendCarer^	NaN	NaN	NaN
21	Avg	26.5LegendCarer^	13.5LegendCarer^	20.0LegendCarer^	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
22	Xtrm	34.2LegendCarer^	9.6LegendCarer^	NaN	NaN	NaN	19.8LegendCarer^	0.0LegendCarer^	19.8LegendCarer^	NaN	9LegendCarer^	64LegendCarer^LegendSS
23	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...

Dropping Columns:

Dropping columns in Pandas is accomplished using the drop method, specifying the columns to be removed. The primary parameters for this operation are as follows:

- `labels`: A list or single label representing the column(s) to be dropped.
- `axis`: Specifies the axis along which the drop operation will occur. Use `axis=1` for columns.
- `inplace`: An optional parameter, when set to `True`, modifies the original DataFrame; otherwise, it returns a new DataFrame with the specified columns removed.

Here is a code example to drop a column named 'ColumnName':

```python
df.drop(columns='ColumnName', inplace=True)
```

Please see this link for more details.

# Remove the column 'Snow on Grnd Definition cm' from the DataFrame.
df.drop(columns='Snow on Grnd Definition cm', inplace=True)

# Display the DataFrame after removing the specified column.
display(df)

	DAY	Max Temp Definition °C	Min Temp Definition °C	Mean Temp Definition °C	Heat Deg Days Definition	Cool Deg Days Definition	Total Rain Definition mm	Total Snow Definition cm	Total Precip Definition mm	Dir of Max Gust Definition 10's deg	Spd of Max Gust Definition km/h
0	01	21.7	12.3	17.0	1.0	0.0	1.6	0.0	1.6	32	64
1	02	20.8	10.0	15.4	2.6	0.0	19.8	0.0	19.8	2	53
2	03	17.2	10.3	13.8	4.2	0.0	5.2	0.0	5.2	32	54
3	04	21.4	10.2	15.8	2.2	0.0	18.4	0.0	18.4	36	46
4	05	23.6	9.6	16.6	1.4	0.0	0.4	0.0	0.4	12	36
5	06	23.3	13.6	18.5	0.0	0.5	LegendTT	0.0	LegendTT	1	38
6	07	26.9	11.6	19.3	0.0	1.3	0.0	0.0	0.0	16	40
7	08	29.8	14.0	21.9	0.0	3.9	0.0	0.0	0.0	33	35
8	09	29.2	15.6	22.4	0.0	4.4	0.0	0.0	0.0	14	47
9	10	34.2	16.8	25.5	0.0	7.5	0.0	0.0	0.0	25	33
10	11	LegendMM	LegendMM	LegendMM	LegendMM	LegendMM	0.0	0.0	0.0	35	45
11	12	24.1	15.2	19.7	0.0	1.7	0.0	0.0	0.0	9	64
12	13	28.0	11.5	19.8	0.0	1.8	0.0	0.0	0.0	4	42
13	14	22.4	13.8	18.1	0.0	0.1	0.0	0.0	0.0	4	40
14	15	28.4	13.0	20.7	0.0	2.7	0.0	0.0	0.0	3	42
15	16	29.1	16.6	22.9	0.0	4.9	0.0	0.0	0.0	17	38
16	17	31.6	16.3	24.0	0.0	6.0	0.0	0.0	0.0	15	34
17	18	32.2	16.0	24.1	0.0	6.1	0.0	0.0	0.0	23	44
18	19	32.7	15.8	24.3	0.0	6.3	0.0	0.0	0.0	34	49
19	20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
20	Sum	NaN	NaN	NaN	11.4LegendCarer^	47.2LegendCarer^	45.4LegendCarer^	0.0LegendCarer^	45.4LegendCarer^	NaN	NaN
21	Avg	26.5LegendCarer^	13.5LegendCarer^	20.0LegendCarer^	NaN	NaN	NaN	NaN	NaN	NaN	NaN
22	Xtrm	34.2LegendCarer^	9.6LegendCarer^	NaN	NaN	NaN	19.8LegendCarer^	0.0LegendCarer^	19.8LegendCarer^	9LegendCarer^	64LegendCarer^LegendSS
23	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...

Dropping Rows:

Similarly, dropping rows in Pandas is achieved using the drop method, with the primary parameters as follows:
- labels: A list or single label representing the row(s) to be dropped.
- axis: Use axis=0 (which is the default) to drop rows. You don’t need to specify axis explicitly for dropping rows.
- inplace: As before, setting this to True will modify the original DataFrame.
To drop rows by index, you can use the following code as an example:
```
df.drop(index=[0, 1, 2], inplace=True)
```
This code will remove the rows with index 0, 1, and 2 from the DataFrame ‘df’.

Please see this link for more details.

# Remove the row with index 7 from the DataFrame.
df.drop(df.index[-1], inplace=True)

# Display the DataFrame after removing the specified row.
display(df)

	DAY	Max Temp Definition °C	Min Temp Definition °C	Mean Temp Definition °C	Heat Deg Days Definition	Cool Deg Days Definition	Total Rain Definition mm	Total Snow Definition cm	Total Precip Definition mm	Dir of Max Gust Definition 10's deg	Spd of Max Gust Definition km/h
0	01	21.7	12.3	17.0	1.0	0.0	1.6	0.0	1.6	32	64
1	02	20.8	10.0	15.4	2.6	0.0	19.8	0.0	19.8	2	53
2	03	17.2	10.3	13.8	4.2	0.0	5.2	0.0	5.2	32	54
3	04	21.4	10.2	15.8	2.2	0.0	18.4	0.0	18.4	36	46
4	05	23.6	9.6	16.6	1.4	0.0	0.4	0.0	0.4	12	36
5	06	23.3	13.6	18.5	0.0	0.5	LegendTT	0.0	LegendTT	1	38
6	07	26.9	11.6	19.3	0.0	1.3	0.0	0.0	0.0	16	40
7	08	29.8	14.0	21.9	0.0	3.9	0.0	0.0	0.0	33	35
8	09	29.2	15.6	22.4	0.0	4.4	0.0	0.0	0.0	14	47
9	10	34.2	16.8	25.5	0.0	7.5	0.0	0.0	0.0	25	33
10	11	LegendMM	LegendMM	LegendMM	LegendMM	LegendMM	0.0	0.0	0.0	35	45
11	12	24.1	15.2	19.7	0.0	1.7	0.0	0.0	0.0	9	64
12	13	28.0	11.5	19.8	0.0	1.8	0.0	0.0	0.0	4	42
13	14	22.4	13.8	18.1	0.0	0.1	0.0	0.0	0.0	4	40
14	15	28.4	13.0	20.7	0.0	2.7	0.0	0.0	0.0	3	42
15	16	29.1	16.6	22.9	0.0	4.9	0.0	0.0	0.0	17	38
16	17	31.6	16.3	24.0	0.0	6.0	0.0	0.0	0.0	15	34
17	18	32.2	16.0	24.1	0.0	6.1	0.0	0.0	0.0	23	44
18	19	32.7	15.8	24.3	0.0	6.3	0.0	0.0	0.0	34	49
19	20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
20	Sum	NaN	NaN	NaN	11.4LegendCarer^	47.2LegendCarer^	45.4LegendCarer^	0.0LegendCarer^	45.4LegendCarer^	NaN	NaN
21	Avg	26.5LegendCarer^	13.5LegendCarer^	20.0LegendCarer^	NaN	NaN	NaN	NaN	NaN	NaN	NaN
22	Xtrm	34.2LegendCarer^	9.6LegendCarer^	NaN	NaN	NaN	19.8LegendCarer^	0.0LegendCarer^	19.8LegendCarer^	9LegendCarer^	64LegendCarer^LegendSS

In this code:

df is your DataFrame.
df.index[-1] retrieves the index label of the last row in the DataFrame.
drop is used to remove the row with the specified index label.
inplace=True is used to modify the DataFrame in place.

Column Names: To access the column names in a Pandas DataFrame, you can use the .columns attribute. Here’s how to do it:
```
# Assuming df is your DataFrame
column_names = df.columns
```
The df.columns attribute returns an Index object containing the names of all the columns in the DataFrame. You can then convert this Index object to a list if needed:
```
column_names_list = list(column_names)
```
Please see this link for more details.

# Retrieve and display the column names of the DataFrame.
df.columns

Index(['DAY', 'Max Temp Definition °C', 'Min Temp Definition °C',
       'Mean Temp Definition °C', 'Heat Deg Days Definition',
       'Cool Deg Days Definition', 'Total Rain Definition mm',
       'Total Snow Definition cm', 'Total Precip Definition mm',
       'Dir of Max Gust Definition 10's deg',
       'Spd of Max Gust Definition km/h'],
      dtype='object')

Replace each occurrence of pattern/regex (Optional Content):

The .str.replace method in Pandas is used to perform string replacement within each element of a Pandas Series or DataFrame column containing string values. It allows for the search and substitution of specific substrings or patterns in strings.

Here are the primary parameters of the .str.replace method:
- pat: The pattern or substring to search for.
- repl: The replacement string or value.
- regex: An optional boolean parameter (default is False) that specifies whether the pattern provided is a regular expression.
- n: An optional parameter to control the maximum number of replacements. If not specified, it replaces all occurrences by default.
Please see this link for more details.

# Iterate through each column in the DataFrame.
for Col in df.columns:
    # Replace 'LegendCarer^' with an empty string in the current column.
    df[Col] = df[Col].str.replace('LegendCarer^', '')
    
    # Replace 'LegendT' with an empty string in the current column.
    df[Col] = df[Col].str.replace('LegendT', '')
    
    # Replace 'LegendM' with an empty string in the current column.
    df[Col] = df[Col].str.replace('LegendM', '')

# Display the DataFrame after the specified replacements.
display(df)

	DAY	Max Temp Definition °C	Min Temp Definition °C	Mean Temp Definition °C	Heat Deg Days Definition	Cool Deg Days Definition	Total Rain Definition mm	Total Snow Definition cm	Total Precip Definition mm	Dir of Max Gust Definition 10's deg	Spd of Max Gust Definition km/h
0	01	21.7	12.3	17.0	1.0	0.0	1.6	0.0	1.6	32	64
1	02	20.8	10.0	15.4	2.6	0.0	19.8	0.0	19.8	2	53
2	03	17.2	10.3	13.8	4.2	0.0	5.2	0.0	5.2	32	54
3	04	21.4	10.2	15.8	2.2	0.0	18.4	0.0	18.4	36	46
4	05	23.6	9.6	16.6	1.4	0.0	0.4	0.0	0.4	12	36
5	06	23.3	13.6	18.5	0.0	0.5	T	0.0	T	1	38
6	07	26.9	11.6	19.3	0.0	1.3	0.0	0.0	0.0	16	40
7	08	29.8	14.0	21.9	0.0	3.9	0.0	0.0	0.0	33	35
8	09	29.2	15.6	22.4	0.0	4.4	0.0	0.0	0.0	14	47
9	10	34.2	16.8	25.5	0.0	7.5	0.0	0.0	0.0	25	33
10	11	M	M	M	M	M	0.0	0.0	0.0	35	45
11	12	24.1	15.2	19.7	0.0	1.7	0.0	0.0	0.0	9	64
12	13	28.0	11.5	19.8	0.0	1.8	0.0	0.0	0.0	4	42
13	14	22.4	13.8	18.1	0.0	0.1	0.0	0.0	0.0	4	40
14	15	28.4	13.0	20.7	0.0	2.7	0.0	0.0	0.0	3	42
15	16	29.1	16.6	22.9	0.0	4.9	0.0	0.0	0.0	17	38
16	17	31.6	16.3	24.0	0.0	6.0	0.0	0.0	0.0	15	34
17	18	32.2	16.0	24.1	0.0	6.1	0.0	0.0	0.0	23	44
18	19	32.7	15.8	24.3	0.0	6.3	0.0	0.0	0.0	34	49
19	20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
20	Sum	NaN	NaN	NaN	11.4	47.2	45.4	0.0	45.4	NaN	NaN
21	Avg	26.5	13.5	20.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
22	Xtrm	34.2	9.6	NaN	NaN	NaN	19.8	0.0	19.8	9	64LegendSS

Pandas set_index Method:

The set_index method in Pandas is used to set one or more columns as the index of a DataFrame. When you set an index, it means you are designating one or more columns to serve as the row labels, providing a way to uniquely identify and access rows in the DataFrame.

Here’s the primary parameter of the set_index method:
- keys: This parameter specifies the column or columns you want to use as the new index.
```
df.set_index(keys, inplace=False)
```
- df: Refers to the Pandas DataFrame.
- keys: The column or columns that you want to set as the new index. This can be a single column name or a list of column names.

# Set the 'DAY' column as the index for the DataFrame.
df.set_index('DAY', inplace=True)

# Display the DataFrame with 'DAY' as the new index.
display(df)

	Max Temp Definition °C	Min Temp Definition °C	Mean Temp Definition °C	Heat Deg Days Definition	Cool Deg Days Definition	Total Rain Definition mm	Total Snow Definition cm	Total Precip Definition mm	Dir of Max Gust Definition 10's deg	Spd of Max Gust Definition km/h
DAY
01	21.7	12.3	17.0	1.0	0.0	1.6	0.0	1.6	32	64
02	20.8	10.0	15.4	2.6	0.0	19.8	0.0	19.8	2	53
03	17.2	10.3	13.8	4.2	0.0	5.2	0.0	5.2	32	54
04	21.4	10.2	15.8	2.2	0.0	18.4	0.0	18.4	36	46
05	23.6	9.6	16.6	1.4	0.0	0.4	0.0	0.4	12	36
06	23.3	13.6	18.5	0.0	0.5	T	0.0	T	1	38
07	26.9	11.6	19.3	0.0	1.3	0.0	0.0	0.0	16	40
08	29.8	14.0	21.9	0.0	3.9	0.0	0.0	0.0	33	35
09	29.2	15.6	22.4	0.0	4.4	0.0	0.0	0.0	14	47
10	34.2	16.8	25.5	0.0	7.5	0.0	0.0	0.0	25	33
11	M	M	M	M	M	0.0	0.0	0.0	35	45
12	24.1	15.2	19.7	0.0	1.7	0.0	0.0	0.0	9	64
13	28.0	11.5	19.8	0.0	1.8	0.0	0.0	0.0	4	42
14	22.4	13.8	18.1	0.0	0.1	0.0	0.0	0.0	4	40
15	28.4	13.0	20.7	0.0	2.7	0.0	0.0	0.0	3	42
16	29.1	16.6	22.9	0.0	4.9	0.0	0.0	0.0	17	38
17	31.6	16.3	24.0	0.0	6.0	0.0	0.0	0.0	15	34
18	32.2	16.0	24.1	0.0	6.1	0.0	0.0	0.0	23	44
19	32.7	15.8	24.3	0.0	6.3	0.0	0.0	0.0	34	49
20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Sum	NaN	NaN	NaN	11.4	47.2	45.4	0.0	45.4	NaN	NaN
Avg	26.5	13.5	20.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Xtrm	34.2	9.6	NaN	NaN	NaN	19.8	0.0	19.8	9	64LegendSS

DataFrame.values: To access the underlying data as a NumPy array, you can do the following:

# Access the 'Max Temp Definition °C' column in the DataFrame.
df['Max Temp Definition °C']

DAY
    21.7
    20.8
    17.2
    21.4
    23.6
    23.3
    26.9
    29.8
    29.2
    34.2
       M
    24.1
    28.0
    22.4
    28.4
    29.1
    31.6
    32.2
    32.7
     NaN
Sum      NaN
Avg     26.5
Xtrm    34.2
Name: Max Temp Definition °C, dtype: object

# Retrieve the values from the 'Max Temp Definition °C' column in the DataFrame.
df['Max Temp Definition °C'].values

array(['21.7', '20.8', '17.2', '21.4', '23.6', '23.3', '26.9', '29.8',
       '29.2', '34.2', 'M', '24.1', '28.0', '22.4', '28.4', '29.1',
       '31.6', '32.2', '32.7', nan, nan, '26.5', '34.2'], dtype=object)

Renaming Columns and Indices: To rename columns and indices, you can use the rename method. Here’s an example:

# Import the pprint function for pretty printing.
from pprint import pprint

# Create a dictionary for renaming columns by removing 'Definition' from their names.
column_rename_dict = dict(zip(df.columns, [x.replace('Definition', '') for x in df.columns]))

# Print the title for the column renaming dictionary.
print("Column Renaming Dictionary:")
pprint(column_rename_dict)

# Rename the columns in the DataFrame using the created dictionary.
df.rename(columns=column_rename_dict, inplace=True)

# Rename specific index labels for clarity.
df.rename(index={'01': '1st', '02': '2nd', '03': '3rd', '04': '4th', '05': '5th'}, inplace=True)

# Print the title for the DataFrame with renamed columns and index labels.
print("\nDataFrame with Renamed Columns and Index Labels:")
display(df)

Column Renaming Dictionary:
{'Cool Deg Days Definition': 'Cool Deg Days ',
 "Dir of Max Gust Definition 10's deg": "Dir of Max Gust  10's deg",
 'Heat Deg Days Definition': 'Heat Deg Days ',
 'Max Temp Definition °C': 'Max Temp  °C',
 'Mean Temp Definition °C': 'Mean Temp  °C',
 'Min Temp Definition °C': 'Min Temp  °C',
 'Spd of Max Gust Definition km/h': 'Spd of Max Gust  km/h',
 'Total Precip Definition mm': 'Total Precip  mm',
 'Total Rain Definition mm': 'Total Rain  mm',
 'Total Snow Definition cm': 'Total Snow  cm'}

DataFrame with Renamed Columns and Index Labels:

	Max Temp °C	Min Temp °C	Mean Temp °C	Heat Deg Days	Cool Deg Days	Total Rain mm	Total Snow cm	Total Precip mm	Dir of Max Gust 10's deg	Spd of Max Gust km/h
DAY
1st	21.7	12.3	17.0	1.0	0.0	1.6	0.0	1.6	32	64
2nd	20.8	10.0	15.4	2.6	0.0	19.8	0.0	19.8	2	53
3rd	17.2	10.3	13.8	4.2	0.0	5.2	0.0	5.2	32	54
4th	21.4	10.2	15.8	2.2	0.0	18.4	0.0	18.4	36	46
5th	23.6	9.6	16.6	1.4	0.0	0.4	0.0	0.4	12	36
06	23.3	13.6	18.5	0.0	0.5	T	0.0	T	1	38
07	26.9	11.6	19.3	0.0	1.3	0.0	0.0	0.0	16	40
08	29.8	14.0	21.9	0.0	3.9	0.0	0.0	0.0	33	35
09	29.2	15.6	22.4	0.0	4.4	0.0	0.0	0.0	14	47
10	34.2	16.8	25.5	0.0	7.5	0.0	0.0	0.0	25	33
11	M	M	M	M	M	0.0	0.0	0.0	35	45
12	24.1	15.2	19.7	0.0	1.7	0.0	0.0	0.0	9	64
13	28.0	11.5	19.8	0.0	1.8	0.0	0.0	0.0	4	42
14	22.4	13.8	18.1	0.0	0.1	0.0	0.0	0.0	4	40
15	28.4	13.0	20.7	0.0	2.7	0.0	0.0	0.0	3	42
16	29.1	16.6	22.9	0.0	4.9	0.0	0.0	0.0	17	38
17	31.6	16.3	24.0	0.0	6.0	0.0	0.0	0.0	15	34
18	32.2	16.0	24.1	0.0	6.1	0.0	0.0	0.0	23	44
19	32.7	15.8	24.3	0.0	6.3	0.0	0.0	0.0	34	49
20	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Sum	NaN	NaN	NaN	11.4	47.2	45.4	0.0	45.4	NaN	NaN
Avg	26.5	13.5	20.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Xtrm	34.2	9.6	NaN	NaN	NaN	19.8	0.0	19.8	9	64LegendSS