An Introduction to Pandas

6.1. An Introduction to Pandas#

Pandas is a Python library, open-source in nature, that offers robust data structures and tools for data analysis. It is specifically designed to simplify and streamline data manipulation and analysis tasks, making them more efficient and straightforward. The name “Pandas” is derived from “Panel Data,” which refers to multi-dimensional structured datasets commonly used in econometrics and finance [Pandas Developers, 2023]. Pandas comes equipped with a diverse set of functions and methods for data handling, including reading and writing data from various file formats like CSV, Excel, and SQL databases. It facilitates data cleaning, transformation, aggregation, and addressing missing values. Furthermore, Pandas supports merging and joining datasets and offers functionality for time series operations [Pandas Developers, 2023]. Widely adopted in data science, machine learning, finance, economics, and numerous other fields, Pandas is favored for its ability to simplify and expedite data processing and analysis workflows. Its clear and intuitive syntax makes it accessible to both novices and experienced data scientists [Pandas Developers, 2023].

>>> pip install pandas

Here is a brief summary of key Python Pandas features:

6.1.1. Data Structures:#

Series: The Series is akin to a one-dimensional labeled array, similar to a NumPy array but with an associated index. This index provides meaningful labels for each element in the Series, allowing for effortless data alignment and retrieval [Pandas Developers, 2023].
DataFrame: The DataFrame is a two-dimensional labeled data structure, similar to a spreadsheet or SQL table. It consists of rows and columns, where each column can accommodate various data types. DataFrames provide a versatile and potent method for working with structured data, enabling operations such as filtering, joining, grouping, and more [Pandas Developers, 2023].

Example - Series: A Pandas Series object can be instantiated through the implementation of the pd.Series constructor.

import pandas as pd

# Create a Pandas Series with custom index
data = pd.Series([10, 20, 30, 40], index=['A', 'B', 'C', 'D'])

# Print the Pandas Series
print("Pandas Series:")
print(data)

Pandas Series:
A    10
B    20
C    30
D    40
dtype: int64

Example - DataFrame: A Pandas DataFrame object can be created by utilizing the pd.DataFrame constructor.

import pandas as pd

# Create a DataFrame from a dictionary
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 22],
        'City': ['Calgary', 'Edmonton', 'Red Deer']
        }

df = pd.DataFrame(data)

# Display the DataFrame
print("DataFrame:")
display(df)

DataFrame:

	Name	Age	City
0	Alice	25	Calgary
1	Bob	30	Edmonton
2	Charlie	22	Red Deer

6.1.2. Reading and Writing Data:#

Pandas is a powerful Python library that provides data manipulation and analysis tools. It’s widely used for tasks like reading and writing data in various formats. Here’s how you can use Pandas to read and write data [Pandas Developers, 2023]:

6.1.2.1. Reading Data#

Pandas can read data from various file formats like CSV, Excel, SQL databases, and more. The most commonly used method is pandas.read_csv() for reading CSV files.

import pandas as pd

Link = 'https://download.microsoft.com/download/4/C/8/4C830C0C-101F-4BF2-8FCB-32D9A8BA906A/Import_User_Sample_en.csv'
# Read a CSV file into a DataFrame
data = pd.read_csv(Link)

# Display the DataFrame
print("The DataFrame:")
display(data)

The DataFrame:

	User Name	First Name	Last Name	Display Name	Job Title	Department	Office Number	Office Phone	Mobile Phone	Fax	Address	City	State or Province	ZIP or Postal Code	Country or Region
0	chris@contoso.com	Chris	Green	Chris Green	IT Manager	Information Technology	123451	123-555-1211	123-555-6641	123-555-9821	1 Microsoft way	Redmond	Wa	98052	United States
1	ben@contoso.com	Ben	Andrews	Ben Andrews	IT Manager	Information Technology	123452	123-555-1212	123-555-6642	123-555-9822	1 Microsoft way	Redmond	Wa	98052	United States
2	david@contoso.com	David	Longmuir	David Longmuir	IT Manager	Information Technology	123453	123-555-1213	123-555-6643	123-555-9823	1 Microsoft way	Redmond	Wa	98052	United States
3	cynthia@contoso.com	Cynthia	Carey	Cynthia Carey	IT Manager	Information Technology	123454	123-555-1214	123-555-6644	123-555-9824	1 Microsoft way	Redmond	Wa	98052	United States
4	melissa@contoso.com	Melissa	MacBeth	Melissa MacBeth	IT Manager	Information Technology	123455	123-555-1215	123-555-6645	123-555-9825	1 Microsoft way	Redmond	Wa	98052	United States

You can also read Excel files using pandas.read_excel():

import pandas as pd

# Read an Excel file into a DataFrame
# Here the excel file is hosted on the web
data_excel = pd.read_excel('https://go.microsoft.com/fwlink/?LinkID=521962', sheet_name='Sheet1')

# Display the first five rows of the DataFrame
print("First five rows of the DataFrame:")
display(data_excel.head())

# The default behavior of Pandas' `head()` method is to display the first 5 rows of a DataFrame.

First five rows of the DataFrame:

	Segment	Country	Product	Discount Band	Units Sold	Manufacturing Price	Sale Price	Gross Sales	Sales	COGS	Profit	Date	Month Number	Month Name	Year
0	Government	Canada	Carretera	NaN	1618.5	3	20	32370.0	32370.0	16185.0	16185.0	2014-01-01	1	January	2014
1	Government	Germany	Carretera	NaN	1321.0	3	20	26420.0	26420.0	13210.0	13210.0	2014-01-01	1	January	2014
2	Midmarket	France	Carretera	NaN	2178.0	3	15	32670.0	32670.0	21780.0	10890.0	2014-06-01	6	June	2014
3	Midmarket	Germany	Carretera	NaN	888.0	3	15	13320.0	13320.0	8880.0	4440.0	2014-06-01	6	June	2014
4	Midmarket	Mexico	Carretera	NaN	2470.0	3	15	37050.0	37050.0	24700.0	12350.0	2014-06-01	6	June	2014

6.1.2.2. Writing and Exporting Data#

Pandas provides a versatile set of tools for exporting data to a variety of formats. One of the frequently employed techniques is using the DataFrame.to_csv() method, which facilitates the export of data to a CSV (Comma-Separated Values) file:

Example:

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 22],
        'City': ['Calgary', 'Edmonton', 'Red Deer']
        }

df = pd.DataFrame(data)

# Write the DataFrame to a CSV file
csv_filename = 'data.csv'
df.to_csv(csv_filename, index=False)

# Print a message indicating that the data has been written
print(f"Data written to {csv_filename}")

Data written to data.csv

For writing to Excel files, you can use DataFrame.to_excel():

# Write DataFrame to an Excel file
df.to_excel('new_data.xlsx', index=False)

Pandas can also write data to various other formats like SQL databases, JSON, and more [Pandas Developers, 2023].

Additional Options:

When reading or writing data, Pandas offers various options to customize the behavior [Pandas Developers, 2023]:

You can specify delimiter, encoding, header presence, and more when reading CSV files using read_csv().
When writing CSV files, you can control the delimiter, whether to include headers, and more using the to_csv() method’s parameters.
For Excel files, you can specify the sheet name using the sheet_name parameter with to_excel() and read_excel().

Remember that Pandas provides extensive documentation with examples for these methods, so you can refer to the official documentation for more details: Pandas IO documentation

Keep in mind that Pandas is just one tool in the data manipulation and analysis ecosystem. Depending on your needs, other libraries like NumPy, Matplotlib/Seaborn for visualization, and scikit-learn for machine learning might be useful as well.

6.1.2.3. Pandas Basics#

Some basic Pandas commands along with their descriptions:

pd.DataFrame(data): Create a DataFrame from data like a dictionary, array, or list.
data.info(): Display basic information about the DataFrame, including data types and non-null counts.
data.head(n): Display the first n rows of the DataFrame (default is 5).
data.tail(n): Display the last n rows of the DataFrame (default is 5).
data.describe(): Display summary statistics of numerical columns (count, mean, std, min, max, quartiles).
data.shape: Returns the number of rows and columns in the DataFrame as a tuple.
data.columns: Access the column labels of the DataFrame.

Command	Description
`pd.DataFrame(data)`	Create a DataFrame from data like a dictionary, array, or list.
`data.info()`	Display basic information about the DataFrame, including data types and non-null counts.
`data.head(n)`	Display the first n rows of the DataFrame (default is 5).
`data.tail(n)`	Display the last n rows of the DataFrame (default is 5).
`data.describe()`	Display summary statistics of numerical columns (count, mean, std, min, max, quartiles).
`data.shape`	Returns the number of rows and columns in the DataFrame as a tuple.
`data.columns`	Access the column labels of the DataFrame.

Example:

import pandas as pd
import numpy as np
 
# Create the DataFrame
data = pd.DataFrame({'A': np.arange(0, 100),
                     'B': np.arange(1000, 900, -1)})

# Display basic DataFrame information
print("Displaying DataFrame Information:")
display(data)

# Get DataFrame information using data.info()
print("\nDataFrame Info:")
data.info()

# Display the first 10 rows
n = 10
print(f"\nDisplaying First {n} Rows:")
display(data.head(n))

# Display the last 10 rows
print(f"\nDisplaying Last {n} Rows:")
display(data.tail(n))

# Display summary statistics
print("\nSummary Statistics:")
display(data.describe())

# Get the shape of the DataFrame
print("\nDataFrame Shape:")
print("Number of rows and columns:", data.shape)

# Get column labels
print("\nColumn Labels:")
print("Column labels:", data.columns)

Displaying DataFrame Information:

	A	B
0	0	1000
1	1	999
2	2	998
3	3	997
4	4	996
...	...	...
95	95	905
96	96	904
97	97	903
98	98	902
99	99	901

100 rows × 2 columns

DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       100 non-null    int32
 1   B       100 non-null    int32
dtypes: int32(2)
memory usage: 932.0 bytes

Displaying First 10 Rows:

	A	B
0	0	1000
1	1	999
2	2	998
3	3	997
4	4	996
5	5	995
6	6	994
7	7	993
8	8	992
9	9	991

Displaying Last 10 Rows:

	A	B
90	90	910
91	91	909
92	92	908
93	93	907
94	94	906
95	95	905
96	96	904
97	97	903
98	98	902
99	99	901

Summary Statistics:

	A	B
count	100.000000	100.000000
mean	49.500000	950.500000
std	29.011492	29.011492
min	0.000000	901.000000
25%	24.750000	925.750000
50%	49.500000	950.500000
75%	74.250000	975.250000
max	99.000000	1000.000000

DataFrame Shape:
Number of rows and columns: (100, 2)

Column Labels:
Column labels: Index(['A', 'B'], dtype='object')

Note

By default, the head() and tail() functions display the first or last 5 rows of the DataFrame. However, you can specify a different number of rows to display by passing an argument to these functions. You can change the number of rows displayed by passing an integer argument to the head() and tail() functions. For example, data.head(10) would display the first 10 rows of the DataFrame.

These are just a few basic commands in Pandas. The library offers a wide range of functions for data manipulation, exploration, and analysis. You can refer to the official Pandas documentation for more details and examples: Pandas Documentation.

6.1.3. Read html with Pandas#

pandas.read_html is a handy function that comes with the Pandas library in Python. It makes it easy to pull tables out of web pages in HTML format and turn them into Pandas DataFrames. This is super useful for people who work with data because it lets them grab structured information from websites without a lot of hassle. If you want to dig into all the details and options, you can check out the full syntax here.

Examples:

# Import the Pandas library and alias it as 'pd'
import pandas as pd

# Use pd.read_html to extract tables from the specified URL
# [0] selects the first table on the webpage since read_html returns a list of DataFrames
Player_Stats = pd.read_html('https://www.premierleague.com/stats/top/players/goals?se=274')[0]

# Display the DataFrame containing player statistics
display(Player_Stats)

	Rank	Player	Club	Nationality	Stat	Unnamed: 5
0	1.0	Erling Haaland	Manchester City	Norway	8	NaN
1	2.0	Son Heung-Min	Tottenham Hotspur	South Korea	6	NaN
2	3.0	Jarrod Bowen	West Ham United	England	5	NaN
3	4.0	Odsonne Édouard	Crystal Palace	France	4	NaN
4	4.0	Evan Ferguson	Brighton & Hove Albion	Ireland	4	NaN
5	4.0	Hwang Hee-Chan	Wolverhampton Wanderers	South Korea	4	NaN
6	4.0	Alexander Isak	Newcastle United	Sweden	4	NaN
7	4.0	Bryan Mbeumo	Brentford	Cameroon	4	NaN
8	4.0	Bukayo Saka	Arsenal	England	4	NaN
9	4.0	Ollie Watkins	Aston Villa	England	4	NaN

# Import the Pandas library and alias it as 'pd'
import pandas as pd

# Use pd.read_html to extract tables from the specified URL
# [0] selects the first table on the webpage since read_html returns a list of DataFrames
Grades_UofC = pd.read_html('https://conted.ucalgary.ca/info/grades.jsp')[0]

# Display the DataFrame containing grades information from the University of Calgary
display(Grades_UofC)

	0	1	2
0	A+	95 – 100%	Outstanding
1	A	90 – 94%	Excellent Superior performance, showing compre...
2	A-	85 – 89%	Approaching Excellent
3	B+	80 – 84%	Exceeding Good
4	B	75 – 79%	Good Clearly above average performance with kn...
5	B-	70 – 74%	Approaching Good
6	C+	67 – 69%	Exceeding Satisfactory
7	C	64 – 66%	Satisfactory (minimal pass) Basic understandin...
8	C-	60 – 63%	Approaching Satisfactory Receipt of a C- or le...
9	D+	55 – 59%	Marginal Performance
10	D	50 – 54%	Minimal Performance
11	F	0 – 49%	Fail
12	AU	NaN	Course Audit No course credit. Permission to A...
13	CR	NaN	Completed Requirements
14	NC	NaN	Not Complete The NC grade is assigned to stude...
15	AT	NaN	Attended
16	NS	NaN	Not Subject to Grading

Example - Daily Weather Data Report for October 2023 (Calgary Int’l Airport, Alberta):

import pandas as pd

# Read the HTML data from the specified URL and store it as a list of DataFrames.
df_list = pd.read_html('https://climate.weather.gc.ca/climate_data/daily_data_e.html?StationID=50430')

# Select the first DataFrame from the list, which contains the climate data.
df = df_list[0]

# Display the DataFrame containing the climate data for analysis.
display(df)

	DAY	Max Temp Definition °C	Min Temp Definition °C	Mean Temp Definition °C	Heat Deg Days Definition	Cool Deg Days Definition	Total Rain Definition mm	Total Snow Definition cm	Total Precip Definition mm	Snow on Grnd Definition cm	Dir of Max Gust Definition 10's deg	Spd of Max Gust Definition km/h
0	01	12.5	3.3	7.9	10.1	0.0	0.4	0.0	0.4	NaN	18	38
1	02	18.0	1.5	9.8	8.2	0.0	LegendTT	0.0	LegendTT	NaN	31	33
2	03	15.1	1.2	8.2	9.8	0.0	0.0	0.0	0.0	NaN	LegendMM	LegendMM
3	04	13.1	3.6	8.4	9.6	0.0	LegendTT	0.0	LegendTT	NaN	33	57
4	05	10.2	1.2	5.7	12.3	0.0	LegendTT	0.0	LegendTT	NaN	2	37
5	Sum	NaN	NaN	NaN	50.0LegendCarer^	0.0LegendCarer^	0.4LegendCarer^	0.0LegendCarer^	0.4LegendCarer^	NaN	NaN	NaN
6	Avg	13.8LegendCarer^	2.2LegendCarer^	8.0LegendCarer^	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7	Xtrm	18.0LegendCarer^	1.2LegendCarer^	NaN	NaN	NaN	0.4LegendCarer^	0.0LegendCarer^	0.4LegendCarer^	NaN	33LegendCarer^	57LegendCarer^
8	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...

Dropping Columns:

Dropping columns in Pandas is accomplished using the drop method, specifying the columns to be removed. The primary parameters for this operation are as follows:

- `labels`: A list or single label representing the column(s) to be dropped.
- `axis`: Specifies the axis along which the drop operation will occur. Use `axis=1` for columns.
- `inplace`: An optional parameter, when set to `True`, modifies the original DataFrame; otherwise, it returns a new DataFrame with the specified columns removed.

Here is a code example to drop a column named 'ColumnName':

```python
df.drop(columns='ColumnName', inplace=True)
```

Please see this link for more details.

# Remove the column 'Snow on Grnd Definition cm' from the DataFrame.
df.drop(columns='Snow on Grnd Definition cm', inplace=True)

# Display the DataFrame after removing the specified column.
display(df)

	DAY	Max Temp Definition °C	Min Temp Definition °C	Mean Temp Definition °C	Heat Deg Days Definition	Cool Deg Days Definition	Total Rain Definition mm	Total Snow Definition cm	Total Precip Definition mm	Dir of Max Gust Definition 10's deg	Spd of Max Gust Definition km/h
0	01	12.5	3.3	7.9	10.1	0.0	0.4	0.0	0.4	18	38
1	02	18.0	1.5	9.8	8.2	0.0	LegendTT	0.0	LegendTT	31	33
2	03	15.1	1.2	8.2	9.8	0.0	0.0	0.0	0.0	LegendMM	LegendMM
3	04	13.1	3.6	8.4	9.6	0.0	LegendTT	0.0	LegendTT	33	57
4	05	10.2	1.2	5.7	12.3	0.0	LegendTT	0.0	LegendTT	2	37
5	Sum	NaN	NaN	NaN	50.0LegendCarer^	0.0LegendCarer^	0.4LegendCarer^	0.0LegendCarer^	0.4LegendCarer^	NaN	NaN
6	Avg	13.8LegendCarer^	2.2LegendCarer^	8.0LegendCarer^	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7	Xtrm	18.0LegendCarer^	1.2LegendCarer^	NaN	NaN	NaN	0.4LegendCarer^	0.0LegendCarer^	0.4LegendCarer^	33LegendCarer^	57LegendCarer^
8	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...	Summary, average and extreme values are based ...

Dropping Rows:

Similarly, dropping rows in Pandas is achieved using the drop method, with the primary parameters as follows:
- labels: A list or single label representing the row(s) to be dropped.
- axis: Use axis=0 (which is the default) to drop rows. You don’t need to specify axis explicitly for dropping rows.
- inplace: As before, setting this to True will modify the original DataFrame.
To drop rows by index, you can use the following code as an example:
```
df.drop(index=[0, 1, 2], inplace=True)
```
This code will remove the rows with index 0, 1, and 2 from the DataFrame ‘df’.

Please see this link for more details.

# Remove the row with index 7 from the DataFrame.
df.drop(df.index[-1], inplace=True)

# Display the DataFrame after removing the specified row.
display(df)

	DAY	Max Temp Definition °C	Min Temp Definition °C	Mean Temp Definition °C	Heat Deg Days Definition	Cool Deg Days Definition	Total Rain Definition mm	Total Snow Definition cm	Total Precip Definition mm	Dir of Max Gust Definition 10's deg	Spd of Max Gust Definition km/h
0	01	12.5	3.3	7.9	10.1	0.0	0.4	0.0	0.4	18	38
1	02	18.0	1.5	9.8	8.2	0.0	LegendTT	0.0	LegendTT	31	33
2	03	15.1	1.2	8.2	9.8	0.0	0.0	0.0	0.0	LegendMM	LegendMM
3	04	13.1	3.6	8.4	9.6	0.0	LegendTT	0.0	LegendTT	33	57
4	05	10.2	1.2	5.7	12.3	0.0	LegendTT	0.0	LegendTT	2	37
5	Sum	NaN	NaN	NaN	50.0LegendCarer^	0.0LegendCarer^	0.4LegendCarer^	0.0LegendCarer^	0.4LegendCarer^	NaN	NaN
6	Avg	13.8LegendCarer^	2.2LegendCarer^	8.0LegendCarer^	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7	Xtrm	18.0LegendCarer^	1.2LegendCarer^	NaN	NaN	NaN	0.4LegendCarer^	0.0LegendCarer^	0.4LegendCarer^	33LegendCarer^	57LegendCarer^

In this code:

df is your DataFrame.
df.index[-1] retrieves the index label of the last row in the DataFrame.
drop is used to remove the row with the specified index label.
inplace=True is used to modify the DataFrame in place.

Column Names: To access the column names in a Pandas DataFrame, you can use the .columns attribute. Here’s how to do it:
```
# Assuming df is your DataFrame
column_names = df.columns
```
The df.columns attribute returns an Index object containing the names of all the columns in the DataFrame. You can then convert this Index object to a list if needed:
```
column_names_list = list(column_names)
```
Please see this link for more details.

# Retrieve and display the column names of the DataFrame.
df.columns

Index(['DAY', 'Max Temp Definition °C', 'Min Temp Definition °C',
       'Mean Temp Definition °C', 'Heat Deg Days Definition',
       'Cool Deg Days Definition', 'Total Rain Definition mm',
       'Total Snow Definition cm', 'Total Precip Definition mm',
       'Dir of Max Gust Definition 10's deg',
       'Spd of Max Gust Definition km/h'],
      dtype='object')

Replace each occurrence of pattern/regex (Optional Content):

The .str.replace method in Pandas is used to perform string replacement within each element of a Pandas Series or DataFrame column containing string values. It allows for the search and substitution of specific substrings or patterns in strings.

Here are the primary parameters of the .str.replace method:
- pat: The pattern or substring to search for.
- repl: The replacement string or value.
- regex: An optional boolean parameter (default is False) that specifies whether the pattern provided is a regular expression.
- n: An optional parameter to control the maximum number of replacements. If not specified, it replaces all occurrences by default.
Please see this link for more details.

# Iterate through each column in the DataFrame.
for Col in df.columns:
    # Replace 'LegendCarer^' with an empty string in the current column.
    df[Col] = df[Col].str.replace('LegendCarer^', '')
    
    # Replace 'LegendT' with an empty string in the current column.
    df[Col] = df[Col].str.replace('LegendT', '')
    
    # Replace 'LegendM' with an empty string in the current column.
    df[Col] = df[Col].str.replace('LegendM', '')

# Display the DataFrame after the specified replacements.
display(df)

	DAY	Max Temp Definition °C	Min Temp Definition °C	Mean Temp Definition °C	Heat Deg Days Definition	Cool Deg Days Definition	Total Rain Definition mm	Total Snow Definition cm	Total Precip Definition mm	Dir of Max Gust Definition 10's deg	Spd of Max Gust Definition km/h
0	01	12.5	3.3	7.9	10.1	0.0	0.4	0.0	0.4	18	38
1	02	18.0	1.5	9.8	8.2	0.0	T	0.0	T	31	33
2	03	15.1	1.2	8.2	9.8	0.0	0.0	0.0	0.0	M	M
3	04	13.1	3.6	8.4	9.6	0.0	T	0.0	T	33	57
4	05	10.2	1.2	5.7	12.3	0.0	T	0.0	T	2	37
5	Sum	NaN	NaN	NaN	50.0	0.0	0.4	0.0	0.4	NaN	NaN
6	Avg	13.8	2.2	8.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7	Xtrm	18.0	1.2	NaN	NaN	NaN	0.4	0.0	0.4	33	57

Pandas set_index Method:

The set_index method in Pandas is used to set one or more columns as the index of a DataFrame. When you set an index, it means you are designating one or more columns to serve as the row labels, providing a way to uniquely identify and access rows in the DataFrame.

Here’s the primary parameter of the set_index method:
- keys: This parameter specifies the column or columns you want to use as the new index.
```
df.set_index(keys, inplace=False)
```
- df: Refers to the Pandas DataFrame.
- keys: The column or columns that you want to set as the new index. This can be a single column name or a list of column names.

# Set the 'DAY' column as the index for the DataFrame.
df.set_index('DAY', inplace=True)

# Display the DataFrame with 'DAY' as the new index.
display(df)

	Max Temp Definition °C	Min Temp Definition °C	Mean Temp Definition °C	Heat Deg Days Definition	Cool Deg Days Definition	Total Rain Definition mm	Total Snow Definition cm	Total Precip Definition mm	Dir of Max Gust Definition 10's deg	Spd of Max Gust Definition km/h
DAY
01	12.5	3.3	7.9	10.1	0.0	0.4	0.0	0.4	18	38
02	18.0	1.5	9.8	8.2	0.0	T	0.0	T	31	33
03	15.1	1.2	8.2	9.8	0.0	0.0	0.0	0.0	M	M
04	13.1	3.6	8.4	9.6	0.0	T	0.0	T	33	57
05	10.2	1.2	5.7	12.3	0.0	T	0.0	T	2	37
Sum	NaN	NaN	NaN	50.0	0.0	0.4	0.0	0.4	NaN	NaN
Avg	13.8	2.2	8.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Xtrm	18.0	1.2	NaN	NaN	NaN	0.4	0.0	0.4	33	57

DataFrame.values: To access the underlying data as a NumPy array, you can do the following:

# Access the 'Max Temp Definition °C' column in the DataFrame.
df['Max Temp Definition °C']

DAY
01      12.5
02      18.0
03      15.1
04      13.1
05      10.2
Sum      NaN
Avg     13.8
Xtrm    18.0
Name: Max Temp Definition °C, dtype: object

# Retrieve the values from the 'Max Temp Definition °C' column in the DataFrame.
df['Max Temp Definition °C'].values

array(['12.5', '18.0', '15.1', '13.1', '10.2', nan, '13.8', '18.0'],
      dtype=object)

Renaming Columns and Indices: To rename columns and indices, you can use the rename method. Here’s an example:

# Import the pprint function for pretty printing.
from pprint import pprint

# Create a dictionary for renaming columns by removing 'Definition' from their names.
column_rename_dict = dict(zip(df.columns, [x.replace('Definition', '') for x in df.columns]))

# Print the title for the column renaming dictionary.
print("Column Renaming Dictionary:")
pprint(column_rename_dict)

# Rename the columns in the DataFrame using the created dictionary.
df.rename(columns=column_rename_dict, inplace=True)

# Rename specific index labels for clarity.
df.rename(index={'01': '1st', '02': '2nd', '03': '3rd', '04': '4th', '05': '5th'}, inplace=True)

# Print the title for the DataFrame with renamed columns and index labels.
print("\nDataFrame with Renamed Columns and Index Labels:")
display(df)

Column Renaming Dictionary:
{'Cool Deg Days Definition': 'Cool Deg Days ',
 "Dir of Max Gust Definition 10's deg": "Dir of Max Gust  10's deg",
 'Heat Deg Days Definition': 'Heat Deg Days ',
 'Max Temp Definition °C': 'Max Temp  °C',
 'Mean Temp Definition °C': 'Mean Temp  °C',
 'Min Temp Definition °C': 'Min Temp  °C',
 'Spd of Max Gust Definition km/h': 'Spd of Max Gust  km/h',
 'Total Precip Definition mm': 'Total Precip  mm',
 'Total Rain Definition mm': 'Total Rain  mm',
 'Total Snow Definition cm': 'Total Snow  cm'}

DataFrame with Renamed Columns and Index Labels:

	Max Temp °C	Min Temp °C	Mean Temp °C	Heat Deg Days	Cool Deg Days	Total Rain mm	Total Snow cm	Total Precip mm	Dir of Max Gust 10's deg	Spd of Max Gust km/h
DAY
1st	12.5	3.3	7.9	10.1	0.0	0.4	0.0	0.4	18	38
2nd	18.0	1.5	9.8	8.2	0.0	T	0.0	T	31	33
3rd	15.1	1.2	8.2	9.8	0.0	0.0	0.0	0.0	M	M
4th	13.1	3.6	8.4	9.6	0.0	T	0.0	T	33	57
5th	10.2	1.2	5.7	12.3	0.0	T	0.0	T	2	37
Sum	NaN	NaN	NaN	50.0	0.0	0.4	0.0	0.4	NaN	NaN
Avg	13.8	2.2	8.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Xtrm	18.0	1.2	NaN	NaN	NaN	0.4	0.0	0.4	33	57

	A	B
90	90	910
91	91	909
92	92	908
93	93	907
94	94	906
95	95	905
96	96	904
97	97	903
98	98	902
99	99	901

	A	B
90	90	910
91	91	909
92	92	908
93	93	907
94	94	906
95	95	905
96	96	904
97	97	903
98	98	902
99	99	901