6.1. An Introduction to Pandas#

Pandas is a Python library, open-source in nature, that offers robust data structures and tools for data analysis. It is specifically designed to simplify and streamline data manipulation and analysis tasks, making them more efficient and straightforward. The name “Pandas” is derived from “Panel Data,” which refers to multi-dimensional structured datasets commonly used in econometrics and finance [Pandas Developers, 2023]. Pandas comes equipped with a diverse set of functions and methods for data handling, including reading and writing data from various file formats like CSV, Excel, and SQL databases. It facilitates data cleaning, transformation, aggregation, and addressing missing values. Furthermore, Pandas supports merging and joining datasets and offers functionality for time series operations [Pandas Developers, 2023]. Widely adopted in data science, machine learning, finance, economics, and numerous other fields, Pandas is favored for its ability to simplify and expedite data processing and analysis workflows. Its clear and intuitive syntax makes it accessible to both novices and experienced data scientists [Pandas Developers, 2023].

>>> pip install pandas

Here is a brief summary of key Python Pandas features:

6.1.1. Data Structures:#

  1. Series: The Series is akin to a one-dimensional labeled array, similar to a NumPy array but with an associated index. This index provides meaningful labels for each element in the Series, allowing for effortless data alignment and retrieval [Pandas Developers, 2023].

  2. DataFrame: The DataFrame is a two-dimensional labeled data structure, similar to a spreadsheet or SQL table. It consists of rows and columns, where each column can accommodate various data types. DataFrames provide a versatile and potent method for working with structured data, enabling operations such as filtering, joining, grouping, and more [Pandas Developers, 2023].

Example - Series: A Pandas Series object can be instantiated through the implementation of the pd.Series constructor.

import pandas as pd

# Create a Pandas Series with custom index
data = pd.Series([10, 20, 30, 40], index=['A', 'B', 'C', 'D'])

# Print the Pandas Series
print("Pandas Series:")
print(data)
Pandas Series:
A    10
B    20
C    30
D    40
dtype: int64

Example - DataFrame: A Pandas DataFrame object can be created by utilizing the pd.DataFrame constructor.

import pandas as pd

# Create a DataFrame from a dictionary
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 22],
        'City': ['Calgary', 'Edmonton', 'Red Deer']
        }

df = pd.DataFrame(data)

# Display the DataFrame
print("DataFrame:")
display(df)
DataFrame:
Name Age City
0 Alice 25 Calgary
1 Bob 30 Edmonton
2 Charlie 22 Red Deer

6.1.2. Reading and Writing Data:#

Pandas is a powerful Python library that provides data manipulation and analysis tools. It’s widely used for tasks like reading and writing data in various formats. Here’s how you can use Pandas to read and write data [Pandas Developers, 2023]:

6.1.2.1. Reading Data#

Pandas can read data from various file formats like CSV, Excel, SQL databases, and more. The most commonly used method is pandas.read_csv() for reading CSV files.

import pandas as pd

Link = 'https://download.microsoft.com/download/4/C/8/4C830C0C-101F-4BF2-8FCB-32D9A8BA906A/Import_User_Sample_en.csv'
# Read a CSV file into a DataFrame
data = pd.read_csv(Link)

# Display the DataFrame
print("The DataFrame:")
display(data)
The DataFrame:
User Name First Name Last Name Display Name Job Title Department Office Number Office Phone Mobile Phone Fax Address City State or Province ZIP or Postal Code Country or Region
0 chris@contoso.com Chris Green Chris Green IT Manager Information Technology 123451 123-555-1211 123-555-6641 123-555-9821 1 Microsoft way Redmond Wa 98052 United States
1 ben@contoso.com Ben Andrews Ben Andrews IT Manager Information Technology 123452 123-555-1212 123-555-6642 123-555-9822 1 Microsoft way Redmond Wa 98052 United States
2 david@contoso.com David Longmuir David Longmuir IT Manager Information Technology 123453 123-555-1213 123-555-6643 123-555-9823 1 Microsoft way Redmond Wa 98052 United States
3 cynthia@contoso.com Cynthia Carey Cynthia Carey IT Manager Information Technology 123454 123-555-1214 123-555-6644 123-555-9824 1 Microsoft way Redmond Wa 98052 United States
4 melissa@contoso.com Melissa MacBeth Melissa MacBeth IT Manager Information Technology 123455 123-555-1215 123-555-6645 123-555-9825 1 Microsoft way Redmond Wa 98052 United States

You can also read Excel files using pandas.read_excel():

import pandas as pd

# Read an Excel file into a DataFrame
# Here the excel file is hosted on the web
data_excel = pd.read_excel('https://go.microsoft.com/fwlink/?LinkID=521962', sheet_name='Sheet1')

# Display the first five rows of the DataFrame
print("First five rows of the DataFrame:")
display(data_excel.head())

# The default behavior of Pandas' `head()` method is to display the first 5 rows of a DataFrame.
First five rows of the DataFrame:
Segment Country Product Discount Band Units Sold Manufacturing Price Sale Price Gross Sales Discounts Sales COGS Profit Date Month Number Month Name Year
0 Government Canada Carretera NaN 1618.5 3 20 32370.0 0.0 32370.0 16185.0 16185.0 2014-01-01 1 January 2014
1 Government Germany Carretera NaN 1321.0 3 20 26420.0 0.0 26420.0 13210.0 13210.0 2014-01-01 1 January 2014
2 Midmarket France Carretera NaN 2178.0 3 15 32670.0 0.0 32670.0 21780.0 10890.0 2014-06-01 6 June 2014
3 Midmarket Germany Carretera NaN 888.0 3 15 13320.0 0.0 13320.0 8880.0 4440.0 2014-06-01 6 June 2014
4 Midmarket Mexico Carretera NaN 2470.0 3 15 37050.0 0.0 37050.0 24700.0 12350.0 2014-06-01 6 June 2014

6.1.2.2. Writing and Exporting Data#

Pandas provides a versatile set of tools for exporting data to a variety of formats. One of the frequently employed techniques is using the DataFrame.to_csv() method, which facilitates the export of data to a CSV (Comma-Separated Values) file:

Example:

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 22],
        'City': ['Calgary', 'Edmonton', 'Red Deer']
        }

df = pd.DataFrame(data)

# Write the DataFrame to a CSV file
csv_filename = 'data.csv'
df.to_csv(csv_filename, index=False)

# Print a message indicating that the data has been written
print(f"Data written to {csv_filename}")
Data written to data.csv

For writing to Excel files, you can use DataFrame.to_excel():

# Write DataFrame to an Excel file
df.to_excel('new_data.xlsx', index=False)

Pandas can also write data to various other formats like SQL databases, JSON, and more [Pandas Developers, 2023].

Additional Options:

When reading or writing data, Pandas offers various options to customize the behavior [Pandas Developers, 2023]:

  • You can specify delimiter, encoding, header presence, and more when reading CSV files using read_csv().

  • When writing CSV files, you can control the delimiter, whether to include headers, and more using the to_csv() method’s parameters.

  • For Excel files, you can specify the sheet name using the sheet_name parameter with to_excel() and read_excel().

Remember that Pandas provides extensive documentation with examples for these methods, so you can refer to the official documentation for more details: Pandas IO documentation

Keep in mind that Pandas is just one tool in the data manipulation and analysis ecosystem. Depending on your needs, other libraries like NumPy, Matplotlib/Seaborn for visualization, and scikit-learn for machine learning might be useful as well.

6.1.2.3. Pandas Basics#

Some basic Pandas commands along with their descriptions:

  1. pd.DataFrame(data): Create a DataFrame from data like a dictionary, array, or list.

  2. data.info(): Display basic information about the DataFrame, including data types and non-null counts.

  3. data.head(n): Display the first n rows of the DataFrame (default is 5).

  4. data.tail(n): Display the last n rows of the DataFrame (default is 5).

  5. data.describe(): Display summary statistics of numerical columns (count, mean, std, min, max, quartiles).

  6. data.shape: Returns the number of rows and columns in the DataFrame as a tuple.

  7. data.columns: Access the column labels of the DataFrame.

Command

Description

pd.DataFrame(data)

Create a DataFrame from data like a dictionary, array, or list.

data.info()

Display basic information about the DataFrame, including data types and non-null counts.

data.head(n)

Display the first n rows of the DataFrame (default is 5).

data.tail(n)

Display the last n rows of the DataFrame (default is 5).

data.describe()

Display summary statistics of numerical columns (count, mean, std, min, max, quartiles).

data.shape

Returns the number of rows and columns in the DataFrame as a tuple.

data.columns

Access the column labels of the DataFrame.

Example:

import pandas as pd
import numpy as np
 
# Create the DataFrame
data = pd.DataFrame({'A': np.arange(0, 100),
                     'B': np.arange(1000, 900, -1)})

# Display basic DataFrame information
print("Displaying DataFrame Information:")
display(data)

# Get DataFrame information using data.info()
print("\nDataFrame Info:")
data.info()

# Display the first 10 rows
n = 10
print(f"\nDisplaying First {n} Rows:")
display(data.head(n))

# Display the last 10 rows
print(f"\nDisplaying Last {n} Rows:")
display(data.tail(n))

# Display summary statistics
print("\nSummary Statistics:")
display(data.describe())

# Get the shape of the DataFrame
print("\nDataFrame Shape:")
print("Number of rows and columns:", data.shape)

# Get column labels
print("\nColumn Labels:")
print("Column labels:", data.columns)
Displaying DataFrame Information:
A B
0 0 1000
1 1 999
2 2 998
3 3 997
4 4 996
... ... ...
95 95 905
96 96 904
97 97 903
98 98 902
99 99 901

100 rows × 2 columns

DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       100 non-null    int32
 1   B       100 non-null    int32
dtypes: int32(2)
memory usage: 932.0 bytes

Displaying First 10 Rows:
A B
0 0 1000
1 1 999
2 2 998
3 3 997
4 4 996
5 5 995
6 6 994
7 7 993
8 8 992
9 9 991
Displaying Last 10 Rows:
A B
90 90 910
91 91 909
92 92 908
93 93 907
94 94 906
95 95 905
96 96 904
97 97 903
98 98 902
99 99 901
Summary Statistics:
A B
count 100.000000 100.000000
mean 49.500000 950.500000
std 29.011492 29.011492
min 0.000000 901.000000
25% 24.750000 925.750000
50% 49.500000 950.500000
75% 74.250000 975.250000
max 99.000000 1000.000000
DataFrame Shape:
Number of rows and columns: (100, 2)

Column Labels:
Column labels: Index(['A', 'B'], dtype='object')

Note

By default, the head() and tail() functions display the first or last 5 rows of the DataFrame. However, you can specify a different number of rows to display by passing an argument to these functions. You can change the number of rows displayed by passing an integer argument to the head() and tail() functions. For example, data.head(10) would display the first 10 rows of the DataFrame.

These are just a few basic commands in Pandas. The library offers a wide range of functions for data manipulation, exploration, and analysis. You can refer to the official Pandas documentation for more details and examples: Pandas Documentation.

6.1.3. Read html with Pandas#

pandas.read_html is a handy function that comes with the Pandas library in Python. It makes it easy to pull tables out of web pages in HTML format and turn them into Pandas DataFrames. This is super useful for people who work with data because it lets them grab structured information from websites without a lot of hassle. If you want to dig into all the details and options, you can check out the full syntax here.

Examples:

# Import the Pandas library and alias it as 'pd'
import pandas as pd

# Use pd.read_html to extract tables from the specified URL
# [0] selects the first table on the webpage since read_html returns a list of DataFrames
Player_Stats = pd.read_html('https://www.premierleague.com/stats/top/players/goals?se=274')[0]

# Display the DataFrame containing player statistics
display(Player_Stats)
Rank Player Club Nationality Stat Unnamed: 5
0 1.0 Erling Haaland Manchester City Norway 8 NaN
1 2.0 Son Heung-Min Tottenham Hotspur South Korea 6 NaN
2 3.0 Jarrod Bowen West Ham United England 5 NaN
3 4.0 Odsonne Édouard Crystal Palace France 4 NaN
4 4.0 Evan Ferguson Brighton & Hove Albion Ireland 4 NaN
5 4.0 Hwang Hee-Chan Wolverhampton Wanderers South Korea 4 NaN
6 4.0 Alexander Isak Newcastle United Sweden 4 NaN
7 4.0 Bryan Mbeumo Brentford Cameroon 4 NaN
8 4.0 Bukayo Saka Arsenal England 4 NaN
9 4.0 Ollie Watkins Aston Villa England 4 NaN
# Import the Pandas library and alias it as 'pd'
import pandas as pd

# Use pd.read_html to extract tables from the specified URL
# [0] selects the first table on the webpage since read_html returns a list of DataFrames
Grades_UofC = pd.read_html('https://conted.ucalgary.ca/info/grades.jsp')[0]

# Display the DataFrame containing grades information from the University of Calgary
display(Grades_UofC)
0 1 2
0 A+ 95 – 100% Outstanding
1 A 90 – 94% Excellent Superior performance, showing compre...
2 A- 85 – 89% Approaching Excellent
3 B+ 80 – 84% Exceeding Good
4 B 75 – 79% Good Clearly above average performance with kn...
5 B- 70 – 74% Approaching Good
6 C+ 67 – 69% Exceeding Satisfactory
7 C 64 – 66% Satisfactory (minimal pass) Basic understandin...
8 C- 60 – 63% Approaching Satisfactory Receipt of a C- or le...
9 D+ 55 – 59% Marginal Performance
10 D 50 – 54% Minimal Performance
11 F 0 – 49% Fail
12 AU NaN Course Audit No course credit. Permission to A...
13 CR NaN Completed Requirements
14 NC NaN Not Complete The NC grade is assigned to stude...
15 AT NaN Attended
16 NS NaN Not Subject to Grading

Example - Daily Weather Data Report for October 2023 (Calgary Int’l Airport, Alberta):

import pandas as pd

# Read the HTML data from the specified URL and store it as a list of DataFrames.
df_list = pd.read_html('https://climate.weather.gc.ca/climate_data/daily_data_e.html?StationID=50430')

# Select the first DataFrame from the list, which contains the climate data.
df = df_list[0]

# Display the DataFrame containing the climate data for analysis.
display(df)
DAY Max Temp Definition °C Min Temp Definition °C Mean Temp Definition °C Heat Deg Days Definition Cool Deg Days Definition Total Rain Definition mm Total Snow Definition cm Total Precip Definition mm Snow on Grnd Definition cm Dir of Max Gust Definition 10's deg Spd of Max Gust Definition km/h
0 01 12.5 3.3 7.9 10.1 0.0 0.4 0.0 0.4 NaN 18 38
1 02 18.0 1.5 9.8 8.2 0.0 LegendTT 0.0 LegendTT NaN 31 33
2 03 15.1 1.2 8.2 9.8 0.0 0.0 0.0 0.0 NaN LegendMM LegendMM
3 04 13.1 3.6 8.4 9.6 0.0 LegendTT 0.0 LegendTT NaN 33 57
4 05 10.2 1.2 5.7 12.3 0.0 LegendTT 0.0 LegendTT NaN 2 37
5 Sum NaN NaN NaN 50.0LegendCarer^ 0.0LegendCarer^ 0.4LegendCarer^ 0.0LegendCarer^ 0.4LegendCarer^ NaN NaN NaN
6 Avg 13.8LegendCarer^ 2.2LegendCarer^ 8.0LegendCarer^ NaN NaN NaN NaN NaN NaN NaN NaN
7 Xtrm 18.0LegendCarer^ 1.2LegendCarer^ NaN NaN NaN 0.4LegendCarer^ 0.0LegendCarer^ 0.4LegendCarer^ NaN 33LegendCarer^ 57LegendCarer^
8 Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ...
  1. Dropping Columns:

    Dropping columns in Pandas is accomplished using the drop method, specifying the columns to be removed. The primary parameters for this operation are as follows:

    - `labels`: A list or single label representing the column(s) to be dropped.
    - `axis`: Specifies the axis along which the drop operation will occur. Use `axis=1` for columns.
    - `inplace`: An optional parameter, when set to `True`, modifies the original DataFrame; otherwise, it returns a new DataFrame with the specified columns removed.
    
    Here is a code example to drop a column named 'ColumnName':
    
    ```python
    df.drop(columns='ColumnName', inplace=True)
    ```
    

    Please see this link for more details.

# Remove the column 'Snow on Grnd Definition cm' from the DataFrame.
df.drop(columns='Snow on Grnd Definition cm', inplace=True)

# Display the DataFrame after removing the specified column.
display(df)
DAY Max Temp Definition °C Min Temp Definition °C Mean Temp Definition °C Heat Deg Days Definition Cool Deg Days Definition Total Rain Definition mm Total Snow Definition cm Total Precip Definition mm Dir of Max Gust Definition 10's deg Spd of Max Gust Definition km/h
0 01 12.5 3.3 7.9 10.1 0.0 0.4 0.0 0.4 18 38
1 02 18.0 1.5 9.8 8.2 0.0 LegendTT 0.0 LegendTT 31 33
2 03 15.1 1.2 8.2 9.8 0.0 0.0 0.0 0.0 LegendMM LegendMM
3 04 13.1 3.6 8.4 9.6 0.0 LegendTT 0.0 LegendTT 33 57
4 05 10.2 1.2 5.7 12.3 0.0 LegendTT 0.0 LegendTT 2 37
5 Sum NaN NaN NaN 50.0LegendCarer^ 0.0LegendCarer^ 0.4LegendCarer^ 0.0LegendCarer^ 0.4LegendCarer^ NaN NaN
6 Avg 13.8LegendCarer^ 2.2LegendCarer^ 8.0LegendCarer^ NaN NaN NaN NaN NaN NaN NaN
7 Xtrm 18.0LegendCarer^ 1.2LegendCarer^ NaN NaN NaN 0.4LegendCarer^ 0.0LegendCarer^ 0.4LegendCarer^ 33LegendCarer^ 57LegendCarer^
8 Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ... Summary, average and extreme values are based ...
  1. Dropping Rows:

    Similarly, dropping rows in Pandas is achieved using the drop method, with the primary parameters as follows:

    • labels: A list or single label representing the row(s) to be dropped.

    • axis: Use axis=0 (which is the default) to drop rows. You don’t need to specify axis explicitly for dropping rows.

    • inplace: As before, setting this to True will modify the original DataFrame.

    To drop rows by index, you can use the following code as an example:

    df.drop(index=[0, 1, 2], inplace=True)
    

    This code will remove the rows with index 0, 1, and 2 from the DataFrame ‘df’.

    Please see this link for more details.

# Remove the row with index 7 from the DataFrame.
df.drop(df.index[-1], inplace=True)

# Display the DataFrame after removing the specified row.
display(df)
DAY Max Temp Definition °C Min Temp Definition °C Mean Temp Definition °C Heat Deg Days Definition Cool Deg Days Definition Total Rain Definition mm Total Snow Definition cm Total Precip Definition mm Dir of Max Gust Definition 10's deg Spd of Max Gust Definition km/h
0 01 12.5 3.3 7.9 10.1 0.0 0.4 0.0 0.4 18 38
1 02 18.0 1.5 9.8 8.2 0.0 LegendTT 0.0 LegendTT 31 33
2 03 15.1 1.2 8.2 9.8 0.0 0.0 0.0 0.0 LegendMM LegendMM
3 04 13.1 3.6 8.4 9.6 0.0 LegendTT 0.0 LegendTT 33 57
4 05 10.2 1.2 5.7 12.3 0.0 LegendTT 0.0 LegendTT 2 37
5 Sum NaN NaN NaN 50.0LegendCarer^ 0.0LegendCarer^ 0.4LegendCarer^ 0.0LegendCarer^ 0.4LegendCarer^ NaN NaN
6 Avg 13.8LegendCarer^ 2.2LegendCarer^ 8.0LegendCarer^ NaN NaN NaN NaN NaN NaN NaN
7 Xtrm 18.0LegendCarer^ 1.2LegendCarer^ NaN NaN NaN 0.4LegendCarer^ 0.0LegendCarer^ 0.4LegendCarer^ 33LegendCarer^ 57LegendCarer^

In this code:

  • df is your DataFrame.

  • df.index[-1] retrieves the index label of the last row in the DataFrame.

  • drop is used to remove the row with the specified index label.

  • inplace=True is used to modify the DataFrame in place.

  1. Column Names: To access the column names in a Pandas DataFrame, you can use the .columns attribute. Here’s how to do it:

    # Assuming df is your DataFrame
    column_names = df.columns
    

    The df.columns attribute returns an Index object containing the names of all the columns in the DataFrame. You can then convert this Index object to a list if needed:

    column_names_list = list(column_names)
    

    Please see this link for more details.

# Retrieve and display the column names of the DataFrame.
df.columns
Index(['DAY', 'Max Temp Definition °C', 'Min Temp Definition °C',
       'Mean Temp Definition °C', 'Heat Deg Days Definition',
       'Cool Deg Days Definition', 'Total Rain Definition mm',
       'Total Snow Definition cm', 'Total Precip Definition mm',
       'Dir of Max Gust Definition 10's deg',
       'Spd of Max Gust Definition km/h'],
      dtype='object')
  1. Replace each occurrence of pattern/regex (Optional Content):

    The .str.replace method in Pandas is used to perform string replacement within each element of a Pandas Series or DataFrame column containing string values. It allows for the search and substitution of specific substrings or patterns in strings.

    Here are the primary parameters of the .str.replace method:

    • pat: The pattern or substring to search for.

    • repl: The replacement string or value.

    • regex: An optional boolean parameter (default is False) that specifies whether the pattern provided is a regular expression.

    • n: An optional parameter to control the maximum number of replacements. If not specified, it replaces all occurrences by default.

    Please see this link for more details.

# Iterate through each column in the DataFrame.
for Col in df.columns:
    # Replace 'LegendCarer^' with an empty string in the current column.
    df[Col] = df[Col].str.replace('LegendCarer^', '')
    
    # Replace 'LegendT' with an empty string in the current column.
    df[Col] = df[Col].str.replace('LegendT', '')
    
    # Replace 'LegendM' with an empty string in the current column.
    df[Col] = df[Col].str.replace('LegendM', '')

# Display the DataFrame after the specified replacements.
display(df)
DAY Max Temp Definition °C Min Temp Definition °C Mean Temp Definition °C Heat Deg Days Definition Cool Deg Days Definition Total Rain Definition mm Total Snow Definition cm Total Precip Definition mm Dir of Max Gust Definition 10's deg Spd of Max Gust Definition km/h
0 01 12.5 3.3 7.9 10.1 0.0 0.4 0.0 0.4 18 38
1 02 18.0 1.5 9.8 8.2 0.0 T 0.0 T 31 33
2 03 15.1 1.2 8.2 9.8 0.0 0.0 0.0 0.0 M M
3 04 13.1 3.6 8.4 9.6 0.0 T 0.0 T 33 57
4 05 10.2 1.2 5.7 12.3 0.0 T 0.0 T 2 37
5 Sum NaN NaN NaN 50.0 0.0 0.4 0.0 0.4 NaN NaN
6 Avg 13.8 2.2 8.0 NaN NaN NaN NaN NaN NaN NaN
7 Xtrm 18.0 1.2 NaN NaN NaN 0.4 0.0 0.4 33 57
  1. Pandas set_index Method:

    The set_index method in Pandas is used to set one or more columns as the index of a DataFrame. When you set an index, it means you are designating one or more columns to serve as the row labels, providing a way to uniquely identify and access rows in the DataFrame.

    Here’s the primary parameter of the set_index method:

    • keys: This parameter specifies the column or columns you want to use as the new index.

    df.set_index(keys, inplace=False)
    
    • df: Refers to the Pandas DataFrame.

    • keys: The column or columns that you want to set as the new index. This can be a single column name or a list of column names.

# Set the 'DAY' column as the index for the DataFrame.
df.set_index('DAY', inplace=True)

# Display the DataFrame with 'DAY' as the new index.
display(df)
Max Temp Definition °C Min Temp Definition °C Mean Temp Definition °C Heat Deg Days Definition Cool Deg Days Definition Total Rain Definition mm Total Snow Definition cm Total Precip Definition mm Dir of Max Gust Definition 10's deg Spd of Max Gust Definition km/h
DAY
01 12.5 3.3 7.9 10.1 0.0 0.4 0.0 0.4 18 38
02 18.0 1.5 9.8 8.2 0.0 T 0.0 T 31 33
03 15.1 1.2 8.2 9.8 0.0 0.0 0.0 0.0 M M
04 13.1 3.6 8.4 9.6 0.0 T 0.0 T 33 57
05 10.2 1.2 5.7 12.3 0.0 T 0.0 T 2 37
Sum NaN NaN NaN 50.0 0.0 0.4 0.0 0.4 NaN NaN
Avg 13.8 2.2 8.0 NaN NaN NaN NaN NaN NaN NaN
Xtrm 18.0 1.2 NaN NaN NaN 0.4 0.0 0.4 33 57
  1. DataFrame.values: To access the underlying data as a NumPy array, you can do the following:

# Access the 'Max Temp Definition °C' column in the DataFrame.
df['Max Temp Definition °C']
DAY
01      12.5
02      18.0
03      15.1
04      13.1
05      10.2
Sum      NaN
Avg     13.8
Xtrm    18.0
Name: Max Temp Definition °C, dtype: object
# Retrieve the values from the 'Max Temp Definition °C' column in the DataFrame.
df['Max Temp Definition °C'].values
array(['12.5', '18.0', '15.1', '13.1', '10.2', nan, '13.8', '18.0'],
      dtype=object)
  1. Renaming Columns and Indices: To rename columns and indices, you can use the rename method. Here’s an example:

# Import the pprint function for pretty printing.
from pprint import pprint

# Create a dictionary for renaming columns by removing 'Definition' from their names.
column_rename_dict = dict(zip(df.columns, [x.replace('Definition', '') for x in df.columns]))

# Print the title for the column renaming dictionary.
print("Column Renaming Dictionary:")
pprint(column_rename_dict)

# Rename the columns in the DataFrame using the created dictionary.
df.rename(columns=column_rename_dict, inplace=True)

# Rename specific index labels for clarity.
df.rename(index={'01': '1st', '02': '2nd', '03': '3rd', '04': '4th', '05': '5th'}, inplace=True)

# Print the title for the DataFrame with renamed columns and index labels.
print("\nDataFrame with Renamed Columns and Index Labels:")
display(df)
Column Renaming Dictionary:
{'Cool Deg Days Definition': 'Cool Deg Days ',
 "Dir of Max Gust Definition 10's deg": "Dir of Max Gust  10's deg",
 'Heat Deg Days Definition': 'Heat Deg Days ',
 'Max Temp Definition °C': 'Max Temp  °C',
 'Mean Temp Definition °C': 'Mean Temp  °C',
 'Min Temp Definition °C': 'Min Temp  °C',
 'Spd of Max Gust Definition km/h': 'Spd of Max Gust  km/h',
 'Total Precip Definition mm': 'Total Precip  mm',
 'Total Rain Definition mm': 'Total Rain  mm',
 'Total Snow Definition cm': 'Total Snow  cm'}

DataFrame with Renamed Columns and Index Labels:
Max Temp °C Min Temp °C Mean Temp °C Heat Deg Days Cool Deg Days Total Rain mm Total Snow cm Total Precip mm Dir of Max Gust 10's deg Spd of Max Gust km/h
DAY
1st 12.5 3.3 7.9 10.1 0.0 0.4 0.0 0.4 18 38
2nd 18.0 1.5 9.8 8.2 0.0 T 0.0 T 31 33
3rd 15.1 1.2 8.2 9.8 0.0 0.0 0.0 0.0 M M
4th 13.1 3.6 8.4 9.6 0.0 T 0.0 T 33 57
5th 10.2 1.2 5.7 12.3 0.0 T 0.0 T 2 37
Sum NaN NaN NaN 50.0 0.0 0.4 0.0 0.4 NaN NaN
Avg 13.8 2.2 8.0 NaN NaN NaN NaN NaN NaN NaN
Xtrm 18.0 1.2 NaN NaN NaN 0.4 0.0 0.4 33 57