In [1]:
import pandas as pd 
import sqlite3

Dataset

In addition, here, we use FIFA 20 Complete Player Datasetet and European Soccer Database from Kaggle.com.

The datasets that we use from the above likes are:

  • Country.csv
  • League.csv
  • players_20.csv
  • Team.csv
  • teams_and_leagues.csv

Acknowledgements

The datasets are also publicly available website sofifa.com.

SQL Aggregate Functions

In this article, we provide a summary of basic SQL Server commands and syntaxes. For sake of demonstrations SQL Server syntaxes, we use SQLite3. Moreover, the following function can help us to mimic SQL Server Environment in this Jupyter notebook.

In [2]:
connection = sqlite3.connect('FIFA.db')
cursor = connection.cursor()

def mySQL(Syntax): return pd.read_sql_query(Syntax, connection)

Here we provide a summary of popular SQL Server command with examples. For full notes, please see the official SQL Server documents at docs.microsoft.com.

Ave

AVG here stands for Average value. For example, to get the average Age of each club, we can try:

In [3]:
mySQL("""
Select
    Avg(A.age) as Ave_Age,
    B.team_long_name,
    C.league_name
from players_20 A
inner join Team B
    on A.club = B.team_long_name
inner join teams_and_leagues C
    on B.team_fifa_api_id = C.url
Group by
    B.team_long_name,
    C.league_name
Order by
    C.league_name,
    B.team_long_name
""")
Out[3]:
Ave_Age team_long_name league_name
0 23.592593 Club Brugge KV Belgian Jupiler Pro League
1 25.071429 KAA Gent Belgian Jupiler Pro League
2 24.555556 KAS Eupen Belgian Jupiler Pro League
3 22.571429 KRC Genk Belgian Jupiler Pro League
4 23.037037 KSV Cercle Brugge Belgian Jupiler Pro League
... ... ... ...
194 23.074074 FC St. Gallen Swiss Super League
195 24.148148 FC Thun Swiss Super League
196 24.115385 FC Zürich Swiss Super League
197 25.192308 Neuchâtel Xamax Swiss Super League
198 25.461538 Servette FC Swiss Super League

199 rows × 3 columns

Count

We demonstrate this using an example, let's count the number of players in each Club.

In [4]:
mySQL("""
Select
    Count(short_name) as Number_of_Players,
    B.team_long_name,
    C.league_name
from players_20 A
inner join Team B
    on A.club = B.team_long_name
inner join teams_and_leagues C
    on B.team_fifa_api_id = C.url
Group by
    B.team_long_name,
    C.league_name
Order by
    C.league_name,
    B.team_long_name
""")
Out[4]:
Number_of_Players team_long_name league_name
0 27 Club Brugge KV Belgian Jupiler Pro League
1 28 KAA Gent Belgian Jupiler Pro League
2 27 KAS Eupen Belgian Jupiler Pro League
3 28 KRC Genk Belgian Jupiler Pro League
4 27 KSV Cercle Brugge Belgian Jupiler Pro League
... ... ... ...
194 27 FC St. Gallen Swiss Super League
195 27 FC Thun Swiss Super League
196 26 FC Zürich Swiss Super League
197 26 Neuchâtel Xamax Swiss Super League
198 26 Servette FC Swiss Super League

199 rows × 3 columns

Sum

This returns a summation of values.

Min

This returns the minimum value.

In [5]:
mySQL("""
Select
    Min(age) as Min_Age,
    B.team_long_name,
    C.league_name
from players_20 A
inner join Team B
    on A.club = B.team_long_name
inner join teams_and_leagues C
    on B.team_fifa_api_id = C.url
Group by
    B.team_long_name,
    C.league_name
Order by
    C.league_name,
    B.team_long_name
""")
Out[5]:
Min_Age team_long_name league_name
0 18 Club Brugge KV Belgian Jupiler Pro League
1 19 KAA Gent Belgian Jupiler Pro League
2 18 KAS Eupen Belgian Jupiler Pro League
3 17 KRC Genk Belgian Jupiler Pro League
4 18 KSV Cercle Brugge Belgian Jupiler Pro League
... ... ... ...
194 17 FC St. Gallen Swiss Super League
195 17 FC Thun Swiss Super League
196 17 FC Zürich Swiss Super League
197 18 Neuchâtel Xamax Swiss Super League
198 18 Servette FC Swiss Super League

199 rows × 3 columns

Max

This returns the maximum value.

In [6]:
mySQL("""
Select
    Max(age) as Max_Age,
    B.team_long_name,
    C.league_name
from players_20 A
inner join Team B
    on A.club = B.team_long_name
inner join teams_and_leagues C
    on B.team_fifa_api_id = C.url
Group by
    B.team_long_name,
    C.league_name
Order by
    C.league_name,
    B.team_long_name
""")
Out[6]:
Max_Age team_long_name league_name
0 31 Club Brugge KV Belgian Jupiler Pro League
1 32 KAA Gent Belgian Jupiler Pro League
2 33 KAS Eupen Belgian Jupiler Pro League
3 34 KRC Genk Belgian Jupiler Pro League
4 32 KSV Cercle Brugge Belgian Jupiler Pro League
... ... ... ...
194 32 FC St. Gallen Swiss Super League
195 32 FC Thun Swiss Super League
196 39 FC Zürich Swiss Super League
197 35 Neuchâtel Xamax Swiss Super League
198 32 Servette FC Swiss Super League

199 rows × 3 columns

Cast

This is useful for converting values. Consider the following example.

In [7]:
mySQL("""
Select
    Sum(age)/Count(age) as Average_Age,
    B.team_long_name,
    C.league_name
from players_20 A
inner join Team B
    on A.club = B.team_long_name
inner join teams_and_leagues C
    on B.team_fifa_api_id = C.url
where 
     B.team_long_name = 'Manchester United'
Group by
    B.team_long_name,
    C.league_name
Order by
    C.league_name,
    B.team_long_name
""")
Out[7]:
Average_Age team_long_name league_name
0 24 Manchester United English Premier League

Note that we have calculated a summation of some values and divided by some number. Since Age is an int value, the result after dividing the summation is also an int value. However, if we use cast and covert the summation first to float and then the result will be also a float value. Thus,

In [8]:
mySQL("""
Select
    cast(Sum(age) as float)/Count(age) as Average_Age,
    B.team_long_name,
    C.league_name
from players_20 A
inner join Team B
    on A.club = B.team_long_name
inner join teams_and_leagues C
    on B.team_fifa_api_id = C.url
where 
     B.team_long_name = 'Manchester United'
Group by
    B.team_long_name,
    C.league_name
Order by
    C.league_name,
    B.team_long_name
""")
Out[8]:
Average_Age team_long_name league_name
0 24.818182 Manchester United English Premier League