Pandas Basics

October 5, 2022
Data Science

Pandas Basics

Pandas DataFrames

Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.

There are several ways to create a DataFrame. One way way is to use a dictionary. For example:

dict = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
      "capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"],
      "area": [8.516, 17.10, 3.286, 9.597, 1.221],
      "population": [200.4, 143.5, 1252, 1357, 52.98] }
import pandas as pd
brics = pd.DataFrame(dict)
print(brics)
Output : area    capital       country  population
0   8.516   Brasilia        Brazil      200.40
1  17.100     Moscow        Russia      143.50
2   3.286  New Dehli         India     1252.00
3   9.597    Beijing         China     1357.00
4   1.221   Pretoria  South Africa       52.98

Another way to create a DataFrame is by importing a csv file using Pandas. Now, the csv cars.csv is stored and can be imported using pd.read_csv:

# Import pandas as pd
import pandas as pd
# Import the cars.csv data: cars
cars = pd.read_csv('cars.csv')
# Print out cars
print(cars)
<script.py> output:
     Unnamed: 0  cars_per_cap        country drives_right
   0         US           809  United States         True
   1        AUS           731      Australia        False
   2        JAP           588          Japan        False
   3         IN            18          India        False
   4         RU           200         Russia         True
   5        MOR            70        Morocco         True
   6         EG            45          Egypt         True

Indexing DataFrames

There are several ways to index a Pandas DataFrame. One of the easiest ways to do this is by using square bracket notation.

In the example below, you can use square brackets to select one column of the cars DataFrame. You can either use a single bracket or a double bracket. The single bracket will output a Pandas Series, while a double bracket will output a Pandas DataFrame.

# Import pandas and cars.csv
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Print out country column as Pandas Series
print(cars['cars_per_cap'])
# Print out country column as Pandas DataFrame
print(cars[['cars_per_cap']])
# Print out DataFrame with country and drives_right columns
print(cars[['cars_per_cap', 'country']])
<script.py> output:
   US     809
   AUS    731
   JAP    588
   IN      18
   RU     200
   MOR     70
   EG      45
   Name: cars_per_cap, dtype: int64
        cars_per_cap
   US            809
   AUS           731
   JAP           588
   IN             18
   RU            200
   MOR            70
   EG             45
        cars_per_cap        country
   US            809  United States
   AUS           731      Australia
   JAP           588          Japan
   IN             18          India
   RU            200         Russia
   MOR            70        Morocco
   EG             45          Egypt

VIkas Donta

My name is VIkas Donta and I first discovered Web Designingin 2018. Since then, It has impact on my web design projects development career, and  improve my understanding of HTML/CSS tremendously!

Related Posts

Stay in Touch

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form