One of the key features of being a data analyst is to query the data from files, databases etc to perform some data manipulation or visualisation and sometimes it’s much better if we can do it directly through code instead of looking into the database tables over and over again.. By Akash Mishra.
Pandas is a python library which can store query results in variables called “dataframes” and it helps us to perform data manipulations, visualisation and can convert our results back in the databases or files as write operations.
# In SQL:
SELECT name,roll_no,phone FROM student_details;
# In Pandas:
import pandas as pd
student_details = pd.read_csv('students.csv')
result = student_details(['name', 'roll_no', 'phone'])
result
This blog basically describes how one can use 10 basic SQL operations using the pandas library:
- Selecting the data
- Using aggregate functions
- Order By clause
- Group By clause
- IN and NOT IN
- Joins
- Creating new column using existing ones
- Selecting data conditionally
.. and more. You will also get 2 datasets to work with on your learning journey. This is just introduction to how one can execute basic SQL operation using pandas and there are many more operations of SQL which can be easily done using pandas. Nice one!
[Read More]