Data analysis and processing using Python library

Publication date: 18.09.2023

How to easily import, process and analyse data in Python? Here is where one of the free libraries in Python comes – Pandas. Pandas in one of the most important libraries in Python which expands the possibilities of data analysis and processing. After reading this article, you will get to know how easily Pandas work on Pandas. This skill is appreciated in a wide range of fields, from finance to specific projects.

When is it worth using the Pandas library?

There are some examples of using Pandas library:

- Importing different types of files (Excel, CSV, SQL etc.)

- Sorting, filtering data

- Data cleaning

- Data aggregations

- Data visualisations

How to install Pandas?

Pandas installing is extremely easy. After downloading Python you need to open a terminal and type the following command:

How to import Pandas to our project?

After installing Pandas we can move to import Pandas to our project.

You just need to do it using the following line:

How to import the data from files?

Using Pandas you are able to import a lot of different types of file:

- Excel

- CSV

- SQL

- Txt etc.

Data from CSV can be imported using pd.read_csv() :

Similarly, you can import Excel file using the function pd.read_excel():

When you know how to import files it’s the appropriate time to move on the more interesting stuff.

Sorting and filtering data

One of the most popular operations in pandas is choosing data which we are interested in from the DataFrame. Here are some ways to do it:

- Choosing a specific column: df[‘name_of_the_column’]

- Choosing rows using number of indexes: df.iloc[index]

- Choosing rows using index labels: df.loc[label].

Below I prepared an example of using the above functions:

Filtering is the process which consists in choosing data using some criterions.

For instance, we would like to find clients who made more than 20 orders:

If we want to sort values we need to use sort_values() function:

Data processing – addition, removing and changing columns

To add a new column just do the following command:

To remove a column, use drop() function:

To rename a column, use rename() function:

Basic statistical aggregations

When your data is modified and prepared for further analysis we can move on some basic aggregations.

Pandas offers a few functions which are used for statistical aggregations:

- Mean – df.mean(),

- Median – df.median(),

- Mode – df.mode(),

- Standard deviation – df.std()

There are only a few functions which pandas offers you. If you want to get to know more of them I recommend you to read pandas documentation which is available on the Internet.

Data analysis with Python-library Pandas

All in all, Pandas is an essential library to work with the data in Python. The above matters show only some basic functions which for sure don’t exhaust the subject but encourage you to extend your knowledge about it. I also encourage you to get to know some other libraries like NumPy, Seaborn, Matplotlib etc.

Good luck!