How to Read Data From Excel Sheet Using C#

How to Work with Excel files in Pandas

A few useful things to know when yous have the data in Excel format

Dorian Lazar

Background prototype by bongkarn thanyakij from Pexels

From what I have seen and then far, CSV seems to exist the most popular format to store data among data scientists. And that'south understandable, it gets the job done and information technology'southward a quite simple format; in Python, fifty-fifty without any library, one can build a unproblematic CSV parser in under 10 lines of code.

But yous may not always find the information that you n eed in CSV format. Sometimes the but available format may be an Excel file. Similar, for case, this dataset on ons.gov.united kingdom almost criminal offence in England and Wales, which is only in xlsx format; dataset that I volition utilise in the examples beneath.

Reading Excel files

The simplest way to read Excel files into pandas information frames is by using the following function (assuming you did import pandas as pd):

df = pd.read_excel('path_to_excel_file', sheet_name='…')

Where sheet_name can be the proper noun of the sheet nosotros want to read, it's alphabetize, or a list with all the sheets we want to read; the elements of the list tin can be mixed: sheet names or indices. If we desire all the sheets, nosotros can use sheet_name=None. In the case in which we desire more than sheets to be read, they will be returned every bit a dictionary of data frames. The keys of such a lexicon will exist either the index or name of a sail, depending on how we specified in sheet_name; in the instance of sheet_name=None, the keys will be sheet names.

Now, if we use it to read our Excel file we get:

That's right, an fault! It turns out that pandas cannot read Excel files on its own, and so we need to install some other python packet to do that.

There are ii options that we have: xlrd and openpyxl. The package xlrd tin can open up both Excel 2003 (.xls) and Excel 2007+ (.xlsx) files, whereas openpyxl can open up simply Excel 2007+ (.xlsx) files. So, we will install xlrd every bit it tin can open both formats:

pip install xlrd

At present, if we effort to read the same data again:

Information technology works!

But Excel files tin can be a niggling bit messier. Also data, they may accept other comments/explanations in the first and/or terminal couple of rows.

To tell pandas to start reading an Excel sheet from a specific row, utilize the statement header = 0-indexed row where to start reading. Past default, header=0, and the first such row is used to requite the names of the data frame columns.

To skip rows at the finish of a sheet, use skipfooter = number of rows to skip.

For example:

This is a little improve. There are still some issues that are specific to this information. Depending on what we want to achieve nosotros may also need to rearrange the data values into another fashion. But in this commodity, we will focus only on reading and writing to and from data frames.

Some other way to read Excel files besides the one above is by using a pd.ExcelFile object. Such an object tin can be synthetic by using the pd.ExcelFile('excel_file_path') constructor. An ExcelFile object can exist used in a couple of ways. Firstly, it has a .sheet_names aspect which is a list of all the sheet names inside the opened Excel file.

And then, this ExcelFile object as well has a .parse() method that can exist used to parse a sheet from the file and return a data frame. The first parameter of this method tin exist the alphabetize of the canvas we desire to parse or its name. The rest of the parameters are the same as in the pd.read_excel() role.

An instance of parsing the 2d sheet (index 1):

… and here we parse the same sheet using its name instead of an alphabetize:

ExcelFilesouthward tin also be used inside with … as … statements, and if y'all want to do something a little more elaborate, similar parsing only sheets with 2 words in their name, you lot tin can do something like:

The aforementioned thing you can do past using pd.read_excel() instead of .parse() method, like this:

… or, if yous merely want all the sheets, you can exercise:

Writing Excel Files

Now that we know how to read excel files, the next footstep for us is to be able to also write a data frame to an excel file. Nosotros tin do that past using the information frame method .to_excel('path_to_excel_file', sheet_name='…').

Allow'south get-go create a elementary data frame for writing to an excel file:

Now we want to write it to an excel file:

… and we got an error.

Again, pandas can't write to excel files on its own; we need another package for that. The main options that nosotros have are:

  • xlwt — works only with Excel 2003 (.xls) files; append mode not supported
  • xlsxwriter — works only with Excel 2007+ (.xlsx) files; append mode not supported
  • openpyxl — works only with Excel 2007+ (.xlsx) files; supports append mode

If we want to be able to write to the old .xls format nosotros should install xlwt equally it is the only that handles those files. For .xlsx files, nosotros will cull openpyxl as it also supports the append mode.

pip install xlwt openpyxl

Now if we run again the above code, it works; an excel file was created:

By default, pandas likewise writes the index column along with our columns. To become rid of it, apply index=Imitation like in the code below:

The index cavalcade isn't there now:

What if we want to write more sheets? If we desire to add together a second sheet to the previous file, practice you recollect that the below code will work?

The answer is no. It volition but overwrite the file with only one sheet: sheet2.

To write more sheets to an Excel file we need to utilise a pd.ExcelWriter object as shown below. First, we create another information frame for sheet2, then we open up an Excel file as an ExcelWriter object in which nosotros write the 2 data frames:

Now our Excel file should accept 2 sheets. If nosotros so desire to add together some other sheet to it, we need to open the file in suspend mode and run code similar to the previous one. For example:

Our Excel file, now, has 3 sheets and looks like this:

Working with Excel Formulas

Probably yous are wondering, at this indicate, about Excel formulas. What about them? How to read from files that take formulas? How to write them to Excel files?

Well… expert news. It is quite easy. Writing formulas to Excel files is equally simple equally just writing the string of the formula, and these strings volition be automatically interpreted past Excel equally formulas.

As an instance:

The Excel file produced by the code above is:

Now, if we want to read an Excel file with formulas in information technology, pandas volition read into data frames the upshot of those formulas.

For example, let'south read our previously created file:

Sometimes you need to save the Excel file manually for this to work and not become zeros instead of the result of formulas (hit CTRL+S earlier executing the above code).

Below is the code as a Jupyter notebook:

I promise you lot found this information useful and thanks for reading!

This commodity is besides posted on my own website here. Experience free to have a wait!

gocherovereful.blogspot.com

Source: https://towardsdatascience.com/how-to-work-with-excel-files-in-pandas-c584abb67bfb

0 Response to "How to Read Data From Excel Sheet Using C#"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel