How to convert xlsx to csv
Hi ommunity,
I am downloading an Excel xlsx from the web with the below method and saving it as xlsx, how do I convert and save xlsx as csv?
ClassMethod DownloadFileUrl()
{
Set httprequest = ##class(%Net.HttpRequest).%New()
Set httprequest.Server = "www.health.gov.za"Set httprequest.SSLConfiguration = "SSL_OAUTH_Client"Do httprequest.Get("/wp-content/uploads/2021/04/ICD-10_MIT_2021_Excel_16-March_2021.xlsx")
write httprequest.HttpResponse.StatusLine
Set stream=##class(%FileBinaryStream).%New()
Set stream.Filename="/home/Data/March_2021.xlsx"Write stream.CopyFrom(httprequest.HttpResponse.Data)
Write stream.%Save()
Write stream.%Close()
}Comments
There are some open source tools that can convert from XLSX to CSV. You could call out to these from your ObjectScript code:
https://csvkit.readthedocs.io/en/latest/tutorial/1_getting_started.html…
https://stackoverflow.com/questions/10557360/convert-xlsx-to-csv-in-lin…
Use LibreOffice from IRIS or (if you only need the data) Apache POI.
This is a good free site ive used, hope it helps.
Now with embedded Python, you can have a quite simple code using pandas :
/// Convert an Excel file to a CSV file
ClassMethod XLStoCSV(source As %String = "/data/sample.xlsx") As %Status [ Language = python ]
{
import pandas as pd
read_file = pd.read_excel(source)
read_file.to_csv(source+'.csv', index = None, header=True)
}
FYI, the reason this is so difficult is because an XLSX file is actually a ZIP archive with a bunch of XML files in it. You'd actually have to save the file as a .ZIP, extract it, navigate to the XML files for the worksheets (on the current version, that's inside the archive in \xl\worksheets, but that's changed between versions if I remember correctly) then parse the data out of the XML file and write it to your CSV file. That's why there are all these third-party tools people are recommending to handle this issue.
Good to know.
I had no idea of this quite combersome workflow with the information in this thread of discussion.
Maybe some other Python libraries could be of help here.