NOTE: This Task helps me in tracking an upstream bug in a MiSE's Open Data dataset. Here further info.
== Preamble (why I've found this issue) ==
In 2015 I've developed just another Italian fuel pumps comparator during a 24H hackaton promoted by a notable Italian insurance company. My team won it but that's not our story. Here the project: https://fuel.reyboz.it/
That tool is based on a dataset provided from the [[ https://en.wikipedia.org/wiki/Ministry_of_Economic_Development_(Italy) | MiSE ]]. Well, that dataset was borked since 2015 but now (2021) it's somehow even //more// borked and I don't have enough time to hotfix it every day. I'm sure that in a better world the MiSE should fix its dataset.
More FAQs:
* No, I have not violated any security system
* No, that was not a protected memory sector
* No, I've not destroyed or damaged any system
* Yes, it's just a borked CSV and everyone can see it borked and it's not nice, but MiSE can fix
== Steps to reproduce ==
1. Download the [[ https://www.mise.gov.it/images/exportCSV/anagrafica_impianti_attivi.csv | anagrafica_impianti_attivi.csv ]] from [[ https://www.mise.gov.it/index.php/it/open-data/elenco-dataset/2032336-carburanti-prezzi-praticati-e-anagrafica-degli-impianti | Italian fuel pump dataset - MiSE ]]
2. Open with LibreOffice (set the separator to the semicolon - Yes, I know, CSV means //comma// separated values, but they use a semicolon. Please don't fight about this. This is not my core problem.)
3. Note that the dataset is borked at least on the ID `46593` (line 29) on a row related to `MEGA SERVICE S.A.S.`
Quick overview (scroll to line 29):
{P17, highlight=29}
IMPORTANT: Note that unclosed `"`. It doesn't take hours of CSV inspection. It's already on line 29. You really cannot say that you have not noticed it.
== Problem n. 1 ==
The MiSE clearly is not using a suitable library to generate a CSV and this causes a malformed dataset.
Solution:
Please adopt a //real// CSV standard. For example, do not just use a semicolon as glue for your raw data.
Reference:
* https://en.wikipedia.org/wiki/Comma-separated_values on Wikipedia
* https://www.php.net/manual/en/function.fputcsv.php for PHP
* https://pythonspot.com/files-spreadsheets-csv/ for Python
* http://commons.apache.org/proper/commons-csv/ for Java
* ...
== Problem n. 2 ==
The MiSE is not properly cleaning user input.
Please `trim` your values and remove double spaces. Remove tabs. Remove newlines. Remove all the shit.
Example: avoid someone called "` Mario Rossi `" (note spaces at the beginning and at the end and in the middle).
== Problem n. 3 ==
"CSV" means "//comma//-separated values" but MiSE generates //semicolon//-separated values.
Proposed solution: wontfix. It's too late to change that now.
Even if it's too late to fix this, it is not too late to inform people: please remember that the `C` in `CSV` means `COMMA` and not `SEMICOLON`.
== Proposed solution for all of these 3 problems ==
Adopt whatever CSV library, instead of just using ";" as glue between whatever raw value.
This will fix a very bad programming approach that is actually compromising that MiSE's dataset every day, at least since 2015.
== In the meanwhile ==
Someone should contact the fuel pump `MEGA SERVICE S.A.S. DI TERMINI MICHELE &C 123 di Licata, Km. 15+500, LICATA 92023 AG` asking them to kindly remove the `"` from their name, since the MiSE cannot handle that case without breaking the whole dataset.