Wednesday, April 29, 2020

fopen() - Text or Binary?

Its often convenient to control an embedded system remotely via some sort of communications protocol. In the good ole days, I'd devise some sort of homebrew communications protocol that was specific to my application's needs and I'd bash some data in and out of the serial port. Having grown up on 8-bit micros with limited RAM, every byte was valuable so there was a strong inclination to pack things tightly into binary streams, but the problem with that is how to test it... and before you know it I'm being drag into an endless rabbit hole of development on the PC side just so I could conveniently communicate with the embedded app.
Embedded processors have come a long way and right now, I'm getting up close and personal with some pretty nifty STM32 processor boards - this one in particular that I sourced from Banggood: https://sea.banggood.com/STM32F407VET6-Development-Board-Cortex-M4-STM32-Small-System-ARM-Learning-Core-Module-p-1460490.html

Compared with an 8051, these ARM devices are a beast; and the development boards are so cheap! It's time to cast off the constraints of decades past and update my comms routines... enter JSON.
JSON is sooo simple even I can wrap my head around it; it's text based which means testing it with my embedded app is simply a case of sending text files through a PuTTY terminal. Furthermore, because JSON is used so widely across the winternet, I'm pretty confident I'll be able to keep using it even if I upgrade my comms to Etherweb.
I dug around and quickly settled on a relatively compact JSON interpreter (called JSMN) from Serge Zaitsev (kudos @zsergo): https://zserge.com/jsmn/. It comes packaged as a single C header file which feels a little hacky but its nonetheless effective and I was able to get a basic command interpreter up and running on my target board pretty quickly.
At that point, I started thinking more about what sort of commands I might want to exchange and it became pretty clear that I was going to need some sort of generic command interpreter that was easy to abstract and extend.  That's going to be a work in progress but the relevance to today's discussion is how it drove me back to establishing a parallel development platform that would allow me to develop and test code on my PC rather than needing to run everything from my target (ARM) platform, and how I bumped into a nasty little PITA on how Windows processes text files.
I'm not going to post my original code because some dufuss will blindly cut and past it into their application and wonder why it doesn't work.. instead I'll post the working code:

 ============================================================================
 Name        : main.c
 Author      : Marty Hauff
 Version     :
 Copyright   : Use at own risk
 Description : Test scaffold for reading a file and processing...
 ============================================================================
 */

#include <stdio.h>
#include <stdlib.h>
#include <limits.h> //PATH_MAX
#include <unistd.h> //getcwd()
#include <sys/stat.h> //struct stat
#include <string.h> //strlen()

#define MAX_TOKENS 256

int main(void) {
FILE* fp;
char* Filename;
struct stat st;
char cwd[PATH_MAX];
char* json_string = {0};

printf ("\nJSON Test\n");
getcwd(cwd, sizeof(cwd));
if (cwd != NULL) {
printf ("Current working dir: %s", cwd);
} else {
perror("getcwd() error");
return EXIT_FAILURE;
}

Filename = "JSON Examples/JSON_2.json";
fp = fopen(Filename, "rb"); //MUST use binary mode otherwise \r\n sequences get messed up!!
if (fp == NULL)
{
printf ("\nFailed to open %s", Filename);
return EXIT_FAILURE;
}

printf ("\n\"%s\" opened successfully", Filename);
stat(Filename, &amp;st);
printf ("\nStat Size: %ld", st.st_size);
fseek(fp, 0, SEEK_END);
printf ("\nfseek Size: %ld", ftell(fp));
fseek(fp, 0, SEEK_SET);

json_string = malloc(st.st_size+1);
fread(json_string, st.st_size, 1, fp);
fclose(fp);

json_string[st.st_size] = '\0';
printf ("\nstrlen Size: %d", strlen(json_string));
printf ("\n%s",json_string);

jsmn_parser p;
jsmntok_t tkns[MAX_TOKENS];
int nodes = 0;

jsmn_init(&amp;p);
nodes = jsmn_parse(&amp;p, json_string, strlen(json_string), tkns, MAX_TOKENS);
printf ("\nFound %d nodes", nodes);

free(json_string);
return EXIT_SUCCESS;
}

The thing that got me stuck for a day was that pesky little 'b' character in the fopen command. Here's what the output looked like without it (I'm using some test JSON from: https://json.org/example.html):

JSON Test
Current working dir: C:\Users\user\Documents\Projects\WinTest
"JSON Examples/JSON_2.json" opened successfully
Stat Size: 253
fseek Size: 253
strlen Size: 253
{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}
âãäåæçèéêëì
Found -2 nodes

I couldn't work out what was causing the garbage at the end of the stream (second last line) and why jsmn was subsequently failing to parse the file. The clue is that there are the same number of garbage characters as there are lines in the JSON string. Take a look at a hex dump of the JSON file:
Take note of the 0D 0A sequence. Also take a look at these settings in Notepad++

Without the 'b' in the fopen() call, the file is opened as a text file and Windows strips out all the CR (0x0D '\r') characters when its reading the contents into a buffer. As a result, the string stored in memory is actually shorter than the number of characters read from disk.
Now look at the output when the 'b' is included in the fopen() call (i.e. as per the code listing above):
JSON Test
Current working dir: C:\Users\user\Documents\Projects\WinTest
"JSON Examples/JSON_2.json" opened successfully
Stat Size: 253
fseek Size: 253
strlen Size: 253
{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}
Found 26 nodes

The CR character remains in the text stream so the console output (on Eclipse) double spaces each line, but take note that we no longer get the garbage at the end of the stream; AND jsmn has managed to parse the file correctly.

Take note of the moral of the story - consider reading a file in binary mode even if you know the contents are text.

Now to work out how to build a generic JSON command interpreter.

No comments:

Post a Comment