:: krowemoh

Thursday | 26 DEC 2024
Posts Links Other About Now

previous
next

Benchmarking Reads in Universe

2022-04-22

I have plans to use my node pick-universe library but one thing that weighs it down is that reading entire files is a pretty expensive operation. Reading a single record is slow but I can deal with it. However selecting a file of 100k items and reading in everything is slow.

I think one solution would be to do all the reads in C rather than in javascript but before I start trying to optimize it’s probably a good idea to validate things.

So the first step is to see how fast BASIC is. This would be the fastest option most likely.Testing BASIC

The test I’ll be running will be selecting a file with about 95k records with 200 fields each. Only 150 of them are populated consistently though.

      OPEN '','INVENTORY-FILE' TO INVENTORY.FILE ELSE
         PRINT 'Unable to open file: INVENTORY-FILE - Press RETURN':
         INPUT ANYTHING
         STOP
      END
*
      BUFFER = ''
*
      SELECT INVENTORY.FILE
*
      LOOP
         READNEXT ITEM.ID ELSE ITEM.ID = ''
      UNTIL ITEM.ID = '' DO
         READ INVENTORY.ITEM FROM INVENTORY.FILE, ITEM.ID ELSE INVENTORY.ITEM = ''
         BUFFER<-1> = LOWER(INVENTORY.ITEM)
      REPEAT
*
      PRINT 'Items: ' : DCOUNT(BUFFER, @AM)

This is a pretty simple program. It simply opens the inventory file, selects it, and then reads in every record into a buffer.To see how long it takes, I simply use time from the linux commandline a few times and I’ll take a rough guess of it.

> time uv "RUN BP TEST.READS"
This gives a general result of:bash-4.2$ time uv "RUN BP TEST.READS"
Items: 94872

real    0m0.522s
user    0m0.285s
sys     0m0.241s
bash-4.2$ time uv "RUN BP TEST.READS"
Items: 94872

real    0m0.510s
user    0m0.284s
sys     0m0.230s

Surprising note here is that changing the READ statement to a MATREAD makes the program run longer. I thought dimensioning an array would be faster but it actually makes it longer.This is probably because dimensioning an array is really declaring 200 variables and reading a record involves allocating each field to one of the variables. Versus, using READ which I assume uses 1 big chunk of indexed memory for the fields.

MATREAD run in about 1.2 seconds whereas the READ runs in 0.52. Very interesting and I’m already glad to have run this performance test.

Now the assumption I’m going with is that the best we can do is this BASIC program. The node version definitely takes longer.Testing Node

The node version has some glaring issues. The first is that I cross from javascript to C for every read. This has to be expensive. The next issues is that each Read requires going over the RPC port. On localhost, it’s probably fine but on a faraway server the network time would be killer.

const mv = require("pick-mv");
const Universe = require('pick-universe');

const uv = new Universe("localhost", "user", "password", "/path/to/account");

uv.StartSession();

const INV = uv.Open("INVENTORY-FILE");
uv.Select(INV);

let buffer = [];

while (true) {
    let id = uv.ReadNext();
    if (id === null) break;
    let record = uv.Read(id, INV);
    buffer.push(record);
}

uv.EndAllSessions();

console.log(`Items: ${buffer.length}`);

I do like that the BASIC and node versions are almost identical and the line counts are in the same range.The performance test, this will be on localhost:

bash-4.2$ time node test.js
Items: 94873

real    0m7.528s
user    0m1.610s
sys     0m2.391s
bash-4.2$

It is definitely longer! 15x longer. This also goes up drastically over the network. I waited almost 15 minutes and still hadn’t finished when I killed my test. This basically means that using the node library probably makes 0 sense over the network and it would be better to simply call a subroutine on the server to do the work and return the data.A change we can do is to use readlist to read in all the ids in one shot. This should speed things up as now we only need to go back to C for the record reads.

const mv = require("pick-mv");
const Universe = require('./index');

const uv = new Universe("localhost", "user", "password", "/path/to/account");

uv.StartSession();

const INV = uv.Open("INVENTORY-FILE");
uv.Select(INV);

let buffer = [];

let ids = mv.MVToArray(uv.ReadList());

for (let id of ids) {
    let record = uv.Read(id, INV);
    buffer.push(record);
}

uv.EndAllSessions();

console.log(`Items: ${buffer.length}`);
This takes:bash-4.2$ time node  test.js
Items: 94873

real    0m4.818s
user    0m1.267s
sys     0m1.331s

This is a bit better than the 7.5 seconds we had from doing the readnexts in javascript but it’s still quite slow.Now that we have proof, I’m going to take a stab at writing a ReadAll function that will stay in C and read a list of records into an array and then return that array to node. This still does the network call so I don’t think it’ll solve the deeper issue of making sure the Universe server is running on localhost.

I’ve written up a readall function and interestingly, it’s very similar to do the reads from javascript. It could be that the cost of going to C and back to javascript is relatively cheap compared to the actual work of doing the Read statement.

const mv = require("pick-mv");
const Universe = require('./index');

const uv = new Universe("localhost", "user", "password", "/path/to/account");

uv.StartSession();

const INV = uv.Open("INVENTORY-FILE");
uv.Select(INV);
let records = uv.ReadAll(INV);

console.log(`Items: ${records.length}`);

uv.EndAllSessions();
The performance of this version is:bash-4.2$ time node  test.js
Items: 94873

real    0m5.476s
user    0m1.843s
sys     0m1.391s

I wasn’t expecting this. There might be some way to speed things up in C but I’ll need to take a deeper look.Testing Calling a Subroutine

Now that I’ve looked at all the ways I can interact with Universe from node, it’s time to test calling a subroutine for the records. I’ll basically have node call the subroutine I wrote at the very top and see how long that takes. I’m expecting that this should be the fastest as the only time that I need to spend is on transforming the data into an array.