Saturday, January 14, 2023

Software Engineering with ChatGPT


Generated by DALL-E 2


Like many others, I am impressed by ChatGPT's ability to understand and respond to my queries. I am looking forward to trying it out for my software engineering projects and see how it can benefit my work.

As an experiment; I decided to use ChatGPT to create a command line MP3 audio recorder on Linux in C++, expecting it to be a difficult task for a language model. However, I was impressed by the tool's ability to understand the requirements and deliver efficient and well-structured code.

I began by creating a main.cpp file and adding an empty int main(...) function. To create an audio recorder, I will need to include:

  • A function to capture audio from the microphone in a specified audio format
  • A function or class with a method to convert and save audio frames as an mp3 file.

I added the following query into ChatGPT:

Write a function to capture microphone audio on Linux. The function needs to have a callback void audiodata(int16_t* data, int samples). The audio is PCM one channel 48KHz, int16_t.

And promptly it wrote the function asked for, this is part of the answer, the generated source code is below. 

Here is an example of a function in C++ that captures microphone audio on Linux and calls the callback function "audiodata" with a pointer to the int16_t data and number of samples:

I pasted the generated C++ code into my main.cpp file, but encountered a couple of errors during compilation and linking. The errors were minor and I was able to fix them quickly, such as changing an int parameter to a pointer. Additionally, I was able to quickly resolve a linker error by pasting the error message into ChatGPT, which provided the solution.

I side tracked a bit here and asked for a function to hexdump and a class to write raw data, all went just fine.

The next query was the following:

Write a linux class in c++ to convert several buffers of binary PCM, int16_t, mono data to a MP3 file. The class needs an open, write and close method.

The result was again spot on.

Here is an example of a C++ class that can be used to convert multiple buffers of binary PCM, int16_t, mono data to an MP3 file using the LAME library on Linux:


After adding the latest C++ code generated by ChatGPT to my main.cpp file, I noticed one mistake. Specifically, the method for writing buffers divided the samples by 2, which was unnecessary. I was able to easily fix this by removing the division at two locations in the code. Additionally, I did not have the LAME library installed on my system, but a few queries to ChatGPT provided me with the commands to install it and the necessary options to specify to the linker.


In conclusion, ChatGPT performed exceptionally well in my project to create a command line MP3 audio recorder on Linux in C++. Its ability to understand the requirements and produce efficient and well-structured code was impressive. If this trend continues, ChatGPT has the potential to greatly benefit the software engineering field, as well as other areas. By the way; the majority of this text was also written or rewritten by ChatGPT.


source code:


g++ chatgpt.cpp -o chatgpt_audio -lasound -lmp3lame

download code here...


#include "alsa/asoundlib.h"


snd_pcm_t *handle;
snd_pcm_hw_params_t *params;

void hexdump(void* data, size_t len) {
    unsigned char *p = (unsigned char*)data;
    for (size_t i = 0; i < len; i++) {
        printf("%02x ", p[i]);
        if ((i + 1) % 16 == 0) printf("\n");
    }
    printf("\n");
}

#include "fstream"
#include "lame/lame.h"

class MP3Encoder {
public:
    MP3Encoder(const std::string& fileName) : fileName(fileName) {
        lame = lame_init();
        lame_set_in_samplerate(lame, 48000);
        lame_set_num_channels(lame, 1);
        lame_set_out_samplerate(lame, 48000);
        lame_set_brate(lame, 128);
        lame_init_params(lame);
    }

    bool open() {
        file.open(fileName, std::ios::out | std::ios::binary);
        return file.is_open();
    }

    bool write(int16_t* data, size_t len) {
        if (!file.is_open()) return false;
        size_t bufferSize = len ;
        unsigned char buffer[bufferSize];
        int encodedSize = lame_encode_buffer(lame, (short int*)data, (short int*)data, len, buffer, bufferSize);
        file.write((char*)buffer, encodedSize);
        return true;
    }

    void close() {
        if (file.is_open()) {
            size_t bufferSize = 7200;
            unsigned char buffer[bufferSize];
            int encodedSize = lame_encode_flush(lame, buffer, bufferSize);
            file.write((char*)buffer, encodedSize);
            file.close();
        }
        lame_close(lame);
    }

private:
    std::string fileName;
    std::ofstream file;
    lame_global_flags* lame;
};
MP3Encoder *mp3;

class BinaryFileWriter {
public:
    BinaryFileWriter(const std::string& fileName) : fileName(fileName) {}

    bool open() {
        file.open(fileName, std::ios::out | std::ios::binary);
        return file.is_open();
    }

    bool write(void* data, size_t len) {
        if (!file.is_open()) return false;
        file.write((char*)data, len);
        return true;
    }

    void close() {
        if (file.is_open()) file.close();
    }

private:
    std::string fileName;
    std::ofstream file;
};

BinaryFileWriter *writer;

void audiodata(int16_t* data, int samples) {
    // do something with the audio data
    hexdump((void*)data, 20);
    writer->write((char*)data, samples * 2);
    mp3->write(data, samples);
}



void audio_capture() {
    int rc;
    int size;
    int dir;
    snd_pcm_uframes_t frames;
    int16_t buffer[48000];

    rc = snd_pcm_open(&handle, "default", SND_PCM_STREAM_CAPTURE, 0);
    if (rc < 0) {
        // handle error
    }

    snd_pcm_hw_params_alloca(&params);
    snd_pcm_hw_params_any(handle, params);
    snd_pcm_hw_params_set_access(handle, params, SND_PCM_ACCESS_RW_INTERLEAVED);
    snd_pcm_hw_params_set_format(handle, params, SND_PCM_FORMAT_S16_LE);
    snd_pcm_hw_params_set_channels(handle, params, 1);
    unsigned int samplerate = 48000;
    snd_pcm_hw_params_set_rate_near(handle, params, &samplerate, &dir);

    rc = snd_pcm_hw_params(handle, params);
    if (rc < 0) {
        // handle error
    }

    snd_pcm_hw_params_get_period_size(params, &frames, &dir);
    size = frames * 2; /* 2 bytes/sample, 1 channel */

    while (1) {
        rc = snd_pcm_readi(handle, buffer, frames);
        if (rc == -EPIPE) {
            // handle overrun
        } else if (rc < 0) {
            // handle error
        } else if (rc != (int)frames) {
            // handle short read
        }
        audiodata(buffer, rc);
    }

    snd_pcm_drain(handle);
    snd_pcm_close(handle);
}


int main(int argc, char *argv[]) {
    mp3 = new MP3Encoder("/tmp/audio.mp3");
    writer = new BinaryFileWriter("/tmp/audio.i16.48.1.pcm");
    writer->open();
    mp3->open();
    audio_capture();
    writer->close();
    mp3->close();
    delete(writer);
}

Saturday, February 5, 2022

Commodore 64 Emulator

Terminal UI



Some years ago, out of pure nostalgia and 2 hours of free time a day during the train commute to and from the office, I decided to write a Commodore 64 Emulator. I knew good emulators were available, and it was not my goal to compete with them, but I wanted to get into the details of doing such a task myself.


So I ended up with an emulator that runs the original Kernal and Basic; the UI works in ASCII mode in a Linux Terminal. This allows one to key in and execute Basic code. Please note that many things are not done yet/ever. I consider it a fun hack, and I enjoyed writing it and still enjoy popping up a CBM64 screen from time to time in my Terminal.


The basics of writing an emulator tend to be not that hard, it's a bit of work, though, and since it could be helpful to others, a few components are briefly outlined in this document.


Sources


Core Loop

At the core of the C64, and computers in general, there is a CPU. Simplified, this "Central Processing Unit" does the following steps in a loop.
  • Read an instruction from a memory location
    • the location is given by a variable stored in a two byte register, called the Program Counter (PC)
  • increment the PC
  • Execute that instruction (in an emulator this is basically a unique value that maps to a function), often these are the folowing steps
    • read data (arguments) from memory or from the registers
    • do a computation on that data
    • write the result to registers or memory
  • start all over again

The full list of instructions, called Opcodes, can be found in the file  MOS6510.cpp.


Memory

Below is an overview of the CBM64 memory map.


The the memory range is from 0x0000 to 0xFFFF, not surprising this fits into 2 bytes.
Note that this is not all just RAM. Like the kernal ROM is (by default) mapped at address 0xF000 - 0xFFFF, the Basic at 0xA0000 - 0XC0000. The Screen, which is pretty usefull to communicate with the computer users is mapped at 0x0400 - 0x0800, that means if we instruct the CPU to write an output of an instruction to 0x0400, likely something will be shown on screen (for example try: POKE 1024, 49). The screen output is handled by a chip called VIC and my implementation does just the basics, being ASCII mode.


Design 

These are two nice schemas of the Commodore 64 Internals:




http://www.zimmers.net/anonftp/pub/cbm/schematics/computers/c64/326106-1of2.gif

http://www.zimmers.net/anonftp/pub/cbm/schematics/computers/c64/326106-2of2.gif


You will notice the following components

  • 6510 CPU
  • 6567 VIC (NTSC Video Interface Chip; can be 6569 for PAL) 
  • 6581 SID (Sound Interface Chip)
  • 8 x 4164-2 RAM (each 65.536 x 1 bit, so 8 make the 65.636 Bytes Ram)
  • Color RAM
  • Kernal ROM
  • Basic ROM
  • Character ROM
  • Bus
  • 6526 CIA's for IO Stuff

To get a CBM 64 running, you need to implement the 6510 CPU, at least the Text mode VIC, RAM and some IO for keyboard input and basic IRQ handling. Once done you should only need to load the ROM's, like the Kernal (yup, it was a Kernal at that time, the 'a' is not a typo) at the right place in memory, set the program counter to a start location and start the CPU.

If all go's right, the emulator should start the Kernal and the Basic and the well known "38911 Basic Bytes Free" should appear.


Implementation overview

Memory - Bus

Memory reads and writes are done over a BUS.

Implementing Memory could be as easy as defining an array of bytes. However, because devices and ROM are mapped into the address space, it is useful to make a class that allows registering memory mapped devices and an interface to read and write memory.

I implemented it in a Class called Bus.cpp , all memory reads and writes are done via the Bus and this one knows about all devices that are  mapped in the address space. It will make sure a reads and writes are done to and from whatever is mapped at a given address. 

The following devices are registered in the Bus, note I only mapped one CIA where there are actually 2. One was enough to get the ASCI mode running.
  24   mBus = CBus::GetInstance();  
  25   mVic = new CMOS6569();  
  26   mRam = new CRam();  
  27   mBasicRom = new CBasicRom();  
  28   mKernalRom = new CKernalRom();  
  29   mProcessor = new CMOS6510(mMutex);  
  30   mCia1 = new CMOS6526A(mMutex);  
  31   mCharRom = new CCharRom();  

RAM

Implemted in : Ram.cpp
RAM is just a buffer of bytes, registered to the BUS from 0x0000 - 0xFFFF, if nothing else is mapped o n the given address reads and writes (peek and poke) will happen on this buffer.

ROM

There are 3 ROM's in the implementation
The idea here is, since it is just data and they are mapped as memory, to just provide Reads from it. In CBM64 terms; Peek. And that is what the implementations do.



CPU, the 6510

Here most magic happens, this is the code that reads Instructions (opcodes) and execute the necessary steps for that instruction.

There are two matrixes in this class, one with all opcodes organized per address mode.
Another one holds the CPU cycles spent by the 6510 to execute the instruction. Some instructions take extra cycles for certain address modes, these are counted in the instuctions implementation. Like the BRANCH instructions take an extra cycle if the branch is executed, and another extra cycle if the address to branch too crosses a page boundary (this is partially done in this implementation).

The main function in this class is 'Cycle()'. This reads and executes a next instruction and runs the IRQ if needed. 

Boot

At start the PC (Program Counter) is set to an address specified at the Kernal address 0FFFC.  

r_pc  = mMemory->Peek16(0xFFFC); 

The CPU starts reading and executing insructions from that address.


CPU Cycles

The PAL version of the CBM64 CPU runs at 985 KHz and the NTSC at 1023 KHz. 

The MOS6510.cpp implementation has a Cycle() method, this method fetches and executes one instruction (it also runs the IRQ's if needed). Just running Cycle() in an endless loop would run the processor but because our modern CPU's are fast, and even taken into account the overhead of the not so optimal implementation of the 6510 emulated instructions, it wil largely overshoot the intended 1023KHz.

To solve this the Cycle() method returns the number of 6510 CPU cycles used by the instructions executed. A way to implement a somewhat accurate CPU cycle rate is to count the spent CPU cycles up to 1.023.000 (or larger), check the actual time spent, and sleep the remainder of the second, then carry over the rest of the1.023.000 cycles to the next round. This is implemented in main.cpp, actually it does it a bit more grannular and checks intervals of 100ms instead of 1 second. One can make it run at smaller intervals as needed.



Monday, April 27, 2020

YOLOv3 Inference Server for Intel Movidius


In a previous post, I went over the steps to get the "Intel Neural Compute Stick 2" working on a Raspberry Pi. It handles the conversion of the Darknet YOLOv3 model, trained on a COCO dataset, to OpenVINO IR format.





As a demonstration of that, I made a small Pyhton Flask web service that is suitable to run on the Raspberry Pi. Using a basic web UI you can PUSH images to it to do object detection and classification. 

While processing some validation images from the COCO dataset, the observed inference speed is about 400ms, do add another 150 ms to post-process the results. This makes about 550 ms for the full object detection, which sounds pretty acceptable to me. Given it runs on a Raspberry Pi4 and I made the postprocessing code to be readable, not to have optimal performance. 

The full source code is available for download on github 
https://github.com/brunokeymolen/movidius-inference-server







Sunday, April 26, 2020

Run YOLOv3 on Raspberry Pi with the Intel Neural Compute Stick 2

While working on a personal project I decided to run YOLOv3 on a Raspberry Pi. I mean the full YoloV3, not the tiny version. Given the availability of decent tutorials on the internet, it did not take too long to get things working. And as expected, the inference results were great but, considering the 12 seconds to do so, it became clear that the Pi (3B+) was never designed for such tasks.


As a solution, I added VPU power (Vision Processing Unit) in the form of an Intel Neural Compute Stick 2, for some also known as Intel Movidius, or just NCS2. This is a less than 100$ USB compute stick solely made for Neural Network inference.


In order to use the Intel Movidius, you need to install the necessary software on the Raspberry Pi. And before running a Neural Network model it needs to be converted to an 'Intermediate Representation' (IR) format. This post handles the necessary steps to do the software installation, to convert the Darknet YOLOv3 model to IR, and to run a demonstration on the Raspberry Pi.

The document is split into the following sections:
  • Install Raspberry Pi Buster
  • Install OpenVINO
  • Prepare a Docker image for conversion to IR
  • Convert to IR
  • Run YOLOv3 Demo on the Raspberry Pi with camera input

Conventions used in this document:


pi$ : shell commands to be given at the Raspberry Pi
linux$ : shell commands to be given at a Linux machine (Ubuntu in my case)
docker$ : shell commands to be given at a docker container

Install Raspberry Pi Buster

Insert a new Raspberry Pi SD card in your Linux machine and check the device name. The following commands are useful for this:

linux$ sudo fdisk -l
linux$ lsblk


(the SD is sde in the example, but possibly you have another one, make sure to change that to not accidentally overwrite your data)

Download Rasbian Buster-lite at: 
(buster-lite is fine)

Unzip the downloaded file and put the Buster image to the Raspberry Pi SD card:

linux$ sudo dd if=2020-02-13-raspbian-buster-lite.img of=/dev/sd[e] status=progress bs=4M

SSH.
Create a file ‘ssh’ on the boot partition to enable SSH

lsblk
sde      8:64   1  29.7G  0 disk 
├─sde1   8:65   1   256M  0 part 
└─sde2   8:66   1   1.5G  0 part 

linux$ sudo mkdir /mnt/sd
linux$ sudo mount /dev/sde1 /mnt/sd
linux$ sudo touch /mnt/sd/ssh
linux$ sudo umount /mnt/sd

Put the SD card in the Pi and start it up.
Figure out the IP address and login over SSH

ssh pi@<ip address of Pi> 

username: pi
password: raspberry

One of the first things to do is to make the full capacity of the SD card available, do that as follow:

pi$ sudo raspi-config 

Choose : Advanced Options -> Expand Filesystem ….

Install the next software:
pi$ sudo apt-get update
pi$ sudo apt-get install git
pi$ sudo apt-get install cmake
pi$ sudo apt-get install libatlas-base-dev
pi$ sudo apt-get install python3-pip
pi$ sudo apt install libgtk-3-dev

pi$ pip3 install --upgrade pip
pi$ pip3 install numpy

Install OpenVINO

OpenVINO is an Intel toolkit that contains a copy of OpenCV and has the necessary drivers and tools to manage models and run them on the Intel Movidius. There is a runtime version for Raspberry Pi, follow the installation steps below. They are based on the following document: https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_raspbian.html



At the Raspberry Pi, do:
pi$ mkdir -p ~/projects/openvino
pi$ cd ~/projects/openvino
pi$ wget https://download.01.org/opencv/2020/openvinotoolkit/2020.1/l_openvino_toolkit_runtime_raspbian_p_2020.1.023.tgz

Untar the download:
pi$ tar -xzvf l_openvino_toolkit_runtime_raspbian_p_2020.1.023.tgz


Make the OpenVINO environment to initialize at the start of the Pi:

pi$ echo "source /home/pi/projects/openvino/l_openvino_toolkit_runtime_raspbian_p_2020.1.023/bin/setupvars.sh" >> ~/.bashrc
pi$ source ~/.bashrc 

[setupvars.sh] OpenVINO environment initialized


Finally add the USB rules:

pi$ sudo usermod -a -G users "$(whoami)"
pi$ sh ~/projects/openvino/l_openvino_toolkit_runtime_raspbian_p_2020.1.023/install_dependencies/install_NCS_udev_rules.sh

At this point, the NSC2 should work and I advise to test it before continuing:

Plug-in the Intel Movidius USB stick in the Pi, create a file "openvino_fd_myriad.py" and add the code below (as explained in “Run Inference of Face Detection Model Using OpenCV* API” at


import cv2 as cv

# Load the model.
net = cv.dnn_DetectionModel('face-detection-adas-0001.xml',
                            'face-detection-adas-0001.bin')

# Specify target device.
net.setPreferableTarget(cv.dnn.DNN_TARGET_MYRIAD)
# Read an image.
frame = cv.imread('face.jpg')
if frame is None:
    raise Exception('Image not found!')
# Perform an inference.
_, confidences, boxes = net.detect(frame, confThreshold=0.5)
# Draw detected faces on the frame.
for confidence, box in zip(list(confidences), boxes):
    cv.rectangle(frame, box, color=(0, 255, 0), thickness=3)
# Save the frame to an image file.
cv.imwrite('out.jpg', frame)

Make sure to download the following pre-trained Face Detection model instead of the one mentioned in the above document:

$ pi$ wget https://download.01.org/opencv/2019/open_model_zoo/R3/20190905_163000_models_bin/face-detection-adas-0001/FP32/face-detection-adas-0001.xml
$ pi$ wget https://download.01.org/opencv/2019/open_model_zoo/R3/20190905_163000_models_bin/face-detection-adas-0001/FP32/face-detection-adas-0001.bin


  • Upload a photo to the Pi, name it face.jpg
  • run the face detection code "pi$ python3 openvino_fd_myriad.py

If all works, the outcome (out.jpg) should be something like this:






Prepare a Docker image for conversion to IR

Converting a model to the OpenVINO Intermediate Representation (IR) needs to be done with a full OpenVINO-toolkit installation, and thus, not at the Raspberry Pi which has only a runtime version. In this post I use an Ubuntu Linux machine, it should work on Mac OS and Windows but the steps may differ.

To make things more manageable, and to not mess up your current Linux installation, I created a Dockerfile to do the conversion. It contains all the necessary components to convert. 


linux$ mkdir ~/projects
linux$ cd ~/projects
linux$ git clone https://github.com/brunokeymolen/devops.git
linux$ cd ~/projects/devops/docker-images/openvino-movidius


Download a full OpenVINO toolkit from the following link and put it next to the Dockerfile:


linux$:~/projects/devops/docker-images/openvino-movidius(master)$ ls -lGg
total 496316
-rw-r--r-- 1      2766 Apr 12 15:02 Dockerfile
-rw-r--r-- 1 508213676 Apr  5 11:04 l_openvino_toolkit_p_2020.1.023.tgz
-rw-r--r-- 1       431 Apr  6 20:07 README


Open the Dockerfile and, if needed, change:
ARG OPENVINO_TOOLKIT_NAME=l_openvino_toolkit_p_2020.1.023.tgz


Build the Docker image:
linux$ docker build -t openvino-movidius .

If all is fine the image should be something like this:
linux$ docker image ls | grep openvino-movidius
openvino-movidius            latest              89b2c076f373        2 weeks ago         3.86GB

Get the YOLOv3 scripts to convert the weights.

First, on the Linux host, download YOLOv3 (choose whether you want yolov3 or yolov3-tiny)

Because it is needed only temporarily, I installed it at the /tmp directory, adapt the location if you want.


linux$ mkdir /tmp/openvino
linux$ cd /tmp/openvino
linux$ git clone https://github.com/mystic123/tensorflow-yolo-v3.git
linux$ cd tensorflow-yolo-v3
linux$ git checkout ed60b90



Get the coco class names and download the model weights to the same directory:
linux$ wget https://raw.githubusercontent.com/pjreddie/darknet/master/data/coco.names
linux$ wget https://pjreddie.com/media/files/yolov3.weights
linux$ wget https://pjreddie.com/media/files/yolov3-tiny.weights

The result should be similar to this:
linux$ :/tmp/openvino/tensorflow-yolo-v3((HEAD detached at ed60b90))$ ls -lGg
total 276868
-rw-r--r-- 1       625 Apr 26 09:32 coco.names
-rw-r--r-- 1      3219 Apr 26 09:30 CODE_OF_CONDUCT.md
-rw-r--r-- 1      1552 Apr 26 09:30 convert_weights_pb.py
-rw-r--r-- 1      1474 Apr 26 09:30 convert_weights.py
-rw-r--r-- 1      3225 Apr 26 09:30 demo.py
-rw-r--r-- 1     11357 Apr 26 09:30 LICENSE
-rw-r--r-- 1      2335 Apr 26 09:30 README.md
-rw-r--r-- 1     10595 Apr 26 09:30 utils.py
-rw-r--r-- 1      9306 Apr 26 09:30 yolo_v3.py
-rw-r--r-- 1      4030 Apr 26 09:30 yolo_v3_tiny.py
-rw-r--r-- 1  35434956 Apr 26 09:33 yolov3-tiny.weights
-rw-r--r-- 1 248007048 Apr 26 09:32 yolov3.weights




Run (and share the /tmp/openvino directory)
linux$ docker run -ti --device-cgroup-rule='c 189:* rmw' -v /dev/bus/usb:/dev/bus/usb -v /tmp/openvino:/mnt/host openvino-movidius




Convert to IR

There are two steps in the conversion to IR.
  • convert the yolo weight file to .pb file
  • convert the .pb file to IR

Convert the Yolo weight file to .pb file
This step uses the scripts from the tensorflow-yolo-v3.git repository, which is on the /tmp directory on the Linux Host but needs to be executed from within the docker image, we use the shared directory for that.

docker$ cd /mnt/host/tensorflow-yolo-v3/
docker$ python3 convert_weights_pb.py --class_names coco.names --data_format NHWC --weights_file yolov3.weights

In case you need Yolov3-tiny:
docker$ python3 convert_weights_pb.py --class_names coco.names --data_format NHWC --weights_file yolov3-tiny.weights --tiny

You will notice a bunch of warnings but at the end, the following message appears:
1186 ops written to frozen_darknet_yolov3_model.pb.

Check if the .pb file is created:
docker$:/mnt/host/tensorflow-yolo-v3# ls -lGg | grep frozen_darknet_yolov3_model.pb
-rw-r--r-- 1 248192514 Apr 26 07:45 frozen_darknet_yolov3_model.pb

Convert the .pb file to IR

docker$ cd /opt/intel/openvino_2020.1.023/deployment_tools/model_optimizer 
docker$ export MO_ROOT=`pwd`

YOLOv3:
docker$ python3 mo_tf.py --input_model /mnt/host/tensorflow-yolo-v3/frozen_darknet_yolov3_model.pb --tensorflow_use_custom_operations_config $MO_ROOT/extensions/front/tf/yolo_v3.json --batch 1 --generate_deprecated_IR_V7

YOLOv3-tiny:
docker$ python3 mo_tf.py --input_model /mnt/host/tensorflow-yolo-v3/frozen_darknet_yolov3_tiny_model.pb --tensorflow_use_custom_operations_config $MO_ROOT/extensions/front/tf/yolo_v3_tiny.json --batch 1 --generate_deprecated_IR_V7

The result is the YOLOv3 model in IR format.
docker$  ls -lGg
...
-rw-r--r-- 1 247691380 Apr 26 07:55 frozen_darknet_yolov3_model.bin
-rw-r--r-- 1     29847 Apr 26 07:55 frozen_darknet_yolov3_model.mapping
-rw-r--r-- 1    108182 Apr 26 07:55 frozen_darknet_yolov3_model.xml
...


Copy them to the host and scp them to the Raspberry Pi.

docker$ cp frozen_darknet_yolov3_model.bin /mnt/host/tensorflow-yolo-v3/.
docker$ cp frozen_darknet_yolov3_model.xml /mnt/host/tensorflow-yolo-v3/.

pi$ mkdir ~/models

linux$ cd /tmp/openvino/tensorflow-yolo-v3
linux$ scp frozen_darknet_yolov3_model.xml pi@<ip address>:~/models/.
linux$ scp frozen_darknet_yolov3_model.bin pi@<ip address>:~/models/.
linux$ scp coco.names  pi@<ip address>:~/models/.

Run YOLOv3 Demo on the Raspberry Pi with camera input

In this step, we use the previously converted Darknet YOLOv3 model.



OpenCV has the following object detection demo:

At the Raspberry Pi:
pi$ mkdir ~/yolov3demo
pi$ cd ~/yolov3demo
pi$ wget https://raw.githubusercontent.com/opencv/open_model_zoo/master/demos/python_demos/object_detection_demo_yolov3_async/object_detection_demo_yolov3_async.py

The code has video output, so if you connect over ssh, login as follow: 
linux$ ssh pi@<ip address> -Y

you might need to install X on the Pi for the ssh -Y option, I did that by installing xterm (that pulls all necessary libraries):
pi$ sudo apt install xterm


If a camera is attached to the Raspberry Pi:
pi$ cd ~/yolodemo
pi$ python3 object_detection_demo_yolov3_async.py -m ~/models/frozen_darknet_yolov3_model.xml --labels ~/models/coco.names -d MYRIAD -i cam -pc




Inference Server
If you don't have a camera, no worries, please check my next post. I'll explain how to turn your Raspberry Pi, with Intel Movidius, into an Inference server accessible over a web interface.




Friday, May 3, 2013

Hough Transformation - C++ Implementation


fence - example 1
  
accumulator
flower stem - example 2

The Hough Transformation is a great way to detect lines in an image and it is quite useful for a number of Computer Vision tasks. 

A good definition can be found at http://en.wikipedia.org/wiki/Hough_transform

This post will guide you through (a little) theory and focusses on the C++ implementation of the linear Hough Transformation.


Theory

Let's take an image (Fig 1) with two lines A and B. Obviously both lines are each made of its own set of pixels laying on a straight line. Now, one way or another we need to learn our software which pixels are on a straight line and, if so, to what line they belong to.


Fig 1
Fig 2

The trick of the Hough Transformation is to represent lines in a polar coordinate system (see Fig 2), or in this case, also called "Hough Space". 

You can read this wiki for more details but important in Fig 2 is that we imagine a line from the center (the blue line) which is perpendicular to the line we want to find (the green line) and we call the distance from the center r and θ(theta) is the angle to the X axis (the angle between the blue line and the X axis).

The transformation from any (x, y) pixel in Fig1 to the (r, θ) representation int Fig2 is the following:

r = x.cos(θ) + y.sin(θ)

Now, if you lean back and think a bit about Fig2. Every possible line can be represented by a unique r and θ. And further, every pixel on a given line will transform to the exact same r and θ for that line. And this is immediately the most important aspect of the whole Hough stuff. If you get that, you understand the transformation.

So, given the above; every line in polar space is defined by its unique coordinates (r, θ). Let play with this and say we decide to use  θ ∈ { 1 ... 180 } with a resolution of 1 degree. 1 to 180 allow us to make any line if we allow r to be negative. Check Fig2, if we use the same θ but make r negative that line A' would be at the oposite side (left) and parallel to A.

That means that for every point (x,y) we can compute 180 r values.
(x1,y1)(x2,y2)
r1_1 = x1.cos(1) + y1.sin(1)
r1_2 = x1.cos(2) + y1.sin(2)
r1_3 = x1.cos(3) + y1.sin(3)
...
r1_180 = x1.cos(180) + y1.sin(180)
r2_1 = x2.cos(1) + y2.sin(1)
r2_2 = x2.cos(2) + y2.sin(2)
r2_3 = x2.cos(3) + y2.sin(3)
...
r2_180 = x2.cos(180) + y2.sin(180)


So far, all fine, but to make it really useful we need to add one extra element and that is called an Accumulator. An Accumulator is nothing more than a 2 dimensional matrix with θmax columns and rmax rows. So each cell, or bin, represents a unique coordinate (r, θ) and thus one unique line in Hough Space. 

The algorithm is as follow:

  • Start with an accumulator with all 0's
  • For every point in your image
    • compute r for θ = 1 to 180
    • increment the Accumulator by 1 for the every found r, so increment bin(r, θ)

Fig 3 - Illustration 















Once done, the accumulator bins contain a count of pixels for every possible line. Go over the binss and only keep the ones that are over a certain threshold (for example, you can decide to only consider it a real line if it contains at least 80 pixels). 

The result is a set of coordinates (r, θ), one for every straight line in the image thats above the threshold. And these polar coordinates easily can be computed back to 2 points in the original image space, just solve the equation for x and y. 



         




C++ Implementation

The following code is a command line implementation that take an image path as input (png, jpg, ...) and displays a view with the detected lines drawn over the image. It also show a view of the accumulator, which can look pretty neat.

The procedure is as follow
  1. Load the color image
  2. Do Canny edge detection on it  so we have a BW image with lines and points representing the edges
  3. Do a hough transformation on the edge image
  4. Search the accumulator for all lines given a certain threshold
  5. Visualize all on the screen 
Since we focus on the implementation of step 3 & 4 (Hough) we are going to use some open source for step 1, 2 and 5. I decided to use OpenCV (http://opencv.org), its open source, popular, awesome and pretty easy to use. 

OpenCV itself has a Hough Transformation, but we are not going to use that, our goal is to get insight in the inner working of the transformation and implement it ourselves. Also I think OpenCV implemented a Probabilistic or Randomized Hough Transform, wich is a faster but different approach http://en.wikipedia.org/wiki/Randomized_Hough_transform'.



1, 2 - Load the image and do the edge detection


Let's work with the following image:

Original Image

1:       cv::Mat img_edge;  
2:       cv::Mat img_blur;  
3:    
4:       cv::Mat img_ori = cv::imread( file_path, 1 );  
5:       cv::blur( img_ori, img_blur, cv::Size(5,5) );  
6:       cv::Canny(img_blur, img_edge, 100, 150, 3);  
7:    
8:       int w = img_edge.cols;  
9:       int h = img_edge.rows;  

This is all OpenCV stuff, line 4 loads the image from path 'file_path'. Line 5 blurs it a bit and line 6 does the actual Edge detection. The result is a Black and White image 'img_edge' with the edges.

Canny Edge Detection















3 - The Hough Transformation

The following code does the actual transformation, img_data contains the edge data. The function loops over all pixels in the edge image and increments the accumulator at  the computed (r, θ).

1:       int Hough::Transform(unsigned char* img_data, int w, int h)  
2:       {  
3:            _img_w = w;  
4:            _img_h = h;  
5:    
6:            //Create the accu  
7:            double hough_h = ((sqrt(2.0) * (double)(h>w?h:w)) / 2.0);  
8:            _accu_h = hough_h * 2.0; // -r -> +r  
9:            _accu_w = 180;  
10:    
11:            _accu = (unsigned int*)calloc(_accu_h * _accu_w, sizeof(unsigned int));  
12:    
13:            double center_x = w/2;  
14:            double center_y = h/2;  
15:    
16:    
17:            for(int y=0;y<h;y++)  
18:            {  
19:                 for(int x=0;x<w;x++)  
20:                 {  
21:                      if( img_data[ (y*w) + x] > 250 )  
22:                      {  
23:                           for(int t=0;t<180;t++)  
24:                           {  
25:                                double r = ( ((double)x - center_x) * cos((double)t * DEG2RAD)) + (((double)y - center_y) * sin((double)t * DEG2RAD));  
26:                                _accu[ (int)((round(r + hough_h) * 180.0)) + t]++;  
27:                           }  
28:                      }  
29:                 }  
30:            }  
31:    
32:            return 0;  
33:       }  
34:    

Line 11 creates the empty accumulator, we have chosen to compute r for θ = {1..180} so the width contain 180 bins, the maximum height depends on the image size and is computed at line 7.

Line 17 to 30 loops over every point in the edge image and computes r for every 
θ at line 25, you should see r = x.cos(θ) + y.sin(θ) in that. Little extra math is added to work from the center of the image. The found accumulator bin is incremented at line 26.


4. Search the accumulator for all lines given a certain threshold

A few steps are done here; we loop over the accumulator and consider every bin which value is on or above the threshold. Than we check if that bin is a local maxima. If we decide the line at coordinates (r, θ) is a valid one we compute them back to two points in the image space.


1:       std::vector< std::pair< std::pair<int, int>, std::pair<int, int> > > Hough::GetLines(int threshold)  
2:       {  
3:            std::vector< std::pair< std::pair<int, int>, std::pair<int, int> > > lines;  
4:    
5:            if(_accu == 0)  
6:                 return lines;  
7:    
8:            for(int r=0;r<_accu_h;r++)  
9:            {  
10:                 for(int t=0;t<_accu_w;t++)  
11:                 {  
12:                      if((int)_accu[(r*_accu_w) + t] >= threshold)  
13:                      {  
14:                           //Is this point a local maxima (9x9)  
15:                           int max = _accu[(r*_accu_w) + t];  
16:                           for(int ly=-4;ly<=4;ly++)  
17:                           {  
18:                                for(int lx=-4;lx<=4;lx++)  
19:                                {  
20:                                     if( (ly+r>=0 && ly+r<_accu_h) && (lx+t>=0 && lx+t<_accu_w) )  
21:                                     {  
22:                                          if( (int)_accu[( (r+ly)*_accu_w) + (t+lx)] > max )  
23:                                          {  
24:                                               max = _accu[( (r+ly)*_accu_w) + (t+lx)];  
25:                                               ly = lx = 5;  
26:                                          }  
27:                                     }  
28:                                }  
29:                           }  
30:                           if(max > (int)_accu[(r*_accu_w) + t])  
31:                                continue;  
32:    
33:    
34:                           int x1, y1, x2, y2;  
35:                           x1 = y1 = x2 = y2 = 0;  
36:    
37:                           if(t >= 45 && t <= 135)  
38:                           {  
39:                                //y = (r - x cos(t)) / sin(t)  
40:                                x1 = 0;  
41:                                y1 = ((double)(r-(_accu_h/2)) - ((x1 - (_img_w/2) ) * cos(t * DEG2RAD))) / sin(t * DEG2RAD) + (_img_h / 2);  
42:                                x2 = _img_w - 0;  
43:                                y2 = ((double)(r-(_accu_h/2)) - ((x2 - (_img_w/2) ) * cos(t * DEG2RAD))) / sin(t * DEG2RAD) + (_img_h / 2);  
44:                           }  
45:                           else  
46:                           {  
47:                                //x = (r - y sin(t)) / cos(t);  
48:                                y1 = 0;  
49:                                x1 = ((double)(r-(_accu_h/2)) - ((y1 - (_img_h/2) ) * sin(t * DEG2RAD))) / cos(t * DEG2RAD) + (_img_w / 2);  
50:                                y2 = _img_h - 0;  
51:                                x2 = ((double)(r-(_accu_h/2)) - ((y2 - (_img_h/2) ) * sin(t * DEG2RAD))) / cos(t * DEG2RAD) + (_img_w / 2);  
52:                           }  
53:    
54:                           lines.push_back(std::pair< std::pair<int, int>, std::pair<int, int> >(std::pair<int, int>(x1,y1), std::pair<int, int>(x2,y2)));  
55:    
56:                      }  
57:                 }  
58:            }  
59:    
60:            std::cout << "lines: " << lines.size() << " " << threshold << std::endl;  
61:            return lines;  
62:       }  


We loop over the accumulator at line 8 - 10, we check if the value of a bin is on or above our threshold at line 12, if so we check if this point is a local maxima at line 14 - 32. If we have a valid line the (r, θ) coordinates are computed back to two points in image space at lines 34 - 52. The results are store in a vector at line 54




5. Visualize all on the screen

1:            //Search the accumulator  
2:            std::vector< std::pair< std::pair<int, int>, std::pair<int, int> > > lines = hough.GetLines(threshold);  
3:    
4:            //Draw the results  
5:            std::vector< std::pair< std::pair<int, int>, std::pair<int, int> > >::iterator it;  
6:            for(it=lines.begin();it!=lines.end();it++)  
7:            {  
8:                 cv::line(img_res, cv::Point(it->first.first, it->first.second), cv::Point(it->second.first, it->second.second), cv::Scalar( 0, 0, 255), 2, 8);  
9:            }  
10:    
11:            //Visualize all  
12:            int aw, ah, maxa;  
13:            aw = ah = maxa = 0;  
14:            const unsigned int* accu = hough.GetAccu(&aw, &ah);  
15:    
16:            for(int p=0;p<(ah*aw);p++)  
17:            {  
18:                 if(accu[p] > maxa)  
19:                      maxa = accu[p];  
20:            }  
21:            double contrast = 1.0;  
22:            double coef = 255.0 / (double)maxa * contrast;  
23:    
24:            cv::Mat img_accu(ah, aw, CV_8UC3);  
25:            for(int p=0;p<(ah*aw);p++)  
26:            {  
27:                 unsigned char c = (double)accu[p] * coef < 255.0 ? (double)accu[p] * coef : 255.0;  
28:                 img_accu.data[(p*3)+0] = 255;  
29:                 img_accu.data[(p*3)+1] = 255-c;  
30:                 img_accu.data[(p*3)+2] = 255-c;  
31:            }  
32:    
33:    
34:            cv::imshow(CW_IMG_ORIGINAL, img_res);  
35:            cv::imshow(CW_IMG_EDGE, img_edge);  
36:            cv::imshow(CW_ACCUMULATOR, img_accu);  
37:    


After we searched the accumulator at line 2 an OpenCV function is used to draw lines over the original image at line 5 - 9. Line 12 - 31 builds a visual representation of the accumulator. Finally line 34 - 36 displays all on screen.

The result with a threshold of 175 is the following:
accumulator
result



Sources

Full source code can be found at: https://github.com/brunokeymolen/hough

Make sure to install OpenCV (I used 2.4.4) prior to compile; http://opencv.org/