Computer Vision and Point Clouds at the edge
Hardware review and benchmarks.
by Shan Swanlow
At the present moment, Omnipresent is pursuing the application of Edge Computing to solve problems using computer vision and 3D point clouds. Both of these processes are challenging to run on limited hardware, therefore for the long term success of this edge device application, it is critical to assess all factors that influence the ability of an edge device to perform the tasks outlined above. The following report provides discussion on what would be required in a device, and what might hinder said device in performing these tasks.
Specific features are required to successfully run point cloud processing and computer vision at the edge. The following features are absolutely required for the device to support all necessary processes and safely complete tasks end-to-end:
CUDA support: Omnipresent’s current computer vision framework of choice- detectron2, requires CUDA support in order to be installed and run successfully. It is possible to install and utilise other computer vision frameworks that do not require CUDA, however, detectron2 supports multiple features that other frameworks presently do not.
BIOS watchdog: Although computers generally are robust and only follow instructions, unexpected software events and fatal crashes can happen. In these scenarios, since the device is deployed at the edge, if a fatal crash occurs, the device cannot be switched on without human intervention. A BIOS Watchdog addresses this issue by automatically rebooting the device if a fatal crash is detected.
WLAN access: Internet access is required for all data transmission. It is pertinent to note that this WLAN access should be able to facilitate simultaneous hosting of an access point, and connection to a WiFi network. The access point hosting is necessary so that IoT sensors can send their findings to the edge device. The connection to a WiFi network is required in order for the edge device to communicate with cloud services.
Potential Risks and their Mitigation
Although the requirements outlined above provide a fair degree of certainty that the device will be able to meet the. A brief overview of risks/unknown factors and possible solutions for them are written below:
Memory usage: Point clouds and computer vision tasks can take up large amounts of memory (elaborated on further in the Product Review section). This is due to the large amounts of data loaded into memory, and all the mathematical operations performed on that data. It is possible that the edge device can run out of memory performing these operations, but it is possible to make use of swap space to ensure that there is enough memory for the process to complete. Swap space can be thought of as using hard drive space as “virtual RAM”. This requires that the edge device has unused, additional storage of a few gigabytes.
Data storage: These devices are likely to be left on site for an extended period of time. As data continuously continues to accumulate, storage usage will increase until capacity is reached. Establishing a set of rules for handling data and deciding what is important to keep, would be useful. Alternatively, compressing stored data can limit how much space is consumed.
Certain features are not required to perform the desired tasks of computer vision and point cloud processing, however, they would improve the efficiency and reliability of the device. Features that fall under this category are listed below:
CPU architecture: RISC (Reduced Instruction Set Computers) architectures are more appropriate for edge scenarios as they cost less, and consume significantly less power. This is achieved by having a lower transistor count and lower clock cycles due to their simpler instruction sets. ARM Holdings is currently the largest producer of RISC CPUs and has a significant market share overall.
Security: Although the edge device will be deployed without any peripherals, it is still possible to access the device if a screen and keyboard are attached to it. “Locking down” the operating system immediately after deployment would address this concern, although access management should also be implemented for technicians- passwords are not recommended for this, instead, a 2 factor approach may be more secure.
Product Review and Comparison
To date, Omnipresent has made extensive use of the NVIDIA Jetson Nano. The Jetson Nano has proven to be a promising device for usage in Edge Computing and IoT. By having native CUDA support and a BIOS watchdog, it has met 2 of the 3 necessary requirements. However, there have been shortcomings found within the Jetson Nano. Below is a non-exhaustive list of issues that were found throughout its testing and usage:
No built-in WLAN: The lack of built-in WLAN access was resolved by the addition of a WiFi USB adapter, however, this has also given rise to other issues. The most significant of these, is that not all WiFi modules are able to host an access point and connect to WiFi simultaneously. This induces unnecessary procurement and research.
Low memory and weaker CPU: When testing object detection on the device, the process could run end-to-end, however it exhausted all resources available on the Jetson Nano. Examining resource usage using the top command revealed that at its peak, detectron2 consumed 100% of the CPU, and nearly all the RAM, including swap space- leaving only ~90MB free.
Bearing the above in mind, many devices were surveyed that could potentially meet . A device was found that meets the necessary requirements, and isn't prone to the above issues, however it comes at a much higher cost. This device is the NVIDIA Jetson TX2, and is priced at around $460. A comparison of its specs and The Jetson Nano’s specs is provided below, most notable is the stronger GPU, and the existence of a built-in WLAN module:
In the future, as our understanding of the chosen edge device improves, or if we develop key partnerships, various steps can be taken to improve the edge device’s efficiency and reliability in the long term. Some areas of improvement are listed below:
Firstly, the operating system could be modified for the business case. The Jetson series of devices currently run a modified version of Ubuntu that contains a full GUI, including window managers and desktop effects. These (and many other features) are unnecessary to have once the device is deployed at the edge. Additionally, these desktop features require large quantities of RAM. By removing unnecessary features of the operating system, more memory is available to critical processing functions, and in addition, less bugs and issues are likely to occur as whole modules are removed.
Secondly, a modifiable bootloader should also be considered- this is especially useful if Omnipresent wishes to perform tasks that require deeper access to the operating system and kernel. As an example, a utility (kdump-tools) exists that can generate logs of what went wrong if the system encounters a fatal crash, and it requires kernel access to be installed successfully. These logs can help significantly in making the device more stable and robust. However, as NVIDIA uses their own proprietary boot software, it was not possible to install this utility successfully. It remains to be seen if, and how, this level of access could be established on the Jetson series of devices.
In addition, libraries and dependencies could be optimized for the exact CPU used. Although many of the dependencies that are used are somewhat optimized due to compiler configurations and the general installation process, it is still possible to obtain even more optimization for mission-critical code. This is possible through the usage of compiler intrinsics and auto-vectorization. Intrinsics are code which generate precise CPU instructions (i.e. assembly code)- precise assembly is much faster than automatically compiled code. Auto-vectorization is a process that requires some compiler configuration, but once done, the compiler can identify places within your code where it can use special CPU instructions to gain more speed. The speed increases are quite significant- in the testing of another SoC for computer vision (the NanoPC-T4, an ARM-based device), we were able to speed up the process of object detection by approximately 15 seconds through the usage of auto-vectorization.
Finally, adoption of the E57 standard could also be considered. There exists an international standard for exchanging and storing point cloud data- ASTM E2807 - 11(2019). By understanding how the data is stored, Omnipresent could develop algorithms that allow point clouds to be processed more efficiently at the edge by taking advantage of the given structure and using only specific properties as they are needed, as opposed to loading all point cloud features into memory at once. This could also assist in addressing memory concerns, as less memory will be consumed by taking this approach.
Achieving the task of computer vision and point cloud analysis at the edge does not require niche or complex features, but is heavily reliant on computing power. Although the cost of the Jetson TX2 is substantially higher than that of the Jetson Nano, the peace of mind given by the extra RAM, CPU and GPU, may justify the cost. And in addition to this, additional features will still have to be researched and deployed on the board, such as advanced security features. As Omnipresent continues to develop on the Jetson, it may be possible to apply the future considerations mentioned above, and perhaps meet all necessary requirements through implementing all the suggestions outlined above.