Developing a High-Availability Software Solution for Onboard Rail Computers

Building a reliable software solution for onboard rail computers requires a well-thought-out approach, considering hardware, operating systems, and the need for flexibility. This article outlines the critical choices we made to create a high-availability solution for rail systems, focusing on the selection of the operating system.

Key Challenges in Designing Rail Software

Onboard rail hardware resembles embedded systems that must last well over 10 years, complicating hardware and software upgrades. We balanced adopting newer technologies with the need for short and long-term stability. Embedded Linux distributions proved the most suitable due to their flexibility and long-term support, and this led us to choose Yocto for its customization capabilities.

Installation Strategy: Feature-Rich vs. Minimal Distributions

When designing embedded systems, the choice between using a feature-rich or minimal distribution affects scalability and control:

  • Feature-Rich (e.g., Debian): Offers many packages but requires removing unnecessary components, which can be cumbersome
  • Minimal Distributions: Leaner but often too restrictive for long-term, complex projects

Given our need for control and scalability, neither option alone was ideal. We required a unified system image tailored specifically for embedded environments, leading us to a customized Yocto build.

High-Availability System Design

To ensure stability in challenging conditions, we adopted a dual-partition setup combined with a failsafe bootloader (GRUB) and an Over-the-Air (OTA) monolithic update system (RAUC). This design allows us to roll back to a stable system in case of failures and avoid disruptions while trains are in service.

Benefits of this Setup:

  • Prevent Boot Loops: Dual partitions ensure a stable system can always be restored
  • OTA Updates: Monolithic updates simplify the process and enhance control over system integrity

In-Memory System Architecture

Our system primarily runs in memory (RAM), boosting performance and reliability. The core system image remains intact during operations, reducing the risk of corruption. Separate storage handles read/write operations, ensuring that the core system stays stable while dynamic data is processed independently.

Why We Chose Yocto

Selecting the right Linux distribution was key to supporting our dual architecture (NXP and x86). While lightweight distributions like OpenWRT could handle smaller systems, Yocto offered the flexibility to build a containerized platform scalable across different hardware configurations. Yocto also integrates with key tools like RAUC, GRUB, and U-Boot, allowing us to fully customize the system from the bootloader to the operating system.

Building OxOS: A Tailored Solution

We developed OxOS, a Yocto-based operating system with our Oxyfi application layer. It provides flexibility, scalability, and portability for embedded rail systems. Tight integration with essential components ensures that updates and maintenance are streamlined while providing a high level of control.

Conclusion

Managing software for IoT fleets through OTA updates is complex, especially when factoring in hardware and application layers. Our choice of a unified, custom-built operating system, Yocto, allows for stable and scalable rail solutions. By focusing on system integrity and long-term reliability, we ensure operational continuity, even in challenging rail environments.