Learn CLER
CLER is designed around simplicity: Blocks are just structs with methods, Channels are type-safe queues, and the framework compiles away to almost nothing.
Understanding CLER
Real Examples, Real Code
Learning DSP frameworks is best done by seeing working code in action. The examples below come directly from our desktop_examples directory — they're not toy code, but real applications demonstrating CLER's core capabilities.
Platform Support
CLER supports both desktop and embedded platforms. Desktop examples use standard threads, while embedded frameworks include bare-metal, Zephyr, FreeRTOS, and ThreadX implementations.
Core Concepts
Blocks Are Just Structs
Every CLER block is a C++ struct with a procedure()
method. Simple inheritance from BlockBase with minimal virtual functions only for interface:
struct GainBlock : public cler::BlockBase {
float gain;
GainBlock(const char* name, float gain_value)
: BlockBase(name), gain(gain_value) {}
cler::Result<cler::Empty, cler::Error> procedure(
cler::ChannelBase<float>* in,
cler::ChannelBase<float>* out) {
size_t transferable = std::min(in->size(), out->space());
for (size_t i = 0; i < transferable; ++i) {
float value;
in->pop(value);
out->push(value * gain);
}
return cler::Empty{};
}
};
Design Rationale: Minimal virtual functions (only for interface), very little runtime type erasure, minimal hidden costs. What you write is mostly what gets compiled.
Multiple Input/Output Blocks
Blocks can have any number of inputs and outputs. Here's an adder that takes 2 inputs:
struct AdderBlock : public cler::BlockBase {
cler::Channel<float> in0; // First input channel (owned by block)
cler::Channel<double> in1; // Second input (different type!)
AdderBlock(const char* name) : BlockBase(name), in0(512), in1(512) {}
cler::Result<cler::Empty, cler::Error> procedure(
cler::ChannelBase<float>* out) {
size_t transferable = std::min({in0.size(), in1.size(), out->space()});
for (size_t i = 0; i < transferable; ++i) {
float value0;
double value1;
in0.pop(value0);
in1.pop(value1);
out->push(value0 + static_cast<float>(value1));
}
return cler::Empty{};
}
};
Key Design: Input channels are owned by the block, output channels are passed as parameters. This prevents one-to-many connection issues.
Type-Agnostic Channels
Channels work with any type — not just numbers:
// Custom packet type
struct Packet {
uint32_t id;
std::vector<uint8_t> data;
};
// Channel that carries packets
cler::Channel<Packet> packet_channel(64);
// Block that processes packets
struct PacketProcessor : public cler::BlockBase {
cler::Channel<Packet> in; // Input channel owned by block
PacketProcessor(const char* name) : BlockBase(name), in(64) {}
template<typename... OChannels>
cler::Result<cler::Empty, cler::Error> procedure(OChannels*... outs) {
Packet packet;
while (in.pop(packet)) {
// Process packet...
packet.id += 1000; // Example processing
// Push to all outputs using fold expression
((outs->push(packet)), ...);
}
return cler::Empty{};
}
};
Flexibility: You're not limited to primitive types. Pass complex data structures, protocol packets, or custom objects between blocks.
Two Execution Models
Different applications need different trade-offs. Flowgraph mode is convenient and handles threading for you. Streamlined mode gives you maximum control for embedded systems or when you need every microsecond.
Flowgraph Mode
Each block runs in its own thread, framework manages everything:
// Desktop version
auto flowgraph = cler::make_desktop_flowgraph(
cler::BlockRunner(&source, &adder.in[0], &adder.in[1]), // multiple outputs
cler::BlockRunner(&adder, &gain.in), // single output
cler::BlockRunner(&gain, &sink.in),
cler::BlockRunner(&sink) // no outputs (sink)
);
// Configure scheduler and performance options
cler::FlowGraphConfig config;
config.scheduler = cler::SchedulerType::ThreadPerBlock; // Or FixedThreadPool (recommended)
// config.adaptive_sleep = true; // Optional: for sparse/intermittent data only
flowgraph.run(config);
// Embedded version (FreeRTOS example)
#include "task_policies/cler_freertos_tpolicy.hpp"
auto flowgraph = cler::FlowGraph<cler::FreeRTOSTaskPolicy,
decltype(cler::BlockRunner(&source, &adder.in[0], &adder.in[1])),
decltype(cler::BlockRunner(&adder, &gain.in)),
decltype(cler::BlockRunner(&gain, &sink.in)),
decltype(cler::BlockRunner(&sink))
>(
cler::BlockRunner(&source, &adder.in[0], &adder.in[1]),
cler::BlockRunner(&adder, &gain.in),
cler::BlockRunner(&gain, &sink.in),
cler::BlockRunner(&sink)
);
flowgraph.run(config); // Same config works on all platforms
BlockRunner Syntax: First argument is the block, remaining arguments are its outputs (passed to procedure() as parameters). Same syntax works across all platforms.
Streamlined Mode
You control the loop, manual scheduling (from streamlined.cpp):
while (true) {
source.procedure(&adder.in[0], &adder.in[1]); // multiple outputs
adder.procedure(&gain.in); // single output
gain.procedure(&sink.in);
sink.procedure(); // no outputs
}
Use Cases: When blocks are simple, you avoid thread overhead. Perfect for embedded systems or when you need deterministic timing.
Platform Support & Task Policies
CLER supports multiple platforms through its task policy abstraction. Different platforms require different threading models, but the same block code works everywhere.
Desktop Development
#include "cler.hpp"
#include "task_policies/cler_desktop_tpolicy.hpp"
int main() {
// Create blocks...
SourceCWBlock<float> source("Source", 1.0f, 10.0f, 1000);
GainBlock<float> gain("Gain", 2.0f);
// Desktop flowgraph uses std::thread
auto flowgraph = cler::make_desktop_flowgraph(
cler::BlockRunner(&source, &gain.in),
cler::BlockRunner(&gain, &sink.in),
cler::BlockRunner(&sink)
);
// Configure scheduler for optimal performance
cler::FlowGraphConfig config;
config.scheduler = cler::SchedulerType::ThreadPerBlock; // Default scheduler scheduler
// config.adaptive_sleep = true; // Optional: reduce CPU when idle
flowgraph.run(config);
return 0;
}
Embedded Systems
FreeRTOS
#include "cler.hpp"
#include "task_policies/cler_freertos_tpolicy.hpp"
int main() {
// Same blocks work on any platform
SourceCWBlock<float> source("Source", 1.0f, 10.0f, 1000);
GainBlock<float> gain("Gain", 2.0f);
// FreeRTOS flowgraph uses xTaskCreate
auto flowgraph = cler::FlowGraph<cler::FreeRTOSTaskPolicy,
decltype(cler::BlockRunner(&source, &gain.in)),
decltype(cler::BlockRunner(&gain, &sink.in)),
decltype(cler::BlockRunner(&sink))
>(
cler::BlockRunner(&source, &gain.in),
cler::BlockRunner(&gain, &sink.in),
cler::BlockRunner(&sink)
);
flowgraph.run();
vTaskStartScheduler(); // FreeRTOS scheduler
return 0;
}
Bare Metal (No Threading)
#include "cler.hpp"
// No task policy needed for bare metal
int main() {
// Stack-allocated channels for deterministic memory
struct SimpleBlock : public cler::BlockBase {
cler::Channel<float, 64> in; // Fixed size, no heap allocation
SimpleBlock(const char* name) : BlockBase(name) {}
template<typename... OChannels>
cler::Result<cler::Empty, cler::Error> procedure(OChannels*... outs) {
// Minimal processing for embedded constraints
float sample;
while (in.pop(sample)) {
((outs->push(sample * 2.0f)), ...);
}
return cler::Empty{};
}
};
SimpleBlock block1("Block1"), block2("Block2");
// Streamlined mode only for bare metal
while (true) {
block1.procedure(&block2.in);
block2.procedure(); // No outputs
// Hardware-specific timing control
}
}
Cyclic Graphs - Control System Example
This demonstrates CLER's support for cyclic graphs — the controller's output feeds the plant, and the plant's output feeds back to the controller, creating a closed feedback loop. This isn't just signal processing — it's a complete interactive simulation with real-time GUI controls.
The Plant Block
struct PlantBlock : public cler::BlockBase {
cler::Channel<float> force_in;
PlantBlock(const char* name) : BlockBase(name), force_in(1024) {
force_in.push(0.0f); // CRITICAL: Initial force for cyclic graph
}
cler::Result<cler::Empty, cler::Error> procedure(
cler::ChannelBase<float>* measured_position_out) {
float force;
force_in.pop(force);
// Mass-spring-damper physics simulation
float acceleration = (force - K * _x - C * _v) / M;
_v += acceleration * DT;
_x += _v * DT + 0.5f * acceleration * DT * DT;
measured_position_out->push(_x);
return cler::Empty{};
}
private:
float _x = 0.0, _v = 0.0; // Position and velocity state
};
Critical Detail: The plant block maintains internal state and connects to the controller in a feedback loop. The force_in.push(0.0f)
is crucial — cyclic graphs need initial conditions or they'll deadlock on startup.
Interactive PID Controller with Thread-Safe GUI
struct ControllerBlock : public cler::BlockBase {
cler::Channel<float> measured_position_in;
// Thread-safe parameters modified by GUI
std::atomic<float> _kp{2.0f}, _ki{1.0f}, _kd{1.0f};
std::atomic<float> _target{10.0f};
std::atomic<bool> _feed_forward{false};
cler::Result<cler::Empty, cler::Error> procedure(
cler::ChannelBase<float>* force_out) {
float measured_position;
measured_position_in.pop(measured_position);
// Read parameters atomically (modified by GUI thread)
float target = _target.load(std::memory_order_relaxed);
float kp = _kp.load(std::memory_order_relaxed);
float error = target - measured_position;
float derivative = (error - _error_prev) / DT;
_integral += error * DT;
float force = kp * error + _ki.load() * _integral + _kd.load() * derivative;
// Optional feed-forward control
if (_feed_forward.load()) {
force += K * target; // Compensation for spring constant
}
force_out->push(force);
_error_prev = error;
return cler::Empty{};
}
private:
float _error_prev = 0.0, _integral = 0.0;
};
Thread Safety: The GUI thread modifies PID parameters while the control loop runs at 100Hz. Atomic variables ensure thread-safe updates without locks, maintaining real-time performance. The controller remains responsive even while parameters change.
The Cyclic Flowgraph
auto flowgraph = cler::make_desktop_flowgraph(
cler::BlockRunner(&controller, &throttle.in),
cler::BlockRunner(&throttle, &plant.force_in), // Controller → Plant
cler::BlockRunner(&plant, &fanout.in),
cler::BlockRunner(&fanout, &plot.in[0], &controller.measured_position_in) // Plant → Controller (feedback)
);
BlockRunner Syntax: First argument is the block to run, remaining arguments are the outputs (channels passed to the block's procedure() method).
Feedback Loop: Notice how the data flows Controller → Plant → Fanout → back to Controller, creating a complete feedback loop. The initial force value in the plant prevents deadlock at startup.
Performance Considerations
Choosing between flowgraph and streamlined modes depends on the computational weight of your blocks versus the overhead of orchestration.
When Streamlined Mode Wins
If your blocks do minimal work per sample, the thread synchronization overhead in flowgraph mode can dominate. Consider this anti-pattern:
// Anti-pattern: Simple gain block in flowgraph mode
struct GainBlock : public cler::BlockBase {
cler::Channel<float> in; // Input channel owned by block
float gain;
GainBlock(const char* name, float gain_value)
: BlockBase(name), in(512), gain(gain_value) {}
template<typename... OChannels>
cler::Result<cler::Empty, cler::Error> procedure(OChannels*... outs) {
// Just multiplying by a constant - very little work!
float value;
while (in.pop(value)) {
// Push to all outputs using fold expression
((outs->push(value * gain)), ...);
}
return cler::Empty{};
}
};
The Problem: Each sample requires thread synchronization (pushing/popping from channels). The overhead of synchronization far exceeds the actual computation (one multiplication).
Better Approaches
Option 1: Combine lightweight operations into heavier blocks
// Better: Combine multiple operations
struct ProcessingBlock : public cler::BlockBase {
cler::Channel<float> in; // Input channel owned by block
float gain;
float offset;
ProcessingBlock(const char* name, float gain_value, float offset_value)
: BlockBase(name), in(512), gain(gain_value), offset(offset_value) {}
template<typename... OChannels>
cler::Result<cler::Empty, cler::Error> procedure(OChannels*... outs) {
float buffer[256];
size_t count = in.readN(buffer, 256);
// Process in batches - amortizes synchronization cost
for (size_t i = 0; i < count; ++i) {
buffer[i] = buffer[i] * gain + offset;
// Could add filtering, decimation, etc. here
}
// Write to all outputs using fold expression
((outs->writeN(buffer, count)), ...);
return cler::Empty{};
}
};
Option 2: Use streamlined mode for simple chains
// Streamlined mode - no thread overhead
while (running) {
source.procedure(&gain.in);
gain.procedure(&offset.in);
offset.procedure(&sink.in);
sink.procedure();
}
Stack-Allocated Channels
For embedded systems or when you know the exact buffer size needed:
struct FilterBlock : public cler::BlockBase {
// Stack-allocated channel - no heap allocation!
cler::Channel<float, 1024> in; // Template parameter sets size
FilterBlock(const char* name) : BlockBase(name) {}
template<typename... OChannels>
cler::Result<cler::Empty, cler::Error> procedure(OChannels*... outs) {
// Process data...
// Push to all outputs using fold expression when ready
return cler::Empty{};
}
};
Benefits: No dynamic allocation, better cache locality, deterministic memory usage — critical for real-time embedded systems.
Scheduler Types & Performance Optimization
CLER provides two highly optimized schedulers designed for different workload characteristics. Choosing the right scheduler can significantly improve throughput and reduce CPU usage.
ThreadPerBlock (Default)
cler::FlowGraphConfig config;
config.scheduler = cler::SchedulerType::ThreadPerBlock; // Default scheduler
// config.adaptive_sleep = true; // Only for sparse/intermittent data
flowgraph.run(config);
Best for: Default choice, debugging, simple flowgraphs, guaranteed per-block isolation
Characteristics: One dedicated thread per block (default, simple, debuggable)
Trade-offs: Simple and predictable, but thread overhead grows with block count. Excellent starting point for most applications.
FixedThreadPool (Cache-Optimized)
cler::FlowGraphConfig config;
config.scheduler = cler::SchedulerType::FixedThreadPool; // Recommended
config.num_workers = 4; // Number of worker threads (minimum 2, adjust for your CPU)
// config.adaptive_sleep = true; // Optional: only for sparse data
// config.collect_detailed_stats = false; // Ultra-high performance mode
flowgraph.run(config);
Best for: Constrained systems, embedded applications, memory-limited environments
Characteristics: Cache-optimized worker threads with platform-aware memory layout
Benefits: Better for constrained systems, automatic cache line detection for x86/ARM/RISC-V, lower thread overhead than ThreadPerBlock.
Adaptive Sleep Configuration
⚠️ Warning: Adaptive sleep reduces CPU usage but can significantly impact throughput. Only use for sparse/intermittent data sources like sensors or network packets with gaps.
// Adaptive sleep works with both scheduler types - USE CAREFULLY
// Can significantly reduce throughput, only use for sparse/intermittent data
config.adaptive_sleep = true; // CAUTION: reduces CPU usage but may impact throughput
config.adaptive_sleep_multiplier = 1.5; // How aggressively to increase sleep
config.adaptive_sleep_max_us = 5000.0; // Maximum sleep time (microseconds)
config.adaptive_sleep_fail_threshold = 10; // Start sleeping after N failures
// For very sparse data (sensors, network packets)
config.adaptive_sleep_multiplier = 2.0; // Aggressive backoff
config.adaptive_sleep_max_us = 10000.0; // Up to 10ms sleep
config.adaptive_sleep_fail_threshold = 5; // Sleep after 5 failures
Use Cases: Sensor data, network packets, user input processing. Can achieve >90% CPU reduction during idle periods while maintaining low latency when data arrives.
Buffer Access Patterns
CLER provides three main ways to access buffer data. Here's what we recommend:
Simple Recommendation
1. read_dbf/write_dbf (TRY THIS FIRST)
// Fastest and cleanest - zero-copy direct buffer access
auto [read_ptr, read_size] = in->read_dbf();
auto [write_ptr, write_size] = out->write_dbf();
size_t to_process = std::min(read_size, write_size);
for (size_t i = 0; i < to_process; ++i) {
write_ptr[i] = read_ptr[i] * gain; // Direct processing
}
in->commit_read(to_process);
out->commit_write(to_process);
✅ RECOMMENDED - Best performance, clean code.
2. readN/writeN (SOLID FALLBACK)
// Simple and reliable - use when DBF not available
float buffer[256];
size_t count = in->readN(buffer, 256);
for (size_t i = 0; i < count; ++i) {
buffer[i] *= gain;
}
out->writeN(buffer, count);
✅ GOOD FALLBACK - Still very fast, works everywhere, simple API
3. peek/commit (COMPLEX - NOT RECOMMENDED)
const float *ptr1, *ptr2;
size_t size1, size2;
size_t available = in.peek_read(ptr1, size1, ptr2, size2);
// Process directly from input buffer
for (size_t i = 0; i < size1; ++i) buffer[i] = ptr1[i] * gain;
for (size_t i = 0; i < size2; ++i) buffer[size1+i] = ptr2[i] * gain;
in.commit_read(size1 + size2);
out->writeN(buffer, size1 + size2);
⚠️ Complex implementation - Only ~5% faster than readN/writeN, not worth the complexity
4. push/pop (AVOID)
float sample;
in->pop(sample);
out->push(sample * gain);
⚠️ Orders of magnitude slower - Function call overhead per sample. Use only for control data
Quick Summary
- Try DBF first - It's usually the fastest and cleanest option
- Use readN/writeN as fallback - Simple and still very fast
- DBF requirements - Buffer must be heap-allocated and page-aligned (≥4KB)
- Peek/commit is rarely needed - Only for specialized buffer management
- Never use push/pop for DSP data - It's extremely slow
- Run benchmarks:
cd build/performance && ./perf_read_write_techniques
Real-World Applications
CLER shines in embedded systems where clean architecture meets resource constraints. Here are practical applications showing when flowgraphs excel and when manual approaches work better.
Microcontroller Applications
IoT Sensor Hub (STM32)
A environmental monitoring node collecting data from multiple sensors and forwarding to a gateway. RAM requirement: 64-128KB for comfortable operation.
// Manual callback hell approach - hard to maintain
struct SensorManager {
void (*temp_callback)(float);
void (*humidity_callback)(float);
void (*pressure_callback)(float);
void (*network_callback)(SensorPacket);
void process_sensors() {
float temp = read_temperature();
if (temp_callback) temp_callback(temp);
float humidity = read_humidity();
if (humidity_callback) humidity_callback(humidity);
// ... multiply by 6 sensors, nested callbacks, error handling
}
};
// CLER flowgraph approach - clean and testable
struct TemperatureSensor : public cler::BlockBase {
cler::Channel out;
cler::Result procedure() {
if (out.space() > 0) {
float temp = read_temperature_sensor();
out.push(temp);
}
return cler::Empty{};
}
};
struct SensorAggregator : public cler::BlockBase {
cler::Channel temp_in, humidity_in, pressure_in;
template
cler::Result procedure(OChannels*... network_out) {
if (temp_in.size() && humidity_in.size() && pressure_in.size()) {
SensorPacket packet;
temp_in.pop(packet.temperature);
humidity_in.pop(packet.humidity);
pressure_in.pop(packet.pressure);
packet.timestamp = get_system_time();
((network_out->push(packet)), ...);
}
return cler::Empty{};
}
};
// Clean, testable flowgraph
auto flowgraph = cler::FlowGraph(
cler::BlockRunner(&temp_sensor, &aggregator.temp_in),
cler::BlockRunner(&humidity_sensor, &aggregator.humidity_in),
cler::BlockRunner(&pressure_sensor, &aggregator.pressure_in),
cler::BlockRunner(&aggregator, &network_transmitter.in),
cler::BlockRunner(&network_transmitter)
);
Why Flowgraphs Win: Complex data flows with multiple sensors become manageable. Each sensor can be tested independently, and adding new sensors doesn't require rewiring callbacks.
Real-Time Audio Processing
struct AudioFilterBlock : public cler::BlockBase {
cler::Channel in; // 44.1kHz audio samples
BiquadFilter filter; // State preserved between calls
template
cler::Result procedure(OChannels*... outs) {
int16_t samples[64];
size_t count = in.readN(samples, 64);
// Process audio block with preserved filter state
for (size_t i = 0; i < count; ++i) {
samples[i] = filter.process(samples[i]);
}
((outs->writeN(samples, count)), ...);
return cler::Empty{};
}
};
// Real-time audio chain: ADC → Filter → Compressor → DAC
auto audio_flowgraph = cler::FlowGraph(
cler::BlockRunner(&adc_source, &filter.in),
cler::BlockRunner(&filter, &compressor.in),
cler::BlockRunner(&compressor, &dac_sink.in),
cler::BlockRunner(&dac_sink)
);
Real-Time Constraints: Audio processing needs deterministic timing. CLER's static allocation and predictable execution makes this possible on microcontrollers.
Motor Control System
struct EncoderBlock : public cler::BlockBase {
cler::Channel position_out;
volatile uint32_t* encoder_register;
cler::Result procedure() {
if (position_out.space() > 0) {
uint32_t raw_count = *encoder_register;
float position = (raw_count * 2.0f * M_PI) / 4096.0f; // Convert to radians
position_out.push(position);
}
return cler::Empty{};
}
};
struct PIDController : public cler::BlockBase {
cler::Channel setpoint_in, feedback_in;
std::atomic kp{1.0f}, ki{0.1f}, kd{0.01f};
float integral = 0.0f, previous_error = 0.0f;
template
cler::Result procedure(OChannels*... control_out) {
float setpoint, feedback;
if (setpoint_in.pop(setpoint) && feedback_in.pop(feedback)) {
float error = setpoint - feedback;
integral += error * DT;
float derivative = (error - previous_error) / DT;
float output = kp.load() * error + ki.load() * integral + kd.load() * derivative;
previous_error = error;
((control_out->push(output)), ...);
}
return cler::Empty{};
}
};
// Control loop: Setpoint → PID Controller → Motor Driver
// ↑ Encoder ← Motor ←
Control Systems: Flowgraphs naturally express feedback loops. Thread-safe atomic parameters allow runtime tuning without stopping the control loop.
FPGA Hybrid Designs
CLER excels in FPGA systems where hardware acceleration combines with software flexibility.
SDR with FPGA Preprocessing
// FPGA handles high-rate operations, ARM processes protocols
struct FPGAInterface : public cler::BlockBase {
cler::Channel, 1024> iq_out;
volatile uint32_t* fpga_fifo; // Memory-mapped FPGA interface
volatile uint32_t* fpga_control;
cler::Result procedure() {
// Check FPGA FIFO status
if ((*fpga_control & FIFO_NOT_EMPTY) && (iq_out.space() >= 256)) {
std::complex samples[256];
// Burst read from FPGA (preprocessed samples)
for (int i = 0; i < 256; ++i) {
uint32_t iq_data = *fpga_fifo;
float i_val = (int16_t)(iq_data >> 16) / 32768.0f;
float q_val = (int16_t)(iq_data & 0xFFFF) / 32768.0f;
samples[i] = std::complex(i_val, q_val);
}
iq_out.writeN(samples, 256);
}
return cler::Empty{};
}
};
struct ProtocolDecoder : public cler::BlockBase {
cler::Channel, 512> iq_in;
DemodulatorState demod_state; // Complex software processing
template
cler::Result procedure(OChannels*... packet_out) {
std::complex samples[128];
size_t count = iq_in.readN(samples, 128);
if (count > 0) {
// Complex demodulation that benefits from ARM cores
for (size_t i = 0; i < count; ++i) {
if (process_sample(&demod_state, samples[i])) {
// Packet detected and decoded
((packet_out->push(demod_state.decoded_packet)), ...);
}
}
}
return cler::Empty{};
}
};
// Hybrid architecture: FPGA → ARM Processing → Network
auto sdr_flowgraph = cler::FlowGraph(
cler::BlockRunner(&fpga_interface, &channelizer.in), // High-rate data from FPGA
cler::BlockRunner(&channelizer, &decoder1.in, &decoder2.in), // Split to protocols
cler::BlockRunner(&decoder1, &network1.in), // Process on ARM cores
cler::BlockRunner(&decoder2, &network2.in),
cler::BlockRunner(&network1),
cler::BlockRunner(&network2)
);
Hybrid Benefits: FPGA handles computationally intensive tasks (filtering, channelization) while ARM cores handle complex protocol processing. CLER coordinates the data flow between hardware and software domains.
Real-Time Image Processing
struct ImageAcquisition : public cler::BlockBase {
cler::Channel frame_out; // Small buffer for real-time
CameraInterface* camera;
cler::Result procedure() {
if (frame_out.space() > 0 && camera->frame_ready()) {
ImageFrame frame;
camera->capture_frame(&frame);
frame_out.push(std::move(frame));
}
return cler::Empty{};
}
};
struct FPGAEdgeDetection : public cler::BlockBase {
cler::Channel frame_in;
volatile uint32_t* fpga_processor; // Hardware accelerated
template
cler::Result procedure(OChannels*... processed_out) {
ImageFrame frame;
if (frame_in.pop(frame)) {
// DMA transfer to FPGA, hardware processes edges
setup_fpga_dma(frame.data, frame.width, frame.height);
start_fpga_processing();
// Wait for completion (or use interrupt-driven approach)
while (!fpga_processing_complete()) { /* yield or sleep */ }
// Processed frame ready
((processed_out->push(std::move(frame))), ...);
}
return cler::Empty{};
}
};
// Computer vision pipeline: Camera → FPGA Edge Detection → ARM Analysis → Display
auto vision_flowgraph = cler::FlowGraph(
cler::BlockRunner(&camera, &fpga_edge.frame_in),
cler::BlockRunner(&fpga_edge, &arm_analysis.in),
cler::BlockRunner(&arm_analysis, &display.in),
cler::BlockRunner(&display)
);
Real-Time Vision: Hardware accelerates compute-intensive operations while software handles higher-level analysis. Natural pipeline structure matches hardware capabilities.
When to Use Flowgraphs vs Manual Wiring
✅ Flowgraphs Excel When:
// Multi-sensor fusion with feedback - naturally expressed as flowgraph
auto sensor_fusion = cler::FlowGraph(
cler::BlockRunner(&gps, &kalman_filter.gps_in),
cler::BlockRunner(&imu, &kalman_filter.imu_in),
cler::BlockRunner(&mag, &kalman_filter.mag_in),
cler::BlockRunner(&kalman_filter, &navigation.in, &control_feedback.in),
cler::BlockRunner(&navigation, &display.in),
cler::BlockRunner(&control_feedback, &actuator.in),
cler::BlockRunner(&actuator)
);
// Easy testing - replace hardware blocks with test blocks
#ifdef UNIT_TEST
MockGPSBlock mock_gps("GPS", test_positions);
auto test_flowgraph = cler::FlowGraph(
cler::BlockRunner(&mock_gps, &kalman_filter.gps_in), // Mock instead of real GPS
// ... rest of flowgraph unchanged
);
#endif
❌ Manual Wiring Better When:
// Simple processing chain - manual is more efficient
while (running) {
sensor.read_raw(&raw_data); // Direct hardware access
filter.process(raw_data, &filtered); // Inline processing
transmit.send(filtered); // Direct transmission
delay_ms(10); // Simple timing
}
// Ultra-constrained system (8KB RAM)
void main_loop() {
uint16_t adc_value = read_adc();
uint16_t filtered = simple_filter(adc_value); // Stack-based, no allocation
if (filtered > threshold) {
trigger_output();
}
} // Total overhead: ~100 bytes
STM32 Practical Guidelines
Memory Requirements
- Minimum for flowgraphs: 64KB RAM (simple chains, 2-4 blocks)
- Comfortable operation: 128KB+ RAM (complex flowgraphs, 6+ blocks)
- Per-block overhead: ~2-4KB (threads, channels, state)
- Manual alternative: <32KB systems, use streamlined mode
- Optimization: Stack-allocated channels reduce heap pressure
• <3 blocks + simple chain → Manual wiring
• 3-6 blocks + moderate complexity → Consider both approaches
• 6+ blocks OR complex data flow → Flowgraphs excel
• Need testing/modularity → Flowgraphs
• <64KB RAM OR hard real-time → Manual wiring
Architecture Deep Dive
CLER uses C++17 templates to eliminate all runtime overhead while maintaining type safety and flexibility.
Compile-Time Magic
template<typename Block, typename... Channels>
struct BlockRunner {
Block* block;
std::tuple<Channels*...> outputs;
// Template deduction guide - compiler figures out types automatically
template<typename... InputChannels>
BlockRunner(Block* blk, InputChannels*... outs)
: block(blk), outputs(static_cast<Channels*>(outs)...) {}
};
Zero Overhead: Channel types are deduced at compile time. Minimal runtime type checking, very few virtual function calls, minimal dynamic allocations. The flowgraph code compiles close to what you'd write by hand — often with template optimizations.
AI-Friendly Core (<1000 LOC)
CLER's core is intentionally small and focused:
- <1000 lines: Perfect context window for AI assistance
- Clear abstractions: BlockBase, Channel, FlowGraph, Result types
- Minimal dependencies: Only standard library, easy to understand
- Template-heavy: AI tools excel at understanding and extending template code
This means when you get stuck with complex error messages (CLER's one weakness), any LLM can quickly understand your code and suggest fixes.
Lock-Free Channel Implementation
Based on Dmitry Drogalis's SPSC queue, optimized for DSP workloads:
- Cache-aligned: Prevents false sharing between producer/consumer
- Memory ordering: Carefully tuned acquire/release semantics for x86/ARM
- Configurable sizing: Stack allocation for embedded, heap for desktop
- Multiple access patterns: From simple push/pop to zero-copy peek/commit
Embedded-First Design
CLER's design choices address real embedded constraints:
- Adaptive performance: Uses advanced techniques like doubly-mapped buffers when available, falls back gracefully on constrained systems
- Predictable timing: Manual memory management, no unexpected allocations
- Small memory footprint: Perfect for microcontrollers
- Type safety: Catch connection errors at compile time, not runtime