MADL!AR
Code is cheap, show me the PPT!
首页
分类
Fragment
关于
MicroSpeech部署到MCU
分类:
硬件
发布于: 2025-04-16
## MicroSpeech TFLite MicroSpeech 是 TensorFlow Lite(TFLite)的一个示例项目,专为在资源受限的微控制器(MCU)或嵌入式设备上实现关键词识别(Keyword Spotting, KWS)而设计。它能够实时检测简单的语音命令(如“yes”“no”“on”“off”等),适用于智能家居、可穿戴设备等低功耗场景。 这里记录一下从代码生成到边缘端部署的过程,并简单进行性能评估。 ### 代码生成 首先需要克隆官方的[tflite-micro代码仓库](https://github.com/tensorflow/tflite-micro),代码生成脚本的路径在```tensorflow/lite/micro/tools/project_generation/create_tflm_tree.py```。执行脚本即可查看帮助信息。 TFLite支持使用arm推出的CMSIS-NN库进行推理加速,相比通用指令计算,性能通常可提升5~10倍。但CMSIS-NN需要移植,这里只是验证可行性,为了减少部署难度直接生成通用代码: ``` python tensorflow/lite/micro/tools/project_generation/create_tflm_tree.py --example="micro_speech" ../gen_micro_speech ``` 这样就会在 tflite-micro 同级目录下生成项目代码,以```gen_micro_speech```命名。这里的通用代码可以在x86平台编译运行,在gen_micro_speech目录下添加CMakeList.txt,内容如下: ``` cmake_minimum_required(VERSION 3.28) project(gen_micro_speech) set(CMAKE_CXX_STANDARD 17) add_compile_options(-fno-rtti -fno-exceptions) file(GLOB_RECURSE SRC ${CMAKE_CURRENT_SOURCE_DIR}/*.cc) add_executable(gen_micro_speech${SRC}) target_include_directories(gen_micro_speech PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/ ${CMAKE_CURRENT_SOURCE_DIR}/examples ${CMAKE_CURRENT_SOURCE_DIR}/examples/micro_speech ${CMAKE_CURRENT_SOURCE_DIR}/third_party/flatbuffers/include ${CMAKE_CURRENT_SOURCE_DIR}/third_party/gemmlowp ${CMAKE_CURRENT_SOURCE_DIR}/third_party/kissfft ${CMAKE_CURRENT_SOURCE_DIR}/third_party/ruy ) target_compile_definitions(gen_micro_speech PUBLIC TF_LITE_STATIC_MEMORY ) ``` 其中除了包含tflite kernal、第三方库的路径外,还添加了一些额外的参数,其中包括: * ```add_compile_options(-fno-rtti -fno-exceptions)``` 禁用运行时类型信息和C++异常 * 定义了```TF_LITE_STATIC_MEMORY``` 添加这些参数一方面原因是为了减少代码体积、提高性能,另一方面是适配嵌入式系统的限制。尤其是第二条宏定义,如不定义,则代码会寻找```tensorflow/lite/array.h```,该文件应该是一个动态分配空间用于model的arena的库,但tensorflow并为提供该代码,因此需要额外移植。为了简便,直接定义静态内存即可。 编译运行: ``` mkdir build && cd build && cmake .. && make -j20 ``` 截止25年4月会报错: ``` /usr/bin/ld: CMakeFiles/gen_micro_speech.dir/tensorflow/lite/micro/hexdump_test.cc.o:(.bss+0x0): multiple definition of `micro_test::tests_passed'; CMakeFiles/gen_micro_speech.dir/examples/micro_speech/micro_speech_test.cc.o:(.bss+0x0): first defined here ... ``` 这是由于在```tensorflow/lite/micro/hexdump_test.cc```中,也写了一些测试操作,使用了```TF_LITE_MICRO_TESTS_BEGIN```等宏定义,而该宏定义展开则是定义了main函数来执行测试的逻辑,因此产生冲突。删除```tensorflow/lite/micro/hexdump_test.cc```该文件,再次执行即可通过,在build目录生成可执行文件```gen_micro_speech ```。运行该文件,输出: ``` Testing NoFeatureTest AudioPreprocessor model arena size = 9944 Testing YesFeatureTest AudioPreprocessor model arena size = 9944 Testing NoTest AudioPreprocessor model arena size = 9944 MicroSpeech model arena size = 7304 MicroSpeech category predictions for
0.0000 silence 0.0547 unknown 0.0000 yes 0.9453 no Testing YesTest ... ... 6/6 tests passed ~~~ALL TESTS PASSED~~~ ``` ### 部署到嵌入式设备 #### 1. 工程创建 这里选择stm32h723VGT6 (1MB flash),打开CubeIDE创建工程H723_MicroSpeech。需要注意的: * 由于TFLite是C++项目,项目类型也需要选择为C++ * 选择 "copy onty the nessary lib files",此项会在Drivers目录下生成CMSIS核心文件(非CMSIS-NN,二者不是同一概念) * 配置最高时钟和串口,生成代码 * release模式可以便捷禁用调试、开启高优化等级以极限压榨性能,因此要**选择release模式** 在项目根目录下新建```External```文件夹,用于存放TFLite文件和用户代码。这里添加一个```inner_main.h```和```inner_main.c```文件,位于```External/custom```下,并定义了```myMain()```执行自己的逻辑。调用链为: ```main.c -> inner_main.c -> tensorflow codes``` 此时需要将External文件夹添加进项目配置,右键项目-> properties -> C/C++ general -> Paths and Symbols,在 “includes”选项卡里,为GNU C和GNU C++都添加上 ```External``` path;并在“source location”选项卡中,点选```External```文件夹加入,否则在编译时找不到源文件,即找不到函数实现。 #### 2. 移植TFLite文件 这里添加两个```makefile_options```的参数,用来生成cortex-m架构的代码,执行 ``` python3 tensorflow/lite/micro/tools/project_generation/create_tflm_tree.py --makefile_options="TARGET=cortex_m_generic TARGET_ARCH=cortex-m7" --example="micro_speech" ../gen_micro_speech_cortex ``` 将生成的```gen_micro_speech_cortex```文件夹放在```External```文件夹下面,**删除```tensorflow/lite/micro/hexdump_test.cc```文件**并同样在 Paths and Symbols 下的 “includes”选项卡里,为GNU C和GNU C++添加以下目录: * External/gen_micro_speech_cortex * External/gen_micro_speech_cortex/third_party * External/gen_micro_speech_cortex/third_party/kissfft * External/gen_micro_speech_cortex/third_party/ruy * External/gen_micro_speech_cortex/third_party/gemmlowp * External/gen_micro_speech_cortex/third_party/flatbuffers/include * External/gen_micro_speech_cortex/examples/micro_speech #### 3. 解决DCB报错 添加完成后编译会报错: ``` ../External/gen_micro_speech_cortex/tensorflow/lite/micro/cortex_m_generic/micro_time.cc:55:5: error: 'DCB' was not declared in this scope 55 | DCB->DEMCR |= DCB_DEMCR_TRCENA_Msk; | ^~~ ``` 编译器找不到 ARM Cortex-M 的 DCB (Debug Control Block) 寄存器定义。右键项目-> properties -> C/C++ general -> Paths and Symbols,在“Symbols”选项卡中,添加定义```CMSIS_DEVICE_ARM_CORTEX_M_XX_HEADER_FILE```,值为```"stm32h7xx.h"```,此宏定义会渲染成源文件中```#include ...```语句,所以不可去掉双引号。同时,TFLite使用的```DCB->DEMCR```是旧版的DCB(Debug Control Block)定义写法,在ARM Cortex-M系列中,调试寄存器通常通过DCB或CoreDebug结构体访问,直接编译依旧不通过。 解决办法是将```tensorflow/lite/micro/cortex_m_generic/micro_time.cc```文件中```DCB->DEMCR |= DCB_DEMCR_TRCENA_Msk;```替换为```CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;```。再次编译,此报错消失。 #### 4. std::byte未识别报错 编译时可能会报另一个错误: ``` ../External/gen_micro_speech_cortex/tensorflow/lite/micro/hexdump.h:26:25: error: ISO C++ forbids declaration of 'type name' with no type [-fpermissive] 26 | void hexdump(Span
region); | ^~~ ``` 这是由于工程未启用C++17标准,解决办法:右键项目-> properties -> C/C++ build -> Settings -> MCU/MPU G++ Compiler -> Miscellaneous,点击Add 按钮添加 flag "-std=c++17"。 #### 5. 解决“array.h 未寻找到报错” ``` ../External/gen_micro_speech_cortex/tensorflow/lite/kernels/kernel_util.cc:28:10: fatal error: tensorflow/lite/array.h: No such file or directory 28 | #include "tensorflow/lite/array.h" | ^~~~~~~~~~~~~~~~~~~~~~~~~ ``` 在右键项目-> properties -> C/C++ general -> Paths and Symbols,在“Symbols”选项卡中,添加定义```TF_LITE_STATIC_MEMORY``` #### 6. 解决main重复定义的错误 解决上述所有问题之后,编译会再次报错: ``` micro_speech_test.cc:(.text.startup.main+0x0): multiple definition of `main'; ./Core/Src/main.o:main.c:(.text.startup.main+0x0): first defined here collect2.exe: error: ld returned 1 exit status make: *** [makefile:110: H723_MicroSpeech.elf] Error 1 ``` 这是因为实例代码会使用 ```TF_LITE_MICRO_TESTS_BEGIN```宏,该作为测试框架的定义,展开后定义了一个main函数,与项目中已经定义的main函数产生冲突。 重写此逻辑,新建 ```micro_speech_test.h```,做一个测试接口,部分文件内容: ``` #ifndef __MICRO_SPEECH_TEST_H #define __MICRO_SPEECH_TEST_H #ifdef __cplusplus extern "C" { #endif void Invoke(); #ifdef __cplusplus } #endif #endif ``` 在 ```micro_speech_test.cc```中,实现Invoke: ``` ... #include "..." #include "custom/debug.h" namespace { constexpr size_t kArenaSize = 1024*50; // 28584; // xtensa p6 alignas(16) uint8_t g_arena[kArenaSize]; ... using MicroSpeechOpResolver = tflite::MicroMutableOpResolver<4>; using AudioPreprocessorOpResolver = tflite::MicroMutableOpResolver<18>; TfLiteStatus RegisterOps(MicroSpeechOpResolver& op_resolver) { ... } TfLiteStatus RegisterOps(AudioPreprocessorOpResolver& op_resolver) { ... } } // endof namespace extern "C" { TfLiteStatus GenerateSingleFeature(const int16_t* audio_data, const int audio_data_size, int8_t* feature_output, tflite::MicroInterpreter* interpreter) { ... } TfLiteStatus GenerateFeatures(const int16_t* audio_data, const size_t audio_data_size, Features* features_output) { ... } TfLiteStatus LoadMicroSpeechModelAndPerformInference(const Features& features, const char* expected_label) { // Map the model into a usable data structure. This doesn't involve any // copying or parsing, it's a very lightweight operation. const tflite::Model* model = tflite::GetModel(g_micro_speech_quantized_model_data); if(model->version() != TFLITE_SCHEMA_VERSION) { print("schema version error"); return kTfLiteError; } ... ... std::copy_n(&features[0][0], kFeatureElementCount, tflite::GetTensorData
(input)); uint32_t tick = HAL_GetTick(); if (interpreter.Invoke() != kTfLiteOk){ return kTfLiteError; } tick = HAL_GetTick() - tick; print("micro_speech audio performance cost: %d ms.", tick); ... ... return kTfLiteOk; } void Invoke() { ... ... TfLiteStatus ts = GenerateFeatures(audio_data, audio_data_size, &g_features); if (ts != kTfLiteOk) { print("gen fature error when test audio"); } LoadMicroSpeechModelAndPerformInference(g_features, label); ... ... } ``` 下载到设备,会输出: ``` ... ... AudioPreprocessor model arena size = 8448 MicroSpeech model arena size = 6788 start invoke!!! micro_speech audio performance cost: 303 ms. ``` 这里不包含预处理音频提取MFCC特征的耗时。实例中一个时间片为30ms,步进为20ms,相邻重叠区为10ms。要达到实时处理的目标,单次推理耗时必须控制在20毫秒以内,考虑到CPU还需处理其他任务,推理耗时在10ms以内更佳。 #### 7. 简单优化 针对这个耗时,优化的手段有很多,比如: * 启用-O3优化 * 将tensor_arena 缓存放置于紧耦合SRAM区,如DTCM 或 AXI-SRAM以降低延迟 * 启用L1 cache * 使用CMSIS-NN和DSP库加速 这里仅验证一下开启L1 cache之后的速度,同时打开I-Cache和D-Cache: ``` micro_speech audio performance cost: 51ms. ``` 竟然提升6倍多!后续将继续移植CMSIS-NN,启用硬件加速并尝试将推理耗时压缩到10ms级别。