1 Commits

Author SHA1 Message Date
Augustin
b61c8e31a8 Fix: Replace tokenizers crate with custom SimpleTokenizer
Resolve Windows linker C runtime mismatch by implementing a custom
tokenizer that doesn't depend on esaxx-rs (which uses static runtime).

Changes:
- Remove tokenizers crate dependency (caused MT/MD conflict)
- Add custom SimpleTokenizer in src/ai/tokenizer.rs
  - Loads vocab.txt files directly
  - Implements WordPiece-style subword tokenization
  - Pure Rust, no C++ dependencies
  - Handles [CLS], [SEP], [PAD], [UNK] special tokens

- Update OnnxClassifier to use SimpleTokenizer
- Update ModelConfig to use vocab.txt instead of tokenizer.json
- Rename distilbert_tokenizer() to distilbert_vocab()

Build status:
 Compiles successfully
 Links without C runtime conflicts
 Executable works correctly
 All previous functionality preserved

This resolves the LNK2038 error completely while maintaining full
ONNX inference capability with NPU acceleration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-16 19:38:44 +02:00