Skip to content

generate coreml model ValueError: basic_string #3012

Closed
@MushR00m

Description

@MushR00m
$ ./generate-coreml-model.sh large-v3-turbo
Torch version 2.6.0 has not been tested with coremltools. You may run into unexpected errors. Torch 2.5.0 is the most recent version that has been tested.
ModelDimensions(n_mels=128, n_audio_ctx=1500, n_audio_state=1280, n_audio_head=20, n_audio_layer=32, n_vocab=51866, n_text_ctx=448, n_text_state=1280, n_text_head=20, n_text_layer=4)
/whisper.cpp/models/convert-whisper-to-coreml.py:146: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1:] == self.positional_embedding.shape[::-1], "incorrect audio shape"
/whisper.cpp/models/.venv/lib/python3.11/site-packages/ane_transformers/reference/layer_norm.py:60: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert inputs.size(1) == self.num_channels
/whisper.cpp/models/convert-whisper-to-coreml.py:88: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  scale = float(dim_per_head)**-0.5
Converting PyTorch Frontend ==> MIL Ops: 100%|████████▉| 6051/6052 [00:00<00:00, 8919.64 ops/s]
Running MIL frontend_pytorch pipeline: 100%|████████████████| 5/5 [00:00<00:00, 21.06 passes/s]
Running MIL default pipeline: 100%|███████████████████████| 87/87 [00:05<00:00, 16.46 passes/s]
Running MIL backend_neuralnetwork pipeline: 100%|███████████| 9/9 [00:00<00:00, 29.21 passes/s]
Translating MIL ==> NeuralNetwork Ops: 100%|█████████████| 5641/5641 [04:14<00:00, 22.18 ops/s]
Traceback (most recent call last):
  File "/whisper.cpp/models/convert-whisper-to-coreml.py", line 320, in <module>
    encoder = convert_encoder(hparams, encoder, quantize=args.quantize)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/whisper.cpp/models/convert-whisper-to-coreml.py", line 255, in convert_encoder
    model = ct.convert(
            ^^^^^^^^^^^
  File "/whisper.cpp/models/.venv/lib/python3.11/site-packages/coremltools/converters/_converters_entry.py", line 635, in convert
    mlmodel = mil_convert(
              ^^^^^^^^^^^^
  File "/whisper.cpp/models/.venv/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 186, in mil_convert
    return _mil_convert(
           ^^^^^^^^^^^^^
  File "/whisper.cpp/models/.venv/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 245, in _mil_convert
    return modelClass(
           ^^^^^^^^^^^
  File "/whisper.cpp/models/.venv/lib/python3.11/site-packages/coremltools/models/model.py", line 489, in __init__
    self.__proxy__, self._spec, self._framework_error = self._get_proxy_and_spec(
                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/whisper.cpp/models/.venv/lib/python3.11/site-packages/coremltools/models/model.py", line 550, in _get_proxy_and_spec
    _MLModelProxy(
ValueError: basic_string

./generate-coreml-model.sh base.en works fine

checkout v1.7.4 works fine

Activity

cssyncing

cssyncing commented on Apr 18, 2025

@cssyncing

Same error here. Did you figure out how to cater?

MushR00m

MushR00m commented on Apr 18, 2025

@MushR00m
Author

Same error here. Did you figure out how to cater?

checkout v1.7.4

self-assigned this
on Apr 18, 2025
danbev

danbev commented on Apr 19, 2025

@danbev
Collaborator

@MushR00m @cssyncing Would one of you be able to see if #3060 works for you?

MushR00m

MushR00m commented on Apr 20, 2025

@MushR00m
Author

@MushR00m @cssyncing Would one of you be able to see if #3060 works for you?

Thanks for the fix. it looks like working fine.
here is the output log, However, my macos system produces a crash report after the script is allowed to finish, I didn't pay much attention to what it is, it's like a coremlcompiler toolchain error report under xcode.

$ ./generate-coreml-model.sh large-v3-turbo
Torch version 2.6.0 has not been tested with coremltools. You may run into unexpected errors. Torch 2.5.0 is the most recent version that has been tested.
100%|█████████████████████████████████████| 1.51G/1.51G [01:11<00:00, 22.6MiB/s]
ModelDimensions(n_mels=128, n_audio_ctx=1500, n_audio_state=1280, n_audio_head=20, n_audio_layer=32, n_vocab=51866, n_text_ctx=448, n_text_state=1280, n_text_head=20, n_text_layer=4)
/private/tmp/whisper.cpp/models/convert-whisper-to-coreml.py:146: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1:] == self.positional_embedding.shape[::-1], "incorrect audio shape"
/*/whisper.cpp/models/.venv/lib/python3.11/site-packages/ane_transformers/reference/layer_norm.py:60: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert inputs.size(1) == self.num_channels
/private/tmp/whisper.cpp/models/convert-whisper-to-coreml.py:88: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  scale = float(dim_per_head)**-0.5
Converting PyTorch Frontend ==> MIL Ops: 100%|██████████████████████████████████████████████▉| 6051/6052 [00:00<00:00, 8903.45 ops/s]
Running MIL frontend_pytorch pipeline: 100%|██████████████████████████████████████████████████████| 5/5 [00:00<00:00, 23.27 passes/s]
Running MIL default pipeline: 100%|█████████████████████████████████████████████████████████████| 87/87 [00:06<00:00, 13.50 passes/s]
Running MIL backend_neuralnetwork pipeline: 100%|█████████████████████████████████████████████████| 9/9 [00:00<00:00, 26.16 passes/s]
Translating MIL ==> NeuralNetwork Ops: 100%|███████████████████████████████████████████████████| 5641/5641 [03:55<00:00, 23.97 ops/s]
done converting
libc++abi: terminating due to uncaught exception of type std::length_error: basic_string
./generate-coreml-model.sh: line 31: 26698 Abort trap: 6           xcrun coremlc compile models/coreml-encoder-"${mname}".mlpackage models/

here are some of the coremlcompiler crash logs.
of course it could be a problem with my xcode environment, nothing to do with whisper, because I also get two crash reports when compiling whisper. Posting this simply to get help to see if it's a problem with my environment😄

Process:               coremlcompiler [26698]
Path:                  /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/coremlcompiler
Identifier:            coremlcompiler
Version:               ???
Code Type:             ARM-64 (Native)
Parent Process:        Exited process [26430]
Responsible:           iTerm2 [705]
User ID:               501

Date/Time:             2025-04-20 19:13:14.9319 +0800
OS Version:            macOS 15.4 (24E248)
Report Version:        12
Anonymous UUID:        x

Sleep/Wake UUID:      x

Time Awake Since Boot: 290000 seconds
Time Since Wake:       13966 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000

Termination Reason:    Namespace SIGNAL, Code 6 Abort trap: 6
Terminating Process:   coremlcompiler [26698]

Application Specific Information:
abort() called


Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	       0x18b71c388 __pthread_kill + 8
1   libsystem_pthread.dylib       	       0x18b75588c pthread_kill + 296
2   libsystem_c.dylib             	       0x18b65ec60 abort + 124
3   libc++abi.dylib               	       0x18b70b39c abort_message + 132
4   libc++abi.dylib               	       0x18b6f9cf0 demangling_terminate_handler() + 316
5   libobjc.A.dylib               	       0x18b380d84 _objc_terminate() + 172
6   libc++abi.dylib               	       0x18b70a6b0 std::__terminate(void (*)()) + 16
7   libc++abi.dylib               	       0x18b70dc48 __cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 88
8   libc++abi.dylib               	       0x18b70dbf0 __cxa_throw + 92
9   libc++.1.dylib                	       0x18b6772cc std::__1::__throw_length_error[abi:ne190102](char const*) + 72
10  libc++.1.dylib                	       0x18b677284 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__throw_length_error[abi:ne190102]() const + 24
11  libc++.1.dylib                	       0x18b67d6c4 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__grow_by_and_replace(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, char const*) + 324
12  libc++.1.dylib                	       0x18b67e214 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::append(char const*, unsigned long) + 108
13  coremlc                       	       0x104cf7e4c 0x104518000 + 8257100
14  coremlc                       	       0x104befe08 0x104518000 + 7175688
15  coremlc                       	       0x104beba30 0x104518000 + 7158320
16  coremlc                       	       0x104cfb958 0x104518000 + 8272216
17  coremlc                       	       0x104bbf624 0x104518000 + 6977060
18  coremlc                       	       0x104bd21b8 0x104518000 + 7053752
19  coremlc                       	       0x104bd20ac 0x104518000 + 7053484
20  coremlc                       	       0x104bd3880 0x104518000 + 7059584
21  coremlc                       	       0x104bd42f4 0x104518000 + 7062260
22  coremlc                       	       0x104bd07a8 0x104518000 + 7047080
23  coremlc                       	       0x104569094 0x104518000 + 331924
24  coremlc                       	       0x104639770 0x104518000 + 1185648
25  coremlc                       	       0x1046375d8 0x104518000 + 1177048
26  dyld                          	       0x18b3b6b4c start + 6000

compile crash log

Process:               cmTC_6df9f [25480]
Path:                  /private/tmp/*/cmTC_6df9f
Identifier:            cmTC_6df9f
Version:               ???
Code Type:             ARM-64 (Native)
Parent Process:        cmake [24632]
Responsible:           iTerm2 [705]
User ID:               501

Date/Time:             2025-04-20 18:55:03.9125 +0800
OS Version:            macOS 15.4 (24E248)
Report Version:        12

Time Awake Since Boot: 290000 seconds
Time Since Wake:       12875 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes:       0x0000000000000001, 0x00000000d503437f

Termination Reason:    Namespace SIGNAL, Code 4 Illegal instruction: 4
Terminating Process:   exc handler [25480]

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   cmTC_6df9f                    	       0x100bfff7c main + 20
1   dyld                          	       0x18b3b6b4c start + 6000
Process:               cmTC_d7c7b [25346]
Path:                  /private/tmp/*/cmTC_d7c7b
Identifier:            cmTC_d7c7b
Version:               ???
Code Type:             ARM-64 (Native)
Parent Process:        cmake [24632]
Responsible:           iTerm2 [705]
User ID:               501

Date/Time:             2025-04-20 18:55:03.2644 +0800
OS Version:            macOS 15.4 (24E248)
Report Version:        12

Time Awake Since Boot: 290000 seconds
Time Since Wake:       12874 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes:       0x0000000000000001, 0x00000000043f57bf

Termination Reason:    Namespace SIGNAL, Code 4 Illegal instruction: 4
Terminating Process:   exc handler [25346]

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   cmTC_d7c7b                    	       0x1006d7f14 main + 4
1   dyld                          	       0x18b3b6b4c start + 6000
danbev

danbev commented on Apr 20, 2025

@danbev
Collaborator

Sorry about this, I'm getting this error as well. I had been running the python script directly when looking into this and obviously never ran the conversion bash script and therefor did not notice this. I'll take another look and see what the issues and if the current fix is just hiding this error.

Update: I've updated the linked PR with a commit which tries to address this.

added a commit that references this issue on Apr 23, 2025
danbev

danbev commented on Apr 24, 2025

@danbev
Collaborator

Closing this as #3060 has been merged. Please let us know if this did not fix this issue for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @danbev@MushR00m@cssyncing

      Issue actions

        generate coreml model ValueError: basic_string · Issue #3012 · ggml-org/whisper.cpp