Description
Describe the bug
SPI transfer16() produces two 8 byte transfers rather than one 16 bit transfer.
To Reproduce
Here is a code that is used for testing SPI transfers. In this instance, the CS pin tells an ADC to convert its input to digital, then 730 nanoseconds later the data is ready for a 16 bit SPI transfer.
With an SPI clock of 50kHz, this should be able to run at close to 1MSPS.
for (int n = 0; n < NKNTS; n++ ) {
// For the ADC
DIGITALWRITE(CSPin,HIGH);
DELAYNANOSECONDS(ADC_WAIT_NANOSECONDS);
DIGITALWRITE(CSPin,LOW);
// For the oscilloscope
DIGITALWRITE(SPAREPIN,HIGH);
// 16 bit transfer
data[n] = SPI.transfer16(0xFFFF);
// For the oscilloscope
DIGITALWRITE(SPAREPIN,LOW);
}
`
And here is the oscilloscope trace. Notice that rather than one 16 bit transfer, we have two 8 bit transfers. And note that they are separated by 400 nsecs, but there is 1200 nsecs before and after.
Here is the implementation, for transfer16() it calls spi_transfer( .... length )
Arduino_Core_STM32/libraries/SPI/src/SPI.cpp
Lines 179 to 195 in 89a2951
And, in utility/spi.c, we find that spi_transfer() simply loops over LL_SPI_TransmitData8( ).
Arduino_Core_STM32/libraries/SPI/src/utility/spi_com.c
Lines 482 to 550 in 89a2951
You already have a LL_SPI_TransmitData16( ), under hardware/stm32/2.8.1/system/Drivers/
What is missing in this implementation is a spi_transfer16() and to change transfer16() to call it.
Thank you
Activity
[-]spi transfer16 is not 16 bit, and excessive overhead[/-][+]spi transfer16 is not 16 bit[/+]fpistm commentedon Sep 13, 2024
Hi @drmcnelson
This is usual implementation in several Arduino core:
https://github.com/arduino/ArduinoCore-renesas/blob/fedf7896ed6c2e66d42ed5642198f5617a2daa55/libraries/SPI/SPI.cpp#L206-L219
https://github.com/arduino/ArduinoCore-sam/blob/790ff2c852bf159787a9966bddee4d9f55352d15/libraries/SPI/src/SPI.cpp#L199-L214
https://github.com/arduino/ArduinoCore-megaavr/blob/5e639ee40afa693354d3d056ba7fb795a8948c11/libraries/SPI/src/SPI.cpp#L238-L249
https://github.com/arduino/ArduinoCore-samd/blob/993398cb7a23a4e0f821a73501ae98053773165b/libraries/SPI/SPI.cpp#L242-L253
It could probably be enhanced.
Feel free to provide a PR.
drmcnelson commentedon Sep 13, 2024
Yes, notice that they are derived from the same earlier version, The earliest version that I found, ran on the uno, an 8bit 16MHZ legacy platform marketed to hobbyists.
Adapting it in form to the much faster 16 and 32 bit platforms, leaving the internal kluges and inefficiencies in place was amateur and hurried, and the result is simply not correct in many places. (I am referring to the history of this code, not a specific person).
Two errors that require urgent attention are: (a) 16 bit transfers implemented as 8 bit transfers, and (b) failure to also provide a loop friendly interface (i.e., separate calls for setup and transfer, rather than forcing setup and transfer into every call)
The erroneous implementation of 16 bit transfers as two 8 bit transfers, alone, reduces performance by something like a factor of 2 or more. Forcing setups into very transfer and not providing a loop friendly API, reduces performance by a solid factor of 8.
The result is that the API is therefore incompatible with a large class of important peripherals; c.f., any device that requires a pin assertion between transfers and better than 250KSPS transfers. An ADC reading a linear CCD would be a good example.
Sometimes there has been push back on fixing this, it is usually about the transaction interface, and it usually comes from that same hobbyist crowd. The logic does not hold up. Reserving the device, even pushing the settings onto a stack, if you think you must despite it being an on-the-metal platform, should not be effected by what you do with it once it is reserved.
Concluding this introduction, lets note that limiting a 200MHz MCU with a 50MHZ SPI clock to 250ksps transfers, only because the code was inherited from a slow 8 bit legacy platform that catered to hobbyists, does not make a lot of sense and it is not good for the platform.
My solution for that for the UNO R4, is here . There is also a solution for the Teensy 4.x, the discussion is here
In both of those, the patch (a) implements a true 16 bit transfer, and (b) provides two new calls, one for the setup and another for the actual transfer.
A loop in a user program can look like this:
Note that with a 50MHZ SPI clock, this could run at close to 1MSPS.
The STM32 could be a serious platform. It deserves a little more attention to the API for the SPI, one of its most important peripheral interfaces.
(BTW The original title referred also to overheads. Note that is indeed part of the problem. But feel free to choose a title that makes sense.)
drmcnelson commentedon Sep 13, 2024
P/S, i forgot, in the loop-friendly API, you may want a "restore" as well. Looking at the transfer times. I see there is only 400 nsecs between the two 8 bit transfers, but 1200nsecs before and 1100nsecs after. So that means a correct loop-friendly call, besides the 16bits/50MHZ, only needs an additional 400nsecs.
fpistm commentedon Sep 13, 2024
Main issue with your setup/transfer api's is it is not Arduino api compatible.
drmcnelson commentedon Sep 13, 2024
What makes it incompatible?
a) Those calls that I suggest, are what is missing from the Arduino API to make it actually useful.
b) You already added new calls and functionality to the API, setSSEL() etc.
drmcnelson commentedon Sep 13, 2024
Looking at the code again, if you could save me a little time and explain what the conditional compiles are about in this, I might try a patch and you can see if you like it enough for a PR.
Arduino_Core_STM32/libraries/SPI/src/utility/spi_com.c
Lines 482 to 550 in 89a2951
fpistm commentedon Sep 16, 2024
Well several examples already exists, if new one are created using those new API they will make it incompatible with other core.
the
setxxx
are dedicated but only for setup purpose and it is easy to use it and having a sketch compatible with other core:While new API simply change the way you implement the SPI transaction.
About the "conditional compiles", it is simply to have the code working for each STM32 series which can have different SPI IP and so required different configuration or API.
drmcnelson commentedon Sep 16, 2024
I am sorry, that is not correct at alll.
Incompatibility means that something you add or change, causes something in the existing API to no longer meet it's specification.
Think of it in terms of code that uses the library: As long as existing code that uses the library continues to work to spec, then an added function cannot be said to be incompatible.
You have undoubtedly seen the term "backward compatibility".
In this case, transfer16() was one such example, I remember when it was added. Teensy did it correctly, Arduino did not. But both are lacking a way to run an efficient loop.
As long as you insist on this very incorrect position, your SPI implementation will be limited to taking 5 usecs for transfers that should take 320 nsecs.
Aside, your SPI.setxxx() would of course be backward compatible. And I am very interestred in setSSEL(), but unfortunately when I add that call to my setup(), it crashes.
P/S Adding SPI.transfe16_setup() and SPI.transfer16_transfer(), will of course help sell the board and your API.
The 50MHz SPI is a big deal, it is a terrible shame that is squandered by stopping short when you already did so much work and you are already so close.
fpistm commentedon Sep 16, 2024
My concern is about the example (sketch) which "should" as much as possible be compatible between core. This is the way Arduino works. That's all 😉
I don't insist 😉 we discuss. I never told it will not be implemented. This is a community project, any contribution are welcome.
See my answer on the forum -> https://www.stm32duino.com/viewtopic.php?p=14825#p14825
As usual wrong usage.
Why? First goal was to provide support for STM32 series in Arduino with Arduino API references. I know it provides limitations and it could be improved but as always time is the missing ingredient. Several other features required to be implemented or enhanced., do my best. As stated any contribution are welcome.
drmcnelson commentedon Sep 16, 2024
Correcting transfer16() to do 16bit transfers, and adding loop friendly calls should not change the way the existing example works.
Yes I apologize for sounding argumentative. There is a lot of resistance and appeal to dogma in the community. The transaction api is a good example.
Thank you, I am in the lab at the moment and dont have that log-in. I see that it was the wrong pin., no doubt a result of failing to digest the datasheet.
Are D13, D12, D11 then not the pins that it is using for the SPI? My intent was to use the next pin, as is customary,
Also I wasnt sure which SPI it is using, it seems like SP1,
Because that it is what enables 1MSPS reads and that ties into a large swath of applications that are important in using arduino class boards for doing science.
My philosophy is that while I might not implement everything, I try to at least, not introduce any limitation not present in the hardware. So there is a difference between fully supporting a rich set of features provided by the hardware, and for that I agree, who has time. Unnecessarily choking off performance is something else, and that is what the historic SPI library does. Fortunately, it can be fixed with a patch or two plus a few extra calls that do not disturb the API already in place.
drmcnelson commentedon Sep 16, 2024
Are you using CLA for PR's?
If not, then I might take a look at patching the library, perhaps even today.
It looks like you did a good job with factoring, so it seems like it should be easy.
fpistm commentedon Sep 16, 2024
D11 to D13 are usually the default SPI pins (for boards with Arduino UNO connector) and D10 used a CS. But in a general way CS is managed by the application (software). When you used
setSSEL()
, you ask to use the Hardware CS and so no need to drive the pin at application level.For the Nucleo F722, unfortunately the D10 pin does not have this capability.
To go deeper, when a board have a Arduino connector, default SPI pins are D10 to D13:
Arduino_Core_STM32/cores/arduino/pins_arduino.h
Lines 43 to 65 in 89a2951
In our case for Nucleo F722:
Arduino_Core_STM32/variants/STM32F7xx/F722Z(C-E)T_F732ZET/variant_NUCLEO_F722ZE.h
Lines 29 to 31 in 89a2951
Thanks to the
PeripheralPins.c
:Arduino_Core_STM32/variants/STM32F7xx/F722Z(C-E)T_F732ZET/PeripheralPins.c
Line 288 in 89a2951
Arduino_Core_STM32/variants/STM32F7xx/F722Z(C-E)T_F732ZET/PeripheralPins.c
Line 307 in 89a2951
Arduino_Core_STM32/variants/STM32F7xx/F722Z(C-E)T_F732ZET/PeripheralPins.c
Line 322 in 89a2951
So to answer your question SPI uses
SPI1
.Finally, you can check in the below table which SPI SSEL to use:
Arduino_Core_STM32/variants/STM32F7xx/F722Z(C-E)T_F732ZET/PeripheralPins.c
Lines 338 to 349 in 89a2951
No. Only follow the CONTRIBUTING.md
drmcnelson commentedon Sep 16, 2024
Thank you, quite elaborate and seems like several layers of naming. So it seems like SPI1 uses PA_5,PA_6,PA_7 and PA_4 for CS.
What is confusing then, is this from the user manual (UM1974), page 58, where it says the arduino SPI uses PA5, PA6, PA7, as D13,D12,D11, and for D10 it is not PA4, but rather PD14
So, bottom line, what goes in the call to setSSEL(), and what pin does it appear on?
Thank you
drmcnelson commentedon Sep 16, 2024
I found it, page 32 in the user manual. If one uses an Arduino shield adapter, the pin is covered. So that means it needs a custom adapter board. Not a big deal, the first thing though is to see the timing and whether it works with the ADC.
7 remaining items