You are not logged in.
Pages: 1
I am doing some embedded systems development work and I have an FTDI USB to Serial bridge chip on the board I am working on. It randomly drops received bytes. This only happens under Arch linux as I just recently switched from Ubuntu and I had no problems there. I have the same problem on another computer I use that also runs Arch, though the problem there is much worse. I'm assuming it's a kernel interrupt priority issue. I don't really know where to start debugging this, so any input would be much appreciated. Unfortunately, it's quite random. Sometimes it's rock solid and sometimes it drops a byte every few hundred or so, causing some rather anoying issues. Cast in point: it seems to be working fine at the moment but earlier it was wreaking havoc while I was tring to carry out some calibration operations. The other computer I tried it with is consistently bad, though, getting out of sync almost immediately. The first computer (usually good performance) has a high end 2nd gen core i7 (sandy bridge) while the second (not so good performance) computer has an intel atom processor.
Last edited by alex.forencich (2011-09-01 07:07:02)
Offline
I did some testing on my eee pc (same one as the previous post with an atom processor). In Ubuntu, the connection is rock solid. Didn't drop a single byte in 10 minutes of heavy data transfer. However, in Arch, it consistently drops at least one byte within 2 minutes and breaks the synchronization between the desktop application and the board. Now, I am going to add checks to reset the connection if it comes out of sync, but I really want to figure out what the underlying cause is here since it's definitely not a problem with the USB-serial chip or my firmware.
Offline
Do you know if the external device honors hardware handshaking (CTS/RTS/DSR/DTR/CD) ? Software handshaking (Xon / Xoff) ?
Is it possible the default values for the port differ between the distributions?
Try running stty -a /dev/yourportname under both Ubuntu and Arch. Check for differences in the sections regarding line discipline (I know you know, but check man stty)
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Offline
The device on the other end is an Atmel xmega microcontroller. I did not use any flow control when I wrote the firmware as the communication routines in the firmware are very fast as the chip runs at 32 MHz and the USART is interrupt driven. Since only one byte at a time gets dropped, the chip is definitely receiving and responding to the bytes sent as most of the commands are 3 bytes and the responses are 3 bytes, so more than one byte would get lost if a byte got lost on the way there.
As for stty, here are the outputs:
Arch:
$ stty -a -F /dev/ttyUSB0
speed 115200 baud; rows 0; columns 0; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>;
eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 -hupcl -cstopb cread clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff
-iuclc -ixany -imaxbel -iutf8
-opost -olcuc -ocrnl -onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
-isig -icanon -iexten -echo -echoe -echok -echonl -noflsh -xcase -tostop -echoprt
-echoctl -echoke
Ubuntu:
$ stty -a -F /dev/ttyUSB0
speed 115200 baud; rows 0; columns 0; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^A; eol = <undef>;
eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr -icrnl -ixon -ixoff
-iuclc -ixany -imaxbel -iutf8
-opost -olcuc -ocrnl -onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
-isig -icanon -iexten -echo -echoe -echok -echonl -noflsh -xcase -tostop -echoprt
-echoctl -echoke
Differences: eof (^D vs ^A), hupcl, clocal, icrnl, and ixon
Here is the output of my PC side software when a byte is lost:
Write: 52 01 00
Write: 52 01 04
Write: 52 01 01
Write: 52 01 02
Write: 52 01 22
Write: 52 01 20
Write: 52 01 26
Write: 52 01 27
Write: 52 01 29
Write: 52 01 2a
Read: 00 01 06 00 00 06 00 01 06 00 01
Read from 0100 complete (data: 0001, response 06)
Read from 0104 complete (data: 0000, response 06)
Read from 0101 complete (data: 0001, response 06)
Read: 06 07 fa 06 07 d0 06 ff 74 06 ff 83
Read from 0102 complete (data: 0001, response 06)
Read from 0122 complete (data: 07fa, response 06)
Read from 0120 complete (data: 07d0, response 06)
Read from 0126 complete (data: ff74, response 06)
Read: 06 12 f0 06 26 bd 06
Read from 0127 complete (data: ff83, response 06)
Read from 0129 complete (data: 12f0, response 06)
Read from 012a complete (data: 26bd, response 06)
Write: 52 01 00
Write: 52 01 04
Write: 52 01 01
Write: 52 01 02
Write: 52 01 22
Write: 52 01 20
Write: 52 01 26
Write: 52 01 27
Write: 52 01 29
Write: 52 01 2a
Read: 00 01 06
Read from 0100 complete (data: 0001, response 06)
Read: 00 00 06 00 01 06 00 01 06 07 f7
Read from 0104 complete (data: 0000, response 06)
Read from 0101 complete (data: 0001, response 06)
Read from 0102 complete (data: 0001, response 06)
Read: 06 07 d0 06 ff 72 06 ff 7e 06 04 06 26 bd 06
Read from 0122 complete (data: 07f7, response 06)
Read from 0120 complete (data: 07d0, response 06)
Read from 0126 complete (data: ff72, response 06)
Read from 0127 complete (data: ff7e, response 06)
Read from 0129 complete (data: 0406, response 26)
There are two blocks of register reads in the above section. The first is correct, the second has a mising byte. The numbers are all in hex. The response to 51 01 29 comes back the first time (correctly) as 12 f0 06 but the second time as 04 06. Now, this is a readback value of a power rail voltage, so it fluctuates a bit. As a result, I don't know what was sent to produce the invalid readback of 04 06. I will try to get some more debugging information out of the board. Also, after some further testing, it seems to like to fail in exactly the same manner - same bad response to the same command. I was getting failures on a different command earlier, but I'm having trouble reproducing that now. I wonder if two bytes are getting replaced by 04. According to the ASCII table, 04 (^D) is the control character for End of Transmission.
Offline
Looks like the bytes sent were 13 04 06 so the 13 got eaten somewhere. As hex 13 is XOFF, I tried running stty -F /dev/ttyUSB0 -ixon . This seems to have solved the issue. As I had no idea where to start when I wrote the serial interface originally, I looked at gtkterm for how to set up a serial port. It turns out that I forgot to initialize the c_iflag field of the termios struct for the port. After initializing that field to sensible defaults, the problem seems to be solved.
Although I suppose I now have a nice technique for simulating a lossy connection: just turn on software flow control with stty after the port is open and wait for it to blow up!
Offline
Yeah, that is part of the fun when you send binary data through a link that tries to interpret control codes. I've been bit by it scores of times.
If you think it is solved, go ahead and edit your first post and add [SOLVED] to the thread title.
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Offline
Yeah really. I would have caught it much sooner (and I'm sure realized it was a my bad as opposed to a kernel issue or something) if the Ubuntu defaults were the same as Arch's in terms of flow control. The annoying thing about serial ports in linux, though, is if you don't know what you're looking for, you can't find good documentation for it. Case in point: I had no knowledge of the stty command until you mentioned it. Additionally, I had to basically borrow gtkterm's code instead of attempting to roll my own. It's interesting the kinds of stuff that you run in to doing embedded systems. This was a deterministic problem that showed up randomly. I had a metastability issue in my code quite a while ago that I didn't realize was a metastability issue because it was completely deterministic, always crashing after the same length of time after startup. It was caused by a complex interaction between a PWM module, a timer, and some buggy interrupt handlers.
Offline
Pages: 1