Understanding adm_test program

Added by Renato Sampaio over 11 years ago

Hi,
I'm studying DS_DMA to use in my own project (deliver network packages to the host trought PCIe like a NIC card does), but I can't get passed the adm_test.
Could you help me, please?

I'm using a SP605 Evaluation Kit from Xilinx.
I succefully implemented and programmed the bitstream of the sp605_lx45t_core.
I don't have the Aldec Active-HDL software, so I'm doing everything in Xilinx ISE.

First, I had some troubles due to synthesis/implementation settings, but studying the Active-HDL config file you are using, I was able to copy the settings and generate the bitstream.

Now, I'm trying to run the adm_test program. My testing enviroment is Ubuntu 12.04 32-bit GCC 4.6.
I had some situations:
- Running with the provided test_main.cfg as configuration file, everything looks fine by the program's output in the beginning, but when it gets to the performance statistics, there's no information (everything is zero) (attachment: test_main.log)
- Running with no config file, it shows performance statistics, but I don't understand the numbers. What does BLOCK_RD and BLOCK_WR stand for? (attachment: test_with_no_cfg.log)
- If I try not to use the parameter "isSystem" as 1 (user-mode), the program crashes with "Alloc memory error".

I don't know if it helps, but I ran the pex_test program. I'm still not familiar with most of the core's configuration, so I don't understand the outputs, but maybe they show something about my situation that would explain the behaviors. Attachment: pex_test.log

Could you guide to extending adm_test so I can receive my own data? If the situation with no config file is right output, why is there negative numbers?

I attached the log files to this post. Thank you very much for any help.

pex_test.log - pex_test output (1.5 kB)

test_main.log - Running with the provided test_main.cfg as configuration file (2.2 kB)

test_with_no_cfg.log - Running with no config file (1.7 kB)


Replies (43)

RE: Understanding adm_test program - Added by Dmitry Smekhov over 11 years ago

Hello. I understand the problem. BLOCK VER=102 works only chipset before P55. Above P55 works only BLOCK VER=103. There is problem with calculation CRC for block of descriptors.
I resolved this problem for Virtex 5 and Virtex 6. But project for sp605 must be updated.

Payload 256 is normal.

RE: Understanding adm_test program - Added by Dmitry Smekhov over 11 years ago

Hello. I understand the problem. BLOCK VER=102 works only chipset before P55. Above P55 works only BLOCK VER=103. There is problem with calculation CRC for block of descriptors.
I resolved this problem for Virtex 5 and Virtex 6. But project for sp605 must be updated.

Payload 256 is normal.

RE: Understanding adm_test program - Added by Renato Sampaio over 11 years ago

Hi,

the BLOCK_VER, you mean the FIFO, right? Well, I used BLOCK_VER 103 with the Ubuntu 12.04 32-bit and, although the interrupt count was increasing, no data was getting trough. I didn't really test the core with BLOCK_VER 103 and 64-bit Ubuntu. Do you think that might be the answer? I'll try that now.

Thanks!

RE: Understanding adm_test program - Added by Dmitry Smekhov over 11 years ago

Hi
BLOCK_VER is a version of block_ext_fifo. 32-bit Linux has problem with interrupt. I suppose this a program problem. But hardware is ok. 64-bit Linux is running on another computer (is it right?). This computer request a latest version block_ext_fifo. Your project with 103 is not good. May be your don't update all files. Now I am in the Greece, I return to Moscow 15.07 and check the project. Try update all files and simulate project in the Active-HDL or ModelSim. Test "test_adm_read_16kb" must be show correct works of DMA channel.

Hi, I'm studying DS_DMA to use in my own project (deliver network packages to the host trought PCIe like a NIC card does), but I can't get passed the adm_test. Could you help me, please?
Yes, I help.

RE: Understanding adm_test program - Added by Renato Sampaio over 11 years ago

I understand. I'll update everything and try simulating it. I'll report results as soon as I have them.

I ran both 64 and 32-bit in both computers. 32-bit raises interrupts, but with no data. 64-bit doesn't raise interrupts.
Both computers share the same chipset family (Intel 7 Series, Z77 and B75 models). I'll try to find an older computer to confirm this too.

Thanks!

RE: Understanding adm_test program - Added by Dmitry Smekhov over 11 years ago

Hi
I make v1.2 build_0x01: http://src.ds-dev.ru/files/sp605_lx45t_core_2013_07_17_v1_2_build_0x01.zip
But I don't test it on SP605. It works on simulator.
Please, see waveforms (src/testbecnch/ rx.awf tx.awf descriptor.awf) and result of simulation (src/log/test/log), but test.log with some russian words :-(
Please, test the project on the SP605. I hope it will be work.

RE: Understanding adm_test program - Added by Renato Sampaio over 11 years ago

Hi, I tested on the 64-bit Ubuntu with no change.
I'm studying for a exam tomorrow, so I'll take a better look to the simulation files and the solution further by friday.

I'll try later on the 32-bit version as well and report here.

Thanks for the help!

RE: Understanding adm_test program - Added by Renato Sampaio over 11 years ago

Hi, good news!

I tested the v1.2 build_0x01 to Ubuntu 12.04 LTS 32-bit and it works!
All received blocks are market as Correct blocks, the Average Speed is 203 MB/s (which makes sense, since standard throughput is 250MB/s) and BRD STATUS behaved just like your examples (logs attached)

The FIFO_STATUS begins with a1, just like expected as well:

ubuntu@ubuntu:~/dev/pcie_ds_dma_linux/application/adm_test/bin$ cat /proc/AMBPEX50 

  Device capabilities

  CAP_ID = 0x5
  Can't find PCI Express capabilities

  Device information
  m_TotalIRQ = 136230

  PE_EXT_FIFO 4

  BLOCK_ID = 18
  BLOCK_VER = 103
  FIFO_ID = 3400
  FIFO_NUMBER = 0
  RESOURCE = 2
  DMA_MODE = 0
  DMA_CTRL = 0
  FIFO_STATUS = a000
  FLAG_CLR = 0
  PCI_ADRL = 0
  PCI_ADRH = 0
  LOCAL_ADR = 0

  PE_EXT_FIFO 5

  BLOCK_ID = 18
  BLOCK_VER = 103
  FIFO_ID = 3400
  FIFO_NUMBER = 1
  RESOURCE = 2
  DMA_MODE = 27
  DMA_CTRL = 1
  FIFO_STATUS = a101
  FLAG_CLR = 0
  PCI_ADRL = 2000
  PCI_ADRH = 0
  LOCAL_ADR = 1000

I`ll test again with the 64-bit and report here, but right from start, FIFO_STATUS is a0, so I guess something is wrong there.

Despite 64-bit not working, now I have somewhere to start from.
My next steps are:
- Being able to generate the v1.2 build_0x01 bitstream and test it successfully on my own.
- Insert ethernet logic in the example design (BRDs 2-5 are available)
- Modify the test application to write incoming ethernet packets to a file

There`s a lot more after that, like creating my own design to use the DMA core, writing my own user application, support ethernet registers configuration from the app and writing my packet handling logic, but I`m glad I`m able to start.

Thank you very much!

out_dio.log (3.1 kB)

test_dio.log (2.8 kB)

test_main.log (2.6 kB)

RE: Understanding adm_test program - Added by Renato Sampaio over 11 years ago

Hi, I tried again and confirmed that it's not working for the 64-bit version.

Could this be the FIFO again? FIFO_STATUS is a0.

One question, though. You said the FIFO version (BLOCK_VER 102) didn't work with the new chipsets because of CRC calculation for block of descriptors.
What do you mean by that? I could't see much change on the svn diff to follow code. Does the new motherboards send different data? Could you share where can I find this difference?

Thanks

RE: Understanding adm_test program - Added by Dmitry Smekhov over 11 years ago

I am glad that the project is running.
203 Mbytes/sec is normal speed. But it is possible to increase up to 215 Mbytes/s;
Project will be work on 64 bit system if you get memory below 4 Gbyte. It is bug; My project for Virtex 5 and Virtex 6 works with memory above 4GB. I try to correct it. And I hope that you help me to test it.

One question, though. You said the FIFO version (BLOCK_VER 102) didn't work with the new chipsets because of CRC calculation for block of descriptors. What do you mean by that? I could't see much change on the svn diff to follow code. Does the new motherboards send different data? Could you share where can I find this difference?

I send 4 requests for read 512 bytes of descriptor; Answers can to come in any order; I write answer in the DPRAM. But version 1.2 calculate CRC before write to DPRAM. If order of answer is changed then CRC is incorrect; Chipset below P55 send answer without change of order. And it errors is not. Chipset above P61 change order. I create new version (1.3); CRC is calculated after read 512 bytes. There is a special cycle of read 512 bytes from DPRAM and calculate CRC. It works on new chipsets.

RE: Understanding adm_test program - Added by Igor Kazinov over 11 years ago

Information regarding sp605_lx45t_core_2013_07_17_v1_2_build_0x01.zip and DELL Studio XPS 8100 (Core I7-860, DDR3-1333 2G, DH57MO1, etc)
Design tested with kubuntu-12.04.2-desktop-amd64.iso
Results:
1) 8GB - failed
2) 4GB - works fine (Opencores SVN rev37 results near 202 MB/sec)

RE: Understanding adm_test program - Added by Renato Sampaio over 11 years ago

Hi,

I tested it against the 4GB Memory 64-bit Kubuntu and it worked as you said! I'll help you test everything I'm able to test.

I send 4 requests for read 512 bytes of descriptor; Answers can to come in any order; I write answer in the DPRAM. But version 1.2 calculate CRC before write to DPRAM. If order of answer is changed then CRC is incorrect; Chipset below P55 send answer without change of order. And it errors is not. Chipset above P61 change order. I create new version (1.3); CRC is calculated after read 512 bytes. There is a special cycle of read 512 bytes from DPRAM and calculate CRC. It works on new chipsets.

Is this behavior described on the chipset datasheet or it`s only noticed during tests? I downloaded the Intel 5 Series Chipset Datasheet (P55 and below) and the Intel 6 Series Chipset Datasheet (P61 and later) and didn`t see any changes in PCIe functional description. Actually I didn't find nothing similar to what you described above.

My production environment will be a Server motherboard, probably C600 chipset with more than 4GB memory, so I'm really interested in studying the differences between chipsets and how to apply them to the DMA engine.

RE: Understanding adm_test program - Added by Renato Sampaio over 11 years ago

Hi,

besides my other question about where can I find this chipset behavior description (order of PCIe answers), today I was studying the project testbench and found results different from yours.

The difference is only in the beginning of the log file.

Here`s the test.log from the repository:

TEST_ADM_READ_16KB
STATUS: A101 - Дескриптор правильный
TRD_STATUS: 006F    STATUS: A191 - завершено чтение блока 
TRD_STATUS: 004F    STATUS: A101
TRD_STATUS: 004F    STATUS: A091 - завершено чтение блока 
TRD_STATUS: 004F    STATUS: A101
TRD_STATUS: 004F    STATUS: A101
TRD_STATUS: 004F    STATUS: A191 - завершено чтение блока 
TRD_STATUS: 004F    STATUS: A1F1 - завершено чтение блока STATUS: A121 - DMA завершён 

Here`s mine (I translated the test_pkg.vhd code so It`s easier for me to read the results):

TEST_ADM_READ_16KB
STATUS: A101 - The descriptor is correct.
TRD_STATUS: 006F    STATUS: A191 - Finished reading Block. 
TRD_STATUS: 004F    STATUS: A101
TRD_STATUS: 004F    STATUS: A091 - Finished reading Block. 
TRD_STATUS: 004F    STATUS: A101
TRD_STATUS: 004F    STATUS: A101
TRD_STATUS: 004F    STATUS: A191 - Finished reading Block. 
TRD_STATUS: 004F    STATUS: A101
TRD_STATUS: 004F    STATUS: A101
TRD_STATUS: 004F    STATUS: A1B1 - Finished reading Block. STATUS: A121 - DMA completed. 

Is there something wrong with mine or maybe the repository's test log is out of date?

Thanks!

RE: Understanding adm_test program - Added by Dmitry Smekhov over 11 years ago

Hi
File sim\test.log is old. I forgot about it. It is file for MODELSIM;
I use Active-HDL and I set output to src\log\test.log;
Last revision test_pkg is right.

RE: Understanding adm_test program - Added by Renato Sampaio about 11 years ago

Thanks! I think all is working well.

Reading trough driver code, I noticed that before Reading or Writing to fpga registers, the address is divided by 4. Could you explain me why? It's probably a more driver related question, but I didn't find the answer on books and the address used in the vhdl core passed to the LC_BUS seems to be aligned with value before dividing by 4.

regards

RE: Understanding adm_test program - Added by Dmitry Smekhov about 11 years ago

Hi
Function for read register from trd:
u32 pex_board::core_reg_peek_dir( u32 trd, u32 reg ) { if( (trd>15) || (reg>3) ) return -1; u32 offset = trd*0x4000 + reg*0x1000; u32 ret = *(bar1 + offset/4); return ret; }

offset - is byte address of register;
bar1 - is pointer of 32-bit word;
I must convert byte address to count of 32-bit word: offset/4
bar1+offset/4 - is pointer to 32-bit register

http://src.ds-dev.ru/doc/adm/admtest.htm - description, but only Russian

The first table is base address of TRD.
The second table is address register in the TRD;
Each register have 4096 byte in the address map.
Each TRD have 4 direct register.
Formula for byte addres: u32 offset = trd*0x4000 + reg*0x1000;

RE: Understanding adm_test program - Added by Renato Sampaio about 11 years ago

Hi,

Now I think understand! Thanks!
Please, correct me if I'm wrong:

So, first we have a byte address, corresponding to the formula "trd*0x4000 + reg*0x1000", which I followed through the code till ctrl_adsp_v2_decode_data_cs.vhd decodes to the right register module.
But the mmap bar1 address mapping is 4-byte aligned, so to make the byte address into a 4-byte address the division is necessary.

Is that why the PCIe TLP Packet Address reserves the 2 LSBs? So that the address is again byte aligned?

RE: Understanding adm_test program - Added by Dmitry Smekhov about 11 years ago

Hi

So, first we have a byte address, corresponding to the formula "trd*0x4000 + reg*0x1000", which I followed through the code till >ctrl_adsp_v2_decode_data_cs.vhd decodes to the right register module.

Yes.

But the mmap bar1 address mapping is 4-byte aligned, so to make the byte address into a 4-byte address the division is necessary.

No. bar1+offset/4 - it is rule of C/C++ language. bar1 it is a pointer to 32-bit data, bar1+1 it is a pointer to next 32-bit word (+4 bytes), bar1+2 - add 8 bytes ...

Is that why the PCIe TLP Packet Address reserves the 2 LSBs? So that the address is again byte aligned?

Yes. Address always aligned to 4 bytes.

« Previous 1 2 (26-43/43)