Solving recurrent error codes

Also, failure modes for NAND flash and NOR flash, and test techniques that can find problems ahead of time

 

Q: I’m having trouble booting from my S25FL256S device with my i.MX28 board. The error code that keeps recurring is 0x80501003, indicating that the file signature or file version is incorrect. Here’s my setup information:

 

  • SPI NOR Flash: S25FL256S_64K
  • Linux BSP: 2.6.35.3-1.1.0
  • U-Boot version: u-boot-2009.08
  • Processor: i.MX281

V2N1 TopTips_fig1

Figure 1: i.MX28 PCB circuit with modifications

 

Here are some additional details and steps I’ve taken:

 

  • I added ss1, ss2 pins to spi pin desc struct (c file)
  • After that, sf probe detected the Spansion flash device.

    • sf probe 2:0
    • sf set_config_reg 0x04
    • sf erase 0x0 0x300000
    • sf write 0x42000000 0x0 0x300000
  • The Linux boot stream image is written to flash, then I set the boot mode to 0010, but I’m getting the0x80501003 ROM error code.

A. Thanks for your question. We’ve seen a number of similar queries so this deserves some discussion. Error code 0x80501003 does indeed tell you that the file signature or file version is incorrect. Below, I’ve outlined a series of steps that you can take to help resolve your problem.

 

1. You’ll want to install the U-Boot and kernel patches, for starters, and you can find them on the Spansion website here. There’s also a useful post called “How To Enable SPI NOR boot for i.MX28” on the Freescale forum.

 

2. Download and review “i.MX28 Building Blocks,” which describes how to fully boot from Spansion's SPI flash. This slide deck shows the necessary hardware modifications that need to be completed on the i.MX28 EVK board before you attempt to boot using the FL129P in quad I/O mode.

 

3. You’re likely already familiar with the board details, programming, boot modes, etc., but it’s worthwhile downloading and reviewing the i.MX28 Applications Processor Reference Manual,l, especially if you run into other issues after following these steps.

 

4. From the details you provided, I see that you're relocating the parameter blocks (4 kB) at the top of the array (high addresses), which is actually good for boot images. In order to make this work successfully, as a reminder to those referencing this, you need to set the configuration register to 0x4 using the command sf set_config_reg 0x04, as you’ve mentioned you’ve done.

 

5. I suspect that one reason your system may not be able to boot is due to a wrong image format programmed to the SPI flash (SPI config block, U-Boot…). Here is an example of how your flash content should look at offset 0:

 

V2N1 TopTips_fig2

Figure 2: Flash content at offset 0

 

Comparing your setup to our system, also using i.MX28 EVK,

 

  • At offset 0, we have the SPI config block,

    1. Note: I definitely recommend having the SPI config block at the very beginning of your flash. That’s the first area that's read by the ROM code in order to initialize your SPI, and it’s needed for code shadowing: SPI –> RAM. If this block is missing, this might explain your issue.
  • At offset 0x1000, you should be able to find your U-Boot.
  • At offset 0x50000, you should find your Linux kernel.

Please dump your flash content after you program the boot images and make sure that you have the right image formats in there.

 

If you're willing to boot everything from SPI flash, then 3 MB is definitely not enough for uboot+kernel+rootfs. Booting completely out of our FL-S SPI is definitely possible, however, and partitioning for kernel+rootfs should be fine.

 

 

Q: Can you tell me what testing is done on your flash devices and how I can test my own devices? Please help me understand how failures occur and how to prevent them.

 

A: This response may be a bit lengthy, but it should provide you with a good background and sufficiently answer your questions. Let’s start with error sources. NAND flash memory is sometimes shipped with bad blocks. This is an unavoidable artifact of even the most tightly controlled fabrication processes and common among all manufacturers. As a result, a small percentage of blocks may fail to program or erase within the rated usage of the devices. Bit errors may also occur, so NAND flash requires single-bit error correction code (ECC). NOR flash memory is shipped with no bad blocks, and program/erase failures are not expected for the rated usage of the device. Also, single-bit errors are unlikely, so ECC is not required as long as the NOR flash memory is used according to the specification.

 

Flash memory, both NOR or NAND, can have mechanical or electrical failures. One or more bits can fail to program, for example, which would be reported by the device as a failure. An erase block can fail an erase operation, which would also be reported by the device as a failure. A bit can be read incorrectly (flipped), but this would not be reported by the device as a failure. This is also known as a soft error and can happen because of write disturb (programming nearby bits) or read disturb (repeatedly reading the same location without reprogram).

 

NOR flash really only shows program or erase errors when some area of the device is worn out. Typically, program and erase operations will slow down over the lifetime of the device. Eventually, a program or erase operation will exceed the allowed time frame and the device will report a failure. At that point, the flash memory device will need to be erased, but this tends to be a comparatively rare occurrence. Unless you are running a memory module through hundreds of thousands of programs and erases over the lifetime of your product, you might never encounter this situation. To be safe, however, it's good practice to always check the status of the flash device after every program and erase operation.

 

NAND flash tends to be a bit less reliable than NOR flash. As a result, data sheets for NAND flash specify the use of ECC for meeting the erase cycle and data retention performance. Some data sheets require a higher level of ECC, but for most devices, a 1-bit Hamming code is sufficient. A cyclic redundancy check (CRC) is an error-detecting code, not an error-correcting code, and as such is not an acceptable substitute for ECC.

 

Similar to NOR flash devices, NAND flash memory will slow down and wear out over time. Single-bit errors tend to be most likely to crop up later in the lifetime of the flash device rather than earlier, as a result of higher read/write cycles over time. Given that ECC is written when the page is programmed, however, it's expected that a single-bit error during a page read will be corrected by software/firmware. In such a case, no failure will be reported and no further action is expected.

 

Some NAND devices have special allowances for the first erase block. Typically, the first erase block is guaranteed to be good (i.e., not marked as a bad block when shipped from the factory). There might also be an allowance for not using ECC for the first erase block if the number of erase cycles is under some number (e.g., 100 cycles). Please consult your NAND flash data sheet for specific details.

 

If you need to upgrade your product in the field using NAND flash and the new code is large enough to extend from the first erase block and into a new erase block, this could cause a problem. The new erase block could be marked bad from the factory, in which case it should not be programmed or erased. Your software/firmware should detect factory bad blocks and avoid them. The layer of software to skip or otherwise manage bad blocks in NAND flash memory is often referred to as bad block management. One way to avoid the problem is by skipping the bad blocks. To simplify this, some flash file systems may keep a list of bad blocks and map them out to hide them from higher layers of software.

 

Even if the next erase block is not marked bad from the factory, it could still fail at any time. Does your upgrade software have a mechanism to detect program or erase failure and skip to the next erase block? Can your product handle code in non-contiguous locations? If you are using a NAND flash file system with bad block management, it should be handled for you.

 

Typically, NAND flash devices are specified for 2% or less of the erase blocks to go bad (both factory and run-time failures); but as I mentioned, this does not include single bit errors that should be corrected by ECC.

 

It is possible for continuous reads to disturb the stored data and cause the wrong information to be read from the flash until it is reprogrammed. Just as for a single-bit read error in NAND flash, there would be no error reported. This is not one of the more common failure modes, so it does not appear in the reliability report. It can affect both NAND flash and NOR flash memory, but it's very dependent on the device and technology. For real usage of Cypress flash devices, there should be no problem with read disturb.

 

Now let’s return to your question about test. There are several options to test the connectivity from the controller to the flash memory. Cypress uses these techniques and you can do so as well:

 

  • For a TSOP package, you could do a visual inspection of the pin soldering.
  • For a BGA package, you could use x-ray techniques to inspect the connection of the chip to the board.
  • You could use a multimeter to check connectivity between the pin at the chip and the pad, or any applicable test pin or pad, according to your board schematic. In order to check pin connectivity from software, you don't need to access all locations, but make sure to exercise each address and data line.

 

You can find more information by accessing the following references:

Flash Memory: An Overview

Spansion Quarterly Reliability Report

 

I hope this has been helpful!

 

Also in this issue:

Leverage MCUs to meet the wearables challenge, Whistle canine activity monitor tracks doggie doings, Energy harvesting starter kit makes it easy to go battery free, Quality and reliability–they're not the sameTetra SDK speeds wearables to market

Get More from Core & Code Subscribe
image_pdf

Leave a Reply

Your email address will not be published. Required fields are marked *


Other stories in this issue

Product Spotlight

Energy harvesting starter kit makes it easy to go battery free

With the problem of powering such small devices out of the way, teams can put all their creativity into innovative wearables like a child’s shoe that integrates a location sensor.