The most exciting thing about this world is its ever changing quality.

Thursday, July 30, 2009

Distributed switch architecture



Distributed switch architecture known as DSA has been out in the market since Marvell's first debut in 2004. DSA was targeted at providing a solution for distributed cascade or stacking broadband switching/routing topologies. This brings a lot flexibility in distributed network system with low cost while high switching performance. On the left is a simple picture which explains one of the in-memory switch structure.

I am not going to talk too much on the technology here but introduce how to bring this type of chip up from an external processor.

Normally, what you would have is an external processor connecting to the DSA chip via MII (Media Independent Interface) bus. If you have worked in the device driver level for ethernet control card or phy device you should be quite familar with this bus. Within this bus, there is a couple of lines dedicated as MDIO (Management Data Input/Output) bus, where most of the Phy device control data will be flowing over.
  • MDIO is a bus structure defined for the Ethernet protocol. It is used to connect MAC devices with PHY devices, providing a standardized access method to internal registers of PHY devices.
  • MII is a standard interface used to connect Ethernet MAC-block to a PHY, which means any types of PHY devices can/should be supported and accessed via a defined standard register set. Equivalents of MII for other speeds are AUI (for 10 megabit Ethernet), GMII (for gigabit Ethernet), and XAUI (for 10 gigabit Ethernet).
REG 0 - Basic Mode Configuration
REG 1 - Status Word (You can use this to detect whether an Ethernet NIC is connected to a network)
REG 2,3 - PHY Identification
REG 4 - Ability Advertisement
REG 5 - Link Partner Ability
REG 6 - Auto Negotiation Expansion
  • Phy is the actually device which connects to the physical connection (i.e. cable or wireless channel). It provides a set of registers which forms an interface to allow (phy device) drivers to manipulate. While these devices are distinct from the network devices, and conform to a standard layout for the registers, it has been common practice to integrate the Phy driver together with the network driver. BUT I DO NOT RECOMMEND THIS!
The DSP chips are usually embedded in access points and routers, and a typical setup with a DSA switch looks something like this:


From the CPU point of view, this external (DSA) device needs to be driven to take switch configuration and commands. So immediately, you might get the first clue that we need some sort of device driver to initialise and drive the DSA chip. Using Linux as an example. Supposing you have Linux kernel running on the left side CPU, you could develop your own device driver to talk to the MII bus (actually the party you are really interested in talking to is the device on the other end via MDIO bus), which is well defined in linux kernel as mdio_bus module.

The problem is, the switch is a child node on the MDIO bus but it is NOT a standard Phy device although it normally have a similar set of Phy registers for you to poke around. Well, you have got to know what you are poking here!

Two ways of drive the switch, in short. One is the quick but dirty way. Where you can still use the support from Phy device and define your own private module data structure which maps to the specific layout of register set on the non-Phy switch device. Using lower level MDIO bus read and write to control the registers directly to issue commands. (The switch will normally have a defined list of register bit value to the switching or configuration commands.) This approach will be suitable if you are working on an early version of 2.6 kernel which do not have DSA kernel driver extension.

If you are working on the latest version of Linux kernel then you are lucky enough to use the DSA framework where a nice structure of dsa_switch is introduced. By using this, you can literaturely treat DSA device as an OS external device driven by standard kernel device driver. DSA module has taken care of all the lower level bus access for you. All is left is for you to find out what value to be written to what address from the datasheet (normally you have to obtain them from the chip manufacturers under NDA though). You can use your driver to iterate through all the ports on the switch and presents each port (on the switch) as a separate
network interface to Linux, polls the switch to maintain software link state of those ports, forwards MII management interface accesses to those network interfaces (e.g. as done by ethtool) to the switch, and exposes the switch's hardware statistics counters via the appropriate Linux kernel interfaces (i.e. ioctl is what I did).

Just be aware that both approach do not follow through the generic Phy state machine, so you will have to define it yourself of the handlers (either in phy_device and phy_driver for the first approach or the dsa_switch_driver such as poll_link handler).

Sunday, July 26, 2009

Mastering time or being mastered

Paul Graham just 'stole' my story :-), presented in a much better way though.
I was thinking of the same topic while reading Paul's latest Maker's Schedule. It is interesting to see how people decide to go about their schedule and fill in their empty hollow calendar, esp. managers. For some, it is quite obvious that a belief they have been holding on for life time that meet'it'up and we all sorted. In scenarios we have multiple topics to talk about, easy enough, just make it multiple meetings! Truth of the matter is, not long after, you will find everyone is satisfied with the fact that they have done their jobs, now all is left is for time to do its bit. Whatever happens in the end, people are quite comfortable to shrug their shoulders, "hey, we have done our best - we had a meeting - it is just bad luck, that is all."

This is not exaggerating, not a tiny bit. I was lucky enough to work with one guy who really believed the fact driving the business via meetings is the ultimate solution. The fundamental problem here is quite obvious, only if you do not choose to ignore it though. Some smarter managers know when and how to take engineers' viewpoint and understand that it takes 'time', concentrated time to make things happen. So watch out, if you find yourself or people who work for you getting the habit of filling calendar with all sorts of meeting schedules, there could be something going very wrong here. Either people are just trying to disguise from the reality that they do not really know what is going on and what is to aim for, or you are on the edge of dragging the productivity down because you can not find a better way of communicating.

This guy I worked with was a pilgrim of working by timesheet. We soon found him buried in all sorts of meetings and he was never on time for anything. People start to find him take too much of his own and others time to satisfy his timesheet. In the end people unavoidably started to wonder the opposite. Does he know what he is supposed to be doing or he just have to replace the lack of confidence and plans by the stuffed calendar - poor meeting boy?

So, to be considerate is what I consider one of the most important criteria to be a good manager. (Being considerate does not mean being nice though :-).) Understand what it takes for engineers to produce something meaningful at certain quality standard is the base of management activity. It always works better for me to make sure I was not managing people's time FOR them. To know when to stand out of people's way is easy to slip from managers' mind, especially when fire starts to come out from everywhere and everyone on the food chain gets beaten up on daily basis. Hey, but what differentiate successful managers from the failed ones? So every time when I try to schedule a meeting I ask myself:

  • Do I really need this meeting? How much it will cost me and the team to spend an hour in this meeting, would it be worthy consider how much we could probably get out of it?
  • Is this meeting necessary or it is really just to make some people happy? If to make those people happy is critical, any other more productive ways I can make that happen? If it isn't critical, I am sorry, we are all grown ups...
  • Am I driven by the calendar? Do I start to rely on calendar to think for me? When was the last time I take a deep breath and a realistic look at the cost and quality of the product, the returning rate of installed products?
Meeting means nothing really, if the purpose of a group of people to communicate important information in a much condensed manner is skewed. It replaces nothing, as far as I am concerned, certainly not the work itself.

Sunday, July 19, 2009

Wolfram Alpha and knowing your market

It was kinda a mixed feeling when I was reading the latest blog on Joel on Software about Wolfram Alpha's failure. It stroke me it has been weeks since my first trial of Wolfram Alpha allured by its overwhelming PR. Gee..., I can't believe I fell behind again, that was my first reaction.

First thing first, WA just really got it wrong that who are their target customer, target market. In other words, they do not know who will be using this. I am certainly not able to get much out of it with ten of my search keywords. Yeah, this makes me look very bad, and that's why I stop using it. Who wants to be made like an idiot in front of a machine which is supposed to be considerate and understanding. Most of my inputs have been given the answer as "Wolfram|Alpha isn't sure what to do with your input". Okay, after a deep breath, I thought we need some scientific spirit here. After all, it is still a young child. There are the claims made on the page:

Wolfram|Alpha answers specific questions rather than explaining general topics. - I guess this is supposed to be where WA wanted to penetrate the overflown search markets, originally at least. However, I did not see the evidence that this has been emphasized, nor been circumvented in any formats. This might be what would happen when you are over ambitious and do not know where you stand when it comes to product introduction.

You can only get answers about objective facts. - To whom? Is this online version of engineering handbook? We are all in the flood of information and knowledge updating all the time. In a short period of 50 years, look how many changes have already been made about our existing science and engineering 'facts'. To me, everything has a life span. Within it, yes, many assumptions we made about nature or explanations to problems might stands, but I do not believe in eternity in answers. Instead, people who create solutions, who explains mystery with their own eyes and hands are there to tell their findings, present their opinions, answer or challenge each other's theory or questions. If WA was targeted to replace our high school science book then fairy enough. However, even first grade university student knows that other than some fundamental formula (which have not yet been proved otherwise), most of the study is about critical thinking, finding unique solutions to problems.

Only what is known is known to Wolfram|Alpha. - No other comments, but WTF!

Only public information is available. - Okay, what else are we supposed to be expecting?

And the suggestions offered by WA to its users are:

"If Wolfram|Alpha is still not sure what to do, try the following:
  • Don't use long complete sentences; just enter the minimum number of words needed to communicate
  • Try different words or notations
  • Use whole words instead of abbreviations
  • Check your spelling"
We all know sometimes we use workarounds or tips sections as vehicle to achieve a good balance between development cost, time and quality. I really can not see how these group ever going to achieve this trade off other than drive users away after first trial out. Patience as beauty is gradually fading out from our society as the close button paint on the elevator. People really can not afford to be nice and offer many chances to decide what search portal can help them out when they are desperately to get over this bit after information is retrieved. I am not saying Google has figured it all out. But it surely has understood the fact that to lure end users to your product, the way to do so is to do as much as jobs for them, in a subtle way still protect pride while simplify the input requirements from end users to the last drop. I already have tips for WA:
  1. Do not assume user inputs and it is certainly not a good idea to assume a superior status to your end user in the product design. Customer is always right!
  2. Clearly identify your market (maybe the market has potential is where Mathematica has been sold to, or more specific search products with institutes, education bodies etc.) After all, it needs a profitable business model to sustain. To replicate Google's Ads might not be such an easy idea here.
  3. Get more data! Before than, make it clear about the product boundary. It can not be claimed be claimed a global distributed product when I could not even get house price data in Cambridge or London.
Just a happy ending to this story, finally, after 10 mins with a piece of paper, I have my search results with WA on "inverse Fourier transform sin(x)"......


Saturday, July 18, 2009

On target automation test engine

While I was on the train the other day to see a crappy doctor for my eyes (I am sorry but he has certainly proved that the visit does not worth my fifty pounds!), I thought just to see whether I could implement a small idea with one eye open, just you know, the other one was in pain...

Anyway, my team has been suffering from lack of on target testing for a while now. Most of the existing testing is more of a hardware exercise test at best. The verification of functionalities and integration test are left for the last minute surprise. There were some standard approaches such as building drives and applications into the binary image and download to the target device, relying on the start up scripts to do its job. Again, most of these tests are more of automated hardware exercise testing rather than simulation. I guess one of the subtly here is that to simulate run time device performance in a realistic working environment is much involved when various components with timing constraints, system load difference, hardware limitation as well as failure mode recovery are really non-deterministic by nature. Some would argue the best we can do really is to do our deterministic part and leave nature to do its part. Well, we all know nature likes to surprise us every single time, just to show our ignorance I guess, for fun :-). We thought about adopting some off-the-shelf testing framework to write our test programs checking the components while they are running on the target device and integration test programs will simulate as many as possible scenarios we can think of, of course this would only be possible when the device running modes are well captured and defined. These simulation tests are not trivial to develop nor easy to maintain. The best part is, when something gets changed, or tests failing to be updated accordingly with the production code, things start to fall apart and no one has a clue what the heck is going on other than running around to setup JTAG debugger or gdbserver and start to get into our ancestor beloved register and assembly world.

My idea is simple, in fact, the engine I wrote on the train is only 200 lines of code. To define a light structure where each time when a new component is dropped to target, some standard test scripts or test programs will be put under certain location. They will be automatically picked up when a high level unit test mode command is issued. For the traditional integration test, we can run multiple scripts simultaneously, following the pattern the application demands. And finally, the best part is Panic test, where a high level command will be something like this: run_as_wish. This mode will certainly cause quite a lot of failures but I guess it is better to know your bottom line earlier rather than keep looking for it all the time.

The engine is written in Lua. So we will need to get Lua onto the target first. It defines three modes for invoking existing test scripts/programs in a protect mode. If you want to know how to kick off multiple scripts in parallel, this post worth reading, and of course luathread. The beauty of Lua is that you can easily integrate it into all sorts of different languages, even .Net, via certain interface. Since Lua has been used widely as embedded script engine in games, you should not be surprised too much. In my engine, again, simple and easy, use a global script to kick off the whole process and iterate through all those scripts needed to be concurrently running, in the scenarios you defined. Obvious question would be, why do not we just write multiple C programs? Okay, first of all, since Lua scripts does not have tight dependencies to the low level OS APIs, they are much easier to port. Secondly, with RemDebug, you can actually remote debug Lua test scripts from PC. Thirdly, when your tests gets bigger and heavier to maintain, it is unavoidable that your production logic sneaks into your test code, if they are written in the same language, that is even more convenient for us lazy people to do. This way, it force you to look at your production code independently and poke it in a protected environment. By doing this, you do not have to rebuild your image every time, which is really a time killing task!

Monday, July 06, 2009

Test coverage and online collaboration

A quick note just incase my brain is stuffed with rubbish some day.

All free.

  • Online collaboration and whiteboard tools:

To share code snippet quickly: dpaste

To share code within Visual Studio 2008, 2010: Dashboard Extension

Browser based whiteboard: Imaginationcubed (You cannot get better than this! :-))

Client app based: mikogo, Dimdim or just do twittering

  • Testing coverage tool

NCover or PartCover for C# (work nicely with NUnit)

COVTOOL for C++

GCOV and lcov (come with linux distribution) for gcc

xCover for both C/C++


Not free: BullseyeCoverage, Testwell CTC++, CoverageMeter, Rational PureCoverage or Test RealTime Coverage