The most exciting thing about this world is its ever changing quality.

Saturday, August 29, 2009

Get those words right

We all lie every single day. As harsh as it might sound but most of us know this is true, damn true. Personally, I have no problems in believing a lie itself does not necessary do evil or good. It is only a choice of communication, a matter of intention, to get message across.

To understand the meaning beyond the line would be extremely helpful for us to achieve better communications. As let's admit it, we can not really survive if everybody just tells the truth all the times, well at least politician, lawyers and state agencies can't :-). Sometimes we say them as more of a professionally trained reaction rather than results of complicated thinking process.

From engineers, these are some typical lines I have heard, and my interpretations to them after many years of learning :p

We have a quick work around to this problem, which will not affect the rest of system at all.
Maybe this time I can get away without sufficent testing, especially regression test. With a bit of luck no one will find out the caveat. The worst could happen is for me to explain to customer how it was a new feature rather than a 'side effect'.

I have thoroughly tested it on my machine.
Well, I forgot to do a proper clean test. Let's start over again.

Source control messed up my codeline.
I forgot/felt too lazy to follow the update->test->checkin->test routine.

I will write test code later.
I will try not to worry about test code until someone important puts a gun in my head.

The code is a mess, we will be much better off start over.
I can't find a better way to refract the code, or I don't understand the use cases of this product, or it isn't my fault there are always so many bugs, it is the code's fault...

Product manager never tells me what he wants, and always changes his mind.
I can't work out the intuitive approach and cool functionality, maybe I can shift the blame and responsibility to someone else, an easy target.

It will take me 5 minutes to fix this bug.
I really don't mean 5 minutes! I only say so to justify why we should not do this. So do not put this as task estimate in your project plan.

For managers, here we go,

We are almost there.
We have not achieved our target. This is not good enough

Next release will be on time.
Whenever I say it, don't take my words on it, pay attention to what I am about to say next, "If..."

All tests passed for this release.
We have cut off some failed tests. With limited time, we can only assume some test codes have not been updated or have bugs in.

Let's work together on this.
I will let you know my decision out of courtesy.

It's not my call.
I am not taking responsibility for the decision.

We have no resource to do this.
I have different understanding of the task priority.

You are the most important customer.
We have more than one customer to support.

Wednesday, August 26, 2009

Real time signal driven between Kernel and User space

I have written a blog before about the standard usage of /proc and /dev interface as IPC between kernel space and user space applications. Of course you can do clever things such as asynchronous I/O (AIO) and non-blocking system calls, but they do not really solve the problem, if what we need is a real event driven rather than thread polling.

For standard non-real time Linux, popular ways include socket (relying on which, netlink is constructed), signalling, mapping memory. (These are all I know, please tell me if there are other tricks I don't!) There are other tricks like upcall (using call_usermodehelper in the kernel module to invoke user space program) but they are rather hacks and not well supported during porting to different hardware platforms. Of course via you can use named pipes or fifo (mknod and mkfifo) which are essentially system level device node based (similar to /dev interface) communications.

(To be extreme, here is what I have always believed - the only reason why we are having the difference of user space as opposed to kernel space is you can do whatever you want in user space without screwing up the whole os, which is protected and can allow others like you to screw up the system independently. After 2.6, the whole Linux kernel can be considered as a single process, with multiple concurrent, schedulable threads.)

Basically, signals can be sent from kernel and some can be queued if you choose to. The Linux signal queue is interrupt-safe. I won't go through the whole list of signals available and APIs to use as you can find them C&P all over the net. What I would like to note here is POSIX.4 Real Time Signals or known as RT signals. They are a group of signals (between SIGMIN and SIGMAX) supported by the Linux kernel which overcome some of the limitations of traditional UNIX signals. First of all, RT signals can be queued to by the kernel, instead of setting bits in a signal mask as for the traditional UNIX signals. This allows multiple signals of the same type to be delivered to a process. In addition, each signal carries a siginfo_t payload which provides the process with the context in which the signal was raised. To say process is a little confusing, in fact, you can signal to specific thread or a group. The catch here is you need to specify carefully which type of signal it is when you generate them in the kernel (unfortunately they need to be manually mapped to the send_sigxxx APIs you will be using to trigger the signal, i.e. if you want to use sigqueue, the si_code has to be SI_QUEUE. Unluckily, some Linux porting doesn't support sigqueue, e.g. Blackfin and PPC. There are workarounds. You can still use send_signal_info to generate signal with siginfo_t payload to queue the RT signal, but be aware you can't use _sifields. _rt. si_sigval. sival_ptr to pass a 32 bits pointer of a struct and hope to use it the same way as the value you can pass with sigqueue, you can only pass a 32 bits value in the union. I learnt it the hard way...)

One problem with RT signals is that the signal queue is finite, and hence, once the signal queue overflows, a server using RT signals has to have some fall backs. Good thing about RT signals is that they have a very low overhead. They also provide a very much software interrupt-driven approach, which to my mind is quite intuitive as you think about it. All the interesting events originally will come from hardware interface, pass to device drivers sitting kernel. What is more efficient than building your higher logic on these events?

Also, I have wrapped up these signal handlers in an I/O lib, where it sits in user space, elegantly creating and posting events in user spaces, to those who are interested - a post-office like publish-subscribe mechanism. This way, you wouldn't have to worry about errant signals. In the kernel, I have added a linked list to maintain all the threads (task_struct) which have initiated the requirement to be signalled. They will be signalled in a simple round-robin way. Code is dead simple:

// in the kernel driver
list_for_each(ptr, &user_tasks.list)
{
entry = list_entry(ptr, struct user_task_struct, list);
memset(&info, 0, sizeof(struct siginfo));
info.si_int = pdev->data;
info.si_signo = SIGGPIBUTTON;
info.si_errno = 0;
info.si_code = SI_QUEUE;
info.si_uid = pdev->minor_node_id; // stole this field for additional info
err = send_sig_info(SIGGPIBUTTON, &info, entry->thread);
}

// in the I/O lib
memset(&m_actBt, 0, sizeof(m_actBt));
m_actBt.sa_sigaction = &CGPI::ButtonSignalHandler;
m_actBt.sa_flags = SA_SIGINFO;
sigemptyset(&m_actBt.sa_mask);
err = sigaction(SIGGPIBUTTON, &m_actBt, NULL);

void ButtonSignalHandler(int signum, siginfo_t *info, void *ptr)
{
int data;
printf("Received signal %d\n", signum);
if(signum != SIGGPIBUTTON) return;
data = (int)(info->si_int);
CEvent * evt = new CEvent(data);
evt->id = info->si_uid;
evt->Post(m_evtMgr);
}

In Xenomai (a real time patch for Linux), you have the option to enable kernel rt_task communicate with user space ones using real time message queue (rt_queue_create). I have also tried to get the real time signal working with Xenomai 2.4.91. Xenomai patch only support the RT signals via POSIX skin (officially). However, pthread_sigqueue_np only takes pse51_thread instead of standard POSIX thread, which means the signal can only be sent to Xenomai POSIX skin thread, created by overloaded pthread_create. This is a little messy I know. In non-real time Linux, you have GNU thread and POSIX thread implementations to choose from, or both, depending how you choose to link your libraries. Xenomai has its own implementation of kernel mode finer thread (xnthread_t , if you like to call it). These Xeno threads existing only in Xeno real time domain. When you choose to use native skins, you will be dealing with rt_task_xxx interfaces. However, all the signal handling will just happen in Linux domain. Within POSIX skin, you can queue and set up signal handler for Xeonmai POSIX skin thread. Effectively, instead of using rt_task_xxx, you can use familiar pthread_create APIs. Just bear in mind you need to link to Xenomai libraries.

Monday, August 24, 2009

Fiddling, hacking and engineering

We have people who do not really have experience in designing a system, no idea what to start from a blank sheet of paper, clueless when it comes down to pros and cons for different weapons of choice for the problems - I call them fiddlers. Fiddling things would sometimes turn out to produce some results, which will unfortunately encourage further messing... If this happens in your codebase, good luck with the quality.

The second type is hackers. (When you call a programmer a hacker, it is either an insult or genuine compliment, depending on the context. Here I will take both of these interpretations on.) Hackers are normally capable of breaking assumptions, finding out the hidden rules and enjoying planting tricks in the code. Some have a strange tendency to interwoven and complicated design on the face but extremely elegant and efficient inherently. To make others understand or maintain will never be on hackers' priority list. Good hackers always try to understand the underneath logic first before any actions are taken. On the other hand, in an insulting way, it means people tend to get on with the change, or Mr. fix-one-break-many. Most hackers are very keen to keep pace with latest and coolest gadgets, especially where they can make their mark on. Making money is kinda irrelevant at this point.

Engineers, a concept which many used to be proud of, but now sounds a little tedious and boring, or abused... In my mind, a good engineer is the key asset to a team, and an organisation. Being a good engineer is not only a challenge at skill level. It is much more than someone who knows what tools to use, what technique to adopt, how to fix bugs and how to use APIs. Good engineers understand the difference between playing with technology and building a profitable product. They also appreciate real requirements as well as feedback. They are by nature good marketers, contradicting to many people think - to advocate their efforts in a bigger scale. Those do what's been told and only translate human language into code are not exactly in this category I am talking about. Love to the code is much more profound I would say.

Get rid of fiddlers, hire and direct hackers, rely on your engineers.

Monday, August 17, 2009

Work means nothing if you have a shitty life

Well, it's true. For me, work is part of my life, a very important part.

Three things I'd usually like to claim, true or not.
1. I don't work for money, but pay has to be well enough :)
2. I don't do the clock, so I won't do a job on the clock either.
3. Work is an important part of my life, so don't call me workaholic, it makes me happy. It's the same kinda feeling as expecting a child, decorating a house, planning a nice trip, playing chess with friends. In another word, I love what I do and I won't do what I hate.

It seems quite obvious to me that if you get to spend 8 hours of your day time to enjoy that part of your life, what's wrong with that? On the contrary, if job to you is only what you want to trade for the rest of 4 hours peace and fun after working hours, I can't help thinking that is very sad.

In a certain stage of everyone's career, well maybe not everyone, in my career, I thought a lot about how important it is being somebody in my life, making marks in paths I have stepped through. It is only in the last two years I realised that you don't mark on something in the future, you do it on this very moment. There is no such thing as out of life, no such thing as what I would be able to do instead in the future if I just do this now. Every step made is part of it, every decision made is the brick which will eventually complete the building. All I hope is one day when I look back and can peacefully say that I lived my life.

Do not be evil

Since Google and Paul Buchheit has not trade marked this one, I borrowed it for this blog.

It is a little ironic but true that it always easier to be evil, be bad, compared with good deeds. It's also normal that a company fights its way through numerous obstacles to develop really good product before first debut. Ambition at that time is about making great product, making big money, and be extremely successful. To achieve so, we soon realise that the only way is to find a problem, then provide a solution, which people love, talk about and are willing to pay for. We go out of our way to improve the quality, to make users happy and fall in love with us. Those who made it understand the importance of marrying passion with down-to-earth approach. Since at hat moment, they have nothing, but a chance to make a good product. In fact, that would be the only thing they are evaluated on. I remember a line from a podcast somewhere about the criteria to use for making decision - "if you are not sure how and what to choose from thousands of things to do, do those which will make your customers happy". This is extremely important for budding organisations which often operate on the edge of living or dying. Some more tenacious species of us will make it, and eventually able to build up their own tribes, raise their influencing weights, attract investment, be successful.

However, what happens next is what I would like to talk about here. It is very difficult to make our way from scratch through the first success - first successful product launch, first major investment; it is even more so when we have chance to get to the next ladder. When we are successful, many start to bully or ignore users. Not we like that, simply because that is the easier way out, or at least it seems so in the beginning. We start to get lazy and evil, start to release software without testing and rely on luck to not hit the rock, start to be arrogant to users' feedback as we know better (otherwise how could we succeed?) We start to focus on getting more and more investment to build up a giant empire because that seems to give the success new and real meaning. We start to stare at stock price rather than test reports. We have gone to the dark side, unfortunately, which majority of big organisations do, seemly unavoidably.

From this point on, we start to offer users less and less choices and provide them thousands of configuration options which gives us the professionally pretentious appearance. What we do not realise is that we start to lose people's love and trust, we start to see the baseline of the game changing and we do not have the support from those once being loyal to the products and marketing for us by words of mouth. We become extremely thick and reluctant to accept criticisms. The only possible outcome is that products sliding away from making people happy to annoying and belittling users, where no good is going to come out.

Google is able to recognise why so many once successful organisations failed to continue their glory. Being able to maintain this type of startup spirit could not be more important. It is so easy and convenient to just do evil, admittedly we all have tendencies of doing that. No matter how accountants and bankers mess up with the economy, at the end of day, it is the maker who decide the ball spinning, it is the product which actually provide the drive. So, be good.

Saturday, August 15, 2009

Relentlessly resourceful

It starts to feel like language advantage when I was reading one of Paul's latest essay. Relentlessly resourceful, a precise description of quality which differentiate certain individuals from the rest.

This applies to both makers and managers. Being relentless is not much a stand out quality itself as you might spend days in doing stuff which does not generate any valuable deliverable. Many would refer this as 'political gesture', especially useful when your behavior has direct impact on external stake holders' expectation. Being relentless also sometimes go with a subtle truth - we do not really know what we are doing, and do not really care, just do stuff and look busy may just be enough to secure the tiny roles we have.

Having said all these, being relentless would be an outstanding quality if the person is resourceful in the same time. Whether it is an engineering problem which is new to everyone or a balance point in resource allocation difficult to negotiate. Being resourceful does not come with total reliance on gut feeling, instinct, although they are very important when people actually know what they are talking about. The instinct or sudden flash of old memory somewhere around the corner might just be enough to give away the clue to the solution. Being resourceful normally also does not imply the individual has to be old enough to have tons of life time stories to tell people with. Sometimes it is about sensitivity. It always amazes me how much we can appreciate with every bits going on around us, if we really open up our sensors in the system.

Relentlessly resourceful individual will deliver, and will always be the direction where people fall on to when there is an urgent situation. If they have a calm personality or willing to spend conscious efforts to remain cool under panic situation, they will be the natural leader you need to look out for in the organisation. In fact, most time, what a bunch of skillful people need is not the answer to the question, but a path or a plan of attack. In reality, there is every opportunity we will become headless chicken and pass the hard nut to someone else's should, if we can find one.

Thursday, August 13, 2009

Wonderful comic

Jon said, "I found xkcd philosophy dangerously appealing", when I shared one comic with him today. The reason why I love about good comic is that they not only just make me laugh, but also tell a deep down truth or something taunting you inside for God knows how long. Anyway, this blog is about 137, from a guy also worked on robots previously.

Sometimes we say that life is too short that we have to follow our hearts and do what we like, enjoy what we do. It talks about dreams, one thing is too fuzzy for us nowadays, also superficially remote. In Connor's second thesis, he talked about "there is no fault but what we make for ourselves". Randall challenged it more in his comic, "does the routine destroy our creativity or do we lose creativity and fall into the routine?"

How many times you would want to shout out loud, FTS!

Friday, August 07, 2009

The ultimate answer to everything in software - upgrade?!

In the last few days, I have been working on a very very nasty problem. To put this into context, I planned to apply one of the existing real time patches to Linux to give us hard real time scheduling performance as opposed to soft real time, which I have briefly explained the difference in my previous blog. I adopted Xenomai 2 to uClinux distribution for Blackfin BF5xx processor using Adeos patch supplied with uClinux2008R1.5 , which is running kernel 2.6.22. Anyway, the problem was, one of our existing GPIO (on I2C bus) driver stops working.

So, after a bit of digging in the code, it appears that the kernel has continuously been trapped within one of the interrupt handler registered by this driver. I should say in this case, the interrupt is in Linux domain. That means, since there is no handling done in Xenomai primary domain and this interrupt is passed by ipipe all the way to Linux non-real time kernel to handle. The problem is, it worked fine when we did not have real time patch in between. In this instance, google did not help either - no obvious answer. I decided to ask for help in Blackfin-uClinux forum and also Xenomai help as people are quite helpful there usually, if you are asking something which has been asked before or known issues. Since this seems to be an outlier, I did not get much out of it, until the author of the patch responded. In short, his answer was that the version I used is a legacy version (2.4.0), unless I upgrade to the latest and greatest xenomai version and uClinux distribution which does not use threaded IRQ anymore, I am pretty much on my own. Right, so first thought came to my mind is that I am busted. It is not a trivial task to port a heavily modified kernel distribution to another version, let alone Linux does not really maintain backward compatibility that well. Anyway, I was stuck between rock and hard place.

So I started to dig into the ipipe to see what is the difference. Unfortunately, the only thing I can see is a positive sign, where the interrupt has been much reliable triggered with smaller amount of latency. Bear this in mind, I have to bet my money on Blackfin implementation at this point. So I went back to check the hardware reference for the type of processor I am using and found out the following:

“When using either rising or falling edge-triggered interrupts, the interrupt condition must be cleared each time a corresponding interrupt is serviced by writing 0x01 to the appropriate bit in the GPIO clear register.”

Right now I have all the pieces of the puzzle. The problem was that the original driver code did not explicitly clear out the GPI pin we configured for interrupt edge triggering, relying on kernel peripheral to clear out resource its allocated after interrupt being served. With Xenomai patched, the interrupt comes quicker to the point before work to be finished (previous interrupt status to be cleared out), the next interrupt kicks in (passed by ipipe). Hence the kernel stuck in this particular ISR. Fix itself is easy enough, one line change to clear the port interrupt register.

What makes me think, however, is that how we normally deal with unknown issues in our system when it comes to software release. Of course there are times that to find out the root cause of A problem would be expensive, which could also become a major distraction from current development undertaking. Unfortunately, from my own experience, many organisations choose to take the altitude to offer system upgrade as a silver bullet when customer has legacy system upon which problems were reported and then pray for those problems to go away on the newer version of release. "Obviously, there are so much we do not know about this world, this problem might just be one of these we could not explain or completely out of our leads, or not worth spending our efforts on. I can to certain degree try to justify if it was for the last reason as we all know sometimes we need to make a balanced decision about where precious resource (as always) should be spent on.

As you can clearly see, the suggestion I was offered as threaded IRQ is a complete wrong shot. Unfortunately, we do blind shot a lot. Question is, have you done this before?

Monday, August 03, 2009

Cul-de-sac

Tough economic situation leads people, organisations, governments to react. In my opinion, the danger is not only in the lack of expenditure or confidence in spending which causes the lower sales turnover with less gross profit, as far as a business is concerned, the much more recessive while fatal impact is that this is the time when death spiral will normally be triggered.

Death spiral is originally a finance term where convertible financing used to fund primarily small companies is used against it in the marketplace to cause the company’s stock to fall dramatically and can lead to the company’s ultimate downfall. In simpler terms, it is a vicious loop which eventually leads to the termination of a business due to initial actions taken to rectify negative visible results. What is contradicting here is that often these initial actions seem to be the correct and logic (re-)actions (if not the only choice) to take to improve the immediate circumstances.

Scenario A. The director of company Random decides to cut down cost in production, R&D, customer support, marketing to balance the poor order book and lower income. The initiative is to sustain the business with the means of running it at a lower level cost to accommodate its profit level. It seems to be the right and intuitive thing to do. However, if we take it to another level it is not too difficult to see the problem. To shrink the size of engineering, production, customer support, the direct effect is the output of product delivery, which will in turn immediately affect the future sales and profits, especially when existing products already prove to fail to attract orders under difficult economic situation. Cut down marketing budget will also lower the baseline of potential sales. Hence the sales will come down again to another level. The only hope here is to hit the balance point where the income meets outgoing, which is only possible when there were a significant portion of budget built in in the first place for expansion. Otherwise, such balance is highly unlikely to be achieved. On the other hand, reducing operational cost is absolutely necessary to any organisations to survive under difficult market situations. To achieve so, what we should really focus instead of budget or redundancy strategy, is the 'mis-spent' cost. Cost is not equivalent to budget. In another word, this type of cost should not exist if those what involved are running in a healthy condition, such as waste, overhead, duplicate efforts etc.

Scenario B. To speed up software project development, engineers are told to defer the test code to later stage until project momentum is caught up and tight deadline is met. This looks like that engineers are now spend all their working hours writing production code which builds up the confidence in stake holders. However, this assuring false confidence can be easily broken or to be used as a false baseline of team's capability to deliver in terms of time-quality-cost. This will come back to drive the development team in another circle where shortcuts is almost a necessity to hit any promises. Solution here is simple - write your test from the start, or formally known as test driven development.

In both scenarios, the intuitive choices of reaction are doomed to lead organisations to its end, cul-de-sac, regardless the good intentions in the beginning. Conscious effort in such decision making process is required, rather than following 'gut feeling'. Sometimes, choose the opposite direction to the rest of crowd could be an easier way.

Sunday, August 02, 2009

Permission of communication

An excellent presentation from Seth on why we could not afford to leave marketing to marketing department alone. To build marketing into the product is the essence of how to succeed in a cluttered commodity environment. The end customers can and will not only make their own decision on the choice of products to spend their money on, but also represent a more and more important means of marketing, or influence. This concept itself is not new. We have old saying such as have one's ear.

What is interesting to me is that Seth has revealed a fact which is difficult to argue and impossible to ignore. A successful product and organisation these days starts from being remarkable. Where they start smart marketing strategy - to connect people, or to give people something to talk about. It is a remarkable experience for myself to witness big boys like Apple, Google, Amazon win over their users again and again. Where did they start from? Being remarkable, telling a story, creating tribes, establishing customer loyalty, helping to build up lifestyle, utilising word of mouth. Have you ever seen a 30 seconds, million dollars advertisement on TV during your Prison break session to ask you to search on Google? No you won't. Because that is not the choice of yours. An interruptive marketing does not gain the permission to get to your ear and eye, instead they force you to listen, to see. That's why many companies spend astronomical amount of money in developing good ads on traditional media channels which can access their end customers, hopefully in a not too intrusive way. 20 years back, we do not really have many choices, which nowadays we have much to enjoy and much to be distracted by. All these seems to be that they started from the wrong end - to try to pay off the debt created due to the selection of intrusive marketing channel, a significant extra amount of efforts is needed to counter balance and hopefully win some customers over with. The other problem is, the intrusive marketing approach is really shoot-in-the-dark. Even with carefully statistic sampling and measurement analysis, there is little to be learnt in terms of your ROI on expensive marketing budget, not to mention suggestions for next round marketing investment.

Now, 180 degree U turn. What happen if we make a different kind of start - building in story points for people to talk about, to be proud of, to be 'sneezing' in Seth's term?

If you earn trust from someone, you have his/her/its permission to feed information through this channel to and from the other end. This type of establishment is now often seen in the format of following someone on Twitter, subscribing to someone's rss feed or podcast, reading blogs etc. This level of trust has ensured the information receiver the quality, relevance and standard. More importantly, it opens up a mutual trustworthy channel to discuss, debate and tell stories. It does not have the problem of turning negativity around from the start. There are many companies in Seth's example list who have managed to establish this connection with their customer after first purchase or usage. They have built in something remarkable, different in their products or services, empowering their customer to take something away other than the pure functionalities, allowing a way to assist their customer either express interest or identities through the usage of the product and influence people around them via trusted channels.

I have seen some companies picking up the first part of hint but failed to recognise the starting point. Marketing departments create organisational facebook account, company twitter, linkedin group hope to attract and build up effective marketing channels. However, as Seth said with which I could not agree more, you have to start from being remarkable first - building something which is unique and significant, something can win your first date by. Then you kick off the circle, rather than the other way round.