The most exciting thing about this world is its ever changing quality.

Wednesday, November 23, 2005

Spam Filtering

I am recently interested in the spam filtering problem. Problems first:
It is a specific text recognition(TR) problem with the prominent inheritage as high-dimensional input space.
  • Cost-sensitive application while unbalanced and uncertain costs.
  • Dynamic, skwed and drifting problem, which can be explained by drifting distributions of un-/solicited emails, together with the time-dependent priors.

The representation method used in TR
Due to the unbalanced cost space with the changing prior, as Tom Fawcett argued that the fair and reasonable way to conduct the comparisons between different approaches can be extremly hard. Various measurement have been designed (See Yang et al.) to gauge the statistical difference among these approaches.

No comments: