Saturday, December 05, 2009

The Blade Itself

A flashy name for a novel I read a few months ago for Joe Abercrombie.

I am not a critic (and god send that I never become one), and I hate being harsh on writers, but the novel is mediocre at best. Weak plot, flat characters that are anything BUT original, no premise I could detect, and peculiarly, I couldn't (by reading the book) discover what blade the title was referring to. The book has all the typical elements and characters of the fantasy genre without any effort to dress them up or customize them.

But for some reason I went for the second part of the book!!!! ... And then the third!! No not some mysterious indiscernible reason, I know what it was. One character "Inquisitor Glokta" was so original, so brilliantly depicted, his dialogues and thoughts so intense and intriguing that all the flaws of the book can be set aside (if not forgiven). Whenever this character appears, the writing gets brilliant, and the scenes, the dialogue, and even the secondary characters get rich and enjoyable.

Inquisitor Glokta - a broken and deformed torturer - works under a loathsome head of the inquisition, drags confessions and information from prisoners. You have every cause to detest him, but you don't. Maybe because he is truthful, maybe because he has depth and he made the book worth reading, I don't know but for some reason Glokta gets your sympathy and intrigues you to follow him though his own quest. Up until the end of the third book of the trilogy. Indeed, even following him getting up a flight of stairs with his deformed and broken legs and listening to his internal ramblings is rewarding on its own.

Inquisitor Glokta alone and his thread in the plot are not enough to make the book a good book, but is enough to make it worth reading and to make you forgive the author and suspect some latent potential to write brilliant books in the future.

Persistent Dictionaries (a.k.a. disk backed key-value store)

I recently faced the situation where I needed to employ an on-disk dictionary that supports O(n) or O(log n) lookup complexity. The need for a dictionary (aka map, associated array, etc.) is clear enough and very common, I can hardly remember any piece of software I wrote that was completely free of one, but the requirement of this data structure to be on disk is the less frequent one. So I set out to look for solutions.

[for those who absolutely have to know why I needed it to be on disk, it was because my program was a web service that will get loaded in memory to satisfy a request, then dies, and I didn’t want to load the whole set of elements [stop word list for those that love details and yes, what I really really needed was a set data structure not a dictionary] every time. But really, there are many different scenarios where you would need such a thing. Imagine a really huge dictionary, … I mean REALLY huge. Imagine a dictionary that you want to share between different processes.]

I knew vaguely that the python community had several solution, and they do. They mainly have an interface that can do this backed up with various database options, usually, simple, small and embedded DBMSs like dbm or Berkley DB, and the DBMS is responsible for indexing and fast retrieval.

But I needed that in C#, or at least .net.

And I thought that I can do the same, I can use some embedded database (e.g. sqllite, BDB, or Microsoft’s SQL Server compact) and wrap it in a dictionary interface:

Dict. Add(key, value);

But I still thought that employing a DBMS is too much, like killing a bee with a bazooka, and introduces external dependencies.

So I looked some more …

I searched for things like: “persistent hashtable” , “persistent map” , “persistent data structures” , “file indexing” , etc.

… and I found this:

A java code sample that uses the filesystem itself as the indexing structure:

So Dict.Add(key, Value) will actually create a file (in a specified folder) with the name=key and the value written inside.
And Dict.Get(key) will simply ask the filesystem to open the file(key) and deserialize its contents.

Quite devious!!! I loved the idea. But it was a bit shabby, and it assumed that the filesystem is an efficient one that handles folders with thousands of files efficiently and guarantee fast lookup, which is probably the case with some filesystems (reiserFS perhaps? EXT3? ZFS?) but not with NTFS, at least I wasn’t confident enough.

What I needed was a disk-backed balanced tree! And surprisingly, when I searched for that I found exactly what I wanted:
A B-Plus tree implementation that support exactly the interface that I want, well, actually it supports the more elegant interface:

Dict[key] = value;
Value = Dict[key];

Turns out that the “disk-backed” is the right term to describe what I wanted, not “persistent”, for “persistent data structures” has a completely different meaning that has to do with mutability, as opposed to “data persistence” which means disk-backed 

Other notable libraries found in the way:

C5 collection library from the University of Copenhagen
Which has a persistent red-black tree implementation that almost fooled me.

Wintellect’s PowerCollections for .Net

Carl Sagan - Pale Blue Dot

Richard Feynman on doubt,uncertainty and religion

I am a tomb raider ...

I am a tomb raider, a grave robber, and I do it with great pride.

For hundreds upon hundreds of years our kings have devised techniques, architectural, mechanical and magical to prevent their tombs from being robbed. I derive my pride and satisfaction from unravelling these puzzles and smashing these barriers one after the other. Of course the loot is a welcome bonus, I don't just work for the fun of it. Anyone that might come upon these diaries (if anyone ever did) would think me crazy at best, or damned and sacrilegious at worst. But I stand proud and tall, for I killed no living soul, I robbed no food from the mouth of hungry children, and I didn't force thousands of poor peasants to toil from dawn till dusk for my personal glory. I live free, ... free of their dirty governments, free of their contrived religion and their nasty - always angry - gods.

But my start was not tall nor proud, I started as conscript in the glorious army of the great Pharaoh, and before that I was even lower, a street urchin that was shunned by the lowest and and pitied by the poorest. I lived most of my childhood on the streets and of the streets. And it was in those days that I learned how harsh the gods can be, and in my days as a soldier that I learned how vain our kings are.

Friday, December 04, 2009


الجزء الثاني من الملل

لم يكن يدري على ما أظن أني لم أكن أقصد السلطان محمد الظاهر، فلم يكن خبر وفاته إنتشر بعد حيث ابحرنا. كان هذا قبل أيام قلائل من وصولنا إلى ميناء الزرقاء في عاصمة السلطنة الألفية. ودعت البحارة على عجل وحزمت حقائبي وكتبي وأخذت أول مركبة متجهة ناحية القصر السلطاني. لم أنزل بالقصر ذاته وانما تركت حاجياتي ببيت لي على مقربةٍ من القصر وسرت راجلاً دون أنا أغير ملابسي أو اغسل عني أثار السفر حتى وقفت أمام بوابةٍ صغيرة بالجانب الخلفي من القصر، لم يلبث حارس البوابة أن رأني حتى فتح فاه أثر الدهشة. أمهلته إلى أن انقضت دهشته ثم فتح لي الباب مسرعاً كأنه أفاق من حلم مررت به مسرعاً ورباط على كتفه ولكنه لم ينطق بكلمة واحدة.
مررت من عدة ممرات شجرية حتى وصلت إلى الباب الخارجي لمطبخ القصر. دخلت ومررت بالمطبخ المزدحم بالناس والروائح دون أن يعيرني أحداً أي انتباه. قلما ينتبه الناس إلى دخيل إلا إذا كان يمشي كدخيل، أما أنا فقد كنت أمشي في المطبخ كأحد عماله كما كنت أمشي في الحديقة كى أحد أهم الضيوف. أظن بعضهم كان ليتعرف وجهي إن حاول.

وصلت إلى القاعة المركزية بالمستوى الأرضي ووجدتها خالية على غير العادة. سلكت أحد الدهاليز ومنه إلى دهليز فرعي أخر وطرقت خمس طرقات في توال سريع. فتح لي شيخ قصير إختبأ نصف وجهه خلف لحية بيضاء. وقفت مبتسماً كالابله لا أعرف ماذا أقول. "هأنذا قد جئت" قلت أخيراً بعد فترة صمت لا بأس بها. أخذ الشيخ شهيقا وزفره ثم تحركت يده في لمح البصر لتلطمني على وجهي لطمة لم اتوقعها و لم اتوقع قوتها.

"سحقاً لك، أين كنت كل هذه السنين؟ فيم غيابك وفيم رجوعك؟"

Suffix Tree Clustering

I recently stumbled upon this very interesting paper:

Web Document Clustering: A Feasibility Demonstration , by Zamir and Etzioni. PDF here

The paper presents an attractive technique for clustering documents using Suffix Trees, the merits of which can be summarized as:

  • The algorithm is incremental, meaning that documents can be added at any time and clusters can be re-evaluated accordingly
  • Does this in linear complexity O(n)
  • Deals with documents as strings, not as Bags of Words, which allows for accounting for context
  • The algorithm actually produces sequences of words that represent each cluster (with no extra cost)
  • Can produce clusters that are not mutually exclusive (fuzzy clusters)
  • and finally the authors claim it can work efficiently on search engine snippets

Suffix Trees are data structures that are very suitable for indexing text documents (for search purposes), and I think it very devious to exploit the same structure for clustering.

Two Technical Lectures

I try to watch a technical lecture or two every weekend, this week I actually stumbled upon two good lectures.

About succinct data structure, … how to index stuff without using much memory, … in other words, … compressing trees.

Spatial data structures for fast and exact nearest neighbor queries, in high dimensional spaces. I was originally interested in special data structures men ayam el physics engine wel collision detection in games, but they can also be used to speed-up some data clustering algorithms.

Dangerous Beans

“I thought, … there would be an island”, said Dangerous Beans (a rat) in a crestfallen and resigned tone, and I started to cry, weep and howl openly. I was driving to the CMIC Maadi office. Listening to an audio-book, written by Terry Pratchett. In less than ten minutes the book (The Amazing Maurice and His Educated Rodents) made me laugh again, an effect only Sir. Charles Spencer Chaplin achieved before. And in my opinion, one of the marks of true craft and art.

“I thought there would be an island” reverberated in my heart as the stood for e very forlorn dream, every dying hope, and every fallen idealist. I wept, not only for Dangerous Beans, I wept for the whole world (and obviously myself before all).

Steven Spielberg usually gets such an effect with his climax scenes at or near the end of the movie, (e.g. Schindler’s List, E.T., and Empire of the Sun) but there is as well another way to touch the audience, another form of high art, the kind that makes you on the brink of tears from the beginning to the end of the work, without actually having to have a climax, or needing to make you cry. Examples of this latter are “The Way we Were”, “Slumdog Millionair” and the book “Doublestar” for Robert. A. Heinlein. In “Slumdog Millionair” I literally had tears in my eyes for the whole movie. And in “The Way we Were” I felt a state of elation/heartache the whole length of the 2 hours+ movie. The state was not bound to a certain scene, but to the theme of the movie. Robert Heinlein is a totally different story, .. better saved for later.