Apache Hadoop

Apache Hadoop presentation which I gave at my workplace.  Its a beginners guide for developers.

Apache Hadoop

Theory of God

The idea of this blog post has been running around in my mind for quite some time.  In this blog post I try to give a logical explanation to “God”.  I will try my best not to sound like the jogi baba (spiritual guru) giving a lecture. Note that I am not a atheist and strongly believe in the existence of a Supreme power. Here it goes, do let me know your comments…

First let us try to answer this simple question.  Who/Where/What  is God?

In my humble opinion, I think god is someone who can solve “ANY” problem. I would like to stress on the word again “ANY”.

Lets keep this thought aside for a while. We all face problems in our day to day life. Some are domestic, some professional and so on. That brings up the next question. What do we need to solve a problem?

The most important thing required to solve a problem is INFORMATION. With the proper information we can solve “ANY” problem. A person who know Maths, can easily keep accounts. A person knowing all the languages on earth can talk to anyone.

For the discussion so far we can conclude the following

“God is someone who has all the Information and hence can solve any problem”

With this thought lets move forward. Mere mortals spend their life engrossed in their day to day problems.  In the process they gather information and become adapt in solving their problems. For example the Accountant  becomes a expert in statistics, the musician becomes the Mozart. The point to note here is that, one can become an expert in only one or two subjects. The expert in one subject might not be adapt in solving problems related to another subjects. The following is a clear classification.

God has all the information

Expert (guru) has lots of information on one or more subject.

General public has some information about many subjects. :-)

The experts help the us solve our day to day problems. The Math teacher teaches us accounts. The Gym Trainer helps us keep fit and so on. All these experts are giving us more and more information. In a way they are guiding us towards god. Every great religion has a Guru, a central figure who shapes up the group of people. The central person teaches the common man how to lead a fulfilling life.  They give them more and more information which brings them closer to god.

Trying to reinterpret the statements made so far, In my humble opinion, I think that noone can actually reach god. We can actually treat the teacher to be god, who shows us the way to god. In other words who gives us information.

As the ancient Hindu Scripture point out. matha pitha guru deivam .  Which means, Mother, Father and Guru are Gods. Mother and Father are our first teachers. Guru represents all the teachers, Living and non living.  Books, Internet, TV.  Everything around us can give us some or the other information. And hence the whole world is god.

Now comes the catch. You can call it catch 22. Here it goes.

One also has the information within oneself which can help the one in solving ones own problem. In other words One is also ones own God. In fact one is ones most important source of information, as one can guide oneself to the correct teacher to give the one the proper information to get the one out of the current trouble. If one chooses the wrong teacher the one might end up in more trouble. So choose your teachers wisely, and believe in them. They will take good care of you.

I hope you are convinced with the explanation. I know that some of the points I made in this blog post are inspired by Hindu religious teachings.  The most probable reason why I came up with this line of thinking might be because I am a Hindu myself.  There can be alternate explanations too and I am totally open to them.

An interesting point to note is that, this blog post validates itself.  My teachers (Hindu religious Scriptures) give me information to solve the problem which we originally set out to answer. :-)

Challenges of scaling Social Networks

Read this nice article in High Scalability about scaling Social Networks.

Let’s say you have 200 friends. When you hit your Facebook account it has to go gather the status of all 200 of your friends at the same time so you can see what’s new for them. That means 200 requests need to go out simultaneously, the replies need to be merged together, other services need to be contacted to get more details, and all this needs to be munged together and sent through PHP and a web server so you see your Facebook page in a reasonable amount of time. Oh my.

Being a developer in eBay, I can comment on its architecture only.   Clearly eBay’s current scale can be attributed to sharding. Wherein a table is distributed in many databases.  Data belonging to a particular user reside in one of the databases only.

Now if eBay decides to implement social features in its site it will have to rearchitect its whole datamodel.

Lets say eBay developes a Twitter style Live status page for the user’s friends, then with the current architecture it will have to query the status of each friend from different database.

Some day when free I will write a post about how ebay can implement Social features.

keyword metatag not used by google

This came as a big surprise to me. All the SEO schools have been teaching the importance of “keyword” html metatag.

I think this is a nice move by Google. It will curb the menace of keword stuffing. Now on I will not have to break my head to come up with wise keywords for my blog post. Will concentrate on the writing good stuff.

At least for Google’s web search results currently (September 2009), the answer is no. Google doesn’t use the “keywords” meta tag in our web search ranking. This video explains more, or see the questions below.

Read more about this at Google’s Webmasters Blog.

New Age Ramayana.

Brilliant stuff by krishashok.

Making Ajax Searchable

Google has come up with a set of guidlines to make Ajax applications Searchable.  This was a major limitations for the Ajax applications.

While AJAX-based websites are popular with users, search engines traditionally are not able to access any of the content on them. The last time we checked, almost 70% of the websites we know about use JavaScript in some form or another. Of course, most of that JavaScript is not AJAX, but the better that search engines could crawl and index AJAX, the more that developers could add richer features to their websites and still show up in search engines.

Read more about the this at google webmasters blog.

They say the best design is the simplest design.

Google’s proposal seems rather complicated. If that is the case with a technical person like me not sure how is it going to fare with other folks.Plus this approach has to get the blessings of other search engine.

Whats missing is a reference implementation. Google should definitely make one so that we can understand this better.

Send email to Appengine

Now we can send email to appengine. This is really a cool feature. And greatly enhance the features that developers can use.

Incoming Email – Your App Engine app has been able to send email for some time … but now, with 1.2.6, your app can also receive email. After enabling mail as an inbound service (just like XMPP), users can email your application at whatever@yourappid.appspotmail.com. Inbound messages are converted to HTTP requests (again, just like XMPP) which you can receive via webhook handler. Docs for Python, Java.

Can’t wait to try this new feature.

Read more details at Google Appengine blog.

MapReduce

The computer was originally designed as a sequential processor. This notion has become ingrained in our minds. We are often bogged down by this constraint while developing algorithms.

Advancements in computers have brought in more and more parallelism but our algorithms have not started to embrace it. Map Reduce is a design paradigm which forces us to think parallel. The algorithms developed using this method as well suited for running on parallel computers.

Map Reduce Algorithms, as the name suggests, work in 2 steps. The first step called “Map” consists of tasks that can be done in parallel. This step generates the intermediate results which are then passed to the Reduce step. In the Reduce step these result is collated to generate the final results. The second step is the sequential part.

The current algorithms we have usually do the Map steps sequentially inside a loop and keep collating the results. These results are presented at the end of the loop.

As simple example, take the case of counting the no of words in a file.

        // Normal algorithm
        public int noOfWordsInFile(String fileName) 
        {
               filecount = 0;
               while(not eof)
               {
                    line = read line from file;
                    linecount = noOfWordsInLine(line);
                    filecount = linecount + filecount;
               }
               return filecount;
         }

        // Map Reduce algorithm
        public void map(String line)
        {
               lineCount = noOfWordsInLine(line);
               save lineCount
        }
        public void reduce(int linecount)
        {
               load fileCount;
               fileCount = fileCount + lineCount. 
               save fileCount;
        }

Update : Some folks though that this is my idea.. I have just rephrased what has been talked about already.

Find more details about at .. Mapreduce

GWT vs Java vs C

They say history repeats itself. Here is an instance where it happened twice.

The “C” Days

In the days of 8086 and MIPS there were different types of instruction sets for different processors. Due to this programs written in assembly language for one processor could not run on another.

The “C” approach came in as a gospel, where in the programmer can write code in one language and compile it to any instruction set. This revolutionized the computer industry in the 1970s-80s, and we saw millions of lines of code written during this time.

The “Java” Days

Then during the 1990s the same problem manifested in a different form. This time there was a different verity of Operating systems. The “c” code could not keep up with the different flavors of underlying OS operations and quickly writing ubiquitous code became a herculean task.

This is when smart folks at Sun rewrote history and invented Java. The concept was same, write code in one language then compile it such that it can be run on different platforms (OS). Though there is a slight variation, this time the compiled code runs over a Virtual machine which wraps the underlying platform. This virtual machine in turn takes care of platform variation.

The “GWT” Days

As we said earlier, History has knack of repeating itself. Now in 2000, our industry is moving towards the client server architecture. JavaScript is the leading technology for writing the rich clients these days. Many frameworks have sprung up which tries to make JavaScript development fast and easy. But again all these frameworks face the same issue as did java and c. JavaScript can run on different browser IE, FireFox, Google Chrome and many more. All these browsers have different implementations of JavaScript which cause a nightmare for developers.

Enter GWT (Google Web Toolkit). The concept is same again. The code is written in Java and its is compiled to Java Script. All the nitty-gritty’s related to browser is taken care of by the GWT compiler. This technology was released by Google around mid 2006 and it has come a long way since then. Its much stable and production ready. GWT is on its way to become a force similar to C and Java.

Conclusion

These are few designs which are so powerful that can alter the course of history. The compiler is one such thing. We have seen it revolutionize the IT industry time and again. This time it’s taken the form of GWT. As developers, it’s high time that we look upon this technology closely and prepare ourselves for the new future.

http://code.google.com/webtoolkit/

Rules While Resigning

Rule 1) Don’t leave a company, try moving to a different team in the same company.

It takes time to ramp up and build contacts in a company. Leaving a company will undo all the hard work. And the same process has to be repeated again.

Rule 2) If decided to Leave, then leave as early as possible.

No point in waiting. It only adds to the stress. Go for the first offer you get. Just make sure you will get sufficient time in the new company to keep looking for new opportunities. If you like it be there else move on. Companies now a day’s accept three months of jobless period.

Rule 3) While resigning leave as quietly as possible.

The more fuss you make, the more problems it will create in the long term.  Always try to retain the old contacts, you never know when you will cross paths again and in what capacity.

Rule 4) Referrals are the best way to get a new job.

A friend inside the company is always of great help. They have the inside knowledge, and can help you get a good deal.

Rule 5) Try getting into a company with similar work culture.

In other words, if you are in a product based MNC then try getting into another product based MNC. Changing work culture can be very stressful and takes a long time to adjust.

Rule 6) If possible take a break.

A nice little vacation can be really rejuvenating before taking up the new challenge.