professional

Apache Hadoop

Apache Hadoop presentation which I gave at my workplace.  Its a beginners guide for developers.

Apache Hadoop

Challenges of scaling Social Networks

Read this nice article in High Scalability about scaling Social Networks.

Let’s say you have 200 friends. When you hit your Facebook account it has to go gather the status of all 200 of your friends at the same time so you can see what’s new for them. That means 200 requests need to go out simultaneously, the replies need to be merged together, other services need to be contacted to get more details, and all this needs to be munged together and sent through PHP and a web server so you see your Facebook page in a reasonable amount of time. Oh my.

Being a developer in eBay, I can comment on its architecture only.   Clearly eBay’s current scale can be attributed to sharding. Wherein a table is distributed in many databases.  Data belonging to a particular user reside in one of the databases only.

Now if eBay decides to implement social features in its site it will have to rearchitect its whole datamodel.

Lets say eBay developes a Twitter style Live status page for the user’s friends, then with the current architecture it will have to query the status of each friend from different database.

Some day when free I will write a post about how ebay can implement Social features.

keyword metatag not used by google

This came as a big surprise to me. All the SEO schools have been teaching the importance of “keyword” html metatag.

I think this is a nice move by Google. It will curb the menace of keword stuffing. Now on I will not have to break my head to come up with wise keywords for my blog post. Will concentrate on the writing good stuff.

At least for Google’s web search results currently (September 2009), the answer is no. Google doesn’t use the “keywords” meta tag in our web search ranking. This video explains more, or see the questions below.

Read more about this at Google’s Webmasters Blog.

Making Ajax Searchable

Google has come up with a set of guidlines to make Ajax applications Searchable.  This was a major limitations for the Ajax applications.

While AJAX-based websites are popular with users, search engines traditionally are not able to access any of the content on them. The last time we checked, almost 70% of the websites we know about use JavaScript in some form or another. Of course, most of that JavaScript is not AJAX, but the better that search engines could crawl and index AJAX, the more that developers could add richer features to their websites and still show up in search engines.

Read more about the this at google webmasters blog.

They say the best design is the simplest design.

Google’s proposal seems rather complicated. If that is the case with a technical person like me not sure how is it going to fare with other folks.Plus this approach has to get the blessings of other search engine.

Whats missing is a reference implementation. Google should definitely make one so that we can understand this better.

Send email to Appengine

Now we can send email to appengine. This is really a cool feature. And greatly enhance the features that developers can use.

Incoming Email – Your App Engine app has been able to send email for some time … but now, with 1.2.6, your app can also receive email. After enabling mail as an inbound service (just like XMPP), users can email your application at whatever@yourappid.appspotmail.com. Inbound messages are converted to HTTP requests (again, just like XMPP) which you can receive via webhook handler. Docs for Python, Java.

Can’t wait to try this new feature.

Read more details at Google Appengine blog.

MapReduce

The computer was originally designed as a sequential processor. This notion has become ingrained in our minds. We are often bogged down by this constraint while developing algorithms.

Advancements in computers have brought in more and more parallelism but our algorithms have not started to embrace it. Map Reduce is a design paradigm which forces us to think parallel. The algorithms developed using this method as well suited for running on parallel computers.

Map Reduce Algorithms, as the name suggests, work in 2 steps. The first step called “Map” consists of tasks that can be done in parallel. This step generates the intermediate results which are then passed to the Reduce step. In the Reduce step these result is collated to generate the final results. The second step is the sequential part.

The current algorithms we have usually do the Map steps sequentially inside a loop and keep collating the results. These results are presented at the end of the loop.

As simple example, take the case of counting the no of words in a file.

        // Normal algorithm
        public int noOfWordsInFile(String fileName) 
        {
               filecount = 0;
               while(not eof)
               {
                    line = read line from file;
                    linecount = noOfWordsInLine(line);
                    filecount = linecount + filecount;
               }
               return filecount;
         }

        // Map Reduce algorithm
        public void map(String line)
        {
               lineCount = noOfWordsInLine(line);
               save lineCount
        }
        public void reduce(int linecount)
        {
               load fileCount;
               fileCount = fileCount + lineCount. 
               save fileCount;
        }

Update : Some folks though that this is my idea.. I have just rephrased what has been talked about already.

Find more details about at .. Mapreduce

GWT vs Java vs C

They say history repeats itself. Here is an instance where it happened twice.

The “C” Days

In the days of 8086 and MIPS there were different types of instruction sets for different processors. Due to this programs written in assembly language for one processor could not run on another.

The “C” approach came in as a gospel, where in the programmer can write code in one language and compile it to any instruction set. This revolutionized the computer industry in the 1970s-80s, and we saw millions of lines of code written during this time.

The “Java” Days

Then during the 1990s the same problem manifested in a different form. This time there was a different verity of Operating systems. The “c” code could not keep up with the different flavors of underlying OS operations and quickly writing ubiquitous code became a herculean task.

This is when smart folks at Sun rewrote history and invented Java. The concept was same, write code in one language then compile it such that it can be run on different platforms (OS). Though there is a slight variation, this time the compiled code runs over a Virtual machine which wraps the underlying platform. This virtual machine in turn takes care of platform variation.

The “GWT” Days

As we said earlier, History has knack of repeating itself. Now in 2000, our industry is moving towards the client server architecture. JavaScript is the leading technology for writing the rich clients these days. Many frameworks have sprung up which tries to make JavaScript development fast and easy. But again all these frameworks face the same issue as did java and c. JavaScript can run on different browser IE, FireFox, Google Chrome and many more. All these browsers have different implementations of JavaScript which cause a nightmare for developers.

Enter GWT (Google Web Toolkit). The concept is same again. The code is written in Java and its is compiled to Java Script. All the nitty-gritty’s related to browser is taken care of by the GWT compiler. This technology was released by Google around mid 2006 and it has come a long way since then. Its much stable and production ready. GWT is on its way to become a force similar to C and Java.

Conclusion

These are few designs which are so powerful that can alter the course of history. The compiler is one such thing. We have seen it revolutionize the IT industry time and again. This time it’s taken the form of GWT. As developers, it’s high time that we look upon this technology closely and prepare ourselves for the new future.

http://code.google.com/webtoolkit/

Favicon – Favorites Icon

Favicon (pronounced fav-eye-con) is short for ‘Favorites Icon’ or ‘Website Icon’. They are small images we see near the address bar on the browser.

Favicon

Almost all the professionally developed websites have a Favicon. They are the web counterpart of desktop Icon.

A webmaster can install a favicon in a website by placing a favicon.ico file in its root directory. Eg. www.google.com/favicon.ico or www.kwelbytes.com/favicon.ico 

When we hit a URL the browser automatically try to fetch the favicon from url/favicon.ico. If it finds one it displays it on the address bar, else it displays the default image.

There are lots of websites which provide free favicons.  eg  freefavicon. Rather then downloading a prebuild favicon on can also build a custom one. You do that by providing a custom image which is converted to the ïco” format. Many websites provide this as free service. This service is also called Favicon generator.  eg favicon.co.uk,. There are some site which help create a favicon from scratch eg. favicon.cc

Claiming my blog

MyBlogLog is a service provide by yahoo to maintain ones identity. Now a days we have some many internet application so many profiles. This site provides one unified place to manage all these.

MyBloglog was recently acquired by yahoo. And it started allowing users to link there yahoo account to there myblog account. I did some mistake and ended up deleting my original myblog account.

I created a new one.. and tried to register my blog but it said.. I have to claim it. As it is owned by another user. This is bad.. (Deleting my earlier account should have deleted my blog info too)

Any way myblog log allows 2 ways 2 claim the blog.. one via a link in post and another via a meta tag. I am using the link . and here it is ..

Will be deleting it once the verification is done.

Buying a domain name

Buying a Domain name can be tricky. It took me lot of searching and a bit of luck to get this domain and I love it.

entrepreneur.com has this nice article to help find a new domain name. Its worth the read.