Storage Array Vendors: Consolidation Time?

I could have titled the article as “All Quiet on the Storage Vendor Front.” It has indeed been very quiet the past few months. The main reason according to me is that lot of consolidation happening on the products front. The battle lines have been clearly drawn. Each of the major vendors is preparing for the battle ahead, sharpening their weapons and adding more potent weapons to their armory.  This metaphor doesn’t hold in the strict sense of the word because in the market the battle never stops. So I should actually be saying that the foot soldiers are fighting it out in the field, the headquarters back home is developing those bazookas which will blow out their opposition and break down customer resistance.

What are the companies working on? The trends of a year or two back are now necessities of life. Snapshots, Thin Provisioning, Deduplication are taken for granted . I don’t think there is any secondary storage device which does not offer compression. And no major array vendoris without Thin Provisioning in his array.  Storage efficiency in form of Dedupe / Compression appears in primary storage as well. Usage of SSDs has percolated and all arrays have started providing SSD option either as top tier storage or as a high performing cache.

The preparation for future according to me is in sectors like Scale Out NAS, Integration with VMware and Cloud play. This is what most companies are doing. Given that cloud will need large amount of storage and virtualization, it is  easy to see why better storage performance with respect to VMware is needed. As the cloud grows, the storage has to scale. Scaling  horizontally through scale out solutions is preferred to vertical scaling. All major storage vendors have a scale out solution in place. The recent news was Hitachi acquiring BlueArc, a company specializing in Scale Out NAS. Hitachi and BlueArc used to work together earlier. EMC has Islion, NetApp has its own scale out solution, HP has IBRIX, IBM has SONAS and now Hitachi has BlueArc. (The news today was that Red Hat has bought Gluster for $136 million. As more news seeps in, we will know what Red Hat is planning to do with Gluster. )

Trying to join this group of senior storage vendors is Dell. The acquisition of EqualLogic has given them leadership in the iSCSI space. They have Exanet, which is Scalable NAS. They also bought Compellent (storage array) and Ocarina for Datadeduplication. Everyone is watching with interest the Dell strategy as they try making inroads into the Enterprise. In short now, the big players have their NAS, SAN, Unified storage and Scale Out solutions in place.

Integration with VMware is another  area where every vendor is concentrating on.  Performance of storage is a major issue of server virtualization. The CPUs do a good job in running VMs but when all these VMs are accessing the same array, performance gets impacted. This is because the hypervisor does a lot of activities related to storage. Hypervisor doing storage work is not an optimal solution since many of the arrays have the intelligence to perform these activities, like say, zeroing out free blocks.  VMware came up with a set of APIs (vAAI: VMware APIs for Array Integration) which will allow to offload some of the storage activities on to the array.  From what I understand, this will be achieved by the arrays supporting a set of SCSI-3 commands like block copy etc. While many arrays claim integration with VMware, you need to check if they are supporting these APIs. This is because VMware integration is claimed even if the array just supports only vMotion. Here is an article which tries to cut through the FUD with respect to VMWare integration. Read the Dot Hill article.

As Server virtualization makes inroads into the Enterprise, the performance of the storage array vis-a-vis VMware will become very important. (I keep mentioning VMware here because they are the dominant vendor in this space. This will apply to other hypervisors like Hyper-V, Xen etc, as well.) Similarly, performance of the array in a virtualized server environment and the ability of the arrays to scale out will be important considerations for the cloud. That’s why you see lot of effort going on from  array vendors and server virtualization vendors in ensuring that storage arrays  are closely integrated with server virtualization.

As they enter into a era of Server virtualization and Cloud, all the major players have the products they need to build good solutions for the Enterprises. One thing I notice is that almost all vendors have lot of different products in their portfolio. There is an ongoing effort going to consolidate the portfolio. It will interesting to observe how the vendors will use their products to build the best solution for the customer.

On a different note: If you Bangalore based, Storage and/or Linux kernel expert/ developer, I have some exciting startup opportunities for you. If interested, contact me at yagnavalky at gmail dot com.

Dealing with enormous data

I wasn’t aware of the company called ‘Greenplum’ until EMC bought it!! I became interested in it when analysts were mentioning that ‘Netezza’ would be bought by IBM to counter this movie. I was interested because I had a friend who worked in ‘Netezza’. So I wanted to find out what this whole thing was about. I checked with a friend, who knows stuff in this area. And this is what he replied. ” The key thing is Netezza, Teradata, Greenplum, Vertica are all designed from the ground up for data warehousing kind of workloads. Oracle and DB2 started as OLTP (Online Transaction Processing) systems and then they tried to do Datawarehousing also using the same server code. That does not work. Datawarehousing has a very different kind of characteristic. Loads are bulk loads. Insert / Update / Deletes are few and it is very Select heavy. All you do is analytics. The selects usually involve very complex queries often running into GBs in size, generated automatically by front end analytics tools. It touches massive amounts of data in the range of terabytes to petabytes. OLTP on the other hand has all of Select / Insert / Update / Delete. Typical example is air line reservation. The volume of data is not that big at all. ” That made sense. Later IBM bought Netezza and HP bought Vertica, another similar company.

So the whole thing was about how you searched for patterns and such in massive amounts of data. Unlike the OLTP data, where there is some data which is current and important, in the analytics scenario, all data is important. There is no irrelevant data as Jim McDonald says in his very nice blog post at XIOTech. This is a very nice post giving a good perspective the challenges faced when you have to access huge amount of data.  He talks about Big Data. I am not sure if there is a common agreement on what ‘Big Data’ means but this Wikibon article can be your starting point in understanding what Enterprise Big Data is all about.

As data grows at amazing speed, neither the processor nor the disk technology can keep up to that pace. So scaling up a product to meet the needs to data growth can only go so far. It is inevitable that data access happen in parallel if you want to deal with larger and larger data sets. The current product trends as well as acquisition trends show that all companies understand this problem and are responding to it. NetApp have come up with their clustered NAS in Data Ontap 8.0  This allows for aggregation of multiple nodes and uses a global namespace. (Looks like there is some confusion regarding the term global namespace since Isilon and SONAS have interpretations that are different from NetApp. You may want to read Martin Glasborrow’s (Storagebod) post which talks about this.)  The data sheet for Clustered Mode Data Ontap is available here. (pdf file)

While NetApp must have developed their our clustered mode Scale Out NAS based on their Bycast buy last year Spinnaker acquisition earlier (thanks to Dustin for pointing my error) , EMC went and bought Isilon, which again was a company dealing with ScaleOut NAS. Infact EMC paid $2.25b to get this company. So you can understand what EMC feels about the potential of Scale Out NAS.  HP in 2009 had acquired IBRIX, another company dealing with Scale Out NAS. IBM has its own Scale Out NAS, which is appropriately labeled, SONAS!!

All of these use a global namespace. What exactly is a global namespace and more importantly, what exactly is Scale Out NAS and how does it work? According to the SONAS datasheet:

-Access your data in a single global namespace allowing all users a single, logical view of files through a single drive letter such as a Z drive.
– Offers internal (SAS, Nearline SAS) and external (Tape) storage pools. Automated file placement and file migration based on policies. It can store and retrieve any file data in/out of any pool transparently and quickly without any administrator involvement.

Scale Out NAS technical details require an extensive writeup, which I will do in a future post. What is important is that all the main storage vendors have a Scale Out NAS solution in their portfolio.

An unexpected, for many, acquisition was that of LSI’s Engenio by NetApp. The reason for being surprised was that NetApp’s message all along has been that of Unified Storage and everyone thought that NetApp would only go with the Unified Storage way always. (Infact there have been blogs critical of NetApp, calling it an one product company. Now everyone was surprised and started asking, “Why are you getting more products. Your messaging will be lost).  LSI’s Engenio is a pure block play and people were interested in knowing why NetApp acquired Engenio and how it would affect their message. Dave Hitz, in his characteristic clear style replied to these concerns / accusations in his blog. In his blog post he says, “The observation is that, while many customers and workloads do require advanced data management, some need “big bandwidth” without the fancy features. For them, the best solution is a very fast RAID array with great price/performance. Perfect for Engenio! Two immediate opportunities are Full Motion Video (FMV) and Digital Video Surveillance (DVS), and over time we believe there will be more.” Here we see NetApp targeting a different type of workload and understanding that no fancy features like Snapshot etc are required here. All that is required here is bandwidth. In other words, all companies are now trying to get solutions which deal with different types of workloads. Hence you see pure block play, Datawarehousing solutions and Scale Out NAS.

So what is the moral all this rambling? Well, the moral is clear. You better start understanding how big data is being dealt with. That is the future if you are into Storage Infrastructure. Your concepts of RAID will not suffice as data will not be distributed across disks in one single array but may be striped across multiple arrays. The clustered storage solutions may become the de facto way of installing storage. And it may happen faster than you think. So go read up more about these technologies. It will help you in the long run.

Talk to me intelligently

When you teach, you clarify doubts of the participants. While you are clarifying the doubts, you start having your own doubts, which make you go deeper into the subject. And the best way to clear deep doubts is to read a good book on the related subject. Internet is good for some quick and dirty research but if you want to do some serious reading you better get hold of a good book.

My aim was to know more about the interaction between the code you write and the processor. Essentially I needed a book which explained to a software engineering student some of the electronics and computer organization stuff. With this in mind I was browsing the bookshelves in Landmark, Chennai when I chanced about a book titled, “Write Great Code: Vol.1 – Understanding the machine.” Written by Randall Hyde. I quickly flipped through and since it had most of what I was looking for, I bought it.

Generally you can bifurcate the technical books in two categories. One is what is written for the student of that subject. This could be a text book or something close to a text. Here it is taken for granted that the person reading has an idea of what he / she is getting into. The other category is technical books written for people who are not students of that subject, but would like to know more on that subject. It is in the second category that I have problems. Basically when a technical subject is being explained to a person who is not involved in that subject, the author assumes that the person reading is absolutely dumb!! It is almost as if saying that if you are not a student of this subject, you ought to be dumb!! I am OK with book like “.. for Dummies” series. Atleast they categorically state who their audience are.  Whereas many of the other books don’t state this assumption and can get on your nerves when you read them, for they start explaining to you at 2+2 is equal to 4. Well, not exactly that, but you get the drift, right?

So it was a pleasure to discover this book by Randall Hyde. As I said, this book is focused on a software engineer who wants to write high performance code. According to my friends, this is a breed which is slowly dying. One, because of project pressures people end up coding the fastest possible way and not the most efficient way. Two, with more and more languages giving you objects and high abstracted entities, your efficiency lies only in selecting the right templates / objects / whatever.  The book is focused on explaining to the reader about the underlying architecture of the machine and how your code can take advantage and become highly performant.

Starting from Binary Numbers, through Bit Operations, Character Representation, How Memory is Organized, the CPU Architecture, Instruction Set Architecture and Input / Output, Randall gives you a very nice view of the internals of a computer. A few things about this book impressed me. First, it talks to you intelligently. It assumes that you are a fairly intelligent person and not someone having an IQ of a caterpillar. Second, the writing style is very fluid. Third is the economy of words. It is possible to pack so much into a 400 page book because Randall doesn’t waste words. Reminds me of an Inorganic Chemistry text we had, authored by J.D.Lee, which had similar economy of words.

If you read the full book, you will end up understanding quite a bit of jargon which you have heard and probably used as well. Stuff like say Pipelining. You probably have a vague idea of what it means but this book makes it very clear. Similarly you would get a good idea about how memory is accessed, what are the instruction sets, what registers do what etc. It also has detailed chapters onI/O, Filesystems and Device Drivers.  It also tells you how compilers work. You can always say that these details are available in various text books and you would be right. But you will need to read a lot of textbooks to get all this knowledge. This is not a book which replaces the text book but rather a high level electronics view keeping the software programmer in mind. At end of each chapter, references to the relevant standard texts are given.

If you are someone interested in knowing the internals of a computer system to the extant of using that knowledge to you advantage while coding, this is the book for you. I would definitely recommend it to all computer science students. This is a very comprehensive book which talks to you intelligently and you will definitely benefit from it.

Recent interesting acquisitions in Storage Space

When there is great growth in an industry, you would expect the demand would to spur competition and we would expect the customer to have more choices and more vendors to procure from. I guess this works only upto a certain scale. Beyond which the opposite, consolidation of vendors,  happens. That is what I see happening in the Storage Industry  now. The demand for Storage is on the rise. Every company is showing wonderful results. Demands for newer technologies is also on the rise. In such a scenario, we are seeing lot of consolidation happening. So market growth leads to shrinking vendor base? I am sure there is some management theory explaining this phenomena. As to when consolidation happens in an industry etc.

These thoughts came to me when I look at the recent happenings in the Storage space. We saw Data Domain being bought by EMC last year. This year there two very major acquisitions. One was HP fighting off Dell in order to acquire 3Par Technologies. HP wanted an array like that of 3Par in their portfolio and went for it aggressively against Dell. It was a $2b + acquisition. 3Par has some nice technology and were quite well known for techniques like Thin Provisioning, Micro RAID, Wide Striping etc. There were getting noticed in the market and had a decent customer base. Everyone feels that this acquisition will help HP immensely in the Storage market.

The second acquisition which has a lot of people talking is that of EMC planning to acquire the Scale Out NAS vendor, Islion. This will also be a $2b + deal. From the comments I see, like HP with 3Par, this is also a buy to fill in a gap in EMC’s portfolio. The general opinion is that the current NAS product of EMC, Celerra, doesn’t scale up well and hence the need to buy a scale out NAS product. EMC was lacking a scale out NAS while the competition had their products. HP has both PolyServe and IBRIX. (Polyserve, btw, had lot of people from the erstwhile Sequent Computers and is based at Beaverton, Portland, Oregon. Some of whom I know), IBM has its Scale Out NAS (SONAS), NetApp has its own scale out product. So this product ensures EMC is also playing in this space.

The other interesting acquisition was IBM acquiring Storwize, a company involved in Primary Data Compression. Storwize had a compression appliance for NAS. This appliance would compress data before it was stored on disk. IBM after acquiring Storwize released a product called IBM Storwize v7000 Storage Array. The funny part was that this array had no Storwize technology in it!! It seems that IBM wants to brand its arrays as Storwize arrays and so only the name was used.

Other interesting acquisitions happened in the Database area. EMC acquired the company Greenplum, which is “massively parallel processing database platform” and IBM acquired the database company Netezza. Both these database companies were involved in building databases for high performance business analytics.

Most of these acquisitions happened keeping cloud in mind. Also on the back the mind of all traditional Storage companies is Oracle. Oracle now has Sun, Sun StorageTek, Virtual Iron and Exadata. And of course, they have their database. They do pose a serious threat in the Storage space. There was. for a brief, while a talk on whether they would acquire someone like NetApp to grow in the Storage space. You never know what will happen!!!

As I said in the beginning, while the Storage market is expanding, the vendor base is getting consolidated. Innovative startups and small companies with good track record are being gobbled up by the big players. So you eventually will end up with only the big guys in the fray.

Storage Books: An Indian Perspective

One question I regularly get asked by my students after I do my Storage 101 sessions is regarding the books they should be reading up to get more details about Storage Technologies. I thought it will be a good idea to write about the Storage books that are currently available in India and my impressions on them. This will help in two ways. People can get a detailed list of books available and they also get to know what each of these books could be helpful for them.

I would divide the people who attend my sessions in these categories: a) Engineers who will be involved in developmental or maintenance activities b) Engineers who will be involved in testing activities c) Storage / Systems / Network Administrators d) Those involved in system integration. Since each book caters to all these categories in some way or the other, I will try and make clear, which book is better suited for whom. The listing of the books is in no particular order.

Let’s start with the central premise on which learning is based. You can learn only if you know that you do not know!!! Hence the first book we will take up is:

1. “Storage Area Network for Dummies” by Chirstopher Poelker and Alex Nikitin. Publisher: Wiley India.

As the name indicates this book is about Storage Area Networks (SAN). I would recommend this book to all those who are just out of college and want to know what SAN is all about. This book has lot of implementation details, which is very useful for the Storage Administrators and the System Integrators. Lot of hardware stuff including types of Fibre cables, FC Switches, Arrays, HBAs etc are covered. There are nice chapters on how to setup a SAN including concepts like arbitrated loop, zoning, LUN masking etc. You can clearly see that the authors are people who have actually implemented SANs and they give tips about trouble shooting SAN and how to manage a SAN. Concepts like Dedupe, Replication are also explained. The language is simple and there are lot of diagrams to explain the concepts. In short they know you are a dummy and model the teaching accordingly!!
Availability: You should be able to get this book in almost any technical book shop
Cost: Rs. 399/-

2. Information Storage and Management. Edited by: G.Somasundaram and Alok Srivastava, EMC Education Services. Published by Wiley India

The scope of the book is vast. It covers various aspects of Storage Technologies. As you can expect, this being a book by EMC Education Services, the examples given to highlight any technology are based on EMC products. This is a good thing since it gives people a glimpse of how a particular technology has been implemented and has been productized. The book starts with the very basic unit of disk drives, proceeds to RAID, then to Arrays and on to DAS, NAS, SAN and CAS. It then introduces the concepts of Storage Virtualization, Data Protection, Disaster Recovery, Security etc. Most of the chapters are divided into two parts. In the first part, the technology is introduced and various components of the technology are discussed. The second part gives an idea of EMC product(s) that use the technology being discussed. Given the scope, I think the concepts are covered to a decent depth. This is a book for everyone in terms of understanding the complete Storage landscape. I would probably say this book is slightly tilted towards the development and testing engineers.

Availability: Available in most of the technical book stores

Cost: Rs. 599/-

(I heard that this book is used as the text book in colleges which have a tie-up with EMC. This book also helps students pass one of the basic EMC certifications it seems.)

3. Storage Area Network Essentials by Richard Barker and Paul Massiglia . Publisher: Originally Veritas, Wiley India in India

If the first book I mentioned was written with a ‘Nuts and Bolts’ approach and showing how to set a SAN, this book is aimed at Development, Maintenance and Testing engineers. Both the authors are from Veritas (now Symantec) but there is no Veritas specific stuff in the book. Lot of stuff is discussed which is needed for any development engineer / designer and architects. Things like Lock Managers, Fault tolerance, IO Balancing, Performance are dealt in detail. The book divides itself into three parts: Understanding Storage Networking, What’s in a SAN and SAN Implementations Strategies. If you are an engineer getting into Storage development or testing, grab this book. It will also help those who are already working on storage since the authors bring lot of experience to the table.

Availability: Available in all technical book shops

Cost: Rs. 449/-

4. Storage Networks Explained by Ulf Troppens, Rainer Erkens and Wolfang Muller (All from IBM Germany), Published by Wiley India

This is a translation from German. This is definitely written keeping the engineers in mind. Everything is explained at the conceptual level. If you are looking for ‘Nuts and Bolts’ descriptions, you will not find it here. The good thing about this book is that it explains the various protocols involved in Storage networks. The one limitation is that this book was published in 2004. Given that 5 years is like a lifetime in Storage world, some of the concepts have not caught up in the Storage world as expected. (Example: Infiniband and VIA. ) Having said that, if you are an engineers who wants to get a good grip on the various protocols and also understand some internal details, this book will be very helpful. There are also chapters dealing with SNIA Shared Model and SMI-S. It will be great if an update version of this book is published soon.

Availability: Available in most technical book shops

Cost: Rs.299/-

5. Backup & Recovery by W. Curtis Preston Publisher: O’Reilly

You can’t get a better person writing about Backup & Recovery than Curtis. Known in the industry as ‘Mr. Backup’, Curtis brings his considerable experience to the book. As the title indicates this book is about Backup and Recovery and if you are involved in this area, buy this book. In the book, Curtis first talks about the Open Source Backup Utilities that are available and explains how backup and restores are done using these utilities.  Infact that seems to be the main aim of the book as the sub title of the book is “Inexpensive Backup Solutions for Open Systems”. In the next segment he talks about the features expected / our requirements vis-a-vis the commercial backup utilities that are available. He discussed features like Snapshots, Dedupe, CDP etc in this section. Backup hardware is discussed in another chapter. The next section of the book is devoted to Bare Metal Recovery which covers Solaris, Linux, Windows, AIX and MacOS.  Backing up Databases form the next section and the final section is called Potpourri, which as the name suggests, discusses various miscellaneous stuff. Curtis Preston has a web site: www.backupcentral.com You can go to his blog from this site. Curtis is someone who doesn’t mince words and speaks out his mind clearly. You will find his blogs interesting, even if you are not into backup. Given the challenges and new techniques that are cropping up for VMware backup, I am hoping that a revised edition of this book will appear in the future covering this topic in detail as well.

Availability: You need to look out for this in the technical book stores. Sometimes they push this book into the Database shelf.

Cost: Rs.600/-

6. Storage Networks by  Robert Spalding Publisher: Tata McGraw Hill

Earlier this used to be the only book on Storage Networks that was available. Nowadays I don’t see as much in the book shelves as I see the ‘Dummies’ and Paul Massiglia’s book. This is a 2003 edition and hence lot of newer developments are not present in the book (like Deduplication for example.) This does give a good internal view of many components (like HBA etc) which will be useful for engineers to understand the basic building blocks involved in creating the SAN. It has nice diagrams and the explanations are good. So if you are an engineer, you can definitely check out this book. It will be quite useful for you.

Availability: Used to be widely available. Still available in many technical book shops

Cost: Rs. 500/-

7. Storage Networking Protocol Fundamentals by James Long Publisher: Cisco Press (Pearson Education in India)

This book has details about various protocols including FC, iSCSI, Parallel SCSI. The approach taken is a layered one, in the sense every networking layer as per OSI model is taken up and the Storage protocols applicable to that layer are discussed. For example, SCSI Parallel Interface, Ethernet and FC are the protocols discussed at the physical layer. Similarly Network, Transport, Session, Presentation and Application layers are dealt with. Appropriate mapping of the respective protocols to these layers is done and some details of the protocol are given. If you are working on protocols this will definitely be a good first book to read before you actually go and read up the standard. (A prospect not many relish, I would say!!!). Though published by Cisco Press, this is not a Cisco specific book.

Availability: Seeing lesser copies of this now. Check in the Cisco section of any technical book shop

Cost: Rs.435/-

There was a book titled “Building SANs with Brocade”, which was a Brocade Switch Specific book. I don’t think this is available nowadays. I haven’t seen it any bookshops. Also given that products keep evolving fast, how applicable this book would be for the latest Brocade product needs to be verified.  Some IBM Redbooks may be available and these are useful if you are working on IBM products. I have seen books on IBM SVC and IBM Data Protection Strategies. Ofcourse you can always check out IBM Redbooks on the IBM Redbooks Site.

The books I am going to list are not available in India but may be worth procuring for your company library if you are working in that particular area.

1. Fibre Channel Switched Fabric by Robert Kembel

2. Shared Data Clusters by Dilip M Ranade

3. Highly Available Storage for Windows Servers by Paul Massiglia

4. Storage Security by John Chirrilo

If you have read any other book which is a good reference for any Storage technology, please do leave a comment with the book name. That will benefit everyone.

One final tip before I sign off. While I have given the cost of the book, you should be able to get atleast 10% discount on the book cost in most of the stores. So don’t go buy in some big name book shop which does not give a discount. I buy mostly from Book Paradise or Sapna Book house in Jayanagar, Bangalore and I get discounts ranging from 10% all the way to 22%. So if you save some money based on this tip and want to share a part of that savings,  let me know and I will mail you my address :)

Why ‘Inception’ is a no-brainer for Storage folks

 

If you haven’t heard about Christopher Nolan’s ‘Inception’, you must be living in your own dream land. This was probably the most hyped up movie after ‘Avatar’ and definitely much more discussed than ‘Avatar’. Just in case you are one of those who hasn’t seen or heard about ‘Inception’, here is a gist of that movie. The movie plays as a dream in a dream in a dream format. There is a debate about whether the dream level is 5 or 6 but we will not go there. The premise of the movie is that there are certain ‘dream thieves’ who can penetrate anyone’s dream and can plant an idea. On waking up the person thinks it is his own idea. This is ‘Inception’.

Fans of Christopher Nolan make take issues with the title of this blog post. How is it that I am so casually saying that the ideas presented in ‘Inception’ are no brainer for Storage folks (and Virtualization folks) as well? Wasn’t Nolan’s movie one of the more complicated ones in recent times? Can I offer some proof to support my assertion? Well, here you go guys and pardon any bad puns along the way.

First, lets take the Virtualization angle. And let us take a simple case. We have an hypervisor on which an operating system runs. This is like the first level dream state. As I was trying to practice something on VMware ESX but did not have the hardware, I ran across an article which says that ESX can be loaded in a VMware Workstation!!! Now check this out: You first load the operating system, say some Windows OS, on top of it you load VMware Workstation. In this workstation, you load VMware ESX. On top of this hypervisor you can load your operating system. This could again be Windows!!! Unlike Nolan’s linear dream, we are in a circular dream four levels deep!! And if you want to add to the complexity, you can run a virtual appliance inside this Windows OS, which runs on the hypervisor, which is loaded on top of VMware workstation, which is loaded on Windows OS!!!! There you go. And according to the article I read, this is supposed to be a valid use case for testing and training!!! (Of course no one would want to use it in production.)

Let’s now shift to the Storage world and this dream in a dream becomes very commonplace. First, lets take the most basic unit, the LUN. This itself is a virtual entity which is carved out of multiple disks (configured as some RAID). Sometimes you may combine these LUNs to form a Meta LUN or you can present multiple LUNs to the server, which can combine them using Volume Manager. These combined LUNs can then then be split up again and assigned as single disks!! If you have a virtualization equipment in the network, that adds to the fun. You can have HDS USP or IBM SVC or NetApp N Series, for example, and these will take LUNs from heterogeneous storage, combine them, split them and present the modified LUNs to the server. Again a case of dream within a dream within a dream. These are not some theoretical use cases but rather daily use cases.

When you combine Virtualization with Storage, the whole thing goes haywire. You can have your basic LUNs on your arrays, these can be presented to a virtualization like USP, which can combine multiple LUNs and present it to the hypervisor for its datastore. The Hypervisor in turn uses this Datastore LUNs, carves up smaller disks from it and then presents it to the operating systems running on the hypervisor. Assume one of these Virtual Machines is running a virtual appliance, then the disks assigned to the operating system is further divided and a small portion is presented as a disk to the virtual appliance!! If you try explaining this to Nolan I am sure he will stop making movies with flashbacks or dreambacks and stick to linear storytelling!!

You can also argue that when the dream is on, the underlying reality may change and you may wake up at a different than when the dream started. That can happen here as well. Many of the virtualizing equipments like USP or EMC’s Rainfinity will move your LUN from one physical storage to another one without the user realizing that the underlying reality has changed. In short, Virtualization and Storage work most efficiently when the user is in a dream state and has no clue about the actual reality!!!

But… you say, Nolan’s movie was about sowing an idea. What idea is being sowed by Server and Storage Virtualization. According to me, it is the idea called ‘Cloud’. As in ‘Inception’, someone has managed to sow the idea so deep that every vendor thinks it is his / her own idea !!! Will ‘Cloud’ work like a dream or is it just a dream. I cannot answer yet as I am not sure if I am in a dream or not!! The protagonist Cobb explains in the movie, “You never know the beginning of a dream”. Since I am yet to find out the beginning of the ‘Cloud’ trend, I am unsure of myself!!! So, as far as the ‘Cloud’ is concerned, the totem is still spinning. In a few years we will know if the spinning stops. Till then….

Virtualization: Lessons Learnt

If you were thinking I am going to give you some great tips about virtualization, let me make it clear that the title of this post must be taken literally and not figuratively. I attended a VMWare 4 day course at GT Enterprises, Bangalore and this post is regarding what I really learnt in the course. The course was conducted well and the trainer was good.

I did know a bit about VMware, having installed the workstation and server versions earlier and working with them. I did not have an idea about VMWare ESX / ESXi and the course was a good place to start. Though the course is more aimed at the server / storage admin (it is called ‘Install, Configure and Manage’) and I was looking at it from an engineering perspective, it was nevertheless a good course to attend. Once you attend the course, you get a much more clearer understanding of what VMware is all about, how useful server virtualization is to the enterprise and the various innovations being made by VMware to improve the product.

One of the things I liked a lot was the focus on management. vCentre is a nice piece of work which allows you to manage things from a single place and this is definitely something a large enterprise would require. Otherwise it will be a nightmare managing so many virtual machines individually. Same goes for the distributed vswitch available in VMware. Another nice concept. We worked on only the Distributed vSwitch of VMware and not on the Cisco virtual switch. It would have been nice had I got an idea of how that is configured but I guess you need to attend a Cisco training for that. Asking it in VMware training would be too much.

vMotion was another feature which impressed me. The ease with which you can move a VM from one physical server to another is incredible. Ofcourse certain pre-requisites need to be met. vCenter is smart enough to tell you which VMs can move to which servers. Similarly Storage vMotion was also quite easy to use. Also got  a better idea about Storage for VMware, High Availability, Clustering etc.

Some of the features have their own limitations and going by the new release it is clear that VMware is fully aware of the limitations and is working on them. The vSphere 4.1 released very recently had some nice features which are primarily aimed at improving the Storage I/O efficiency. Two important aspects related to storage were:

vStorage API for Array Integration (VAAI)

Storage I/O control (SIOC)

VAAI provides API for Storage Array makers so that some of the storage operations like Full Copy can be performed at the array level rather than involving the server resources. SIOC is a way in which to ensure the right job gets the right priority as far as storage is concerned. Here are links to two articles which cover these two aspects:

VAAI article by Mark Farley (aka 3parfarley)

SIOC article at technodrone (You will see this link in Mark’s article as well)  In order o understand SIOC you need to know something about what shares are and how they are allocated in VMware. Even if you are not aware, you will still get a general idea of how the concept works by reading the article.

The vSphere 4.1 release will certainly help in faster and fairer storage access.

This training I took is mandatory if you are planning to take up a VCP certification. If you don’t have such plans and want to know more about VMware vSphere, you can check out Scott Lowe’s book, “Mastering VMware vSphere 4″. (Scott is a well respected blogger in the virtualization world. He was an independent blogger who later joined EMC. His blog at http://blog.scottlowe.org/ still provides a lot of information which is neutral (that is, not plugging for EMC) ) This is a well written book and covers a lot of ground. I would suggest this book even to those who have attended the course, since Scott provides lot of insights into many of the features and gives a good background on each topic. I am sure it will be of immense help to those taking up the VCP exam.(I have recently seen another book in the book stores which deals with vSphere 4 but I haven’t been able to read it yet.) Scott’s book is published by Wiley India and costs Rs.599/- (Actual cost would depend on how much discount you would get. I generally get anywhere between 17 to 25% discount at Book Paradise, Jayanagar, Bangalore. The quantum of discount depends on the number of books you buy.)

Before I end, let me also note that I recently gave a talk titled, “Server and Storage Virtualization: Their relevance to the Cloud” at Mindtree Consulting. This was an invited talk as a part of their internal initiative. The talk was well attended, the participants were interactive and the feedback was positive. The arrangements were well done. I enjoyed the session and the interaction.  Thanks to Rama Narayanaswamy, VP at Mindtree Consulting, who made this happen.

The shape of the cloud : Public or Private

Cloud over Kinner Kailash

” That wall has to be bigger if we have to use this pattern”, said my painter, “it will not be very effective on your wall”. This is something which happens to many of us all the time. With our painters, our carpenters, our architects and various other service providers. You want to do something but the person providing you the service has a comfort zone in which he / she wants to work. So rarely do you get exactly what you want. It will always be a compromise between your ideas and the ideas of the vendor. This is something which happens not only for individuals like us but also for large enterprises. You cannot execute everything on our own. You need to depend on the vendors. What vendors will say depends on their comfort zone, what products they have, what they want to seel and how much commission the sales guy is getting for selling a particular product. I guess anyone who has been in the industry long enough knows this. And these are exactly the factors which will influence the way the cloud would shape up eventually.

Last year there was a lot and lot of talk about the cloud, which is still continuing. Initially the talk seemed to be about one type of cloud, owned by a service providers like Amazon, Google etc, whose services all enterprises would avail of. Slowly it emerged that there two types of clouds. One, the public one. Which you see when you step out on the road. And the private one, which you see on your own ceiling. In other words, Public Cloud is the one which is run by a service provider and Private Cloud is the one which is run by your own IT department.

People may ask as to what Private Cloud means. After all has the IT department not been running the data center all along. How does the Private Cloud change things? I am not sure if the definition of Private Cloud has taken a concrete shape yet but here is my take. Private Cloud represents a huge change in thinking on the part of the IT department. You no longer need to buy things for each department separately or you don’t need to reserve resources for any one department / division or whatever is the unit of classification. The whole idea is to give resources when necessary, provision just the right amount of resources and give more resources when asked for. In essence the IT department owns all the resources as a huge pool, or cloud, and provisions as required internally. No more buying or provisioning resources for a particular division, which will be locked down for a long time. This would apply to all kinds of resources, the key among them being compute power and storage. Given the ‘give only as much needed’ philosophy and sharing of resources across the organization will definitely bring in optimal usage and definite cost savings.

Changing from one model to another one is not an easy task. So what are the likely challenges for such a movement? Since I haven’t handled big data centers, I cannot possibly give you all the scenarios, but these are most likely challenges that an organization would face when trying to build its Private Cloud:

– Change in the thinking pattern of the IT staff. They must forget how they procured and provisioned earlier. They need to think in terms of how to provision using the cloud paradigm

– Training the IT staff in newer technology areas like Virtualization and newer ways of provisioning which we will important in order to build a Private Cloud. They will need to design newer chargeback mechanisms as well

– The bigger challenge I see is how to move the current infrastructure into the Private Cloud. Big companies have tons of equipment and they are already provisioned and being used. How will you get all these into a common pool? No organization would want to build a Private Cloud by purely buying new equipment

– Another important aspect which will dictate on whether Private Cloud will be accepted within an organization or not will depend on the how the power structure would change is Private Cloud is implemented. We have seen more than once that many good ideas have been compromised due to this issue.

How will this Public / Private Cloud help the vendors. As I said earlier, what eventually will come up depends a lot on your vendor. As of now many vendors have started talking Private Clouds and this is understandable. Look at it this way. You spend a lot of time, energy and money building up relationships with your client. You are now in a position that the client trusts you and you know the client well enough even to influence their buying decisions. At this juncture, if you were to propose the Public Cloud idea, you are shooting yourself. The decision on which equipment to buy will then pass on to the service provider, with whom you may or may not have a great equation. If it is a Private Cloud, you can always show the benefit of the cloud to the customer without losing your influence or your orders!!

The way the Public / Private Cloud is being proposed is: Big Enterprises need a Private Cloud, Smaller Enterprises can use the Public Cloud. This again makes sense from the vendors point of view. Especially the big ones. Because running after small orders or small players is never something which big vendors want to do. In such cases, if the buying decision shifts away from the small players to a service provider, who will buy in large quantities, then it is easy for the big guys to target this service provider.

As I had said in the beginning of this post, what color I paint my walls depends a lot on my painters aesthetics as well. In the same way, the way cloud will evolve will depend on what the vendors feel would be beneficial to them in the long run. Based on this, we will be seeing a lot more talk on Private Clouds. When cloud becomes a reality, Public and Private Clouds will coexist.

You can read EMC’s Chuck Hollis take on private clouds here.  Here is a different take on the cloud and what it should or should not mean. As usual, Steve Duplessie does not mince words in this article on why the cloud will vapourise

The Nagging Bug Fix

Nothing makes you feel that you have conquered the whole world than when you solve a problem. When you finally understand what has gone wrong, provide a fix and it works like a dream, you are in seventh heaven (wherever that is.) Recently I had a session on troubleshooting and could see that all the participants had this type of ‘wow’ moments in their lives. All of them were Storage administrators and had many stories of troubleshooting to tell. I will relate those at a later day after obtaining the required permissions from them to put it on this blog. In the meanwhile, I want to relate a minor debug that I have been invovled in the last couple of days. I can’t say I solved the problem, though the problems seems to have disappeared. This is what I call the nagging bug fix. You know the problem is solved but you don’t know why!!

First, let me explain my setup. There is nothing much as far as the hardware setup goes. I use a Lenovo Thinkpad and I have Windows XP loaded on it. I use VMplayer with Ubuntu Linux virtual appliance running in it for all my Linux needs. I use three different email accounts. One for my personal mail, one for my technical subscriptions and reading technical blogs (thru google reader) and one ‘official’ mail for all official transcations.  Now, my ‘official’ mail id is my own domain while the personal and technical subscription mail ids are gmail based ids. I could have gone to two different service providers but was happy with the experience of gmail and google reader that I decided to have both mail ids based on gmail. Since the browsers use cookies and know that you are logged in, it is not possible to see the mails from two ids simulatneously from the same browser. So I use two browsers. One for each mail id. I generally use Firefox for my technical stuff and Explorer for my personal stuff. Nothing logical in this selection, just my quirk.

Sometime back I downloaded a trial version of VMware workstation in order to work with Celerra VSA (virtual appliance.) I had downloaded TweetDeck as well. I also downloaded Celerra appliance but before I could start testing it, I had some assignments come my way and the last couple of weeks were spent in these assigments. I started earnestly working on my system this week on and noticed that Firefox was hanging once in a while. This happens sometimes and so I killed it, restarted it and kept going. Then I noticed that it had stalled again after some time. Once again the same process was repeated and I continued my work. When it happened again the next day, I got frustrated and decided not to use Firefox but try using Chrome instead. I thought things were working fine but suddenly Explorer stopped. I was now feeling like a butcher, having to kill these browsers at regular intervals.

One of the first things you learn in Computer Debugging 101 is that many problems get solved if you reboot!! I am sure we will try a reboot even if we were to administer a super computer!! This now runs in our blood and I had to do this. So the reboot happened. Again I started Chrome and Explorer. After some time Chrome hung!! Reboot had not solved the problem but instead was confusing me!! Kill-Start happened but the problem kept on appearing and I could no longer pretend that this was not a problem.

The steps in Debugging go thus:

1. The problem will solve itself. Just look the other way
2. A reboot will solve the problem. Time to switch off
3. Reload the software and things will be fine.

Naturally I had to try the third step and Internet Explorer was anyway tempting me to load the latest shiny version with Silverlight and all. How can you refuse such an offer? That too when it comes with a Silverlight. (I have no clue what Silverlight is but c’mon, it sounds so sexy.) So there I was, downloading the latest browser with all the plug ins and what not. Once done I started the browser and here is where competition happens nowadays. As soon as you start your browser it says something like, “Hey, are you a moron?  I am not your default browser. Make me your default browser. Click OK”. Scared, you click OK when the other browser wakes up and says, “Hey, someone trying to make you a moron and wants to be your default browser. Don’t let that happen.” Too scared by now, you are not sure what you do. You click some button and things quiten down. It used to end here in earlier days. Nowadays you hear a screeching sound and a screaming text saying, “How come I am not your default search engine. Why is someone else your default search engine.” Next someone crops up and says, “I want to beyour Phising filter”, to which someone else pops up saying, “No way. I am your Phising filter”. You realize that your desktop / laptop is now a battleground!! After you have pacified all these guys, the browser starts up with some 10 rows of toolbars. Google, Yahoo, MSN, Ask, Don’t Ask, Copernicus, Galileo, Newton.. oops, the last couple are not toolbars, not yet atleast. After all this trouble sometime later one of the browser hangs!!

Things are getting serious now. I start looking to check if my DNS is a problem. Doesn’t seem to be. Next step in debigging is to do the reverse of step 3, i.e. uninstall as many softwares as you can. You don’t know what clashes with what. So I start this process and discover that lot of stuff keeps getting updated without your knowledge. Norton Antivirus goes about downloading latest patches, Firefox goes about downloading latest version and fixes, Windows keeps downloading latest security patches and rebooting your system on its own. These are just a few of them. Ofcourse each asks you something before they download but these message happen so often that you generally click OK for everything. “This web site wants to use your bank account and draw some money”. Click OK. Done.

So I uninstalled TweetDeck . The problem exists. Uninstall VMware workstation. Problem exists. Uninstall some mp3 player. Problem exists. Reboot. Problem exists. Use Firefox instead of Chrome. Problem exists. Use Internet Explorer instead of Firefox, problem exists. Tear your hair. Problem exists.

The major problem about problem solving is that you do not notice details in the beginning. Now as that this was getting on my nerves I started observing closely as to what could be the characteristics of the problem. I immediately noticed that whichever browser had opened my technical subscription gmail was hanging. I started this in a different browser and now this browser was slowing things down. Finally I had got hold of the problem!! Gmail was the culprit!! The Gmail help forum asked to check if I had some plug ins which were not compatible. Nothing of that sort was on the system and there was no further help.  So the next step in debugging in current times? Yes, you guessed it right!! The internet. So off I was to find if someone else had this problem. Looks like lot of people have been having this problem lately. Check out this link. Everyone was confused as to why this was happening. Someone suggested that we turn off the https option in gmail. Someone suggested we use an earlier version of gmail. Someone asked people to turn off the Phishing filter if they were using Norton. As anyone involved in debugging knows, the worst case scenario is when you do too many things at a time and the problem gets solved. You have again debug to find what was the problem. So I decided to do this one at a time. First I switched off the https option and reloaded gmail manually by using the http option. Sometimes you get lucky and  and gmail started running normally and the browser did not hang now !! First try and you have succeeded.

I call these types of bug fixes as nagging bug fixes. Firstly, a good bug fix gives you a better understanding of the system. Nothing like that happened here. Second, the solution was quite trivial. Some minor setting change and things work without you knowing why that was a problem and you have no means of probing further. Added to it is the frustration that all the effort you have put to debug deserves a more complicated bug fix!! Third, a nagging feeling that this may probably not be the right solution.  The reason is that when I use gmail for my personal mail, the setting has https and it works like a dream!! So why should the setting affect one mail id and not the other. This has been consistent across all browsers. So what the heck is the problem? Sometimes, as in real life, you just need to accept the solution and move on and not probe too much. You need to get rid of that nagging feeling by drinking a cup of strong coffee or any other bewerage of your preference. Afterall, if the problem happens again, we have the necessary tools. Reboot, Reload, Reinstall, Uninstall and the World Wide Web!!!

Setting up Sun Unified Storage 7000 Simulator

Nothing to stimulate you like a simulator!! I know it sounds corny but what the heck. The Sun Unified Storage simulator did stimulate my interest and I found the going good. So here is my story of how to setup the Sun Unified Storage simulator and work with it.

I have been thinking of installing some simulator on my system and working with it. As it generally happens, you keep postponing it in small steps and before you know the idea has vanished from your mind.  Luckily for me, I had registered myself in the Sun site for information and they sent a mail to me asking me to download the simulator. I had some time on my hands and it was too tempting an offer to resist.

First things first. In order to download this simulator, you need to register yourself at the Sun site.  Then you get access to download the Sun Unified Storage simulator. The simulator zip file is around 370MB and it expands close to 2.5 GB.  You better have enough space on your hard disk for this.  You can get the simulator at this site Scroll down to find the simulator.

What do you need to run this simulator? I installed this simulator on my laptop, which is Core 2 Duo system with 2GB RAM running Windows XP SP3. So I guess if you have this or something better,  it should work. For the simulator to work, you also need VMware Player on your system. If you don’t have one, you can download it free of cost from the VMware site.  In essence, to make your simulator work on Windows XP, you need to download the VMware Player and you need to download the Sun Unified Storage Simulator.

The Sun Unified Storage Simulator is a virtual appliance, which means you don’t need anything else with it. The steps to follow to install the Storage Simulator are simple:

  1. Download the simulator zip file from the Sun site
  2. Unzip this file. In the extracted files, there will be a uni.vmx file
  3. Now start your VMplayer and select the uni.vmx file
  4. The installation starts now. Have patience

The next steps are from the Sun site:

“When the simulator initially boots you will be prompted for some basic network settings (this is exactly the same as if you were using an actual 7110, 7210 or 7410). Many of these should be filled in for you. Here are some tips if you’re unsure how to fill in any of the required fields:

  • Host Name: Any name you want.
  • DNS Domain: “localdomain”
  • Default Router: The same as the IP address, but put 1 as the final octet.
  • DNS Server: The same as the IP address, but put 1 as the final octet.
  • Password: Whatever you want.

After you enter this information, wait until the screen provides you with a URL to use for subsequent configuration and administration. Use the version of the URL with the IP address (for example, https://192.168.56.3:215/) rather than the host name in your web browser to complete appliance configuration”

What the above steps do is to setup the virtual simulator on your system. This also provides an IP address to the Storage simulator. Once that is done, you see a login prompt on your VMware player. This would probably be the same if you are using the actual hardware. At this point in time you have two options:

  • Login with ‘root’ as the user name and the password you have entered during the setup time and start using the CLI  (or)
  • Use the Web and the GUI provided there to manage the simulator

Though I love Unix and the CLIs generally, I decided to go ahead and try the web. You can access the web gui by typing in the link given during the setup phase. It will be something like <some ip address>:215/ (I got 192.168.22.128:215 as my address. It can be different for you.) Once you type this in your browser you will get the login screen.

The Sun Unified System has lot of features and you can test them using the simulator. There are features like replication, compression, snapshots, analytics etc. My initial idea was to do the simplest possible thing. Create a LUN and create a filesystem and export it. Then use this LUN or Filesystem. So I have not yet checked the other features.

The Sun Unified Storage allows you to use NFS, CIFS and iSCSI. In the GUI, on the top you have a tab called ‘Shares’. This allows you to create shares of the type you want. Shares can be grouped together as projects, making it easy to administer shares of the same kind. Under ‘Shares’ you have the Filesystem and the LUN tabs. If you want to use NFS or CIFS, you need to create that filesystem using the Filesystem tab. If you want to use iSCSI, you can just create a LUN using the LUN tab.

I first created a filesystem and exported it. It was easy seeing it over Windows. I just gave the path and it immediately saw the share. I then wanted to see the same share via Linux. I started another VMplayer with Ubuntu virtual machine running it. Initially I had a few hiccups since the portmapper package is not installed as a default on my system. My friend Sagar sent me a link on the packages required on Ubuntu to make NFS work. (The Ubuntu link here.) Once I installed the required packages and configured the system, I could immediately mount the share and copy some files into it.

The next step was to try accessing some LUNs via iSCSI. I don’t have an iSCSI HBA so I had to use the Software Initiator. I downloaded the Software Initiator from the Microsoft site and I also downloaded the documentation related to it. (I downloaded the initiator which ends with -x86fre.exe) The funny part is that the software and the document are of almost same size!! The download and installation happen fast. No reboot is required. Once installed, you can see the iSCSI software initiator under ‘Programs’. The iSCSI initiator works as a GUI and a CLI is also provided. In case you are just testing, the GUI should do fine.

Once you have downloaded the Software Initiator, you need to now go to the simulator and create a LUN. (Since you will expose this as a iSCSI target you should not create a filesystem.) You should go into the Protocols tab in the simulator to say that iSCSI protocol needs to be used and allow access for all initiators. Once this is done, get back to Windows and open the iSCSI initiator GUI. In this GUI:

  • Provide the IP address of simulator under the ‘Discovery’ tab.
  • The exposed LUNs will be automatically discovered and shown to you in the ‘Targets’ tab
  • Select each of the targets and press the ‘Login’ button. This will ensure you are now connected to the LUN

Once these steps are done, the disks will be visible in ‘Disk Management’ (under ‘Computer Management’) These are raw disks, which you can initialize and partition. I created two LUNs of 0.5GB each. I was able to see them using iSCSI and was able to initialize and partition them.

Thus ended my two days tryst with Sun Unified Storage Simulator.  I must say I am impressed with this simulator. Very easy to install and very easy to configure and use. I will probably try out the other features soon and will writeup about them if I do. I am now raring to go and try other Storage simulators. I know Celerra simulator exists but I am not sure if it is open. NetApp has a simulator but it for NetApp client only I think.

If you have the time, do try out the Sun Simulator. You can get the installation and configuration documents at this site My thanks are due to Chris M Evans, who provided me with the link to the documents when I asked him. (The document also comes a part of the simulator. You can press the ‘Help’ tab in the simulator to get the complete document. ) Chris Evans (@chrismevans) , who blogs as Storage Architect, has written a series of posts on Sun Unified Storage. You can check out those articles at The Storage Architect blog.

Hope this was useful and hope it makes at a few of you to wake up from your slumber and try something :)