Perl Windows - Grab all the system tray icons & get their co-ordinates

posted Jun 26, 2012, 5:34 PM by Aditya Ivaturi

While automating applications on Windows, every now & then I run in to a situation where I have to find a tray icon & do some mouse operations on it. And mostly I overlooked it since, it was just a small part of my automation needs & never really bothered with it. Well it was time to scratch that small itch as it was bothering me. So with the help of Sinan on StackOverflow, I ended up writing this script. This should work on both 32 & 64-bit platforms

use strict;
use warnings;

use Data::Dumper;
use Win32::API;
use Win32::OLE qw(in);
use Win32::OLE::Variant;
use Win32::GuiTest qw(

# Used for WMI
use constant wbemFlagReturnImmediately => 0x10;
use constant wbemFlagForwardOnly       => 0x20;

# SendMessage commands
use constant TB_BUTTONCOUNT   => 0x0418;
use constant TB_GETBUTTONTEXT => 0x041B;
use constant TB_GETBUTTONINFO => 0x0441;
use constant TB_GETITEMRECT   => 0x041D;
use constant TB_GETBUTTON     => 0x0417;

sub get_windows_os_details {
    my ($self) = @_;
    my $ret;

    my $objWMIService =
      or die "WMI connection failed.\n";
    my $colItems =
      $objWMIService->ExecQuery("SELECT * FROM Win32_OperatingSystem",
                               wbemFlagReturnImmediately | wbemFlagForwardOnly);

    my $objItem;
    foreach $objItem (in $colItems) {
        $ret->{'osname'} = $objItem->{Caption};

    $colItems =
      $objWMIService->ExecQuery("SELECT * FROM Win32_Processor",
                               wbemFlagReturnImmediately | wbemFlagForwardOnly);

    foreach $objItem (in $colItems) {
        $ret->{'osbit'} = $objItem->{AddressWidth};

    return $ret;

sub get_process_list {
    my $ret;
    my $class = "winmgmts:{impersonationLevel=impersonate}\\\\.\\Root\\cimv2";
    my $wmi = Win32::OLE->GetObject($class) || die;
    my $plist = $wmi->InstancesOf( "Win32_Process" );
    foreach my $proc (in($plist)) {
        $ret->{$proc->{'ProcessID'}} = $proc->{'Name'};
    return $ret;

sub get_tray_handle {
    my ($tray_hwnd) = FindWindowLike(undef, undef, 'TrayNotifyWnd');
    my ($toolbar_hwnd) = FindWindowLike($tray_hwnd, undef, 'ToolbarWindow');
    return $toolbar_hwnd;

sub get_tray_icon_count {
    my $hWnd = get_tray_handle(); 
    return SendMessage($hWnd, TB_BUTTONCOUNT, 0, 0);

my $os = get_windows_os_details();

my $tb_button;
if ($os->{'osbit'} == 64) {
    $tb_button->{'pack'} = 'i i C C A6 L L';
    $tb_button->{'size'} = 0x20; # 32 bytes
} else {
    $tb_button->{'pack'} = 'i i C C A2 L L';
    $tb_button->{'size'} = 0x1C; # 28 bytes

# Get tray handle 
my $tray_hwnd = get_tray_handle();
my $icon_count = get_tray_icon_count();

# All the processes & their PIDs
my $proc_list = get_process_list();

# Allocate virtual buffer for TBBUTTON Structure (depending on 32-bit on 64-bit) in
# tray process
my $buffer = AllocateVirtualBuffer($tray_hwnd, $tb_button->{'size'});
my $icons;

for (my $i=0; $i<=$icon_count; $i++) {
    # Get the button structure represented by index by using TB_GETBUTTON
    # message & then copy it to the local process space.
    my $status = SendMessage($tray_hwnd, TB_GETBUTTON, $i, $buffer->{ptr});
    my $result = ReadFromVirtualBuffer($buffer, $tb_button->{'size'});
    my ($iBitmap, $idCommand, $fsState, $fsStyle, $bReserved, $dwData, $iString) = unpack $tb_button->{'pack'}, $result;
    # Find the owner handle for the button
    my $read_process = Win32::API->new('kernel32', 'ReadProcessMemory', 'NNPIN','I');
    my $extra_data = pack('L2', 0, 0);
    $read_process->Call($buffer->{process}, $dwData, $extra_data, (length $extra_data), 0);
    my ($owner_hwnd, $id) = unpack('L2', $extra_data);
    # Find the PID of the owner process
    my $window_thread_proc_id = Win32::API->new('user32', "GetWindowThreadProcessId", 'LP', 'N');
    my $lpdwPID = pack 'L', 0;
    my $pid = $window_thread_proc_id->Call($owner_hwnd, $lpdwPID);
    my $dwPID = unpack 'L', $lpdwPID;
    # Find the icon coordinates
    my $rect = pack ('IIII', 0, 0 , 0, 0);
    SendMessage($tray_hwnd, TB_GETITEMRECT, $i, $buffer->{ptr});
    my $map_window_points = Win32::API->new('user32', 'MapWindowPoints', 'NIPI', 'I');
    my $ret = $map_window_points->Call($tray_hwnd, 0, $rect, 2);
    my ($left, $top, $right, $bottom) = unpack('IIII', $rect);
    # Fill our icons hash list
    if (defined $proc_list->{$dwPID}) {
        $icons->{$proc_list->{$dwPID}}->{'hwnd'} = $owner_hwnd;
        $icons->{$proc_list->{$dwPID}}->{'pid'} = $dwPID;
        $icons->{$proc_list->{$dwPID}}->{'x'} = $left;
        $icons->{$proc_list->{$dwPID}}->{'y'} = $top;

# We don't need the virtual buffer any more

print Dumper($icons);

iperf v2.0.4 pre-compiled binary for win32

posted Jul 23, 2010, 5:29 PM by Aditya Ivaturi   [ updated Sep 24, 2010, 2:32 PM ]

The last compiled binary for Windows was v1.7.0. But I like the newer -y option which dumps out the report in CSV format. With the help of this patch, I managed to create a binary for Windows and it is attached to this post. Unfortunately I couldn't find any compiled binary of this version online, so here it is...

Standard disclaimers: Use it at your own risk. I'm uploading this so that it might be of some use to someone. Please don't email me for any support issues; I did not create this software, I just compiled it. It works for me, if it doesn't work for you - tough luck.

Replace QTP with Selenium + AutoIt + Applescript - Part I

posted Jun 13, 2010, 4:20 PM by Aditya Ivaturi   [ updated Jun 13, 2010, 4:23 PM ]

I hate QTP. Why? Because it sucks. Of course, QTP fanboys will immediately jump up & down stating that I don't know how to use QTP. On the contrary, I know exactly what I am talking about. As of this writing, QTP still does not support Windows 7, heck it still doesn't support Firefox 3.6! Are you kidding me? Well, apart from HP's snail pace development process, I have other problems with the tool itself. Like its really retarded scripting engine (which uses vbscript), which does not provide you any real mechanism to maintain frameworks. Another example - CreateObject("WScript.Shell") - what do you think will happen if you used that in QTP? Any programmer who knows vbscript, will say that it creates a wscript object but she'd be so wrong. It rather creates a native windows shell automation object. WScript CreateObject() is simply not supported in QTP as QTP scripting engine overrides WScript. 

Let us get one thing straight - Automation is programming, now let that sink in for a second...again - automation is programming. If your automation tool does not provide a real good programming interface, it is not fit for automation. Obviously in my books, QTP falls way short of that goal. One of the statements I consistenly hear is - "oh we don't have programmers in our automation team". If you cannot see the fallacy in that statement, no one can help your team - not even QTP. And of course, support from HP is bad too. Case in point - recently our team encountered a bug in QTP 10 where it had memory allocation issues & the workaround offered to us - "restart QTP after every 4 test case runs". I am not joking.

QTP does few things really good vis-a-vis record & playback (and they make it real simple for non technical users). And that also includes support for various enterprise applications both web based & win32. That means, they have to cover a lot of territory before they can release something and that explains why Windows 7 support is still lacking. But in your case, do you need Sharepoint support on Windows 7? If all you're testing is your own web app, why do you have to wait for HP to finish support for say Oracle enterprise apps? At this juncture, the only reason your team is still sticking to QTP is either because you have no real developers in your QA team and/or you have a lot of test cases automated in QTP. The later is a pain initially to convert to something else, but if you plan it out correctly you will save tons of headache in future. 

I could go on & on about all that is wrong with QTP, but this article is not about that. This article is about getting rid of QTP & using alternatives in place of it to achieve a truly cross platform solution. After joining my current company, one of my first goal was to do exactly that. And this article describes what we did & how we did it.

What are we automating?

I work for the SSL-VPN group at Juniper. Not many people (outside of my group) know all the capabilities of this box even if they use our box. The reason is simple - you don't need to use all the features and you can't fit them all in a simple data sheet or a marketing slide. But its my group's responsibility to make sure everything is tested and for that matter automated. Right from the start our group was a Mercury shop & that means QTP infestation was really thick. Well you can't really blame anyone as our product, like many others in the market, is incremental in nature and the tools that I am championing right now were not even created. Back in early 2000's, this was the first SSL VPN product, which was designed primarily to provide remote access to web-enabled enterprise application. But now, it provides remote access solutions from L3-L7 with support for a really dizzying array of backend auth servers, authentication mechanisms & enterprise applications like Sharepoint & even VMware solutions. This box allows you to access these applications either strictly through web (if it is web enabled) or through traditional vpn solutions a la vpn client that is supported on Windows, Mac & Linux. Admin configuration is strictly through web, which can get really hairy depending on how complex your network is, what kind of (or combinations of) auth servers you use, what kind of policies you want to enforce etc. Users can access the box (or in essence the network) via web or through the client directly. 

We not only have to test the admin & user accessible web pages of our box, we even have to test enterprise applications that are rewritten through our box. For e.g. when you access OWA through our box using the rewriter, it rewrites links, flash etc so that the app is forced to pass through the box. And that means, not only are we testing our web application, we also have to test 3rd party web application that is accesses through the box. So yes, at the end of the day we end up writing test cases to test OWA application itself (through our box of course). And just to emphasize, if we get enough complaints that a site like Youtube is not working properly through our box, we will create test cases to test Youtube through our box. And we have to test all these on all versions of IE & latest versions of Firefox on Windows XP, Vista & 7; Safari & Firefox on Mac and Firefox on Linux (Ubuntu). You can easily how complex our testing infrastructure & framework becomes & so far we have just tested one part of our product - web. We also have 3 different windows clients, 2 Mac clients, one linux client & a java client. And recently, we announced mobile clients for all the major smartphone platforms (you heard Steve Jobs mention it - didn't you? :). You can launch them through the web portal or through the system itself. In the former case, it automatically gets pushed to the system (not the mobile) & installed if it is not already & supports two delivery mechanisms - activex & java.

So how do we approach this automation?

Essentially, the whole automation can be broken down in to two segments - web & system. And we needed tools that had good APIs and were as cross-platform as possible, more importantly for web. We simply don't have the time or the man power to maintain separate testsuites for each platform. That means, a lot of abstraction has to take place to facilitate such environment where we write one test & can run on all platforms unmodified (or with very minimal set of if-then-else statements). Juniper uses Perl  for all of its testing & since our group ended up at Juniper through acquisition, we had no choice but to move to Perl. But I don't hate Perl, in fact it used to be my goto language during my Sysadmin days. And the community support for Perl is second to none. So we (or at lease I) didn't really have any qualms to move to it from python. So now that we decided on Perl, it was time to go look for tools.

In the past couple of years, if you search for web testing tools you simply cannot miss Selenium. If your team does web testing exclusively, this tool should address almost all of your needs but there are caveats. And the best part, it's open source and that means you can modify it for your team's unique needs. It was definitely invaluable in our case. The two most important thing that made Selenium a slam dunk was its support for all the mainstream browsers on all the 3 platforms & a server-client model with support for multitudes of languages - specifically Perl. The server-client model is very useful, since generally, your driver & the DUT are not the same machine - and in our case it certainly wasn't.

Now that Selenium was going to address the web part (which was actually rather easy decision), we needed to look at tools for automating system applications - in our case, vpn clients & to some extent 3rd party applications like Outlook, Word etc. Most of our test cases are for Windows platform, followed by Mac. Linux definitely needs some love, but with minuscule % of the customers using it, we couldn't justify spending time on it. So we needed some tool that would allow us unified API (preferably with a server-client model) & work on Windows (all flavors & both 32-bit & 64-bit) and Mac OS X. We couldn't find any single tool that met all those criteria. So we decided to end up using native OS capabilities to address our goal. To tie them all together, we created a RESTish automation server that uses native OS tools to provide automation for native apps. That is where AutoIt steps in. AutoIt, is more or less a convenient user friendly (well actually developer friendly) wrapper around Win32 API & its API works across all flavors of Windows. And the best part is that their API is also available through a COM dll, which makes it very easy for us to consume the API through Perl. But there are still some things that you can't do through AutoIt, so you have to use Win32 API directly through Perl. In case of OS X, obviously Applescript is practically a no-brainer.

In Part-II of this article, I will discuss how our implementation of REST server provides automation for native system events & apps & goes hand-in-hand with Selenium to provide a very feature rich, faster & more importantly cross-platform alternative to QTP.

Hacking Selenium to improve its performance on IE

posted Feb 3, 2010, 4:14 PM by Aditya Ivaturi   [ updated Jun 26, 2010, 1:41 PM ]

TL;DR of the post below; You have to perform all these steps & in some instances, you can squeeze out better performance from other browsers too:

1) Switch to "javascript-xpath". The default ajaxslt is unbearably slow when it comes to IE. I don't even know why it is still distributed with Selenium.

2) Inject javascript-xpath in to the DOM of your app under test - each time it changes. In our case, processing speed jumped at least 10 times by doing this. You might have reservations about modifying your app under test but this step is for all practical purposes - harmless & javascript-xpath remains dormant in browsers where xpath support is already built in (read - any browser other that IE).

3) Offload repetitive processing to javascript through user extensions & use "context nodes" where possible - specifically where you have to read table listing. More details in the description below.

4) This is highly optional but convert your ids to xpath. Yes, you heard that right - xpath look up (with the first 2 steps above) is at least 3 times faster than plain id lookup in IE.

All the above will practically bring your IE performance really close to that of Firefox. Now for the original post with details:

Here is some info on what we did to improve the IE performance with regards to Selenium. Before we get started, you have to familiarize yourself with a few key related technologies:


  1. XPath -
  2. CSS Selectors -
  3. DOM -
  4. JSON -


Of course, you also have to know what Selenium is, how it works & a basic idea of its architecture to understand the problem & how to solve it. All major browsers in the market (FF, Chrome, Safari) except IE include XPath engines in their browsers. This makes is extremely simple to query a web element on the page directly through JavaScript. For an intro to the API, please read this document - Selenium (apart from other locators) uses XPath to help you locate elements on a web page. But since IE does not have a native XPath engine, it is actually implemented as a JavaSript Library by 3rd party. Google has its own implementation known as ajaxslt, which is the default in Selenium for XPath processing if native support is unavailable. But that is where the problem starts.


The Problem

Selenium launches the app under test browser window in two modes - single window & multi window. Default is always multi window wherein, the first one is the driver window which processes all the commands & the second window is the actual app under test. In single window mode, the app under test is loaded in to the lower frame, which means the app under test cannot be frame busting - meaning the app under test should not have any frames or popup windows. Since our app is frame busting, we have to use the default mode - multi window. When you send a command to evaluate an XPath query, the driver window uses the DOM handle of the app under test & runs that query by traversing the DOM. In FF, this is amazingly fast as it has a native support for XPath, but on IE the driver window uses the external XPath JavaScript library to do the same. But, the evaluation gets progressively slow on large set of elements due to 2 main reasons - 1) extremely slow JavaScript engine of IE & 2) constant chatter between the driver window & the app under test window, which was addressed recently by the Selenium devs (check out the latest code from The driver window ends up doing a DOM traversal on the app under test after processing the XPath query. And on top of that, there was another problem - ajaxslt was itself very slow. Fortunately, there is another library called javascript-xpath and in Selenium you can switch to that library to improve the performance of your XPath processing. Inside Selenium, it didn't really help much but outside of Selenium the processing was blazingly fast. A lot of discussion ensued with the developers to convince them that this was in no way a usable solution. You can read all about it here - and subsequently a bug was filed with the relevant info


CSS Selectors to rescue?

Our automation framework depends on Selenium to interact with the web pages (Juniper SSL VPN admin pages). It has various "get" & "set" methods to read & write values to those web pages. We implemented a convenient method in our framework called "table-list", which reads the list of items listed in a specific table format and was widely used across all the admin pages. Reads are always costly as individual pages can be huge for e.g. Auth Servers page (linked to the bug above) was taking about 25 minutes to finish listing the auth servers with the ajaxslt library. For comparison, on FF it was taking close to 4 seconds. So after digging in the forums for a while, CSS selectors was offered as a faster alternative. But CSS Selectors themselves weren't that blazingly fast. When we re-implemented that "table-list" method using CSS Selectors, the time of execution dropped down to 6 minutes, but still, it was really slow. So, we started looking for other alternatives and in one of those experiments we realized that all this chatter between the two browser windows was slowing down thing & we could probably improve the performance a bit if we had javascript-xpath library on the app under test window. We did exactly that - once the app under test window is loaded, we modify the DOM & inject the javascript-xpath library. This actually sped up over all performance on IVE Admin pages from login onwards and our "table-list" dropped down to just over 4 minutes. This still was not acceptable.


"Context nodes" & injecting javascript-xpath

If you have ever worked with libxml2 XML processing library, you'll know about a little nugget known as "context" nodes. In a structured document like XML & XHTML, you can traverse the document as a tree & if you want to do repetitive processing on a set of nodes, you chose the parent of those nodes & make it a temporary root node and this node is known as "context" node. That means all your further processing will work based on this particular context & you don't have to traverse the whole document for each processing of a query. Well there should be something similar in XPath too right? Yes of course there was - If you look at the test document of the Auth Servers page, attached to the bug filed with Selenium, there are over 800 tables in that page (don't ask me why - I still don't know). So whenever we ran some thing like //table[@class='tblList']  it'd go through the whole document (with all those 800 tables) & do a search for that table and then drill down for eventual table cell where our content resides. With context nodes (outside of selenium), the whole "table-list" operation dropped down to 900 milliseconds, since you're not searching through 800 tables  anymore. There was a hurdle though; Selenium does not support returning object back to the client driver - only data. That means, we could never get the context node back from Selenium. So, we decided to completely off load "table-list" processing to JavaScript using the user-extensions feature of Selenium. Using user-extensions, you can extend Selenium & create your own commands. And thus the getList command was born, which does exactly the same as "table-list", only it was implemented in JavaScript & not Perl. We used JSON to build our data structure, serialize it & send it over the wire to Perl Client driver. The execution time dropped down to 2 seconds for that command on IE - mission accomplished! On a side note - in FF this same method will finish execution in milliseconds.


Now, that we solved the issue of listing the content, we feared that the general "reads" on IE for input elements with no ID, would be slow too and indeed it was. Our IVE web page templates didn't have IDs for elements - only names & values. So when you try to read the input elements with name/value attribute, the DOM traversal was horrendously slow in IE. So the solution was to inject javascript-xpath at runtime on every single page & convert the element names to XPath on the fly & process them. Well, this brought down the processing time of each lookup from over 15 seconds to milliseconds on IE. All these changes dramatically improved the performance & brought it very close to the performance of Selenium on FF. At this point, if you've been paying attention, you might ask if it is wise to modify app under test? Well, it is a judgement call & in our case we're modifying app under test to inject the javascript-xpath to improve the performance of XPath locators on IE. javascript-xpath does not interfere or modify any thing else on your document's DOM and is in fact dormant in browsers which have native support for XPath. As I see it, it was really not an issue. 

Add getCSSCount command to Selenium

posted Dec 30, 2009, 3:35 PM by Aditya Ivaturi

getXpathCount is a very useful command that Selenium devs provided. And once you started enjoying coding with it, you realized IE was throwing roadblocks since XPath is dog slow on it (what else is new?). So you look around and you're told to switch to CSS Selectors (or locators). Well, the very first thing you're looking for is the count equivalent for CSS and it is conveniently missing. So, I ended up hacking the Selenenium core to add the missing command - "getCSSCount".
BTW, I hacked the core that comes with the selenium-server.jar and I believe you can achieve the same result using user-extensions.js. Incidently, I understood the core better just by looking at the unjarred files, than reading the user-extensions doc - go figure. So just to be clear - my way is not the only way to do it and it works for me. I wanted to post this on the Selenium forums but since they moved to Google groups, I couldn't post the files and hence this hack resides here.
Either hack the following scripts yourself or download the hacked scripts and replace them in their respective locations. Here are the steps:
1) Unjar the selenium-server.jar.
2) Inside the core/scripts directory you'll have to hack 2 files - selenium-api,js, selenium-browserbot.js
3) In Selenium-api.js add this function just below getXpathCount function (or download the attached file & replace it).
Selenium.prototype.getCSSCount = function(locator) {
    * Returns the number of nodes that match the specified xpath, eg. "table" would give
    * the number of tables.
    * @param xpath the xpath expression to evaluate. do NOT wrap this expression in a 'count()' function; we will do that for you.
    * @return number the number of nodes that match the specified xpath
    var result = this.browserbot.evaluateCSSCount(locator, this.browserbot.getDocument());
    return result;
4) In selenium-browserbot.js add this function just below evaluateXpathCount function (or download the attached file & replace it).
 * Returns the number of css results.
BrowserBot.prototype.evaluateCSSCount = function(css, inDocument) {
    var results = eval_css(css, inDocument);
    return results.length;
5) Now in iedoc-core.xml & iedoc.xml (in the core directory) add these lines after the getXpathCount block (or download the attached files & replace them).
<function name="getCSSCount">
<return type="number">the number of nodes that match the specified CSS Locator</return>
<param name="xpath">the CSS Locator expression to evaluate. do NOT wrap this expression in a 'count()' function; we will do that for you.</param>
<comment>Returns the number of nodes that match the specified CSS Locator, eg. "table" would give
the number of tables.</comment>
6) Now jar the whole file back using this command in the root directory - "jar cmf META-INF\MANIFEST.MF selenium-server.jar ." (note the dot at the end).
7) The next step is to obviously expose this functionality through your driver - I use Perl & I'll provide you the example here. Take a look at it & modify your respective drivers, it is very easy. So get to your file & add this function (or download the attached file & just replace your local copy):
=item $sel-E<gt>get_css_count($selector)
Returns the number of nodes that match the specified xpath, eg. "table" would givethe number of tables.
$selector is the selector expression to evaluate. do NOT wrap this expression in a 'count()' function; we will do that for you.
Returns the number of nodes that match the specified selector
sub get_css_count {
    my $self = shift;
    return $self->get_number("getCSSCount", @_);
That is all that is required. Now you have getCSSCount available for your testing needs. For the Perl driver, here is how you'd use it:
my $val = $sel->get_css_count("table[class='tblList']>tbody>tr");
print "val: $val\n";
From the above example, it'll give me the number of rows in a table of class tblList. Note that you don't have to use the "css=" prefix.

Create your own VMware Connection Broker

posted Dec 15, 2009, 11:45 PM by Aditya Ivaturi

First of all, "Connection Broker" is a highly glorified marketing term to say that this piece of software will allow you to connect to your Virtual Machine based on some "rules" (access, authorization, availability etc and the more "rules" you add, it becomes more complex) and hence the "broker" part. In this post, I'll be concentrating on VMware since I am more familiar with it, but you can pretty much implement this connection broker for the Sun & Citrix solutions as well. It looks complicated, but it isn't, that is mostly because the excellent SDK that they provide.

I first heard of a connection broker when our sales guy was raving about a company called Leostream. At that time, I thought it was really cool but by the time I got hold of their evaluation VM (yes they give out or at least used to provide their connection broker as a VM), I had already developed a prototype using the VI Perl SDK internally to work with our VPN box. And guess what, they just bundled the Perl SDK & provided some cgi front end to provide a polished product. Now, if that was all they provided then they still wouldn't be in business. They actually provide "policy" & "access" mechanisms to those backend VMs, which add value to the overall product. And their product was the first (I think) to tunnel PCOIP. But regardless, underneath it all, the core engine (at least for VMware) is production ready & they used it as it is. That is why, I keep saying - roll out your own connection broker if you don't care about all the bells & whistles. For e.g. why does a university lab require a commercial connection broker? Get some students to intern for a summer & give them this project. It is not only fun, but there are a lot of internet technologies that they'll come across & learn by doing them.

One of the most amazing thing that VMWare did with ESX (and for that matter most of their product line) was to provide an amazing set of API to pretty much do any task (down to bare metal operation) through SOAP. In fact, they themselves use this API internally to manage all the ESX servers (through Virtual Center). I knew this SDK (and still call this) as VI SDK, but for some reason they keep on changing its name. I think the latest monicker is vSphere, but don't get too hung up on it, soon it may become "vUniverse". Lucky for me, their functionality & direction of the API doesn't change - it just evolves. Back to the actual topic on hand. You should be familiar with the various infrastructure products that VMware sells - ESX, Virtual Center etc and what service they provide. If not some of the discussion here might not make any sense. At the heart of these products is ESX, which hosts the actual VMs themselves. And what VI SDK does is to expose all the operations that you can do on a VM (and the ESX themselves), like deploying a new VM, undeploying, powering on & off etc. 

So let us say, you have set up a farm of ESX servers & you have a VC managing all of them. You can use the VI SDK to talk to the VC server & ask it to manage various operations for you. For e.g. you can tell the VC to deploy a new VM and based on its rules & configuration, VC will deploy the VM on an ESX server & give you details relevant to that deployed VM. In a way, it does the load balancing (load as in number of VMs on a particular ESX) for you. You can of course specify where exactly to deploy the VM but why complicate your life? VC is pretty efficient at what it does. Anyhoo, this farm of virtual infrastructure is pretty much useless unless you could automate these operations, which you can using the VI SDK or purchase a connection broker. Unless you are deploying this for day-to-day production use i.e. your workforce is going all virtual, create your own connection broker & save the money & on the way, earn some job security ;).

To get you started, let me give you some snippets of code. Oh BTW, they release a Perl SDK also, which is really cool and it is really well designed. And since it is designed in Perl, hacking it is just that more fun. So the code snippets, I'm about to show you are in Perl. If you want Java or powershell *shudders* scripts there are tons of examples on their developer site. 

Regardless, you have to read their VIPerl documentation. In fact most of these snippets would make sense & seem familiar if you actually read them first. VIPerl SDK uses certain environment variables - you don't have to use them, but for these examples, it is easy to explain using them. 

Once you have VIPerl SDK installed, just use it...

use VMware::VIRuntime;

Now set up your environment variables.

$ENV{'VI_SERVER'} = ''; # This is your VC server or it can be your ESX server too. 
$ENV{'VI_USERNAME'} = "user";
$ENV{'VI_PASSWORD'} = :password"";
$ENV{'VI_PROTOCOL'} = 'https';
$ENV{'VI_PORTNUMBER'} = '443';
$ENV{'VI_DATACENTER'} = 'MY LAB'; # Look this up in your admin panel of either the VC or ESX server. 

my $datacenter = $ENV{'VI_DATACENTER'};

These three statements do all the magic of reading, validating options and connecting to the VC server.


Now let us do some simple stuff like finding your datacenter.

my $datacenter_view = Vim::find_entity_view(view_type => 'Datacenter',
                                            filter => { name => $datacenter });

And may be get all hosts under this datacenter

my $host_views = Vim::find_entity_views(view_type => 'HostSystem',
                                        begin_entity => $datacenter_view);
print "<p>Hosts found:</p>";
foreach (@$host_views) {
   print $_->name,"\n";

Now, find the list of VMs assigned for the VI_USERNAME.

print "<p>VM's found for $ENV{'VI_USERNAME'}:</p>";
# get all VM's under this datacenter
my $vm_views = Vim::find_entity_views(view_type => 'VirtualMachine',
                                      begin_entity => $datacenter_view);

Now at this point, you have the list of VMs for a particular user and you might want to connect to it. There are two options for it (that I know of) - 1) Connect using VMWare View Client (there is a linux version too!) or connect directly through the browser using VMWare MKS client. The MKS client will be installed automatically when you try to access the console of a VM in the browser (even though support for all browsers is non-existent). I'll show you how to connect via MKS.

So from our previous command, we found all the VMs for the user, let us create a "View Console" link for it. 

foreach (@$vm_views) {
   my $mks = VirtualMachineOperations::AcquireMksTicket($_);
   print "$counter: <a href=viewConsole.cgi?vwmks=1&cfgFile=".$mks->cfgFile."&port=".$mks->port.

OK, lets start with AcquireMksTicket(). This method provides you a ticket (along with other relevant stuff) to access the console of a particular VM and it is time sensitive that means after a while of inactivity, it expires. The viewConsole.cgi page, which you have to create btw (and is attached to this page), will take the relevant MKS parameters & allow you to connect to the VM console through its browser plugin, which happens to be:

<object id="mks" classid="CLSID:338095E4-1806-4ba3-AB51-38A3179200E9"  codebase="https://$host/ui/plugin/msie/,1,0,0" width="100%" height="100%"></object>

Note that the classid might change depending on the version of MKS plugin and you should change it accordingly. Try this example first though. So as you can see, this plugin is shipped with every ESX server. Oh one more thing, if you payed close attention to the codebase attribute of the object tag, you'll notice that we're connecting to the ESX server on which this VM is hosted directly and not through the VC.

Well, there you go - you have connected to your VC, accessed all the VMs for the logged in user & allowed your user to connect to your VM. Now, you might ask how do I deploy these VMs for the user? Can I create groups and allow authorized access to it? Well, if you have read this far & understood what is involved then those things shouldn't be too hard - for one, all the necessary methods are provided through the SDK. Just RTFM.

VMware Lab Manager & Perl

posted Dec 8, 2009, 9:46 PM by Aditya Ivaturi

VMware's Lab Manager is an amazing tool to have for testing organizations (which was originally designed by a company called Akimbi). But this post is not about what it is & how it can be useful. Go to the mentioned link for further details on the product. This post is to give you an example on  how you can consume its SOAP API from Perl. We use Lab Manager extensively for all our feature testing & automation. For automation, you can deploy, undeploy and change some of the vm properties using the SOAP API. Lab Manager also has more "internal API", which is not officially supported but can be used for more control of what kind of automation you wanna do. 

Anyhoo, for my group I ended up creating a simple perl module to deploy & undeploy (and delete) library configurations. For SOAP, I used SOAP::Lite, which is very widely used & supported module. The attached module is a working example. Just remember that each request to the LM server requires that you send auth headers (read the LM API documentation for further info). get_auth_header accomplishes that in my module. 

The module has its own POD to help you out. If you optimize it & add more methods to it, please let me know.

Doxygen & Perl POD

posted Dec 5, 2009, 4:27 PM by Aditya Ivaturi   [ updated Dec 6, 2009, 2:37 AM ]

If you have ever dealt with Perl modules on CPAN, you will immediately notice the widespread use of POD to document every thing. And then most POD processors do a simple conversion to HTML (or for that matter many other formats) for proper presentation. pod2html is a very good tool for what it does but when you are maintaining frameworks written entirely in Perl, a simple POD wouldn't be the only documentation that you'll need especially if you want your co-worker to just read it & pick up where you left off. You need to present a little bit more "structure" to your documentation than simply providing description & synopsis for each module. So, when some one refers to framework documentation they are typically not looking for just your method descriptions but a lot more than that like for e.g. an object model or inheritance diagram or the whole frameworks package structure. Sure you can tell them to open up the whole package tree & look at it, but that is just rude - especially with Perl.

This is where documentation generation tools like Doxygen come in very handy. If you don't know about this tool, I recommend you to go to that site & read up a little about it. Although, if you are a c/c++ shop, it'd be really hard not to come across it or at least consider it. Of course there are other documentation generation tools, but I am a little familiar with doxygen & it was a natural choice. Unfortunately it doesn't support Perl or POD, but fortunately you can feed "filtered" text of your source to it to get the magical output you wanted. There are two projects that I know of that filter Perl source code to be processed by Doxygen - DoxygenFilter and DoxyFilt. The later is an older project & some how I couldn't figure out how to download the package from its site. Regardless, they both process Perl scripts & at least from their documentation it seemed like they can process POD too. I couldn't try DoxyFilt, but I did try DoxygenFilter & it definitely doesn't process POD (I looked at its modules to confirm). But, the filtered Perl sources were being processed by Doxygen & I was spitting out a nice site of all kinds of wonderful documentation, without any POD info in it.

So I did what any lazy hacker would do - hack DoxygenFilter to address just my issues. In our group we actually use very basic POD syntax & we enforce simple rules on documentation, like =cut your =head1. These rules meant that I can hack DoxygenFilter really fast to generate necessary input for Doxygen. My changes are attached to this post & remember that this hack is strictly for Perl POD with some arbitrary rules enforced (which works in my group). And it is not optimized, for e.g. I read each Perl script 3 times, but then again my focus was just to get this thing to spit out what I want. I wasn't shooting for programmers hall of fame.

Anyhoo, the two modules I modified are & which are part of DoxygenFilter package and they are attached to this post. Although I am pretty sure that I made it crystal clear that this hack is for my use & works for my specific situation, I am not guaranteeing that it'll work for you too. Try it at your own risk.

1-8 of 8