by Marko Riedel, with an idea by Alexander Malmberg
The goal is to implement a screen grab utility for X11 using GNUstep. This application, or tool rather, should collect data for all windows that are currently visible, be it partially or complete, into a tree that represents the parent-child relationship between them. The most important item per window is the image that it displays. Trees map naturally onto browsers, so we will let the user navigate the window tree with an NSBrowser. The application has one window, whose upper portion displays the browser. The lower portion contains a scrollview, whose document view is a special imageview that shows the image in the window at the time the snapshot was taken (just after launch). We read image data only for the onscreen rectangle of the window. The user can save images to TIFF files. There are two phases to this application. First it queries the X server for the contents of the onscreen windows and builds a window tree; the user views and perhaps saves window contents of the retrieved windows during the second phase.
We’ll be working with three classes. The class XWinData encapsulates the attributes for an X window and an NSImage, which holds the current image obtained for the window. The class ImageView is a simple replacement of NSImageView and it provides for fast display of an image that is represented by a bitmap and it will be the documentview of the scrollview. The third class is the class Controller, which collects the data from the X server, builds the tree of XWinData objects starting with the root window and acts as a passive delegate for the browser. It implements a method to save images.
The program starts with the necessary headers, notably the one for Xlib.
The class XWinData encapsulates data that describe an X window as well as the image contained in the window. An X window is on a certain display and has a parent. We record these for easy reference. We also record the attributes of the window. We’ll be using the location and the size attributes, as well as the map state of the window (whether it is viewable or not). The instance variable children makes XWinData into a tree structure, with the individual objects being the nodes. We need the tree data to make the set of windows easy to navigate and provide data for the browser.
On of the first things that we’ll do is compute the portion of the window that is actually on the screen, in screen coordinates. We only process windows of which at least some part is on the screen. An XWinData object also holds a description of itself, which is the name of the window if it has one, and the geometry otherwise. Finally we have the image, which is an NSImage object that holds the image that we retrieved for the window.
The intializer is straightforward. Note that it doesn’t require the array of children or the image, which are computed later in the program, and only if necessary.
We need access to the attributes of the window. The methods for this return instance variables. The method onScreen indicates whether any portion at all of the window is on the screen.
The next two methods let us set and retrieve an array of children for the receiver.
The entries of the browser’s columns should be in alphabetical order so we need a comparison method between two XWinData instances that we’ll use to sort them.
If it turns out that some part of a window is on the screen, possibly obscured, then we compute an image from the contents of the window. There is an accessor for this instance variable.
The last method sends appropriate release messages to the array of children and the description (a string) when the object is being deallocated.
We now discuss these methods in detail. The initializer stores its arguments in the appropriate instance variables. It then tries to obtain the window’s attributes from the server and raises an exception if it fails.
The next step is to translate the coordinates of the window’s origin into screen coordiates (the values in the attribute structure are in the parent’s coordinate system).
We may compute the portion of the window that is on the screen once the coordinates have been translated. We intersect the screen rectangle with the window’s rectangle for this purpose. We’ll be using the result when we retrieve the image and decide whether the window should be included in the tree.
Next the initializer builds the description and then it exits. If the name property of the window is set, then we use it as the description. Otherwise we record that there was no name, namely by giving the root window the name ROOT WINDOW and other unnamed windows the name {no title}. The description of unnamed windows includes the geometry of the window.
The newly initialized object is put into an autorelease pool.
The methods description, map_state and xclass return the appropriate instance variables.
A window is on the screen if its onscreen-rectangle is not empty.
We need to set and retrieve the array of children. Setting the array will occur at most once, so we don’t have to worry about releasing a prior content of the instance variable.
XWinData objects are ordered according to the description string for sorting purposes i.e. getting the right order in the browser’s columns.
The next method is perhaps the most important of the entire application. It retrieves the window’s image from the window server and stores it in an NSImage. We’ll be using a bitmap as the image’s representation, so we can write data directly into the bitmap’s data plane. (We’ll use a non-planar bitmap with one plane.)
The first step is to translate the screen coordinates of the onscreen rectangle into window coordinates, so that we can reference the portion of the image that we wish to read. We translate the coordinates from root window coordinates to window coordinates and use the clipped bounds rectangle to obtain the dimensions of the displayed rectangle.
We are now ready to read the image data and invoke XGetImage for this purpose. We raise an exception if we couldn’t read the image.
We must convert the data in the XImage into a bitmap image representation, which we now declare. It has the correct dimensions and uses eight bits per sample (red, green, blue). There is no alpha channel. We retrieve the bitmap’s data plane for easy reference. There is only one plane because the bitmap is not planar.
We briefly digress to explain how we obtain the colors of the pixels of the image. There is a function XGetPixel, which we can use to read the pixels of the image, which are unsigned long integers. If we are not on a TrueColor visual, then we obtain the color components for this pixel using the window’s color map with a call to XQueryColor, otherwise we extract the bits for each color from the pixel’s value. Once we have these, it is easy to write them into the bitmap data plane.
We do TrueColor visuals first. The visual contains a bit mask for each color component. We must determine where in the pixel value the component starts, and how many bits it takes up. Therefore we declare a “shift” variable (offset, i.e. position) and a “bits” variable (number of bits) for each color component.
We apply the following procedure to each mask. First shift the mask to the right as long as the lowest bit is zero. The number of shifts indicates the position of the mask in the pixel. Next shift the mask to the right while the lowest bit is one. The number of these shifts yields the number of bits of the color component. We use at most eight bits. If there are more than eight bits, then we increment the position counter to skip over the least significant bits so that only the eight most significant bits remain.
We iterate over the entire image and extract the pixel values for each position (x,y).
We write the color components directly into the data plane. There are three steps for each component. Shift it to the right so that the component’s bit sequence starts at bit zero. Compute a mask that is as long as the number of bits of the component and consists entirely of ones, then extract the value of the component by computing the bitwise and of the component and the mask. Finally, if there were less than eight bits to the component, then pad it with zeros at the right so that it is eight bits long. We output debugging information once the entire image has been processed.
We now treat the case when we must use XQueryColor to obtain the color. There is only one problem: it is very slow to invoke XQueryColor for every pixel. Hence we cache colors locally. The cache consists of an array of pixels and an array of colors. The entries at the same index of these two arrays yield the pixel and the corresponding color. The index for a pixel is the pixel value modulo the cache size. (This is a kind of hash.) We also have an array that indicates whether a particular entry has already been written to. The latter array is initialized so that all entries are marked as empty. You may want to experiment with different sizes of the cache and observe how the performance of the application changes.
We may now iterate over the pixels and write into the bitmap’s data plane. Every pixel of the image at some position (x,y) is retrieved.
Next we compute where the color would be in the cache if indeed it has been recorded. Only if we know that the corresponding entry is not empty and its pixel value matches the current pixel may we use the color from the cache. We use XQueryColor in all other cases and retrieve the color from the server. The new color is recorded in the cache.
We now have the color components at the location (x,y) and compute the offset into the data plane. The components are actually sixteen bits, but we work with eight bits per sample and use only the upper eight bits.
The bitmap image representation contains the image at the end of the loop. We are done with the XImage and it may be freed. We allocate an image of the right size and make the bitmap its (only) representation. The image is retained and there is a method to access the image.
The complete functionality of XWinData has now been implemented. The method dealloc releases the array of children and the image should they have been allocated.
The class ImageView is quite simple. Like NSImageView, it stores an image for display. The initializer is the same as in an NSView. The image may be recorded and retrieved, and the method drawRect does the actual drawing as is standard with views.
The initializer clears the sole instance variable and invokes the initializer for NSView.
Getting and setting the image is easy, but the appropriate retain/release messages must be sent.
The actual draw code clips to the rectangle that is being asked for and commands the first representation to draw itself (a bitmap in our case),
The controller object stores the browser and the image view, which is inside a scrollview that does not need to be stored. It also stores the root node of the window tree.
There is an initializer that invokes the method collectVisibles... to collect visible on-screen windows before the application object is created and the application launched. This is so that the application’s windows do not obscure parts of the screen, whose contents we want to grab.
The controller acts as a passive delegate of the browser and implements the appropriate methods. The key idea is to store the nodes, which are XWinData objects, as “represented objects” in the browser’s cell. This is the hook that connects the browser to the window tree. There is a method that responds to the browser’s cells being selected. It must display the corresponding image.
The controller is also the application’s delegate and assembles the main window after the application has been launched.
We need a method that will save the currently selected image to a file when the button in the menu is pressed. That button should only be clickable after a window has been selected in the browser, hence the method to validate menu items.
The initializer is straightforward: obtain the display and raise an exception if there is a problem. Next obtain the screen and its root window. Then build the window tree.
Building the window tree is actually fairly simple. We rely on the procedure XQueryTree to walk the tree of X windows and build our tree of XWinData objects. The method collectVisibles... returns the node that it has created. We start at the root and recursively process children. The first step is to ask for the children of the current window.
The next step is to create the current node. We need the attributes of the window that the initializer of XWinData retrieves.
What follows is very important. We identify the windows that go into our tree (not all X windows do). The window must be viewable, it must not be an input-only window, and some part of it must be on screen. If all these conditions are fulfilled, then we record the success in the flag process (which tells us whether we should recurse later on) and we compute the image of the window. A window that does not fulfill these condition causes the corresponding XWinData object to be released (recall that we put it into an autorelease pool, so it will be deallocated at some point in the run loop).
The actual recursion follows. There is work to do if the current window was processed and has children. We allocate an array for these children and declare an index that we’ll use to iterate over the array children_return that we obtained from XQueryTree.
The loop iterates over the children and recursively creates a node for each. This node is recorded in the array object if it is not empty.
We sort the children by their names after all have been obtained. The sorted array is recorded in the instance variable children of the current node.
We must free the list of X windows if indeed there was one. The method returns the newly created node if the window fulfilled our three conditions and nil otherwise.
The controller is the passive delegate of the browser, and the required methods are straightforward. First, we must be able to tell the browser how many rows there are in a given column. This is easy. There is one row that contains the root window in column zero. Otherwise there is a column to the left of the column being loaded, one of its cells is selected, and the new column should display the children of the object represented by the selected cell; hence we ask the selected cell for the represented object and return the number of children of the object.
Second, we must initialize cells that are about to be displayed. A cell should be initialized with data from the represented object, which is the root window in column zero. For non-zero columns the object is the child at position row in the array of children of the selected cell, which is located in the previous column.
We read the properties of the represented object and record them in the cell. We must know if there are any children; this determines whether the cell is a leaf or not. The string value of the cell is the description string of the object, and the object itself must be recorded in the cell.
What happens when the user selects a cell in the browser? The method selectItem is the action that is invoked in this case. It retrieves the represented object and the image that it contains.
We place the image in the image view (which is inside a scrollview) and resize the image view to be the same size as the image. This completes our interface with the browser.
The remaining code is concerned with the assembly of the main window and the response to save-as-requests. The controller builds the main window after the application has finished launching. Recall that the window consists of an upper part (the browser) and a lower part (the scrollview with the image). Both parts have the same width and different heights. These are given by constants.
First compute the content rectangle to hold both the upper and the lower part. Then allocate and initialize a window for this size of content view, and set its minimum size and its title.
The upper part is next. Compute the frame rectangle of the browser and allocate and initialize it.
The browser should be width-sizable but its height should not change. It is not titled and does not allow multiple selections. There is a minimum value for the width of the individual columns.
Finally we connect the browser to the controller, which is its delegate and implements a passive delegate’s methods, as we have seen; the browser’s target is the controller and its action the method selectItem. We must place it in the view hierarchy.
The lower part contains the scrollview. We compute the frame rectangle and allocate and initialize it.
The scrollview should be width-sizable and height-sizable and it should have vertical and horizontal scrollers.
The document of the scrollview is an image view of the fast variety that we discussed above. It is initally empty and we choose a frame that will not cause the scrollview to draw scrollers. We set the image view to be the document view of the scrollview.
The window is fully assembled once we place the scrollview in the view hierarchy and order the window (now centered) to the front.
The penultimate method of the controller lets the user save images to files. The structure is very simple: obtain the selected cell and its image, try to write the bitmap image representation to a TIFF file and signal an error if you fail. We declare errno to have access to the systems error messages. We ask for the savepanel and restrict it to TIFF file names. Then we run the panel.
We proceed if the user clicked the OK button and ask the panel for the file name, the browser for the currently selected cell, the cell for the object it represents, the object for the image, and the image for its bitmap representation.
We read the bitmap’s TIFF representation into an NSData object and try to write this object to a file with the name that the user selected.
The remaining code in this method tries to construct a meaningful error message if we couldn’t write the file. The message contains the name of the file and the error string for the current value of errno, if the latter was set. The message is displayed with an alert panel.
We reset the cursor to an arrow should it still be the ibeam cursor (this used to happen sometimes, not sure if it still does).
The last method of the controller keeps the save-as menu item, whose tag is one, disabled if no window has yet been selected in the browser.
This very nearly concludes the discussion of our screen grab application. The routine main creates the autorelease pool, the controller and the application. The controller is created before the application so that the screen data do not include the tool’s windows, as explained earlier.
The application’s menu only contains two entries; one that quits the application and another that lets the user save images.
It remains to connect the controller to the application and run it.
This recipe was written with the online Xlib manual kindly provided by Christophe Tronche. Excercise: adapt this recipe to take window borders into account.