Start Small: January 2011

Sunday 30 January 2011

Talking to gnuplot by pipes

Drawing a graph shouldn't be difficult and Gnuplot indeed does make it simple to draw excellent graphs of all sorts of data. This little Python snippet shows how simple it is to talk to a Gnuplot subprocess from a Python program to draw a graph.

If you have an elaborate dataset that you want to explore in various ways it is probably easiest to run Gnuplot and refer to this file in the plot command. However in may other situations it is often more convenient to let Python talk to gnuplot via a pipe, writing commands and data directly to a gnuplot subprocess, with the possible added benefit that running Gnuplot as a complete separate process may utilize a multi-core processor better than Python can on its own.

Gnuplot is capable of interacting with another process over a pipe although on Windows you'll need not the main Gnuplot program but the executable pgnuplot.exe which is part of the distriubution.

Of course the wish to talk to Gnuplot this way is not a unique one. A fairly elaborate module exists but it doesn't look that actively maintained as it does not support Python 3.x. Moreover, it depends on the numpy package which is excellent but a bit heavy handed in many situations.

So what we are looking for is the simplest way to communicate with Gnuplot to draw a graph from an array of x,y values. The snippet below shows how this can be accomplished:

from subprocess import Popen,PIPE

gnuplot = r'C:\gp45-winbin\gp45-winbin\gnuplot\binary\pgnuplot'

data = [(x,x*x) for x in range(10)]

plot=Popen([gnuplot,'-persist'],stdin=PIPE,stdout=PIPE,stderr=PIPE)

plot.stdin.write(b"plot '-' with lines\n")
plot.stdin.write("\n".join("%f %f"%d for d in data).encode())
plot.stdin.write(b"\ne\n")
plot.stdin.flush()

The gnuplot variable points to the pipe capable gnuplot executable. On windows this is pgnuplot, on Unix-like operating systems the main executable will work just fine as well. The data variable hold a list of (x,y) tuples.

The important bit is in the Popen() call. The first argument is a list consisting of the program name and any options we wish to pass to this program. In this case we add the -persist option to prevent closing the window containing the graph. We also indicate that all input and output streams should be pipes.

The final statements show how we pass content to the input stream of the subprocess. Note that the write() method of such a file like object expects bytes, not a string, so we either provide bytes literals like b'' or use the encode() method of a string. The plot statement we send to Gnuplot is given '-' as the file name which will cause any subsequent lines sent to it to be interpreted as data, up to a line containing a single e.

Sunday 23 January 2011

CherryPy Arguments vs. Sqlite's lazy typing

Some bugs prove extremely hard to squash as this little tale shows.

Look a this bit of code:

import sqlite3 as sqlite

with sqlite.connect("/tmp/testdb.db") as conn:
 conn.execute("create table if not exists test ( a default 1)")
 conn.execute("insert into test default values")
 
 c =  conn.cursor()
 c.execute("select count(*) from test")
 print(c.fetchone()[0])
 
 c.execute("select * from test where a = ?",(1,))
 print(c.fetchone()[0])
 
 c.execute("select * from test where a = ?",('1',))
 print(c.fetchone()[0])

All it does is creating a table with a single column called a which has a numerical default of 1.

As expected the first print call produces a positive integer (incremented each time you run this snippet unless you delete /tmp/testdb.db in the mean time.) and the second call to print will select all records where a = 1, and printing the first column of the first record therefore will yield the number 1. Nothing special so far.

However, the final print call will raise an exception, stating that NoneType is not subscriptable because the select statement doesn't match any record. This seems logical as the a column contains numbers whereas in the final print call we select records matching the string '1' and a number is never equal to a string.

But what if we would change the table definition to

conn.execute("create table if not exists test ( a integer default 1)")

Now the final print call will succeed without a problem because sqlite will acknowledge the affinity of the a column and convert the string to an integer before comparing them.

This caused me some severe headaches because some of the queries in a CherryPy application I was working on returned different results although the queries where identical except from the location in the code, with identical arguments even.

But what might look identical to the eye proved to be different on close inspection: the column I checked was declared with an integer affinity and in the piece of code that did a successful query, the argument to compare against was an integer. The faulty code however was in a CherryPy method that received its arguments from the outside (a GET request from a browser) and all incoming arguments in CherryPy are strings!

Moral of the story: even though Sqlite allows you to assign any type of value to any column just like Python variables, be careful when comparing values!

Sunday 16 January 2011

Moon phases with PyEphem

Calculating all sorts of astronomical data including the phases of the moon is a piece of cake if you use the right tools. Combining the fantastic PyEphem package with the powers of Gnuplot and a suitable font will provide you with an actractive illustration of the path of the moon through the sky.

Let us first have a look at what we may achieve. The illustration shows the path of the moon through the sky as calculated for the Netherlands on January 11, 2011. The path of the sun is displayed as well for comparison. The horizontal axis shows the compass direction while the vertical axis shows the elevation above the horizon. The path of the moon is plotted with a resolution of 15 minutes and only for those parts that are above the horizon. On top of the path we have plotted an image indicating its phase for every hour together with the time. Now how did we put this together?

The essential ingredients are the PyEphem package,Gnuplot and a suitable font. The first thing we have to do is calculate the position of the moon as seen from our location and its phase. This is actually quite straightforward:

from datetime import date
from math import radians as rad,degrees as deg

import ephem

g = ephem.Observer()
g.name='Somewhere'
g.lat=rad(52.0)  # lat/long in decimal degrees
g.long=rad(5.2)

m = ephem.Moon()

g.date = date.today()# local time zone, I'm in UTC+1
g.date -= ephem.hour # always everything in UTC

for i in range(24*4): # compute position for every 15 minutes
    m.compute(g)

    nnm = ephem.next_new_moon(g.date)
    pnm = ephem.previous_new_moon(g.date)
    # for use w. moon_phases.ttf A -> just past  newmoon,
    # Z just before newmoon
    # '0' is full, '1' is new
    # note that we cannot use m.phase as this is the percentage of the moon
    # that is illuminated which is not the same as the phase!
    lunation=(g.date-pnm)/(nnm-pnm)
    symbol=lunation*26
    if symbol < 0.2 or symbol > 25.8 :
        symbol = '1'  # new moon
    else:
        symbol = chr(ord('A')+int(symbol+0.5)-1)

    print(ephem.localtime(g.date).time(), deg(m.alt),deg(m.az),
      ephem.localtime(g.date).time().strftime("%H%M"),
      m.phase,symbol)
    g.date += ephem.minute*15

If you run the script a list of numbers is produced that may be processed by Gnuplot. The list looks something like:

00:00:00.000002 7.91511093193 276.570014287 0000 45.1640434265 G
00:15:00.000002 5.7418040753 279.412684378 0015 45.262512207 G
00:29:59.000002 3.61571815838 282.256638545 0029 45.3610076904 G
00:44:59.000002 1.57919683222 285.110182297 0044 45.4595336914 G
...

The first column is the time, the second the elevation above the horizon, the third the azimuth (the compass direction), the forth the time again in a compact HHMM format and the final column holds a letter that will be used to display the phase of the moon with the help of the moon_phases.ttf font.

Assuming you saved the data in a file m.dat the following bit of gnuplot will create a picture called moon.png similar to the one at the top of the page (if you saved it as moon.plot you can run it with gnuplot moon.plot)

set terminal png truecolor nocrop font "/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf" 12 size 640,480
set output 'moon.png'
set grid xtics nomxtics ytics nomytics noztics nomztics \
 nox2tics nomx2tics noy2tics nomy2tics nocbtics nomcbtics
set grid layerdefault   linetype 0 linewidth 1.000,  linetype 0 linewidth 1.000
set xtics border in scale 1,0.5 mirror norotate  offset character 0, 0, 0
set xtics 45 norangelimit
set ytics border in scale 1,0.5 mirror norotate  offset character 0, 0, 0
set ytics autofreq  norangelimit
set xlabel "Azimuth (compass direction, in degrees)"
set xlabel  offset character 0, 0, 0 font "" textcolor lt -1 norotate
set xrange [ 45.0000 : 315.000 ] noreverse nowriteback
set ylabel "Elevation (degrees above horizon)"
set ylabel  offset character 0, 0, 0 font "" textcolor lt -1 rotate by -270
set yrange [ 0.000000 : 90.0000 ] noreverse nowriteback
plot '/tmp/m.dat'u 3:2 w l lc 4 lw 2 t "moon path", '/tmp/m.dat' u 3:2:(10) every 4 w circles t "" lc rgb "#ddddff", '/tmp/m.dat' u 3:2:6 every 4 w labels offset character 0,character -1 font "/home/michel/sitescripts/sunrise/moon_phases.ttf,30"   t "", '/tmp/m.dat'u 3:2:4 every 4 w labels font "/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf,8" tc rgb "#dd0000" t ""
#    EOF

The crucial bit is the final (and very long) plot. It draws several subplots:

a line representing the path of the moon
a series of circles with a radius of 10 in pale blue to be used as a backdrop for the moon phase characters
a series of labels that represent the phases of the moon (you must explicitely point to the location of the font)
and a final set of labels that use the values from the column with the short time format

Saturday 8 January 2011

Forward references

Forward references are sometimes necessary and in interpreted languages like Python almost never a problem. In class definitions however, it is. In this article we examine a solution for this problem with the use of the built in globals() function.

Consider the following use case: we have two classes A and B and we want the __init__() method of both classes to check that any arguments passed to it are instances of the other class, i.e. the A() constructor will only accept B instances as argument while the B() constructor only accepts A instances.

This isn't a problem per se because Python is an interpreted language. Is this context that means that a reference to any class name in for example the __init__() method will be resolved at run time. The __init__() method of the first class to be defined can therefor use an expression like issubclass(argument.__class__,B) without a problem, even if B is not yet defined.

But what if we want to parameterize this approach? Say we want to store the class that the arguments have to check against in a class variable. Such an assignment is executed while the class is being defined and therefor a forward reference (a reference to a class that is defined later in the file) will raise an exception. The following snippet of code illustrates the problem.

class A:
    argumentclass = B

    def __init__(self,*args):
        for a in args:
            if not issubclass(a.__class__,self.argumentclass):
                raise TypeError('argument not a subclass of' ,
                   self.argumentclass,a)
        self.refs = args

The code in line 2 will raise a NameError exception. There is a solution however: if we store the name of the class we can easily check if an argument is a sub class of a class with that stored name if we have a means of retrieving a class object by name. Fortunately that is quite simple by way of the globals() function. This built in function will return a dictionary of all objects in the current globals scope. By the time the __init__() method is executed this will hold both the A and the B class. The adapted code may look something like this:

class A:
    argumentclass = 'B' 
    def __init__(self,*args):
        for a in args:
            if not issubclass(a.__class__,globals()[self.argumentclass]):
                raise TypeError('argument not a subclass of' ,self.argumentclass,a)
        self.refs = args

This approach even allows for easy subclassing:

class X:
    argumentclass = None 
    def __init__(self,*args):
        for a in args:
            if not issubclass(a.__class__,globals()[self.argumentclass]):
                raise TypeError('argument not a subclass of' ,self.argumentclass,a)
        self.refs = args

class A(X):
    argumentclass = 'B'

class B(X):
    argumentclass = 'A'

Saturday 1 January 2011

Python Geospatial Development, full review

Erik Westra did an excellent job here: delivering a book that is both comprehensive and easy to read. Of course you already need to know a bit of Python to start, but all topics are explained very well and the book is very hands-on and task oriented.

The integration of Google maps in almost everything from telephone directories to pizza ordering services and on-line news services shows that there is a huge appeal to providing relevant geographical information in all sorts of contexts. But where do you start if you want to develop such functionality in Python? This book is certainly an excellent starting point.

Erik covers basically any subject from necessary geometrical concepts like units, datums and projections, just to get you started, to where to get the needed Python libraries and more importantly where to source good basic data like maps, positions of cities, shapes of countries and shorelines, etc, preferably from free sources. As geographical information used to be hard to get and often expensive, this attention to data sources is a very valuable part of the book. I particularly liked his extensive coverage of the collaborative openstreetmap project, an extensive source of geographical data collected by a host of people around the world.

With these resources at hand the next steps are directed to manipulating geospatial data and generating all sorts of maps. The storage of data in databases with specific geospatial extensions to handle large datasets is covered in depth with specific examples for PostGIS, MySQL ans Sqlite. Also the rendering of maps is explained in a detailed manner focusing on the Mapnik library. The final part of the book is dedicated to developing a web application to work interactively with map data. As performance is a key issue when working with large maps, even the implementation of a caching tile server (a server that generates small parts of a map on demand as you browse and zoom over the map) is implemented and explained in detail.

All in all a near perfect book for everybody who wants to start developing map based applications. Its solid coverage, excellent reference material and detailed explanations certainly make it a five out five for me.