jang: (Default)
[personal profile] jang

Okay, so I'm at a local Higher Education institution, doing something completely different. In this case, it's software development. The platform is Python; specifically, Django - because the other members of the development team tend to use this (or have a "strategy" to use this for future projects).

Why Python isn't my favourite language

Apart, that is, from the old chestnut of else being used "consistently" throughout the language, as defined by someone for whom English isn't their first language. Yes, when squinted at, you might claim that every else has the same semantics. Kind of. But they aren't implied by what I understand else to mean. I've never met anyone (who doesn't know the answer) who correctly predicts what

for i in some_generator_function():

actually does. But the real problem with Python, as far as I'm concerned, is also arguably one of its strengths, especially as a teaching language and a tool for experimentation. I would, admittedly emotively, describe this as follows: Python permits you to type without thinking. Or at least, without thinking about typing.

This is a consequence, side-effect, or just naturally goes hand-in-hand with duck-typing, perhaps (although Go does it better and with type safety verifiable at compile time). One of the reasons you need Python to support an experimental approach, I think, is because you can't tell in advance what types you're working with.

Here's a concrete example of Python's softness, to illustrate exactly what I find frustrating about it. This is the Pydoc documentation, automatically generated, for getPeerCertificate from the Twisted library. What does it do? It "returns an object with the peer's certificate info."

Well, that's great, but how do I inspect that object to find out more about it - like, say, the CA who signed the client's certificate, if any? The answer is, you can't tell, without reading the source. And because Twisted uses Zope's interface construct (this is not a weakness; this is programming in terms of types, of which I wholeheartedly approve), you have to have a more cunning source browser than simply clicking on the (source).

Thinking about types is not hard

Contrast this with the Java equivalent - which I had to hunt around for a while to find; Oracle've shifted all the documentation away from their now empty and glossy front page. What does getPeerCertificates() return? An array of Certificates, which, if you click on the type, you will be able to find out more about rapidly (you'll still need to drill down into the implementing subclasses, but the documentation is all there).

This support means that IDEs, documentation generators, and type-checking tools have more information to work with before you have to resort to unit testing. The cost is a little up-front thinking about your types; but writing the API (the interfaces) first and then looking at the implementation details is - to me, at least - a very natural way of working.

Pythonistas, of course, will decry this as heresy. "Never typecheck!" they'll say, "just use the object as you expect it to be used." I am not putting words into people's mouths here; my esteemed colleague Doctor Google will find you many instances of this received wisdom.

What they mean is, never manually typecheck, because the language doesn't have declarative type assertions, so don't write

if isinstance(x, Number):
  raise SomeException() # or some shit like that

- which is something I can agree with; it's ugly as hell. What I disagree with is the use of the adverb in their imperative. Just use the object as you expect it to be used? What happens if it's something complicated (like an X509 certificate, as opposed to a callable function (that returns what, exactly?), or a str, or a Number)? Or some other domain-specific object where the methods are not immediately apparent to me? What happens, in other words, if the programmer has no expectations because they are learning to deal with something new?

What happens is that this language, which is great for teaching, fails the learner.

It's not just inexperienced programmers who suffer this. I like to think that I know what I'm doing*; typically, a little bit of thought will make you familiar with how a piece of code is going to work, without you ever having seen (or used) it. "Oh yeah," you'll say, "that's a certificate. It probably has some principal information, a way to extract the embedded fields; perhaps something to give me a fingerprint; and there'll probably be a way of checking that there's a working chain of trust. If I'm really lucky there'll be some kind of revocation support using an OLRS, although that might wind up in a support class, I suppose." That's great; we've correctly anticipated a bunch of patterns (in the small) we expect to show up in the code. Unfortunately, we still don't magically know what those methods are called. That's where I lean on automatically-generated documentation and my IDE - only, in this case, neither can help me.

* I have had a vendor point-blank ask me if and how I got hold of their source-code after I raised a bug, correctly diagnosed what was causing it and even made a - as it turned out, 100% accurate - stab at sketching out the broken C that caused it, with a fix. I'd not seen it, of course - just asked myself, "if I was going to do this wrong, what's the most likely way I'd do it?" I describe myself as a really good guesser; what I mean is, I've been around the block a few times.

Programming is an exercise in creating and describing types

Python being a dynamic language, there are, obviously, typechecking libraries that you can use. Prior to Python 3000, these typically use the annotation syntax, which is fine (and we'll probably be adopting one of these if I have anything to say about it); although others overload the documentation string mechanism to embed type information in the docstring. Ironically, because Python doesn't supply a way to uniformly do this, there's more than one way to do it; this is famously not the Python motto.

For all of that, it's still a nice little language - almost on a par with Javascript, in my estimation - and there are lots of natty frameworks out there for it.

Looking downcast

Anyway, we're using Django. It's one of those natty frameworks. Ironically, Django is very pluggable; you can strip out its ORM layer (read on), strip out its templating layer (which we might have to do), and be left with a complicated way to run unit tests and a nagging question that there ought to be a simpler way to do it. However, there's a lot to be said for adopting a set of conventions and rather than invent our own, having someone take a view for us will save us some time.

That brings us onto the meat of today. Django supports inheritance in its models. That is very fortunate, I think, because there are some natural constructs that play very nicely in a language with rich object-orientation features like Python's. The Composite, in particular, fits a whole slew of problems with various web-based document editing conundrums.

The first problem, which is pretty typical of many ORMs, is that One-To-Many is often reflected in the relational world by having Many-To-One. Yeah, JPA and Hibernate have multiple ways of doing this, but basically if something is going to be contained, it needs to know about it. That is, potentially putting something into a container has an invasive effect upon the definition of the thing contained. Crikey; I haven't done that since I was hacking C! It's a shame, really, because lots of relational databases - including the one we're kicking off with - have the ability to do things the other way around. Yes, it's denormalised, but it's a lot faster; we're looking at requiring rather fewer than 1,200 queries to render a page.

Anyway. The second problem is that, on the face of it, inheritance is a bit of a problem, because whilst polymorphism is incredibly useful, Django doesn't downcast when you deserialise objects from persistent storage. Rather, it gives you concrete instances of the type you're talking about. That's so much of a pain in the arse that we might be throwing Django's ORM away in favour of one that groks downcasting.

Alternatively, I'm going to spend tomorrow morning mired in the Django docs (and, probably, the source code) looking at the cheapest and speediest way to get a models.Manager to do type refinement properly on instantiation. I think that this should be easy; but then, I also think that this should be an FAQ, and it doesn't appear to be.

What I also learned today

As a client of a particular local Higher Education institute's IT Services, I have to say the feedback isn't great. "When is your account getting turned on?" I was asked, promptly, by a good friend of mine in response to a ticket I raised to get access to some things sharpish - the institution has, after all, known that I've been coming to work for them for a couple of weeks, and yet it's still not clear what state my personnel records - and the concomitant accounts in AD - are in. "I don't know, perhaps you can tell me," I had to respond. I was able to give him a ticket number, but when I called to check, the phones weren't being answered.

My suspicion is that my account will be enabled magically and silently and I'll get an obstreperous and sarky response if I ask if it's been done yet. "Psychically knowing about all changes to AD object status" still isn't in my job description.

Having said that, my desktop was in place and working perfectly when I popped in last week to pre-install it, and everything that depends on the local sysadmin and/or the local IT support girl (who is a total fox, incidentally, and who invited me out to lunch on my first day) has been done in plenty of time with nary a hitch.

Oh, for what it's worth, I'm pretty much convinced that the right thing to do is to throw away Django's ORM in favour of SQLAlchemy, because it does all this - and so much more - out of the box. Yes, type discriminants are ugly; but they are a necessary evil if you want your ORM to go fast (the alternative is a bunch of speculative joins; an approach that SQLAlchemy supports for a single database round-trip). What we lose, principally, is the admin interface (for the bits that don't use Django's ORM), and perhaps some of the manage.py capability. But it still looks worth it..!


jang: (Default)

October 2011

23 242526272829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 22nd, 2017 05:08 pm
Powered by Dreamwidth Studios