Tarantula and TextMate

  • Posted By Stuart Halloway on February 28, 2008

Tarantula is under active development as we use it internally to police our apps. If you grab the bits from this morning (R243 or later in the repository), you will see that stack traces in the log report now link back into TextMate.

We have a ton of features we would like to add, and I bet the community can think of plenty more. Please add comments to this post, or post into Trac, letting us know what features you would like to see next. Here's some possible choices to get you started:

  • a "Johnny Droptables" fuzzer that tries specific SQL injection attacks
  • docs detailing the kinds of errors we have been finding and how to fix them
  • an XSS fuzzer that tries to inject script tags (this is challenging because it isn't obvious how to automatically detect the symptom)
  • CSS validation
  • JS validation
  • UI features to make the reports more navigable and usable (be specific!)
  • integration with RSpec
  • blacklist of files your server should never return
  • Ajax crawling (Tarantula currently simulates plain old web requests)
  • Integration with other IDEs (you'll probably have to send us a tested patch because we're happy with TextMate)

Get your votes in today and we can look at them during open source Friday.

Tarantula vs. your Rails app

  • Posted By Stuart Halloway on February 26, 2008

The Tarantula is a fuzzy spider. It crawls your rails app, fuzzing inputs and analyzing what comes back. We have pointed Tarantula at about 20 Rails applications, both commercial and open source, and have never failed to uncover flaws.

How does your Rails app stand up? It's easy to find out. Install the plugin, and create a Tarantula integration test: (Update: Note that Tarantula integration tests live in test/tarantula so that you can treat them separately in your cruise builds. For a substantial app or fixture set Tarantula can take a while to run!)

 
# somewhere in your test
require 'relevance/tarantula'            

# customize to match your security setup  
def test_with_login
  post '/sessions/create', :password => 'your-pass'
  assert_response :redirect
  assert_redirected_to '/'
  follow_redirect!
  t = tarantula_crawler(self)
  t.crawl '/'
end

Then rake tarantula:test, and then start looking through the Failures section of the HTML report.

Tarantula is just a baby now, but we plan to feed it until it is a lot bigger and meaner. Suggestions and contributions are welcome via the Relevance Open Source Trac.

Hat tip to Courtenay, whose SpiderTest plugin inspired me to go down this road. Also congrats to Mephisto, which is the best behaved app under Tarantula to date (only three problems, all minor broken windows).

Ruby puzzler: gsub, blocks, and procs

  • Posted By Stuart Halloway on February 13, 2008

See if you can guess what this code will do before you run it in ruby.

upc = Proc.new {|m| $1.upcase}

puts "hello world".gsub(/([aeiou])/, &upc)
puts "hello world".gsub(/(\w)/, &upc)

def doit(str, re, blk)
  puts str.gsub(re, &blk)
end

doit "hello world", /([aeiou])/, upc
doit "hello world", /(\w)/, upc

Now try running it in JRuby. Whoa.

Have you killed a design pattern today?

  • Posted By Stuart Halloway on February 13, 2008

Design patterns are the enemy of agility. They introduce repetition and accidental variation to your codebase. Design patterns encourage you to create "point solutions" throughout your application, instead of cleanly isolating concerns. And they will make your code refactor-proof, no matter how cool your IDE is. But there is hope: Catch your design patterns while they are young, and teach them to be library calls instead. Here's one example:

In Ruby, we often re-open existing classes and add instance methods. One approach is simply to open the class:

class NilClass
  def blank? 
    true
  end
end

Or, you could create a new module and mix it in:

module MyNilExtensions
  def blank?
    true
  end
end
class NilClass
  include MyNilExtensions
end

There are other approaches that are similar but not quite the same. In other words, this is a design pattern. From the Wikipedia entry:

A design pattern is not a finished design that can be transformed directly into code. It is a description or template for how to solve a problem that can be used in many different situations.

The problem with design patterns is the "not a finished design" part of the definition. Rather that a DRY solution, design patterns give you repetition throughout the code. Worse yet, the repetition is not exact. It is repetition with variation, and there is often no evidence whether the variation is intentional or accidental.

I like Ruby because I can eliminate design patterns when they start to annoy me. This "Open Class Add Method" pattern annoyed me for the last time earlier today, when two different libraries defined incompatible versions of Object#metaclass. Enough is enough. Let's make a library call for reopening classes.

Here are my design goals:

  1. The syntax should be terse. I hate defining a module and including it in two separate steps, I just want to open a block and go.
  2. I should have an audit trail for where new methods came from. (This is one reason to define a named model, because I can then reflect against it.)
  3. I should be protected from method collisions. This protection should be configurable. Collisions can be explicitly approved, or they can generate a warning, or an error.

These three goals are in conflict (and we could easily come up with more). This illustrates another problem with design patterns. Each time a design pattern is used, a programmer favors some design goals over others. Over time, this leads to a codebase at odds with itself. If the same pattern were captured in a reusable module, then changing design priorities could be handled from that module alone.

Here's a strawman proposal for cleanly adding methods to existing Ruby classes. The following code adds #jump to Object:

embrace{Object}.and_extend do
  def jump
    puts "jumping"
  end
end

The syntax is simple and involves just one block, meeting goal one. Behind the scenes, I use __FILE__ and __LINE__ to define a module, which gives us auditability (goal two):

> puts Object.ancestors
Object
Anonymous module from /Users/stuart/Desktop/temp.rb 56
Kernel  

Finally, the code that mixes in the module walks the inheritance hierarchy first, printing a warning whenever a name collision is encountered (goal three).

Warning: /Users/stuart/Desktop/temp.rb 64 is attempting to redefine jump
         Originally defined in /Users/stuart/Desktop/temp.rb 56  

The complete implementation is included at the bottom of this post. I am sure it can be improved in several ways, but even in its primitive state it beats a design pattern. As long as the API is decent, we can always make the implementation suck less later.

Should I make a gem out of this? What changes would you like to see in the API? How should the handling of method collisions be specified?

require 'set'
module Embrace
  class <<self
    def check_for_collisions(clazz, module_to_include)
      new_methods = Set.new(module_to_include.instance_methods(false))
      clazz.ancestors.each do |anc|
        anc.instance_methods(false).each do |meth|
          collision(anc, module_to_include, meth) if new_methods.member?(meth)
        end
      end
    end

    def collision(clazz, module_to_include, method)
      puts "Warning: #{module_to_include} is attempting to redefine #{method}"
      puts "         Originally defined in #{clazz}"
    end
  end
end

def embrace(&clazz_block)
  m = Module.new
  file = eval("__FILE__", clazz_block.binding)
  line = eval("__LINE__", clazz_block.binding)
  clazz = clazz_block.call
  meta = class << m; self; end
  meta.class_eval do
    def and_extend(&blk)
      self.class_eval(&blk)
      mixin_to_class
      self
    end
    define_method("mixin_to_class") do
      Embrace.check_for_collisions(clazz, m)
      clazz.class_eval do
        include m
      end
    end
    define_method("to_s") do
      "Anonymous module from #{file} #{line}"
    end
  end   
  m 
end

o = Object.new

embrace{Object}.and_extend do
  def jump
    puts "jumping"
  end
end

o.jump

embrace{Object}.and_extend do
  def jump
    puts "jumping higher"
  end
end

o.jump  

How should metaclass work?

  • Posted By Stuart Halloway on February 12, 2008

Facets defines metaclass like this:

def meta_class(&block)
  if block_given?
    (class << self; self; end).class_eval(&block)
  else
    (class << self; self; end)
  end
end
alias_method :metaclass, :meta_class

RSpec defines it this way:

def metaclass
  class << self; self; end
end

I just spent an hour figuring out why some carefully-tested code went no-op after adding RSpec to a project. As a community we need to commit to a standard definition here. What should it be?

Why they fear the meta

  • Posted By Stuart Halloway on February 11, 2008

Giles says It's Not Meta, It's Just Programming. Darn straight! The specific example he gives is the ability to add methods to specific instances, instead of to an entire class. As he demonstrates, this is invaluable for isolation when testing.

# written this way to demonstrate eigenclass syntax
class << @response
  def body
    ({:foo => "bar"}.to_xml(:root => "thing"))
  end
end

Combine eigenclass methods with open classes, and almost any idiom can be automated. If you regularly add stubbed instance methods for testing purposes, why not write a helper for just that? For many common tasks, including this one, the work is already done. Here is a more literate version of the above code:

# exact syntax depends on your choice of mocking library
@response.stubs(:body).returns(some_canned_response)

If you are a n00b, this power is scary. Once you start treating code as data, the elegance of your code is dependent on your skill. You cannot hide behind the limitations of your programming language anymore, because there aren't any.

Troubleshooting LoadErrors in Rails tests

  • Posted By Stuart Halloway on February 08, 2008

I am proposing a patch to help cope with the dreaded Rails LoadError:

LoadError: Expected foo.rb to define Foo

In Ruby, it is simple to load code, just require it. In script/console:

>> require 'account_controller'
=> ["AccountController"]

Rails extends this to magically find classes just based on their name:

>> AccountController
=> AccountController

Most of the time, if a class does not exist, you get a helpful exception:

>> AccountController
MissingSourceFile: no such file to load -- hpricot_scan

Ah, so my AccountController depends on hpricot, which isn't available for some reason. Solution: go find hpricot.

But once in a while this problem presents a different symptom:

>> AccountController
LoadError: Expected account_controller.rb to define AccountController

This is confusing, since account_controller.rb does define AccountController! Experienced Rails developers know that this cryptic message actually means "Something went wrong in application_controller.rb, but Rails swallowed the real exception." After being bitten by this on three different projects in the last two weeks, I decided to track the issue down. Turns out the problem is in how fixtures get loaded:

begin
  require_dependency file_name
rescue LoadError
  # Let's hope the developer has included it himself
end

After fixtures swallow the real MissingSourceFile for a subdependency such as hpricot, ActiveSupport raises a misleading LoadError for the original dependency (account_controller.rb) that references hpricot.

In a perfect world, I would simply have fixtures stop swallowing LoadErrors. But the comment strongly suggests that some code depends on this behavior. So weaker sauce is to at least log the problem:

def try_to_load_dependency(file_name)
  require_dependency file_name
rescue LoadError => e
  ActiveRecord::Base.logger.warn("Unable to load #{file_name}, underlying cause #{e.message} \n\n #{e.backtrace.join("\n")}")
end

If anybody knows how to distinguish the confusing LoadErrors from the expected ones, please go and improve the patch.

Layering and platform choice

  • Posted By Stuart Halloway on February 04, 2008

Over the last few weeks I have repeatedly linked to Ola's post about the stable layer. I didn't take the time to go into detail, and I trusted that people (if they wanted to) could follow the link and understand what Ola was talking about.

Well, that didn't work so well. Most responders clearly did not understand Ola. A few informed me that I didn't understand Ola. :-) So I am going to make a clean break, and lay out my own argument in more detail. What follows are my views about how layered architecture affects language and platform choice. First, some ideas that are hopefully uncontroversial:

  • Good design is layered.
  • Leakage between layers should be minimal.
  • Features within a layer should be orthogonal, and should not have to be re-implemented in higher layers.
  • All kinds of programs benefit from this kind of layering, including languages, libraries, frameworks, and application code.

A small leap:

  • The lowest layers are the most important.

This might seem obvious. All other layers depend on the lower layers, so a problem at the bottom affects a lot of code. But if you are working several layers higher, problems at the bottom are part of the air you breathe. The air may smell terrible, but you are acclimated and don't notice.

A big but uncontroversial leap:

  • Java, the VM, is a good VM for the bottom layer.

This is uncontroversial because the majority has chosen. And they are right to do so: the Java VM is well-specified, widely implemented, carefully optimized, and supported by a huge array of tools.

A mistake:

  • Java, the language, is a good language for the bottom layer.

Noooo! Java is a high-ceremony language. At every turn, Java enforces a high busy-work/real-work ratio. Specifically:

  1. Java's checked exceptions bloat code, make components harder to use and maintain, and lead to tons of boilerplate code, each line of which is a bug-in-waiting.
  2. Java's new operator/constructors cannot pick a return type. The amount of code that exists only to work around this is staggering. Two entire cottage industries have sprung up to deal with this single issue: factory patterns and dependency injection.
  3. Java has no metaprogramming features to automate common tasks such as field accessors, standard constructors, and simple delegation.
  4. Primitives, functions, and classes are not first-class objects, leading to huge code bloat to deal with these types specially.
  5. Java's core reflection and interception capabilities are clunky, requiring tons of bolt-on technologies to make them workable, including AOP, annotations, and code generators.

That's a pretty big stink, but if you are used to it you probably can't smell it anymore.

The net result of these problems is that bottom layer code written in the Java language will be bloated and difficult to maintain. These problems multiply if we use the Java language for higher layers as well. What should we do?

Keeping the VM, avoiding the language

For better or worse, Java is already the bottom layer for many businesses. A complete rewrite is impossible, so we need an approach that lets us continue to use our existing Java code. There are two obvious choices:

  • Use a framework to hide Java's most glaring flaws, and continue to use the Java language for development. The most popular option here is Spring. Spring provides framework-level fixes for several problems in Java: dependency injection, unchecked exception wrappers, and a powerful AOP capability, to name a few.
  • Use a better VM language. There are lots of choices, including Clojure, JRuby, Rhino, and Groovy. All of them can interoperate nicely with existing Java code.

My opinion, based on extensive experience with both options, is that the "better VM language" approach is better than the "fix Java with a framework" approach.Smaller is better.

Some advice

In the past few weeks, I have been approached by several organizations to advise them on platform decisions. Every organization is different, but here are some guidelines to consider.

  • Your team matters far more than your language. Pick a platform at random, then take your platform analysis budget and spend it finding good team members and helping them get better.
  • Your process matters far more than your language. If your team is not delivering real business value on a regular, repeating timeframe, stop worrying about your platform and start worrying about things like estimating, agility, testing, and continuous integration.
  • The static/dynamic languages debate is a red herring. The Java language's problem is ceremony, not static typing. Use whatever combination of static and dynamic typing works for you. [1]
  • There can be more than one! The Java VM simplifies the interop and deployment story. So quit trying to decide, and try a few different JVM languages.
  • The hot new JVM languages have different syntaxes, but similar features. They all solve the problems with Java that I enumerated above. Throw a dart at the wall, pick one, and get started coding.
  • Beware "Use the right tool for the job." This is true, but useless without context, and it is becoming the weapon of choice for pundits who write no code. Be a polyglot, but also be articulate about why tool X is the right fit for job Y.
  • Stop writing plain old Java code. Groovy obsoletes plain old Java. We ought to just say "Java 7 = Groovy" and move on.

Keep an open mind. Try several approaches. Judge your choices by how easy they would be to unmake or adapt. Have fun!

Notes

[1] In the past I have had a lot to say about static/dynamic typing. I realize now that I was trying to talk about ceremony. I am still worried about the same problems, but I think I now know them by more accurate names.