property-based
testing approach.
I also mentioned the John Hughes talk, which is great.
But there’s a catch.
Until now, we’ve been considering property-based
testing in a functional way, where properties for functions depend only on the function input, assuming no state between function invocations.
But that’s not always the case, sometimes our function inserts data into some database, sends an email, or sends a message to the car anti-lock braking system.
The examples John mentions in his talk are not straightforward to solve without Erlang’s QuickCheck, since they’re verifying state machines behavior.
Since I’m a Clojurist, I was a little confused about why I couldn’t find a way to do that state machine magic using test.check
, so bugging Reid and not reading the fine manual was the evident answer.
Thing is, test.check has this strategy of composing generators, particularly using bind
you can generate a value based on a previously generated value by another generator.
For instance, this example shows how to generate a vector
and then select an element from it:
1 2 3 4 5 6 7 |
|
But it doesn’t have a declarative or simple way to model expected system state.
First thing we should think about is, when we do have state? Opposed to a situation when we’re testing a referentially transparent function.
The function sort
from the previous post is referentially transparent, since for the same input vector, always return the same sorted output.
But what happens when we have a situation like the one described in the talk about this circular buffer?
If you want to test the behavior of the put
and remove
API calls, it depends on the state of the system, meaning what elements you already have on your buffer.
The properties put
must comply with, depend on the system state.
So you have this slide from John’s presentation:
With the strategy proposed by QuickCheck to model this testing problem:
So we need to generate a sequence
of commands and validate system state, instead of generating input for a single function.
This situation is more common than you may think, even if you’re doing functional programming state is everywhere, you have state on your databases and you also have state on your UI.
You would never think about deleting an email
, and after than sending the email
, that’s an invalid generated sequence of commands(relative to the “email composing state”).
This last example is exactly the one described by Ashton Kemerling from Pivotal, they’re using test.check
to randomly generate test scenarios for the UI. And because test.check doesn’t have the finite state machine modeling thing, they ended up generating impossible or invalid action sequences, and having to discard them as NO-OPs
when run.
The problem with Ashton’s approach for my situation, was that I had this possibly long sequence of commands or transactions, where each transaction modifies the state of the system, so the last one probable makes no sense at all without some of the in-the-middle ocurring transactions.
The problem is not only discarding invalid sequences, but how to generate something valid at all.
Say you have 3 possible actions:
[id, name]
[id, name]
[id]
If the command sequence you generate is:
1 2 3 |
|
The delete
action depends on two things:
id
for deletion from the ones already added.It looks like something you would do using bind
, but there’s something more, there’s a state
that changes when each transaction is applied, and affects each potential command that may be generated afterwards.
Searching around, I found a document titled Verifying Finite State Machine Behavior Using QuickCheck Eqc_fsm by Ida Lindgren and Robin Malmros, it’s an evaluation from Uppsala Universitet to understand whether QuickCheck is suitable for testing mobile telephone communications with base transceiver stations.
Besides the evaluation itself which is worth reading, there’s a chapter on Finite State Machines I used as guide to implement something similar with test.check
.
There’s a nice diagram in the paper representing the Erlang’s finite state machine flow
We observe a few things:
Translating that ideas into Clojure we can model a command protocol
1 2 3 4 |
|
So far, we’ve assumed nothing about:
Since we’re using test.check
we need a particular protocol function generate
returning the command generator.
Having a protocol, lets define our add
, edit
and delete
transactions.
Questions to answer are:
Our expected state will be something like this:
1 2 3 |
|
So our add
transaction will be:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
The highlights:
conj
the transaction into.generate
function returns a standard test.check
generator for the command.exec
function applies the generated command to the current system state and returns a new state.Now, what’s interesting is the delete
transaction:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Note the differences:
delete
can only be executed if the people list actually has someone insideid
from the people in the current state (using gen/elements
selector)So how do we generate a sequence of commands giving a command list?
This is a recursive approach, that receives the available commands and the sequence size to generate:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The important parts being:
one-of
after filtering preconditions.0
just finish, otherwise recursively concat the rest of the sequence.(exec (get commands (:type cmd)) state cmd)
, where we need to retrieve the original command object.If you would like to generate random sequence sizes, just bind it with gen/choose
.
1 2 3 4 5 6 |
|
Note the initial state is set to {:people []}
for the add
command precondition to succeed.
If we generate 3 samples now, it looks good, but there’s still a problem….
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
Each add-cmd
is repeating the id
, since it’s generating it without checking the current state, let’s change our add
transaction generator
1 2 3 4 5 6 7 8 |
|
Now the id
field generator checks that the generated int
doesn’t belong to the current ids in the state (we could have returned a uuid
or something else, but it wouldn’t make the case for state-dependent generation)
To complete the example we need:
Which is pretty straightforward, so we’ll talk about shrinking first.
If you were to found a failing command sequence using the code above, you would quickly realize it doesn’t shrink properly.
Since we’re generating the sequence composing bind
, fmap
and concat
and not using the internal gen/vector
or gen/list
generators, the generated sequence doesn’t know how to shrink itself.
If you read Reid’s account on writing test.check
, there’s a glimpse of the problem we face, shrinking depends on the generated data type. So a generated int
knows how to shrink itself, which is different on how a vector
shrinks itself.
If you combine existing generators, you get shrinking for free, but since we’re generating our sequence recursively with concat
we’ve lost the vector
type shrinking capability.
And there’s a good reason it is so, but let’s first see how shrinking works in test.check
.
test.check
complects the data generation step with the shrinking of that data. So when you generate some value, behind the scenes all the alternative shrinked scenarios are also generated.
Let’s sample an int vector generator:
1 2 |
|
The sample
function is hiding from you the alternatives and showing only the actual generated value.
But, if we call the generator using call-gen
, we have a completely different structure:
1 2 |
|
What we have is a rose tree, which is a n-ary
tree, where each tree node may have any number of childs.
test.check
uses a very simple modeling approach for the tree, in the form of [parent childs]
.
So this tree
Is represented as
1 2 |
|
Everytime you get a generated value, what you’re looking at is at the root of the tree obtained with rose/root
, which is exactly what gen/sample
is doing.
1 2 3 4 |
|
The shrinking tree you would expect for a generated vector is:
The more deep inside the tree, the more shrunk the value is. So for instance integers
shrink up to zero, and vectors
randomly remove elements until nothing is left.
If we were to actually look inside the shrunk vector tree, it would also include the shrunked integers, but you get the idea.
I said before our sequence doesn’t shrink since it’s generated recursively, so this is how our sequence tree looks like so far.
But even if we were using the vector shrinker we would end up with something like this:
Since the vector shrinker doesn’t really know what a valid command sequence looks like, it will just do a random permutation of commands, ending up with many invalid sequences (like [{:add 1} {:delete 2}]
).
We will need a custom shrinker, that shrinks only valid command sequences, with a resulting tree like this one:
To do that, we will modify our protocol to add a new function postcondition
.
1 2 3 4 5 |
|
postcondition
will be called while shrinking, in order to validate if a shirking sequence is valid for the hipotetical state generated by the previous commands.
Another important function is gen/pure
, which allows to return our custom rose tree as generator result.
So this is how our command generator looks like now:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
We see two different things here:
shrink-sequence
that generates the rose tree given the command sequence and intermediate states.The shrink-sequence
function being:
1 2 3 4 5 6 7 8 9 |
|
Highlights:
[parent childs]
.remove-seq
generates a sequence of subsequences with only one element removed.valid-sequence?
uses postcondition
to validate the shrinked seq.I’ve put together a running sample for you to check out here.
There’s only one property defined: applying all the generated transactions should return true, but it fails when there are two delete commands present.
1 2 3 4 5 6 7 8 9 10 11 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
If you look closely the failing test case has three add
commands, but when shrunk only two needed in order to fail appear.
Have fun!
I’m guilespi on Twitter.
]]>I had been previously exposed to some of the concepts of generative testing, particularly Haskell’s own QuickCheck, but never took the time to do something with it, this talk by John Hughes really stroke a chord on the usefulness of generative -or property based- testing, and how much effort you can save by knowing when and how to use it.
I’ve been using Clojure test.check
for a while, and since I’m preparing a conference talk on the subject, I decided to write something about it.
So bear with me, in this two-entry blog post I’ll try to convince you why down the road, it may save your ass too.
Probably the reason I’ve always looked down upon generative testing, was thinking it was just about random/junk data generation, for the too-lazy-to-think-your-own-test-cases kind of attitude.
Well, that’s not what generative testing is about.
You will have data generators for some input domain values, but trying to generate random noise to make the program fail is just fuzzy testing, and generative testing is more than that.
How?
I’ve written before about the difficulty of using types to prove your program is correct. Some people will always say you can do it with type systems(and types even more complex than the program under proof), and you can always use Coq.
But for everyday programming languages and type systems it’s not that easy, say for instance this Java
function (assuming such thing exists).
1 2 3 |
|
You can say just by looking at the function, that any integer except zero will succeed.
In this other case:
1 2 3 |
|
The function will succeed except when x
is null
.
So assuming that’s expected behavior, you can write some tests to check on those special failure cases.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
But for the sake of making and argument, assume you’re testing openssl
and this is the function you have…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
Unless you’ve been living under a rock, you should have heard about the heartbleed openssl bug, and it’s just what you think, the bug was in the heartbeat processing function above, and this is the patch with the fix.
Who was the motherfucker that missed that unit test, huh?
When the function logic is more complex, it’s exponentially more difficult to define both types and tests that make us feel more confident about pushing our nasty bits of code to a production environment.
And that’s because the possible states our system or function can be, expand like hell when new variables and conditional branches are added (more on this later).
Looking at the function above you can see the problem is not on some untested code path, but on some values used on function invocation.
Some people aim for 100% code coverage, according to Wikipedia
In computer science, code coverage is a measure used to describe the degree to which the source code of a program is tested by a particular test suite. A program with high code coverage has been more thoroughly tested and has a lower chance of containing software bugs than a program with low code coverage.
Which is great, but since you can have 100% code coverage of the 1/x
function, but regarding domain coverage (for which values of x
the function works as expected) you have nothing.
Code coverage without domain coverage is just half the picture.
Even unit tests prove almost nothing.
There’s a great quote by Edsger Dijkstra from Notes on Structured Programming that says
Program testing can be used to show the presence of bugs, but never to show their absence!
Which is to say, no matter how many unit tests you write, you’re only proving that your program works (or fails) for the set of inputs you have selected when writing your tests.
It doesn’t say a thing about the generalities or about a general property of the system or function under test.
So what is generative testing?
In generative testing you describe some properties your system or function must comply with, and the test runner provides randomized data to check if the property holds for that data, that’s why it’s also known as property-based
testing.
A property
is a high-level specification of behavior that should hold for a range of data points.
So a property works somewhat like a domain iterator, bringing a little bit closer types and tests.
Since you’re defining how the system should behave for a particular domain of values, not when the program is compiled, but when it’s run.
In the StrangeLoop 2014 conference, Joe Armstrong gave a talk called The mess we’re in, where he discussed system’s complexity, go watch it since it’s real fun.
He says that a C
program with only six 32 bit
integers, has the same number of states that atoms exist on the planet, so testing your program by computing all combinations it’s going to take a really long time.
And if it’s almost impossible to find the number of states computationally, imagine trying to find the number of possible failing states manually.
I’ve been in the position of having to hunt a bug that occurs only once a year in a system processing millions of transactions daily, and it’s not fun at all. Pray to the logging gods the proper piece of information revealing the culprit is logged, so you don’t have to wait another year for the bug to show up.
If your software runs inside a car, would you wait for the next deadly crash to analyze that dead-driver log file? Maybe that’s why Volvo uses QuickCheck to test embedded systems.
Generative testing helps you put and test your system in so many different states it would be impossible to do manually.
So, should we throw away all of our type systems and unit tests?
Not so fast, property based testing is not a replacement for types nor for unit tests.
Haskell and Scala both have their frameworks for property based testing (QuickCheck and ScalaTest) and are strongly typed languages.
Property based testing helps us define considerations for our programs where type systems do not reach, and where dynamically typed languages have a void.
So what does a property look like?
All concepts so far hold true for any language with a generative testing framework, many re-implementations exist from the original QuickCheck version, from C
, C++
, Ruby
, Clojure
, Javascript
, Java
, Scala
, etc. So now I will show you a couple of examples in different languages, just for you to grasp the basic property definition semantics, which is quite similar along the implementations.
These examples are not meant to show how powerful generative testing can be, yet.
Let’s say you want to test a sort
function of yours, and instead of specifying individual test cases for particular arrays of integers, you define a property, which says that after sorting the array, the last element should always be greater than the first one.
This is what the property looks like in Javascript’s JSCheck
1 2 3 4 5 6 7 8 9 10 11 |
|
You don’t say which particular arrays, just any array of integers must comply with the property, the framework will generate values for you (in this case 10 repetitions will be run).
This is the result:
1 2 3 4 |
|
Did you spot when the property doesn’t hold?
This is what the same property looks like in Clojure’s test.check.
1 2 3 4 5 6 |
|
With the following result:
1 2 3 |
|
As you see, both fail, since they doesn’t hold for single element arrays.
The basic semantic for both languages is the same, you need:
This encourages a higher level approach to testing in the form of abstract invariant functions should satisfy universally.
One of the best features of QuickCheck is the ability to shrink your failure cases to the minimum failing case (not all the implementations have it by the way).
When generating random data, you may end up with a failing case too big to rationalize (for instance a thousand elements vector), but it doesn’t necessarily means that all the 1000 elements are needed for the function under test to fail.
When QuickCheck finds a failing case, it tries to shrink the input data to the smallest failing case.
This is a powerful feature if you don’t want to repeat many unnecessary steps in order to reproduce a problem.
A simple example to illustrate the feature comes from test.check
samples.
Here a property must hold for all integer vectors, and it is that no vector should have the element 42
in it.
1 2 3 |
|
When the tests are run, test.check
find a failing case being the vector [10 1 28 40 11 -33 42 -42 39 -13 13 -44 -36 11 27 -42 4 21 -39]
, which is not the minimum failing case.
1 2 3 4 5 6 7 8 9 |
|
So it starts shrinking the failing case until it reaches the smallest vector for which the property doesn’t hold, which is [42]
.
Unfortunately JSCheck
doesn’t shrink the failure cases, but jsverify does, so if you want some shrinking on Javascript give it a try.
Since QuickCheck depends on generators to cover the domain, we need to consider those domains may be infinite or very large, so it may be impossible to find the offending failure cases. None the less, we know that by running long enough or a large enough number of tests, we have better odds of finding a problem.
Regarding the name, property-based
testing is a much better name than generative
testing, since the later gives the idea that it’s about generating data, when it’s truly about function and system properties.
The higher level approach of property definition, coupled with the data generation and shrinking features provided by QuickCheck, really helps the case of having something more closer to proofs about how your system behaves.
In the next post I’ll write about finite state machine testing using test.check
and show more complex examples, stay tuned.
I’m guilespi on Twitter, reach out!
]]>In the first post I showed what Clojure looks like in bytecode, and in the second post I did a quick review of the Clojure compiler and its code generation strategies.
In this post I’ll go deeper in the decompiling process.
Decompilers do not usually reconstruct the original source code, since many information meant to be read by humans (for instance comments)
is lost in the compilation process, where stuff is meant to be read only by machines, JVM
bytecode in our case.
So by decompiling I mean going from lower level bytecode, to some higher level code, it doesn’t even need to be Clojure, we already know from the last post that Clojure compiler loses all macro information, so special heuristics will be needed when trying to reconstruct them.
For instance, it’s possible for let
and if
special forms, to be re-created using code signatures.(Think pattern matching applied to code graphs)
As I’ve said in my previous post:
What use do I have for a line based debugger with Clojure?
My goal when decompiling Clojure is not to re-create the original source code, but to re-create the AST
that gave origin to the JVM
bytecode I’m observing.
Since I was creating a debugger that knows about s-expressions, I needed a tree representation from the bytecode that can be properly synchronized with the Clojure source code I’m debugging.
So my decompiling goal was just getting a higher level semantic tree from the JVM
bytecode.
Much of the work I’ve done was using as guide the many publications from Cristina Cifuentes, and the great book Flow Analysis of Computer Programs, which I got an used copy from Amazon. So all the smart ideas belong to them, all mistakes and nonsense are mine.
I already said I want a reasonable AST
from the bytecode, so decompilation process will be split in phases
If you want a decompiler that re-creates source code, you would add a fifth step called Emit source code.
As you smart readers would have probably noticed by now, it has some things in common with the compiling process,
only that we need to get to the AST
from compiled bytecode instead of doing it from a source file,
once you have the AST you can emit whatever you want, even Basic
.
Maybe I should have used a different name for this step, since stack unwinding is usually related with C++ exception handling, and refers to how objects allocated are destroyed when exiting the function and destroying the frame.
But in the general case, stack unwind refers to what happens with the stack when a function is finished and the current frame needs to be cleared.
And if we go to the dictionary - pun intended -
to become uncoiled or disentangled
I’m happy to say we will coil the uncoiled stack into proper statements.
As we saw in the second post of our series, the JVM uses a stack based approach to parameter passing
1 2 3 4 5 6 7 8 |
|
So our first step is about getting the stack back together.
Some statements are going to be quite similar to what the Clojure compiler already recognizes, such as IfExpr
representing an if conditional, but many statements at this stage won’t have a direct mapping in Clojure, for instance the AssignStatement
representing an assignment to a variable, does not exist in the Clojure compiler, and higher level constructs such as LetFnExpr
or MapExpr
won’t be mapped at this stage of low-level bytecode.
So a reduced list would look like:
So we’re dealing with less typed expressions/statements, just a small set of generic control structures.
One important thing when winding the stack is: in many cases statement compose, for instance an InvokeStatement
result may be used directly from the stack into a subsequent IfStatement
.
Let me show you.
Getting back to our previous example
1 2 3 4 5 |
|
Decompiled as
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Lines 0
and 1
are responsible for the (inc 1)
part of the code, decompiling to clojure.lang.Numbers.inc(1)
, which result is directly used in line 7
which compares with the long value 2
pushed on line 4
.
So our first decompiled statement on line 0
is an IfStatement
, which contains the InvokeStatement
inside.
1 2 3 |
|
When this step is finished all the low level concepts will be removed, and high level concepts were re-introduced, such as parameter passing.
But we’re still stuck with our damn Basic!
A Codeblock is a sequence of statements with no branch statements in the middle, meaning execution starts at the first statement, and ends with the last one.
The following is a control flow graph, with code blocks numbered from B1
to B15
.
Note we’re building a graph here, not a tree.
A tree is a minimally connected graph, having only one path between any two vertices,
when modeling control flow you can have many paths between two vertices,
for instance in our example B1->B2->B4->B5
and B1->B5
.
This is the first step of the control flow analysis phase, having identified the basic branching statements from the previous step, building the graph is straightforward.
Loop detection is one of the most difficult tasks when writing a decompiler.
Main reason is when you’re reading bytecode or assembly, you’re not entirely sure about the compiler used to generate that, you may be trying to decompile bytecode written by hand, which may never map to a known higher level construct.
For instance, there are a few higher level constructs identified with loops, which usually take the following form:
But then you may have a graph with the following improper looping structures:
Improper loops ranging from multi-entry or multi-exit, something you can find on goto
enabled languages, to parallel loops with a common header node, or entwined loops.
In our case, we can assume all bytecode we’re going to find is always created from a reasonable Clojure compiler,
and we can safely guess goto
support won’t be approved by Rich Hickey any time soon.
So, a loop needs to be defined in terms of the graph representation, which not only determines the extent of the loop but also the nesting of the different loops in the function being decompiled.
The loop detection algorithms I used were taken directly from Cifuentes papers about decompiling, which in turn took the ideas from James F.Allen and John Cocke of using graph interval theory for flow analysis, since it satisfies the necessary conditions for loops:
Even if we don’t have improper loops, we need to know which IfStatements
correspond to a loop header before assuming it’s indeed an If
.
So, what does a Clojure loop looks like?
1 2 3 4 5 |
|
Our first phase decompiler would see it as:
1 2 3 4 5 |
|
Well, we had a GOTO
after all… as you see a loop is just an If
statement followed by a backlink, in this case solved with a GOTO
branching statement.
Now if I leave the debug comments from my decompiler, you’ll see a couple extra things:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Conditionals refer to if
, when
, case
and other conditionals that may be found in code, which are usually 1-way or 2-way conditioned branches,
all of them have a common end node reached by all paths.
Since Clojure when
is a macro expanding to an if
, it’s just the 1-way conditional branch, if the if
clause has an else
part we’re in the 2-way conditional branch, where the else part is taken before reaching the common follow node.
The more difficult situation arises when trying to structure compound boolean conditions, as you see in the following picture:
You should expect different IfStatements
one behind the other, all being part of the same higher-level compound conditional which is compiled in a short-circuit fashion,
with two chained if statements.
With Clojure we have an additional problem, for instance the following example:
1 2 3 4 5 |
|
Decompiles to the following Basic
:
1 2 3 4 5 6 |
|
Wait a minute!!
We should be seeing only two IfStatements
there, one for each part of the compound conditional, but there are three, what’s going on?
As you see on line 21
the same condition of line 8
is being tested again, which we already know it’s false, why someone would do that?
It turns out it has to do with and
being implemented as a macro, so if we look what’s the actual Clojure code being emitted the bytecode makes sense
1 2 3 4 5 6 7 8 |
|
The and__3941__auto__
variable which is the result of the first condition is being checked twice,
I guess this is the reason the temporary variable exists in the first place,
to avoid computation of the boolean expression twice and just checking for the result of it again.
If case the compiler analyzed the and
as part of the if
it could have emitted the result directly instead of
using a temporary variable and that nasty double check.
Many of the different strategies explored previously apply if you want to decompile just about anything from bytecode (or machine language).
Since in our case we already know we’re decompiling Clojure, there are a lot of special cases we know we will never encounter.
Targeting our decompiler to only one language makes things easier, since while we’re not only supporting only one compiler, but we know we’ll never encounter manually generated bytecode, unless your using an agent or custom loader that has patched the bytecode, of course.
In the next post I will show you two things, how to synchronize the decompiled bytecode tree with Clojure source code, and how to patch the debuggee on runtime to use our s-expression references using BCEL.
Much of the code to accomplish this was developed while understanding the problem, so it’s not open sourced yet, I’m planning to move stuff around and make it public, but if you want to look at the current mess just ping me, I’ll send it to you(you’ll need to un-rust your graph mangling skills tough).
Meanwhile, I’m guilespi on Twitter.
]]>For this entry, I’ll do a compiler overview, the idea is to understand why and how does Clojure looks like that.
For other decompilation scenarios you don’t usually have the advantage of looking at the compiler internals to guide your decompiling algorithms, so we’ll take our chance to peek at the compiler now.
We will visit some compiler source code, so be warned, there’s Java ahead.
Well, yes, the Clojure compiler targeting the JVM is written in Java, there is an ongoing effort to have a Clojure-in-Clojure compiler, but the original compiler is nowhere near of being replaced.
The source code is hosted on GitHub, but the development process is a little bit more convoluted, which means you don’t just send pull requests for it, it was asked for many times and I don’t think it’s about to change, so if you wanna contribute, just sign the contributors agreement and follow the rules.
The Clojure-in-Clojure alternative is not only different because it’s written in Clojure, but because it’s built with extensibility and modularization in mind.
In the original Clojure compiler you don’t have a chance to extend, modify or use, many of the data produced by the compilation process.
For instance the Typed Clojure project, which adds gradual typing to Clojure, needed a friendlier interface to the compiler analyzer phase. It was first developed by Ambrose Bonnair-Sergeant as an interface to the Compiler analyzer and then moved to be part of the CinC analyzer.
The CinC alternative is modularized in -at least three- different parts.
There’s a great talk from Timothy Baldridge showing some examples using the CinC analyzer, watch it.
Note CinC developer Nicola Mometto pointed out that the analyzer written by Ambrose and CinC are indeed different projects. Which I should’ve noticed myself since
the analyzer by Ambrose uses the analyzer from the original Clojure compiler, which is exposed as a function. Part of my mistake was surely derived from the fact one is called tools.analyzer.jvm
and
the other one is called jvm.tools.analyzer
One of supposed advantages of Lisp-like languages is that the concrete syntax is already the abstract syntax. If you’ve read some of the fogus writings about Clojure compilation tough, he has some opinions on that statement:
This is junk. Actual ASTs are adorned with a boatload of additional information like local binding information, accessible bindings, arity information, and many other useful tidbits.
And he’s right, but there’s one more thing, Clojure and Lisp syntax are just serialization formats, mapping to the underlying data structure of the program.
That’s why Lisp like languages are easier to parse and unparse, or build tools for them, because the program data structure is accesible to the user and not only to the compiler.
Also that’s the reason why macros in Lisp or Clojure are so different than macros in Scala, where the pre-processor handles you an AST that has nothing to do with the Scala language itself.
That’s the proper definition of homoiconicity by the way, the syntax is isomorphic with the AST.
In general compilers can be broken up into three pieces
Clojure kind of follows this pattern, so if we’re compiling a Clojure program the very high level approach to the compilation pipeline would be:
The first three steps are the Reading phase from the fogus article.
There is one important thing about these steps:
Bytecode has no information about macros whatsoever, emitted bytecode corresponds to what you see with macroexpand calls. Since macros are expanded before analyzing, you shouldn’t expect to find anything about your macro in the compiled bytecode, nada, niet, gone.
Meaning, we shouldn’t expect to be able to properly decompile macro’ed stuff either.
As said on the first post, the class
file doesn’t need to be on disk, and that’s better understood if we think about eval.
When you type a command in the REPL
it needs to be properly translated to bytecode before the JVM is able to execute it, but it doesn’t mean the compiler will save a class
file, then load it, and only then execute it.
It will be done on the fly.
We will consider three entry points for the compiler, compile
, load
and eval
.
The LispReader
is responsible for reading forms from an input stream.
compile
is a static function found in the Compiler.java
file, member of the Compiler
class, and it does generate a class
file on disk for each function in the compiled namespace.
For instance it will get called if you do the following in your REPL
1
|
|
Clojure function just wraps over the Java function doing the actual work with the signature
1
|
|
Besides all the preamble, the core of the function is just a loop which reads and calls the compile1
function for each form found in the file.
1 2 3 4 5 |
|
As we expect, the compile1 function does macro expansion before analyzing or emitting anything, if form
turns out to be a list it recursively calls itself, which is
the then
branch of the if test:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
The analyze
function we see on the else
branch does the proper s-expr
analyzing which emits and evals itself afterwards, more on analyzing ahead.
The load
function gets called any time we do a require
for a not pre-compiled namespace.
1
|
|
For instance, say we do a require for the clojure.core.reducers
namespace:
1
|
|
The clj
file will be read as a stream in the loadResourceScript
function and passed as the first rdr
parameter of the load
function.
You see the load
function has a pretty similar read form and eval loop as the one we saw in the compile function.
1 2 3 4 5 |
|
Instead of calling compile1
calling eval
, which is our next entry point.
eval
is the e
in REPL, anything to be dynamically evaluated goes through the eval
function.
For instance if you type (+ 1 1)
on your REPL that expression will be parsed, analyzed and evaluated starting on the eval
function.
1
|
|
As you see eval receives a form
by parameter, since knows nothing about files nor namespaces.
eval
is just straightforward analyzing of the form, and there’s not a emit here. This is the simplified version of the function:
1 2 3 |
|
Languages with more complicated syntaxes separate the Lexer and Parser into two different pieces, like most Lisps, Clojure combines these two into just a Reader
.
The reader is pretty much self contained in LispReader.java
and its main responsibility is given a stream, return the properly tokenized s-expressions.
The reader dispatches reading to specialized functions and classes when a particular token is found, for instance (
dispatches to ListReader
class, digits dispatch to the readNumber
function and so on.
Much of the list and vector reading classes(VectorReader
, MapReader
, ListReader
, etc) rely on the more generic readDelimitedList
function which receives the particular list separator as parameter.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
This is important because the reader is responsible for reading line and column number information, and establishing a relationship between tokens read and locations in the file.
One of the main drawbacks of the reader used by the compiler is that much of the line and column number information is lost, that’s one of the reasons we saw in our earlier post that for a 7 line function only one line was properly mapped, interestingly, the line corresponding to the outter s-expression.
We will have to modify this reader if we want proper debugging information for our debugger.
The analyzer is the part of the compiler that translates your s-expressions
into proper things to be emitted.
We’re already familiar with the REPL, in the eval
function analyze
and emit
are combined in a single step, but internally there’s a two step process.
First, our parsed but meaningless code needs to be translated into meaningful expressions.
In the case of the Clojure compiler all expressions implement the Expr
interface:
1 2 3 4 5 6 |
|
Much of the Clojure special forms are handled here, IfExpr
, LetExpr
, LetFnExpr
, RecurExpr
, FnExpr
, DefExpr
, CaseExpr
, you get the idea.
Those are nested classes inside the Compiler class, and for you visualize how many of those special cases exist inside the compiler, I took this picture for you:
As you would expect for a properly modularized piece of software, each expression knows how to parse itself, eval itself, and emit itself.
The analyze function is a switch on the type of the form to be analyzed, just for you to get a taste:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
And there’s special handling for the special forms which are keyed by Symbol on the same file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Analyze will return a parsed Expr
, which is now a part of your program represented in the internal data structures of the compiler.
As said before it uses ASM so we found the standard code stacking up visitors, annotations, methods, fields, etc.
I won’t enter here into specific details about ASM API since it’s properly documented somewhere else.
Only notice that no matter if code is eval’ed or not, JVM bytecode will be generated.
One of the reasons I ended up here when I started working on the debugger was to see if by any means, I could add better line number references to the current Clojure compiler.
As said before and as we saw here, the Java Clojure Compiler is not exactly built for extensibility.
The option I had left, was to modify the line numbers and other debugging information at runtime, and that’s what I will show you on the next post.
I will properly synchronize Clojure source code with JVM Bytecode, meaning I will synchronize code trees, that way I will not only add proper line references, but I will know
which bytecode corresponds with which s-expression
in your source.
Doing Clojure I usually end up with lines of code looking like this:
1
|
|
What use do I have for a line base debugger with that code??
I want an s-expression based debugger, don’t you?
One more reason we have to envy Dr Racket, whose debugger already knows about them.
Stay tuned to see it working on the JVM.
Meanwhile, I’m guilespi on Twitter.
]]>This article was written in the scope of a larger project, building a better Clojure debugger, which I’ll probably blog about in the future.
These articles are going to build form the ground up, so you may skip forward if you find some of the stuff obvious.
To be more precise, there is a Clojure compiler targeting the JVM, there’s also one targeting Javascript, one for the CLR and there are some less known projects targeting lua or even C.
But the official clojure core efforts are mainly on the JVM, which stands for Java Virtual Machine.
That means when you write some clojure code:
1 2 3 |
|
You won’t get a native binary, for instance a x86 PE or ELF file, although it’s entirely possible to write a compiler to do it.
When you target a particular runtime though, you usually get a different set of functions to interact with the host, there’s a lot of language primitives just to deal with Java inter operation which do not migrate easily to other runtimes or virtual machines.
This doesn’t mean that the JVM can only run programs written in Java.
In fact, Clojure doesn’t use Java as an intermediate language before compiling, the Clojure compiler for the JVM generates JVM bytecode directly using the ASM library.
So, what does it mean the JVM is about Java if you can compile directly to bytecode without a mandatory visit to the kingdom of nouns?
Besides its name, the JVM was designed by James Gosling in 1992 to support the Oak Programming Language, before evolving into its current form.
Its main responsibility is to achieve independence from hardware and operating system, and like a real machine, it has an instruction set and manipulates memory at runtime, and the truth is the JVM knows nothing about the Java Programming Language, it only knows of a particularly binary format, the class file format, which contains bytecode and other information.
So, any programming language with features that can be expressed in terms of a valid class file, can by hosted on the JVM.
But the truth is that the class file format, maintains a lot of resemblance with concepts appearing in Java, or any other OO programming language as a matter of fact, to name a few:
class
file corresponds to a Classclass
file has membersclass
file has methodsclass
file can be static or instance methodsThrowable
So, we can say the JVM is not agnostic regarding the concepts supported by the language, as the LISP machines were not agnostic either.
So we have a language like Clojure, with many concepts not easily mapped to the JVM spec, but that it was mapped none the less, how?
Maybe you think Clojure namespaces correspond to a class, and each method in the namespace is mapped to a method in the class.
Well, that is not the case.
Namespaces were criticized before for being tough, and the truth is they’re used for proper modularity, but do not map to an entity in the JVM. They’re equivalent to java packages or modules in other languages.
Each function in your namespace will get compiled to a complete different class. That’s something you can easily confirm listing the files under target/classes
in a leiningen project directory.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
You will find a .class
file for each function you have defined, namespace$function.class
being the standard syntax.
As you saw in the previous listing, there are many functions with numbers like config$fn__292.class
.
Those correspond to anonymous functions that get their own class when compiled, so if you have this code:
1
|
|
You should expect a .class
file for the anonymous function #(+ 34 %)
.
Many times you’ll find the class
files on disk, but it doesn’t have to be that way.
In many circumstances we’re going to be modifying the class
structure on runtime, or creating new class
structures to be run, entirely on memory. Even the compiler can eval
some code
compiling to memory without creating a class
file on disk.
For the first example, I selected a real simple clojure function
1 2 3 4 5 6 7 |
|
To explore the bytecode we will use javap
, simple, but does the job:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
I’ve removed some extra information such as variable tables, we’re going to be visiting those later.
What you see here are JVM assembly instructions, just a subset of the JVM instruction set, generated by the Clojure compiler when feed with the sample function above.
Before we get into more details, let me show you how that code looks after a basic decompiler pass:
1 2 3 4 5 |
|
Prettier uh?
That is until you decompile this:
1 2 3 4 5 |
|
And get this:
1 2 3 |
|
Who was the moron that put a BASIC in my Clojure!
Ain’t it?
Keep reading… there’s more to be seen ahead.
I won’t dwell into many details about each JVM instruction and how that translates to something resembling Clojure, or Basic for that matter, but there’s one thing worth of mention, and that is the operand stack.
A new Frame is created each time a method is invoked and destroyed when the method completes, whether that information is normal or abrupt (throws an uncaught exception), frames are allocated in the JVM stack and have its own array of local variables and its own operand stack.
If you set a breakpoint on your code, each different entry in your thread callstack, is a frame.
The operand stack is a last-in-first-out (LIFO) stack and its empty when the Frame that contains it is created, and the JVM provides instructions for loading constants or variables into the operand stack, and to put values from the operand stack in variables.
The operand stack is usually used to prepare parameters to be passed to methods and to receive method results, as opposed to using registers to do it.
So you should expect something along these lines:
1 2 3 4 5 6 7 8 |
|
So looking again at the bytecode of the previous function:
1 2 3 4 5 |
|
Here:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
We can make ourselves an interpretation about what’s going on…
lconst_1
is pushing the constant value 1
into the stack, then calling a static method with invokestatic
, as you’ve already guessed that’s the clojure.lang.Numbers.inc(1)
we saw on the basic decompiler earlier.
Then ld2_w
loads the value 2
into the stack and lcmp
will compare it against the function result, ifne
tests for non equality and jumps to line 18
if values differ.
One thing to consider here is that each entry on the operand stack can hold a value of any JVM type, and those must be operated in ways appropriate to their types, so many operations have a different operation code according to the type they’re handling.
So looking at this example from the JVM specification, we see the operations are prefixed with a d
since they operate on double
values.
1 2 3 4 5 |
|
Which as you may have guessed, is adding double values 1
and 3
.
The JVM class
format has support for some extra information that can be used for debugging purposes, some of which you can get rid from your files if you want.
Among those we find the LineNumberTable attribute and the the LocalVariableTable attribute, which may be used by debuggers to determine the value of a given local variable during the execution of a method.
According to the jvm spec, the table has the following structure inside the class
file format
1 2 3 4 5 6 7 8 9 10 11 |
|
Basically it says which variable starts at which instruction: start_pc
and lasts for how long: length
.
If we look at that table for our let
example:
1 2 3 4 5 6 7 |
|
We see how each variable is referenced against program counter(pc
) line numbers (do not get confused with source file line numbers).
1 2 3 4 5 6 7 |
|
One interesting thing though, is the LineNumberTable
1 2 3 |
|
Which has only one line number reference, even if our function was 7 lines long, obviously that cannot be good for a debugger expecting to step over each line!
Next post I’ll blog about the Clojure compiler and how it ends up creating that bytecode, before visiting again the decompiling process.
I’m guilespi on Twitter, get in touch!
]]>So if you’re interested in performance don’t leave just yet.
One of the primary motivators for the reducers library is Guy Steele’s ICFP ‘09 talk, and since I assume you don’t have one hour to spend verifying I’m telling you it’s worth watching, I’ll do my best to summarize it here, which is a post you will probably scan in less than 15 seconds.
One of the main points of the talk is that the way we’ve been thinking about programming for the last 50 years isn’t serving us anymore.
Why?
Because good sequential code is different from good parallel code.
1 2 3 4 5 6 7 |
|
How would you sum all the elements of an array?
1 2 3 4 |
|
That’s right, the accumulator loop, you initialize the accumulator and update the thingy in each iteration step.
But you’re complecting how you update your sum with how you iterate your collection, ain’t it?
There’s a difference between what you do with how you do it. If you say SUM(X)
it doesn’t make promises on the strategy,
it’s when you actually implement that SUM that the sequential promise is made.
The problem is the computation tree for the sequential strategy, if we remove the looping machinery and leave only the sums, there’s a one million steps delay to get to the final result.
So what’s the tree you would like to see?
And what code you would need to write in order to get that tree? Functional code?
Think again.
Functional code is not the complete answer, since you can write functional code and still have the same problem.
Since linear linked lists are inherently sequential you may be using a reducer and be on the same spot.
1
|
|
We need multiway decomposition.
Rationale behind multiway decomposition is that we need a list representation that allows for binary decomposition of the list.
You can obviously have many redundant trees representing the same conceptual list, and there’s value in redundancy since different trees have different properties.
So what are Clojure reducers?
In short, it’s a library providing a new function fold
, which is a parallel reduce+combine
that shares the same shape with the old sequence based code, main difference been you get to
provide a combiner
function.
Go and read this and this great posts by Rich Hickey.
Back? Ok…
As Rich says in his article the accumulator style is not absent but the single initial value
and the serial execution promises of foldl/r
have been abandoned.
For what it’s worth, I’ve written in Clojure the Split a string into words parallel
algorithm suggested by Steele here, performance sucks compared against
clojure.string/split
but it’s a nice algorithm none the less.
1 2 3 4 5 6 7 8 |
|
There are a couple interesting things in the code.
combine-states
is the new combiner function, decides how to combine different splits100
is the size when to stop splitting and do a sequential processing (calls reduce
afterwards). Defaults to 512
.fn
is the standard reducing functionvector
before processing.Last step is just for the sake of experimentation, and has all to do with the underlying structure for vectors.
Both vectors and maps in Clojure are implemented as trees, which as we saw above, is one of the requirements for multiway decomposition.
There’s a great article here about Clojure vectors, but key interest point is that it provides
practically O(1)
runtime for subvec
, which is how the vector folder foldvec
successively splits the input vector before reaching the
sequential processing size.
So if you look at the source code only for vectors and maps actual fork/join parallelism happens, and standard reduce is called for linear lists.
1 2 3 4 5 6 7 8 9 10 |
|
What I like the most about reducers is that reducer functions are curried, so you can compose them together as in:
1 2 3 |
|
It’s like the utmost example of the simple made easy Hickey’s talk, where decomplecting the system, results in a much simpler but powerful design at the same time.
I’m guilespi at Twitter
]]>Whether you’re using Memcache, Redis, RabbitMQ or a custom distributed service, if you’re trying to scale your shit up, you probably have many pieces or boxes involved.
At least that’s what happens at Twitter, so they’ve come up with a solution called Zipkin to trace distributed operations, that is, an operation that is potentially solved using many different nodes.
Twitter Architecture
Having dealt with distributed logging in the past, reconstructing a distributed operation from logs, it’s like trying to build a giant jigsaw puzzle in the middle of a Tornado.
The standard strategy is to propagate some operation id
and use it anywhere you want
to track what happened, and that is the essence of what Zipkin does, but in a structured kind of way.
Zipkin was modelled after Google Dapper paper on distributed tracing and basically gives you two things:
Zipkin Architecture
The architecture looks complex but it ain’t that much, since you can avoid using Scribe
, Cassandra
, Zookeeper
and pretty much everything related to scaling the tracing platform itself.
Since the trace collector speaks the Scribe protocol you can trace directly to the collector, and you can also use local disk storage for tracing and avoid a distributed database like Cassandra, it’s an easy way to get your feet wet without having to setup a cluster to peek a few traces.
There are a couple entities involved in Zipkin tracing which you should know before moving forward:
Trace
A trace is a particular operation which may occur in many different nodes and be composed on many different Spans.
Span
A span represents a sub-operation for the Trace, it can be a different service or a different stage in the operation process. Also, spans have a hierarchy, so a span can be a child of another span.
Annotation
The annotation is how you tag your Spans to actually know what happened, there are two type of spans:
Timestamp spans are used for tracing time related stuff, and Binary annotations are used to tag your operation with a particular context, which is useful for filtering later.
For instance you can have a new Trace for each home page request, which decomposes in the Memcache Span the Postgres Span and the Computation Span, each of those with their particular Start Annotation and Finish Annotation.
Zipkin is programmed in Scala and uses thrift, since it’s assumed you’re going to have distributed operations, the official client is Finagle, which is kind of a RPC system for the JVM, but at least for me, it’s quite ugly.
Main reason is that it makes you feel that if you want to use Zipkin you must use a Distributed Framework, which is not at all necessary. For a moment I almost felt like Corba and DCOM were coming back from the grave trying to lure me into the abyss.
There’s also libraries for Ruby and Python but none of them felt quite right to me, for Ruby you either use Finagle or you use Thrift, but there’s no actual Zipkin library, for Python you have Tryfer which is good and Restkin which is a REST API on top of it.
In the process of understanding what Zipkin can do for you (that means me) I hacked a client for Clojure using clj-scribe and clj-thrift which made the process almost painless.
It comes with a ring handler so you can trace your incoming requests out of the box.
1 2 3 4 5 6 7 8 9 10 |
|
Zipkin Web Analyzer
It’s far from perfect, undocumented and incomplete, but at least it’s free :)
Give it a try and let me know what you think.
I’m guilespi at Twitter
]]>There’s even gender studies in mathematics and other sciences.
But even if the issue has been on the table for a while, I’ve attended a few conferences where I live and the usual attendance looks like this:
I don’t think you’ll find more than 5 women in the picture.
And I can tell you for sure, that picture does not represent at all the women in tech in the city. There may be an imbalance, but women are not by any means, 0,5% of computer science graduates here.
So women are not participating, but why?
Since StackExchange has open data you can derive some insights from, I decided to take a look at the problem from a different perspective, and address the question of the underrepresentation using StackOverflow data.
I started with a few questions in mind:
Since SO has no gender data, gender needs to be inferred from user names, which is obviously not 100% accurate, many names are unisex or depend on the country of origin. I decided to use this database and program which has names from all over the world curated by local people. In many cases you will get a statistical result: mostly male or mostly female, and that makes sense.
Be warned this is not a scientific study nor tries to be, just trying to see some patterns here.
First I wanted to get a glimpse of the general trends, so I did a random draw of 50k users, more than enough for what I need.
1 2 |
|
StackExchange limits the number of rows returned by the online database browser to 50k, so that’s it.
1 2 3 |
|
As you see there’s a 27% of confirmed males and only a 4% of confirmed females. Anonymous
users are the usual numerical users like user395607
,
and name not found
refers to things like ppkt
, HeTsH
, holden321
and ITGronk
, you get the idea.
Then I wanted to see how reputation was distributed among those users, and how that compared against how long the user was using the site.
There you go, an image is worth a thousand words, reputation difference among genders is huge, it doesn’t seem to be related to how long you’ve been around either.
To confirm that, I drew randomly 50k fresh users, who joined the site after 2012-10-10
, just to see if trends were any different considering only last year data.
1 2 3 |
|
1 2 3 |
|
Here women seem to be a little bit closer, but still a great difference.
Then I drew the 50k users with the highest reputation score.
1 2 |
|
Now we’re seeing some changes:
1 2 3 |
|
As you expect here there’s almost no anonymous
users, in the online community charity has a name attached to it, ain’t it?
And the reputation trend is still there, something you can readily confirm if you scroll the first pages of the all time user’s reputation page.
But then I charted the reputation distribution against gender and something interesting arises:
As you see, there’s a great deal of outliers there, with 75% of the top 50k users below 4200 points.
1 2 3 |
|
So what happens when we look at the distribution considering the 75% of the users, that are in fact below 4215 points?
Well that’s something! Now distribution looks pretty much alike.
Seems to me those outliers, that are mostly men, are well beyond everyone else, men or women.
How do you read that data?
At 4% females, SO seems to be suffering the same phenomena occurring in the real world, that is, women being underrepresented, but being SO a strongly moderated virtual community the problem can’t be related to harassment. So, there’s something else going on, is it that SO is designed for male competitiveness, being the site designed exclusively by males -there were no women on the team AFAIK-?
Isn’t it the reason you want diversity on your teams to start with? To provide a different perspective on the world, that enables you to reach everyone.
In my opinion, that’s why women should be part of the creation process, don’t you think?
None the less, a large group of men are acting as if they already know what women need, and as patriarchy mandates, providing it, creating programs, making conferences and burning witches, but not a single soul has asked women what they think, want, or need.
For a change, I’ve put up an anonymous survey online with the purpose of understanding how women who are already in tech, feel, if you have a female coworker, friend or colleague please share it, we may even find some interesting insights if we start listening.
I’m guilespi at Twitter
Share the link above or take the survey here:
]]>Today he gave me to read his “farewell letter”, his goodbye to co-workers. Far from commonplaces and standard templates, it was a really heartfelt goodbye, and what struck a chord with me, it’s that it was a thank you letter.
Thank you to the ones who gave me the opportunity, thank you to the ones who helped me, thank you to the ones who have shared your time with me, and thank you to the ones who made me better.
And I kept thinking, not about being thankful, which is by the way a great thing, but about trusting people and making opportunities for others.
We’re now living in a mincer, a people eater, with a permanent desire to crash each other, fiercely competitive startups, hiring and poaching, making sure every single hire is at the top of her game, because failure is not an option, and hey, we need that extra million.
Don’t we?
What is it that we have at the end of the game?
At least for me, the “thanks you” I’ve had, have made more for me than any money I can make , knowing that you’ve helped someone be better, or trusted someone when no one else would do it , that will stay with you when money is long gone, and it will probably stay when you are gone too.
Hiring only the very best is a safe bet anyone can take, having the talent to see potential takes skill, and having the guts to make opportunities and trust people not only takes courage, but it’s the only human thing to do.
Next time you’re about to put a cog on the machine, think human, and take the risk.
Rewards are worth it.
I’m guilespi at Twitter
]]>That is what really builds community, having a common language, besides the language.
Clojure’s been going through this phenomenon for the last years, as you see in this question from 2008 and this question still being answered in 2012.
This year I made a real life full application in Clojure so I’ve spent some time deciding on many strategies, where to put which code, how to test, best libraries to use and what not.
So I decided to put the source code online, not only because I think it may help someone, but hopefully somebody will come up and help me improve what it’s done in one way or another, it’s certainly far from perfect.
In case you wonder, it’s an application for automatic call and sms dispatching.
Among other things, you’ll find inside:
If you like it say hi! I’m guilespi
]]>The idea of someone telling us that we’re not good, we’re not a fit, we’re not loved anymore, it’s against the human condition need of being liked.
Have you ever been paralyzed before asking out that girl you like? Even knowing you have nothing to lose by doing it?
Interviewing for a job is no different, when someone applies for a position you have opened is dealing with the fear of rejection, and your responsibility is to acknowledge that fact, and treat people who want to work with you as a human being.
Does it mean you need to personally thank every applicant sending an email with his resume?
Not exactly.
Respect
You need to show respect, If you will remember only one thing, remember that.
Being in the position of the interviewer doesn’t make you better, and if the people you’re hiring is any good, you should be better showing your brightest side during the process, no one likes to work with assholes.
Having failed myself many times -always with a reasonable excuse, of course- I made some easy rules to follow when interviewing people.
It’s not about the stage
The interviewing process in your company may have many stages, technical interviews and whatnot.
But in my book about respect, there are only two things that matters when interviewing someone:
That’s all, your respect rules must derive from that.
Dealing with time
What to do when you’re rejecting someone, according to the time he has spent?
Easy ain’t it?
It’s about other people’s time, and it’s about human relationships.
If you don’t have the balls to reject someone like a human being, you shouldn’t be interviewing at all.
Don’t reject me, and follow me on Twitter :)
]]>Up until now, I never got the chance to make something with it, but always had the doubt how good or bad the applications you could create were.
Last week I had an opportunity to take a look under the hood, and made the source code available on GitHub.
The big question was, can you make native apps using Phonegap?
What you mean by native?
According to this guy in order to make native IOS applications you need to program in Objective-C.
I say that’s misleading.
Native applications are applications that run on the phone and provide a native experience
, for which I understand at least the following treats:
The wrong reason to use Phonegap
At first sight you may think that the main reason to use Phonegap is program once, run everywhere.
Wrong.
In order to provide a native experience you will need to design the UX of your app for every platform you’re targeting, so at least the UX/UI code will be different.
Obviously you can use the same UI for all the platforms, but unless the purpose of your app is to alienate your users I wouldn’t try it.
Software should behave as the user expects it to behave, you would not create new affordances for the sake of creativity, don’t do it for the sake of saving money, because it ain’t cheaper in the long run.
So, no matter what you’re thinking about doing, save some time to read the UX/UI guidelines for each mobile platform you’re targeting.
The great Mike Lee would tell you that you even need a different team for each of those platforms.
WTF is Phonegap?
You know the tagline “easily create apps using web technology you love”, does it mean the only thing you need to know is HTML and Javascript?
Of course not.
Phonegap is an extensible framework for creating web applications, with the following properties:
You can think of it as a native host that lets you write your application in Javascript, abstracting the native layer components behind a uniform API. That’s all.
So you’ll end up creating your app inside XCode, dealing with code signing nightmares and taking a lot of walks inside the Apple walled gardens.
And you will need to learn the Phonegap API.
It doesn’t have to be 100% HTML
The first reaction is to think that since Phonegap uses a webview you will have to create your application using only HTML, but it’s not the case.
Phonegap supports plugins, which is a mechanism for exposing different native features not already exposed in Phonegap API. So you can create more native plugins and expose them to javascript, where javascript works as the glue that blends together the native controls but not necessarily is used to create all the UI.
The most common example is the TabBar and NavigationBar in IOS, plugins for those already exist, and lets you design a more native experience than the one you would get using only HTML.
Notice the Back
button in there, it’s just a custom button with the back text. If you want to create an arrow like back button you’ll need to go
down the same path as if you were doing Objective-C IOS development.
Mobile Frameworks
There are many mobile frameworks available out there to be compared.
Among the most well known are jQuery Mobile and Sencha Touch. JQM development being more web like, something to consider if your team is already comfortable with HTML, Sencha generates its own DOM based on Javascript objects 100% programatically.
I haven’t dig deep enough in order to write a complete evaluation, you may found some interesting ones here, here and here.
Almost everybody agrees in one important point:
JQM is sluggish and transitions doesn’t feel native enough, something I easily verified testing the app in my IPad I, even the slider
was sluggish.
Using Phonegap Plugins
Plugins are usually composed of two parts:
Usually you’ll need to copy the m
and h
plugin files to the Plugins
directory of your project, you will also need to declare the plugins
being used in the config.xml
project file.
1 2 3 4 5 6 |
|
And then include the Javascript files in your index.html
.
1 2 |
|
Using the native plugins from your applications is straightforward, you initialize, create, setup and show the buttons you want.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
Using this strategy lets you extend your app for a more native experience exposing to javascript even custom controls you may design. This way you can have some members of your team focused on the native code of the app and exposing the building blocks to the web developers assembling the pieces.
As you see in the last image, the TabBar is shown in the native version and the HTML version side-by-side. The HTML version was created using jQuery Mobile.
Debugging is hell
Well maybe it’s not hell, but it’s not a pleasant experience either.
If you include the following line in your html using your own id:
1
|
|
You’ll have easy access to debug your app using weinre without the need to set it up, at least it’s good for HTML inspection.
If you want to debug javascript, you’ll certainly end up using alert
and console.log
, even the guys at Phonegap are recommending the poor’s man debugger.
Be ready to waste some of the time you gained by choosing Javascript doing print based debugging.
Update Raymond Camden pointed out in the comments that a better approach to debugging exists, specially with Safari and IOS6
Conclusion
Tools are picked for the team, so that’s what you should think about when choosing or not to pursue the Phonegap path. If you already have members on your team who are great at web development, Phonegap may be an option, certainly it’s fast for prototyping and seems to be a great asset for product discovery and validation.
If charging for your app is among your goals, I wouldn’t pick Phonegap or any other framework that uses the webview renderer as the main application. Also, for most tasks the Javascript VM would be alright, but if you have inner loops cpu intensive, such as in game development, using Phonegap it’s not really an option.
Reviewing the main points considered to categorize a mobile application as native, web frameworks will provide you a sub-par experience regarding feedback, latency and usability. Using the Phonegap plugins to avoid it will only go so far before the cost being so high you’ll better be programming in Java or Objective-C anyway.
If you still have doubts, fork the code and give yourself a try.
And don’t forget to follow me on twitter
]]>But the truth is sometimes you don’t want to rely on the cloud
for your latency-sensitive communications, you already have communications infrastructure
you want to reuse, or you have such a volume of calls to make that it’s cheaper for you to roll your own solution.
So I will show you a DIY guide to roll your own dialer using Clojure and Asterisk, the self proclaimed PBX & Telephony Toolkit.
What is a Dialer
If you ever received a spam call from someone trying to sell you something, it was probably made by an automated dialer. The purpose is to reach the most possible people in the least time, optimizing resources.
Sometimes it’s someone selling Viagra, but hopefully it’s used for higher purposes such as massive notification of upcoming emergencies.
Integrating with Asterisk
Asterisk has a lot if integration alternatives, custom dial-plans, AGI scripting, outgoing call spooling, or you can write your own low-level C module, each strategy serves its purpose.
For this scenario I’ve decided to show you an integration with Asterisk using the Asterisk Manager API, which allows for remote command execution and event-handling.
I’ve written a binding for Clojure called clj-asterisk to sit on top of the low-level text based protocol.
Making a Call
The clj-asterisk
binding map against the Asterisk API is straightforward, so checking against the Originate Action which is the
one we need to create an outgoing call.
1 2 3 4 5 6 7 8 9 |
|
The corresponding clj-asterisk
invocation is:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The ActionID
attribute is not specified since it’s internally handled by the clj-asterisk
library in order to track async responses from Asterisk.
Receiving events
For most telephony related actions blocking is not desirable, since most of the time the PBX is handling a conversation and waiting for something to happen, using a blocking scheme is far from the best. You need a strategy to wait for events that tell you when something you may be interested in, happens.
In this case we will be interested in the Hangup
event in order to know when the call has ended, so the dialing port is free,
so we can issue a new call. If you’re interested in the complete list of events, it’s available on the Asterisk Wiki
To receive an event using clj-asterisk
you only need to declare the method with the event name you need to handle:
1 2 3 4 5 6 7 |
|
The method passes as parameter the received event and the connection context where the event happened.
The Main Loop
In order to have a proper dialer you will need a main-loop, which life-fulfillment-purpose is:
I’m assuming you have some data storage to retrieve the contacts to be dialed and will share those details in a later post, I will focus now only in the dialing strategy.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
Lets go piece by piece…
You wanna know how many ports are available to dial, for instance you may have only 10 outgoing lines to be used.
1
|
|
You wanna know the recipients to be reached.
1
|
|
Then you wanna know the status of the contacts you’re already dialing and waiting for an answer or for the call to finish.
1 2 3 4 5 6 |
|
Here pending-contacts is a list of futures, the contacts being currently dialed. Since we don’t wanna block waiting for the answer the realized?
function is used in order to count how many of them are finished and filter them. If the finish status is not CONNECTED
or CANCELLED
we assume the contact has failed and we need to issue a retry for those, typically the BUSY
and NO ANSWER
statuses.
Then, given the total available ports minus the already being dialed contacts, a new batch of contacts is dialed
1 2 |
|
The dispatch-calls
function is pretty straightforward, it just async calls each contact of the list.
1 2 3 4 |
|
Finally the call function issues the request against the Asterisk PBX and saves the result for further tracking or analytics.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
The tricky part here is that it’s impossible to know before-hand the call-id Asterisk is going to use for our newly created call,
so we need a way to mark our call and relate to it later when an event is received, we do that using the call variable CALLID
which is a guid
created for each new call.
Our call creating function will wait on a promise
until the call ends, something we will deliver
in the Hangup event as shown here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
It seems more convoluted than what it actually is, when the CALLID
variable is set we receive an event that allows the mapping between call-id and
Asterisk UniqueId to be done. Then when the Hangup
occurs we can find the promise to be delivered and let the call
function happily end.
Keep tuned for part II, when I will publish the data model and the complete running Dialer.
Here is the gist with the code of the current post.
While you wait, you can follow me on Twitter!
]]>As you see it serves the same purpose as a traditional barchart, but displays the information in a coxcomb flower pattern.
I couldn’t find something already done that suited my needs, so I made one my self.
It’s slightly modified from the original design, since it doesn’t display the bars stacked but side by side, I think it’s better to display superposed labels that way.
Death and Mortality
Feeding data into the chart is straightforward
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
I’ve used it to show some skills in a resume-of-sorts if you wanna see a color strategy by category and not by series.
Lie Factor Warning
The received values are normalized and the maximum value takes the complete radius of the coxcomb. Be warned, each value is normalized and only the radius is affected, not the complete area of the disc sector. This may introduce visualization problems as the ones pointed by Edward Tufte, with x10 lie factors or more, as in the following known case with a 9.4 lie factor.
I may fix it if someone founds this useful, the area for the formulas are on this website. The source code is on github.
Follow me on Twitter
]]>For me, I think types and tests are two sides of the same coin, because neither strategy for proving correctness is computable.
Bear with me.
The purpose of having types or having tests, it’s to prove your program to be correct before your bugs reach your customers, one strategy tries to prove it at compile time and the other after compile time. But what it really means your program to be correct?
Let’s go for ride on Computability Theory.
Functions and programs
According to the definition a function f
is computable if a program P
exists which can calculate the function given unlimited amounts of time and storage.
f:N→N is computable ↔ ∃ a program P which computes the function.
Also we must define when a program P
converges for input n
.
Program P with input n converges if ∃ m ∈ N / <Q,n> = m , it's written <Q,n>↓
In computer theory there are well known functions to be non computable, two of them are:
Θ(n) = 1 if <Ix(n), n>↓
0 if <Ix(n), n>↑
Which says that the function Θ
is equal to 1 if the program of index n
converges on input n
and 0 if the program of index n
diverges on input n
.
There’s another very famous function which has been proved to be non computable.
stop(p, n) = 1 if <Ix(p), n>↓
0 if <Ix(p), n>↑
Which pretty much says that given a description of an arbitrary computer program, decide whether the program finishes running or continues to run forever. As you may have guessed, it’s the well known Halting Problem, and it’s not computable.
How the Halting Problem relates to types and tests
Let’s assume our program P
we’re trying to prove correct, computes the function f:N→N
. We can define our program that proves correctness, a program T
that computes the function
ft:N→{0,1}
Which is to say, for every input of the domain, our program T
decides if the program converges or not. It’s starting to sound familiar ain’t it?
Let’s assume such a program T
exists to prove correctness, and we have a macro MT
to find such a program given P
. We could write the following program Q
1 2 3 4 |
|
What Q
does it’s having received a program x0
and an input x1
, first finds the T
decider program for the program x0
, and then evaluates the program
with the input x1
.
So what do we have here?
Q(p, n) = 1 ↔ ft(n) = 1 ↔ <P, n> ↓
So we have written a program which computes the stop
function, which is absurd. It means we cannot have a program that decides on the computability of
a program.
Show me the code
In practice, it means that if you have this program
1 2 3 4 5 6 |
|
This program doesn’t stop for x < 0
, and according to theory, there’s no program you can write to find out about it.
There are also a few other funny cases regarding the domain of your functions, such as
1 2 3 |
|
This function fails miserably if x = 3
. Just think about it when your functions have a more complex domain.
How to improve your tests for correctness
Most people I see are worried about having 100% code coverage, but it’s not that usual to see people worried about data coverage.
As seen in the previous example if you forget to test for x = 3
you may have 100% code coverage but your program will blow up anyway.
Regarding types, I know Dependent Types exists, but it’s the other side of the same coin, you have to provide a constructive proof that the type is inhabited. So if you don’t define your type considering the special cases of your function domain, no one is coming up to save your ass.
But when thinking about correctness you should be thinking about your function domain.
Conclusion
Both tests and types are useful ways to validate your program is correct, but not perfect. Even the discussion is meaningless, because it’s just a matter of taste whether you like to specify your correctness rules in types or tests, but it’s something you will keep doing as far as I can tell.
As Rich Hickey said, both tests and types are like guard rails, and you must know the cliff is there in order to decide building them.
Update:
Many people wrote to me as if I’m saying you can’t prove a program to be correct, that was not what I’ve tried to say.
It was that you can’t have a system that can prove programs to be correct without specifying the rules yourself.
That is, it was a case against Q
not against T
.
And hey! you follow me on twitter!
]]>I’ll save you the thinking, it’s costing you customers.
See the following chart I’ve crafted for you(emphasis on crafted), please hit the play
button.
There’s an obvious relationship between the cost of fixing a bug and how much customers your company can effectively take.
It has an easy explanation, if you have only one customer and your solution has a bug, what do you do? You call her, you explain the bug, you go to her office, you hack a fix, you drink some coffee, and you move on. Maybe if you have one big fat customer based on a personal relationship, you can live with that.
I hope it’s clear for you this delivery process does not scale, is it?
When you have hundreds or thousands of customers you can’t clone yourself and explain to everyone why your product is failing, you won’t drink a hundred coffees to build rapport and talk your way out of the mess.
I think there’s still two big misconceptions about this relationship between your bugs and your customers, and it may affect how you decide on your development and delivery process.
Bug fixing cost and quality are not the same thing
It’s widely known, I hope, that the earlier you find a bug, the cheaper it is to fix it. This guy even fixes his bugs in the hammock, before writing any code. Take a look at the chart of the relative cost of fixing defects, this is the source
Obviously you should be investing in good Engineering, peer reviewing even your documents and designs, and testing your components early an often. (Quality is not about testing either, but that’s material for another rant). What’s not so clear, is given that some bugs will always reach your customers, how do you reduce the cost of fixing your on-the-wild bugs?
You should do everything in your reach to produce quality products, because it’s cheaper in the long run. But what will make or break your ability to grow your customer base, is how fast and cheap you move when a bug is found. Your maintenance cost, if you want.
Bug fixing cost is like performance in the browser
You should watch this talk from the last Strangeloop, besides being great, Lars Bak makes a great point about performance in the browser, when a new level of performance was reached on the Javascript VM, all new kinds of applications started to pop up taking advantage of that performance.
It’s not the other way around.
Correlation does not imply causation, until it does, just be sure to understand what causes what.
Speed in the browser did not improve because Gmail was running too slow, first speed improved, then we have Gmail.
It’s the same with your customers.
If you wait till having lots of customers to start thinking about improving your maintenance costs, you will never have them. Having low support and maintenance costs will make you find a way to acquire more customers, just because you can.
What to do?
This is not by any means a complete nor bulletproof list, but some strategies I’ve found from personal experience that help.
You do continuous integration and deployment
Have you ever been involved in a delivery process having to test thousands of test cases, run dozens of performance and stress tests, do it in multiple platforms, all of it, just because you patched 3 lines of code, and you must be absolutely sure everything is still working as intended?
I have, and it’s not fun
It’s not fun for your customer either, because you end up batching even your hot-fixes, and they’re not so hot anymore. And your customer has to wait, and you will eventually lose your customer.
Continuous integration is not about some geeks with shiny tools, it’s about customers.
You develop with operations in mind
There’s a great talk by Theo Schlossnagle about what it means to have a career in web operations, walking the path, and becoming a craftsman, you must watch it, seriously, because it’s that good.
One of the remarkable points is that you must build systems that are observable. Developers cannot separate themselves from the fact that software has to operate, actually run. And developers shouldn’t be trying to reproduce a bug in a controlled environment in order to understand if there’s really a bug. You should be able to diagnose the problem in the running system, so it must be observable. How much elements in that queue? is it stalled? you must know, now.
And you don’t build observable systems if you start thinking about it after you’ve shipped, using an entirely different team(hello DevOps).
Software with operations in mind is like software with security in mind, or quality in mind, it’s a state of being, and it’s about your development process.
You use the right tools
How long does it take you to see that a function is returning the wrong value? How long does it take you to find the 3 lines of log that point you to the exact spot the problem is? How long does it take you to analyze a crash dump and get to the cause of the crash?
Being able to debug and diagnose a problem fast, is almost as important as being able to fix it fast, and deploying the fix fast.
This is an area where I personally think there’s a lot of room for improvement regarding the tools we daily use, but you should know DTrace exists and how to use it, idealistically.
Conclusion
If you’re hacking your brains out and life’s good, all the power to you. I like that too.
But if you’re really thinking about scaling your business, you should be taking a look at your bug fixing and maintenance costs, now.
There’s also a great book about scaling companies, you should read that one too.
And don’t forget to follow me on twitter
]]>But there’s some news you will never be ready to break, it’s the day you must say you’re stepping down.
I know I wasn’t.
It was one of the more difficult and saddest moments I’ve ever had to go through, I’m still finding hard to even write about it.
Beyond all reasons, there’s only one thing I really want to say to Ewe, Niko, Burguer, Seba, Fernickk, Nacho, Juan, Cany, Paola, El Ruso Alexis, Fede, Canario, Lolo & Diego.
Thank you, it was an amazing ride.
You’re gonna be missed
El Cabeza
At least you should follow me on twitter!
]]>I think it’s amazing the speed
an answer is given for any asked question, like freaking fast. If you are using Google Reader to peek new questions filtered by tag, when you see a question, almost for sure it’s already answered.
Fortunately all StackExchange data is open, so we can see exactly how fast is that. I used the online data browser, more than enough for the task.
I decided to consider only the questions having an accepted answer, since questions with many bogus answers should not be treated as having an answer at all.
tl;dr
The average answer time seems to be dependent on a mix of the maturity of the language and how many people is using it.
Hey, Haskell has pretty good answer times, at least considering its 33th position in the TIOBE Index.
Not all questions are the same
Of course not all questions are the same, this is from the first query I ran.
This is an unfiltered query using all the questions from year 2012, you see the average answer time is much higher than the previous chart, around 1000 minutes, looking at the data:
Language Ans. Time Stdev
c 934 7630.98957971267
c++ 1036 7258.13498426685
clojure 1078 7485.94721484444
haskell 1199 9059.91937846459
php 1210 8588.58929278208
lua 1386 6569.08356022594
c# 1452 8875.00837073432
scala 1472 10707.9191188056
javascript 1490 9756.64151519177
java 1755 10541.6111024572
ruby 2124 11850.4353701107
The standard deviation is huge, we have a lot of questions that took ages to get answered, making the average answer time meaningless.
So I decided to take out questions with an answer time greater than 24 hours, as 92% of the questions have an approved answer in less than 5 hours. (here you can see the query used to get this table)
DifficultyGroup Total Average StandardDev
Easy 47099 27 44.7263057563449
Medium 344 691 339.312936469053
Hard 1926 3769 2004.75027395979
Hell 1623 66865 96822.8840748525
It started to look like something:
This is the query.
You see there, PHP running at front with 68 minutes average accepted answer time, either it’s too easy or there’re too many of them.
If you wanna see how the distribution goes when considering accepted answers in less than 5 hours, is the first picture of the page, the trend is also there.
What about the time?
Something unexpected, average answer time is almost unaffected by the time of the day the question was asked. The only thing I see here is that Ruby programmers are being killed by the lunch break and c++ programmers slowly fade out with the day, ain’t it?
This is the query.
There goes my idea of catching unanswered questions at night. It would be interesting to see how many cross-timezone answering is happening.
Conclusion
It should work better running a regression against the complete dataset using more features than only programming language and time of day to automatically guess which questions have more chance of have a long life unanswered. Maybe next time.
Follow me on Twitter
]]>What caught my attention is the libraries used for portfolio construction and management, QSTK, an opensource python framework, based on numpy, scipy, matplotlib, pandas, etc.
Looking at the first tutorial’s source code, saw it as an opportunity to migrate the tutorials and libraries to Clojure and get to play a little with Incanter.
I’m going to highlight what I’ve found interesting when migrating the tutorials. I’m assuming you have QSTK installed and the QS environment variable is set, since the code depends on that for data reading.
1
|
|
NYSE operation dates
As part of the initialization process the tutorial calls a function getNYSEDays, which retrieves all the days there was trading at the NYSE. Migration is straightforward using incanter’s read-dataset to read file into memory and then filter the required range.
Pay attention to the time-of-day
set at 16 hours, the time NYSE closes, we’ll see it again in unexpected places.
Data Access
QSTK provides a helper class called DataAccess used for reading and caching stock prices.
As you see here there’s some data reading happening, we’re gonna take a look at these functions since we’ll need to write them from scratch.
1 2 3 4 |
|
We’re going to separate this in two functions, first reading symbol data from disk using again read-dataset and creating a hash-map indexed by symbol name.
1 2 3 4 5 |
|
Then if you take a look at voldata
in a python repl, you can see pretty much what it’s doing
AAPL GLD GOOG $SPX XOM
2012-05-01 16:00:00 21821400 7414800 2002300 2706893315 13816900
2012-05-02 16:00:00 15263900 5632300 1611500 2634854740 11108700
2012-05-03 16:00:00 13948200 13172000 1868000 2673299265 9998600
It’s grabbing the specified column volume
or close
from each symbol dataset, and it’s creating a new table with the resulting column renamed as the symbol.
All the get_data magic happens inside get_data_hardread, it’s an ugly piece of code making a lot of assumptions about column names, and even about market closing time. I guess you can only use this library for markets closing at 16 hours local time.
1 2 3 |
|
I’ve translated that into these two functions:
In this case Clojure shines, the original function is almost 300 lines of code. I’m missing a couple of checks but it’s not bad for a rookie, I think.
The helper function select-value
is there in order to avoid an exception when trying to find stock data for a non existent date. Also the function returns :Date
as a double since it’s easier to handle later for charting.
Charting
Charting with Incanter is straightforward, there a subtle difference with python since you need to add each series one by one. So what python is doing here charting multiple series at once
1 2 3 |
|
We need a little function to solve it with Incanter. Each iteration gets reduced into the next with all the series accumulated in one chart.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Data Mangling
Incanter has a lot of built-in functions and helpers to operate on your data, unfortunately I couldn’t use one of the many options for operating
on a matrix, or even $=
, since the data we’re processing has many nil
values inside the dataset for dates the stock didn’t trade which raises an exception when
treated as a number, which is what to-matrix does, tries to create an array of Doubles.
There’s one more downside and it’s we need to keep the :Date
column as-is when operating on the dataset, so we need to remove it, operate, and add it later again, what happens to be a beautiful one-liner in python
1
|
|
1
|
|
I ended up writing from scratch the iteration and function applying code.
Maybe there’s an easier way but I couldn’t think of it, if you know a better way please drop me a line!
Now normalization and daily-returns are at least manageable.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Having the helper functions done, running of the tutorial is almost declarative.
If you wanna take a look at the whole thing together here’s the gist, I may create a repo later.
Please remember NumPy is way much faster than Clojure since it links BLAS/Lapack libraries.
Follow me on twitter
]]>The main point is there’s a semantic misunderstanding of what distribution of wealth is, confusing a statistical frequency distribution of income, with the transitive verb distribute
.
As if the current distribution of wealth is the result of someone who decided to distribute it unfairly.
I pretty much agree on Paul Graham’s take on wealth creation, and think our focus should be on individuals being able to create more value for society and for themselves.
The semantic confusion may be just part of the reason we’re having the wrong conversation, about distributing, instead of being about affecting the distribution by creating.
]]>