Top Secret #6

Mar 18, 2025

SAML went through another round of widespread, deserved criticism this week. In particular, the SAML protocol invited commentary on Hacker News related to a blog post from the GitHub Security Lab. As HN user “Diggsy” wrote,

The SAML spec itself is fairly reasonable, but is built upon XML signatures (and in turn, XML canonicalization) which are truly insane standards, if they can even be called such.
Only a committee could produce such a twisted and depraved specification, no single mind would be capable of holding and combining such contradictory ideas.

If you’re interested, the blog post linked above is quite detailed. (It explains things more thoroughly than I can here.)

In essence, the blog post observes, many real-world SAML implementations relied upon a flawed Ruby library. The researchers noticed that this flawed library used two different tools to look at SAML messages. Those tools behave slightly differently. A clever attacker could exploit subtle differences in those two tools, tricking certain SAML implementations into incorrectly trusting certain SAML messages. Armed with that exploit, an attacker could bypass authentication – signing in as pretty much anyone.

This is a really bad problem. The worst part? It’s pretty predictable. This just happens in SAML. We see problems with SAML implementations all the time.

SAML is weird. When properly implemented, SAML doesn’t really have any major problems. It underpins an awful lot of authentication. But SAML doesn’t make life easy. It’s really complicated and delicate. It does things that it really shouldn’t do. It presents lots of opportunities for developers to make mistakes.

And those mistakes can be costly.

Ulysse wrote a more technical discussion of this general class of problems here. An excerpt from that piece:

SAML library authors need to stop being so credulous about the spec.
When a specification is a collection of security flaws, responsible engineers disregard the specification. Responsible engineers should disregard what the SAML and XML Signatures spec authors wrote down, and instead implement the secure thing at its core.

For better or worse, SAML isn’t going anywhere – at least not in the short run. We all have to be responsible with our implementations.

What We’re Reading

How iguanas got from North America to Fiji millions of years ago: I am continually amazed by the new findings that researchers are able to surface with computational genetic analysis. The researchers described in this article have found fairly strong evidence that suggests Fijian iguanas’ ancestors traveled from southwestern North America over water. That’s more than 5,000 miles. The article speculates that a population of intrepid, seafaring iguanas (this is an entirely terrestrial species) rode floating piles of vegetation across the open ocean before happening upon Fiji by chance.
Why did a family-run California walnut company suddenly pour $144 million into a hot tech stock?: for reasons that remain unclear, Crain Walnut Shelling invested nine figures in Super Micro, a company that makes computer servers. Super Micro stock has not done well – among other things, they’ve been accused of defrauding shareholders. The nut company claims that its stake in Super Micro isn’t even its largest holding. This is all very weird – why is a nut company allocating so much cash to random equities?
Companies Might Soon Have to Tell You When Their Products Will Die: there’s some proposed legislation that seems to have some legs. It’s called the Connected Consumer Products End of Life Disclosure Act (what a mouthful). It basically wants to require companies that make consumer devices to announce their plans for supporting the products. I get the impulse, but I really wish we’d stop trying to legislate our way out of inconvenience. If you’re really worried about companies bricking your stuff, you might just not be an early adopter. And that’s okay.
‘Next-Level’ Chaos Traces the True Limit of Predictability: okay, so this article is not easy to grok. I’m still grappling with it, to be quite honest. I highly recommend reading if you find any aspect of physics, mathematics, or epistemology compelling. From the end of the article, “‘These are very important results. They are very, very profound,’ Wolpert said, ‘But they also ultimately have no implications for humans.’”
Apple AI’s Platform Pivot Potential: Ben Thompson presents a surprisingly optimistic perspective on Apple’s opportunity in the AI era. Apple’s taken some lumps and made some mistakes. I mean, what on earth are those Genmoji It billboards? But the company remains a juggernaut. I’m inclined to agree with Ben on Apple’s path forward: “instead of seeing developers as the enemy, Apple should deputize them and equip them in a way no one else in technology can.”

Top Secret Developer Tips

It’s been a very long time – like, many years – since I wrote C/C++. I spent some time hacking on a small C++ project over the last week, so I had to dust off the ol’ fingerless gloves. I’ve been reminded of a few fun quirks. Let’s talk about array indexing.

Here’s a block of C++:

#include <iostream>
#include <string>


int main() {


   char myArray[3] = {'a', 'b', 'c'};


   for (int i = 0; i < 3; i++ ){
       std::cout << "myArray[" << i << "] is " << myArray[i] << '\n';
       std::cout << std::to_string(i) << "[myArray] is " << i[myArray] << '\n' << '\n';
   };


   return 0;
}

If we run this code, we get the following output:

myArray[0] is a
0[myArray] is a

myArray[1] is b
1[myArray] is b

myArray[2] is c
2[myArray] is c

In the fun world of C, an array name – like myArray – will implicitly convert into a pointer to its first element. Using myArray means I’m identically using a pointer to the character ‘a’.

When we’re indexing an array, we’re basically describing an offset. That’s just how things are defined. myArray[2] describes the thing that’s two away from myArray or * (myArray + 2).

The offset is just addition, and addition is commutative. The thing that’s two away from myArray is the same as the thing that’s myArray away from two.

We should recognize the following three statements are true

myArray[2] == * (myArray + 2) // by definition
* (myArray + 2) == * (2 + myArray) // by commutative property
* (2 + myArray) == 2[myArray] // by definition

By the transitive property, we can then observe that myArray[2] == 2[myArray]. Weird!

This probably isn’t news to anyone that’s worked with C/C++ much, but I had pretty much entirely forgotten about this little quirk.

Nerd Corner^TM

If you’ve taken statistics – or maybe even linear algebra – you’ve probably run into linear regression. In all likelihood, you were looking at a specific kind of linear model called ordinary least squares (OLS).

The general idea goes something like this: we want to estimate the relationship between two variables that seem related. We put one variable on the Y axis and another variable on the X axis, then we plot all of the individual datapoints. Then, loosely speaking, we look for the line that best cuts through the points we’ve plotted. It’s kind of a “line of best fit.”

As we get deeper into statistics, we often learn some more exotic techniques for estimating the relationship between two variables. This is particularly true in applied microeconomics, where researchers pretty much never get the data that they want. Economists just can’t run controlled experiments like physicists or chemists can.

One interesting – but still basically legible – such technique is called regression discontinuity. It exists to help researchers navigate around odd jumps in data that warp relationships between variables.

These kinds of data distortions often result from public policy. In California, for example, adults qualify for medical benefits so long as their income does not exceed 138% of the Federal Poverty Line. If you were trying to associate health outcomes with income, you might notice that adults earning 135% of the FPL look very different from adults earning 140% of the FPL. Even though they’re basically the same, one group qualifies for Medical – and the other doesn’t.

So we can get data that looks like this, where the relationship between X and Y changes suddenly at a threshold:

Econometrician Joshua Angrist exploited regression discontinuity in a legendary paper to identify a pretty strong inverse relationship between class size and test scores (i.e., that small classes are good for test scores).

It turns out that Israeli schools applied Maimonides’s rule: class sizes could never exceed 40 students. The rule would be enforced pretty mechanically. If a school had 39 students, it would have one class of 39 students. If the school had 40 students, it’d still have one class of 40 students. But as soon as a school had 41 students, it’d break them into two different classes – one of 20 students and another of 21 students.

This meant that Angrist could use overall enrollment as a clever proxy for class sizes!

After some adjustments for other factors, he could observe lots of variation in test scores around schools with enrollments close to a multiple of 40. For example, schools with 79 students tended to have very different scores than schools with 81 students. The only reasonable explanation for such an observation would be changes in class size from adherence to Maimonides’s rule!

And yeah, it turns out that smaller classes seem a good bit better. Maybe not that surprising, but pretty cool stuff all the same.

Other Cool Stuff

Vigil is the world’s most moral programming language. It’s like Python, but it comes with some really powerful extra features like implore and swear. Vigil will find any functions in your code that raise exceptions or behave in unexpected ways (per implore and swear) – and delete them from your source code. No more bugs!
Hummingbot is an open source toolkit that lets anyone do high-frequency trading on crypto exchanges. I’m pretty sure engaging in that activity is a horrible idea for pretty much everyone on earth. There’s an example trading strategy in the GitHub called buy_low_sell_high.py – you decide whether this seems like a good idea. That being said, this is a pretty interesting piece of software.

From The Archives

(2003): Eric Conveys An Emotion
(2004): Why Writing Your Own Search Engine Is Hard
(2006): A Guided Tour of the Visible Human
(2013): Thoughts on Bitcoin
(2016): How to weigh your cat! – the IoT version
(2020): 100 Little Ideas

Thanks,
Ned