Journey to MongoDB [Mark Porter]

Why MongoDB, from the CTO of MongoDB

Listen to more on the StackOverflow Podcast: https://stackoverflow.blog/2021/08/06/podcast-364-mark-porter-mongodb-database/

Transcript

markporter
 
 
[00:00:00] swyx: This is Mark Porter, the CTO of Mongo DB on his personal journey from relational databases to Mongo DB.   

[00:00:06] Mark Porter: I am a relentless tech geek. I've loved tech my whole life. In fact, my Twitter handle is MarkLovesTech. I have used databases since I was 14 with some really ancient technologies started out on a 4k TRS 80 model one computer. 

We had to program it in assembly language because there wasn't enough memory to use the local basic copy. And I very quickly got into databases and I was talking to someone the other day and he pointed out something I'd never noticed, which is I've oscillated between using databases and building database. 

So I started out at Caltech and NASA using databases for space, data, and chip data. And then I built databases at Oracle versions, 5 6, 7, 8 for about 13 years. And then I used databases at NewsCorp for huge student data systems. And then I built databases at Amazon with Amazon RDS. Then I moved to Grab taxi, which is the Uber of Southeast Asia and use databases to deliver 15 million rides and meals a day, and then came back to Mongo DB. 

And here I am building databases again. I frankly can't get away from this thing.  

[00:01:20] Ben Popper: I love that story. I wonder. Does that mean. You know, at each point you had some sort of frustration or saw some sort of like opportunity for innovation, you know, you kind of would build something, then you'd be the user of it. 

Then you'd realize that like the next sort of turn of the wheel was coming. As you move between those jobs where new paradigms and databases and murders.  

[00:01:38] Mark Porter: Yeah. I mean, it's been really interesting. Half of my career. I've been the Bo and half my career. I've been the target. And I got to tell you that sometimes as a customer, you're not really happy being the target of what has been produced. 

Look, the reality is, is relational databases have been the modus operandi since 1970, when Cod first did his paper. And then Oracle was the first company that released them in 1979. They were actually known as relational technology back then and then changed their name later to Oracle. So the mission criticality of databases has never been in doubt. 

What has changed is the amount of data, the way we process that data. And what's really, really important. And it used to be duplication of data was important and things like that. And while that's still important, what's really important. Now is developer product. Bar none. That is job one for any mission critical software company is developer productivity and innovation  

[00:02:35] Ben Popper: makes a lot of sense. 

It does seem like data has become almost this, uh, overwhelming force for some companies. Ryan. I know if you have experience with this, but I've been getting a lot of pitches and, and talking with folks on the podcast and you know, it's gone from, we're using data to, we have data lakes and there's a data iceberg. 

And, you know, we're only sort of scratching the surface of what we might be able to do with this. Endless flow of unstructured data that we're collecting. And as you mentioned, yeah, a lot of times what they're looking to do is understand it in a way that allows them to enhance productivity or automate certain processes, which right now are very time labor intensive. 

Yeah. Yeah. At my previous job, I worked out on an article about data pipelines and, you know, ETL processes and that yeah. There's a becoming a separation, I think, between your production database and the database you use to gain insights, right? Then the production database has to be fast. But the insight database, it can be a little more flexible in how it produces data, right? 

[00:03:34] Mark Porter: Yeah. So we think about systems of record. We think about systems of insight and yeah. I mean, definitely different people want to do different things with the databases. And so what we do is we think about personas. Are you an analyst? Are you a developer? Are you an AI ML engineer? Are you a PhD data scientist? 

We always try to come at it from the customer and what they want to accomplish. Yeah,  

[00:03:56] Ben Popper: I think that's so interesting because as you said, obviously, databases have always been part of working in the world of software and computers, but increasingly there are these specialties that are very important in which are producing these really interesting results that themselves are devoted to data, as opposed to it being something that, you know, needs to be part of the larger process. 

Um, so mark, I wanted to touch on something, which is that you had a part of your career at AWS, which now, you know, has grown into. Quite a behemoth. Um, yeah. Just wondering if you can talk to us a little bit about what you learned there and maybe how some of that applies to the role you have at, at Mongo DB. 

[00:04:26] Mark Porter: Yeah. So I joined AWS as the general manager of AWS RDS, which at that time was probably the largest fleet of databases in the world. And that fleet grew just tremendously while I was there. It was, it was amazing, you know, just showing. That it's not just databases. It was managed databases that mattered. 

So RDS did not build any of its own databases, RDS vended. By the time I left over a million significantly more than a million Postgres, my SQL Maria DB, Oracle, and SQL server databases. And so the product that we produced was managing those databases and people love it when their database stays up. When the backups and restores work, when you can change parameters when fail over works and all those things. 

However, over time, as much as I loved running those databases, I became frustrated with how they were shackles almost on customer innovation and customer operability. And so we developed this system called Amazon Aurora, which changed out the storage system underneath Postgres in my SQL. Obviously we couldn't do that for the commercial databases and we made those databases so much more resilient, so much more durable, so much more available, but we kept running into the fundamental limit. 

Of a rigid architecture of high fail over times and a single primary architecture, which meant that the blast rate. Of a system going down or play in changing in Oracle database. I mean, it takes down a whole company and I can talk more about availability. In fact, you'll have trouble stopping. When you talk to you about availability, if you get me started  

[00:06:09] Ben Popper: well, I mean, that's, that's the, uh, the big thing about a no SQL is, is availability, right? 

The replicability, the speed of access. Yeah, for folks who don't know, let let's lay out the value prop here. Like what is sort of the difference between the two and why would you prefer one over the other? You know, you mentioned shackles. I love that word, but yeah. You know, what are the limitations that it allows you to avoid when you, when you move to a new SQL and I guess, you know, to the degree that it makes sense. 

Yeah. Talk a little bit about availability or I guess, you know what I would say, it's almost like how robust your system can.  

[00:06:41] Mark Porter: So I do think availability is really important, but from, just from a value prop point of view, the main reason that no SQL was started was multiple things. Number one, was this platform availability. 

I actually think you guys had a podcast with Elliot about a year and a half ago where he talked about the founding of Mongo DB. And I will give a shameless plug for one of your other podcasts, which, which is a great podcast that Elliot did. And. You know, in it, he talked about the fact that they want to do 400,000 transactions per second, and there was no way they could do it, but along the way, they did something even more important, which is they developed the document model. 

And the document model is just a natural way to program. When you want to add a field to a no SQL application that you're writing. You just added in your code and your structure in your, in your structure in Java or go or rust or whatever. And the database automatically starts having that field. So it's not just about availability. 

Now to get to your point about availability, Mongo DB uses what's called a sharding architecture or a replica set architecture where you can't actually configure a Mongo DB that doesn't have three nodes and those nodes automatically do elections and they automatically start up. And as opposed to relational databases where fail over is measured in 30 seconds, 60 seconds, 90 minutes, 10 seconds. 

Fail-over in Mongo. DB is measured in single digit seconds, RP 99.9 election time. And our outlet service is less than seven seconds. And why is that important? Because when an app is down for three to five to seven seconds, people go, huh? What happened? What's going on? On my phone. When it's down for 60 seconds, they've already visited another website to complete their purchase. 

And so there's a fundamental difference. So the ability to stay up and the ability to, to be available is thing one, the second ability is the ability to scale without limit. We have customers running a petabyte. In Mongo, DB clusters. And with over a thousand nodes, you just can't do that with the relational, even Aurora, which I just got to tell you, I love deeply because I helped architect it. 

You have one writeable master or primary and up to 15 read replicas. And if you run out of the ability of that master or primary to take right. You're done. You got now split your database and do crazy stuff. So those were the fundamental premises of databases. So, but the thing that's really missing there is that developers love databases, but developers do so much more than just store and retrieve data developers want to do graphs. 

Developers won't do analytics. Developers want to have a connection to their mobile device. They want to do all this. So what we're doing at Mongo DB and sorry for the brand plug, but I I'm pretty passionate. Yes, we're building an application data platform where the. Correspondence between what we produce and our main persona, the developer we're trying to get to a hundred percent. 

[00:09:44] swyx: I think this conversation was exceptional, not just because one of the key criticisms of Mongo DB is always from the SQL folks who just say that in the Mongo DB script, kiddies don't really know SQL one now. So they picked Mongo DB. This is a guy who definitely know SQL and is still in love with Mongo DB, but also this is a CTO who's clearly in love with the technology his company represents. And I think there's just not enough of those. I see so many resting investing CTOs were basically completely checked out and not really inspirational leaders or the company. And I think there should be more CTOs that are exactly like my Porter.
2021 Swyx