Solving The Oracle Problem [Sergey Nazarov]

How do you bring the real world into the blockchain world when billions of dollars are at stake?

Listen to Software Engineering Daily: https://softwareengineeringdaily.com/2021/04/07/chainlink-connecting-smart-contracts-to-external-data-with-sergey-nazarov/

Transcript

JM: Tell me a little bit more about the data sources for Chainlink. Like how do those
data sources get vetted and how does the data make its way onto the chain?


SN: Right, absolutely. So there're actually two approaches here and I think they're
both important and the flexibility of how you acquire data is important. The first approach is that
you have an oracle network and that oracle network is a collection of nodes that are incentivized
just like blockchain miners and Bitcoin miners are incentivized. Those nodes are incentivized to
go out and get accurate data in order to generate the most accurate, highly reliable result
possible.

In the first version of how data is put into a smart contract, this oracle network of anywhere from
seven to over 30 nodes basically goes to an API at a data provider that is considered a highquality data provider. Often that's determined by users. So users will say, “Hey, we want that data provider.” Chainlink also has a reputation system where we track how well each node, and even more and more now how each data provider is performing. And so better data providers
get to continue selling their data to Chainlink networks, whereas worst data providers are kind of
not as used by node operators because they're either not responsive or not returning the right
results. And so there's actually a reputation system baked into Chainlink, and it's quite
fascinating because the system inherently puts all of the data on chain and generates a lot of
proof about what's going on with the oracles.

In any case, in the first variant of the system you can go to any data provider, you can go to
really any API in the world and you can request from it and you can come to consensus on the
data from that source assuming you can get other sources or you can come to some model of
consensus that the user wants around that data. And that doesn't require the data provider to do
anything, right? So the benefit of this system is that you have a layer of consensus and you
have a lot of proof that the data was acquired from a data provider and the data providers don't
need to change anything about their infrastructure, right? So the data providers just continue to
provide their APIs, operate the way they have always operated and just do what they're
supposed to be doing. This is the system through which a good amount of the data is acquired
and then the data providers are more than happy to sell their data to Chainlink nodes because
it's consumed into these applications which they're all excited about.

The second version is when a data provider runs their own Chainlink node. And what that
basically means is the data provider gets a lightweight signing appliance. They basically get a
lightweight signing application that allows them to connect their APIs internally to their own
official node. And then that node publishes a contract on-chain, and that on-chain contract is a
representation of that data provider. So now there's an on-chain contract that's the
representation of that data providers services. And that on-chain contract gets requests from
other smart contracts for data to be given to them because, once again, a blockchain cannot
talk to an API. A blockchain has to have an oracle to speak with any API in the outside realworld.

And so the second variant is where data providers that are more interested in kind of selling
their data to the blockchain ecosystem or more convinced about that, and we have many data
providers already doing this live. We have data for sports events, weather events, market
events, all kinds of things out in the real-world already live on production with data providers
running their own production nodes. This variant allows you to get data essentially directly from
an official node run by a data provider. It has the benefits of getting data directly from a data
provider running their own node. It has the limitation in that the data provider now has to be able
to make sure that they are properly connected, that their APIs stay up according to the node and
all these other kinds of nuances. The benefit that they get is they are connected to many
different chains all at once. And in reality this variant basically requires the data provider to want
to opt-in to some kind of infrastructure. It requires them to want to say that, “Hey, I want to kind
of run a function in the cloud or I want to run some kind of node myself and I want to make a
technical investment in that.”

What we found so far is that the majority of data providers just want to sell their data to
somebody and they want to provide that to an oracle network that just retrieves their data and
sells that data successfully to a smart contract. There are some data providers that want to run
their own node and we're working with a lot of those, but I think that's something that's going to
evolve more slowly.


[00:16:33] JM: You mentioned this reputation system for how data gets verified as quality. How
does that reputation system work? How do you vet and ensure quality data?


[00:16:45] SN: So once again there's two levels. There's one level of the node operators and
assuring that they're operating properly and then there's the level of the data providers
responding properly. In terms of the node operators, the way that the Chainlink system works is
that node operators are committing to certain service level commitments, right? They're
basically, in many cases, on-chain committing to a certain degree of service. And they're
committing to that because the on-chain activity that they do is immediately public to everybody
as soon as it happens.

So I think the big nuance difference between a reputation system in the web world and a
reputation system in the blockchain world is that data is immediately available publicly. It is
immediately available for people to know that a node did not respond for a certain period of
time. And that lack of response is recorded on-chain immutably for everybody to analyze. And
we actually have multiple ecosystem teams. We have multiple kind of block explorer-like things
and marketplaces that are all able to analyze the same data about both node operators and
data providers.

So basically the way that it looks is that the node operators are expected to perform to a certain
degree on-chain. Those expectations are clear. They are then able to perform, or in some cases
if they're not able to perform, they are not able to stay on that oracle network. And then the data
providers themselves, for the ones that run their own nodes, it becomes pretty clear what their
responses, are and if their responses are often wrong, then you know once again that data
provider and their node might not be used in an aggregation. They might not be applied to that
aggregation.


In the cases where a node operator gets data from a data source, a lot of that data is actually
more internal to the oracle network and that data is something that's in the process of getting
published on chain. So there is a certain amount of insight that node operators have about the
responsiveness of different data providers and different data sources. At this point the reputation
system extends to node operators and to the node operators that are data sources. It will
continue and is already being extended to cover data providers. And that's another kind of piece
that's coming and is already working for node operators in how they choose data providers and
is something that's going to be made more public.


[00:19:13] JM: Let's talk a little bit about the architecture of Chainlink. Can you tell me about the
different types of smart contracts that are stood up to compose what Chainlink operates as?


[00:19:26] SN: Sure. Sure. I think the simplest way to think about Chainlink is that you're
creating an on-chain interface between an off-chain service resource or computational
environment. So what you're really creating is you're creating an on-chain contract that can
receive transactions from other contracts that basically request specific types of data, specific 

types of computations like randomness and in many cases require you to make a commitment
to provide that, right? And it actually varies in terms of the use cases.
So there're variants of chain-link networks that create something called reference data.
Reference data is a piece of data that's used by many different contracts, and we have some of
the top defi protocols using our reference data to settle their protocols and transactions and in
lending and derivatives and insurance and various other financial products. And what reference
data does is it creates an on-chain aggregation from multiple nodes and then that aggregation is
then provided through an interface, through another interface that allows people to read that
data and to use it in their contract. So that's one way to interface with Chainlink validated data.
Another way to interface is something called the request model. The request model is when you
actively request a specific computation, a specific piece of data, a specific randomness from
something like Chainlink VRF where you basically have a designation, a job ID that you feed in
and you use to trigger a request. So I think that the nuance around understanding what oracle's
do and what Chainlink is is around both the interface, the interface that allows people to
consume all this data in different ways, and it's kind of a roundabout answer to your question
because it's as varied as the different use cases want to consume data, which is quite varied,
and also the type of data they want to consume.
And then this interface is replicated across all the different blockchain environments, but in
many cases goes back to the same kind of core oracle network for that piece of data and then
that retrieves the data from specific sources. But I think the simplest way to think about an
oracle in an oracle network is that you're creating an unchained contract that is acting as the
interface. Just like APIs are an interface into people's web backends, oracles are kind of another
onion layer on top of APIs that act as the interface for people to interact with those services from
a smart contract. And those interactions are very varied. They can be on a schedule where you
tell the interface that you want them to send data at a certain point in time. Or in some of our
keepers functionality we actually can watch contracts and the oracle network chooses when to
send them data based on the certain conditions that contracts have or haven't met. And so it's
more and more advanced depending on how people want to receive the data or how they want
the off-chain service to interact with them.


2021 Swyx