[STEEMSQL.COM] A public SQL server database with all Steemit blockchain data

avatar

Did you ever wanted to easily access data contained in the Steem blockchain and perform analysis or find valuable information. But not everybody has programming skills to gather those data and compute the wanted result.

Therefore, I created a publicly available SQL database with all the blockchain data in it.

Why use a SQL database?

The main advantage having such a database is the fact data are structured and easily accessible from any application able to connect to a SQL Server database. Having a SQL Server database makes it possible to produce quick answers to queries.

Simply put, a query is a question. You ask the server form something and it sends back an answer (called the query result set).
For example, when dealing with large amounts of data as Steem blockchain data, you might want quick answers to questions (queries) such as:

  • What was the Steem power down volume during the past six weeks?
  • Which are the top 10 most rewarded post ever?
  • Did I get, me or my posts, mentioned in any post or comment?
  • How many posts are talking about ants?

Browsing the blockchain over and over to retrieve and compute such information is time and resource consuming.
If you don’t have a local copy of the blockchain, instead of downloading the whole data from some external public node to process it, you will send your query to SteemSQL server and get only the requested information, saving tons of bandwidth.

Let’s have a look at some technical details

Database diagram

The Blocks table contains bare block information (timestamp, witness …)
Each block can contains Transactions
Depending on each transaction type, the associated transaction’s data is stored in the related table.

Full text search

The database has been full text search enabled. This allow fast search of information within post and replies.

Let say I want to know if anyone mentioned me in a post or comment, the following simple query will do the trick

SELECT author, title, body, url FROM TxComments WHERE CONTAINS(body, '@arcange')

Database Connection information:

Here the information to connect and query the database:

Server: sql.steemsql.com
User: steemit
Password: steemit

Database name: DBSteem

How to retrieve query the database

  • Using Microsoft Excel

Check this tutorial to see how to create an analysis report with Excel

  • If you’re a python programmer
import pypyodbc
connection = pypyodbc.connect('Driver={SQL Server};'
                                'Server=sql.steemsql.com;'
                                'Database=DBSteem;'
                                'uid=steemit;pwd=steemit')
cursor = connection.cursor() 
SQLCommand = ("YOUR SQL QUERY”)
cursor.execute(SQLCommand)
# do whatever you want with the retrieved data
connection.close()

Support

If you need help, have any comment or request, join steemsql channel on steemit.chat

Availability and performance

The SQL server is hosted in a datacenter with 24/7/365 availability.
Available output bandwidth is up to 500Mb/s

New data from the latest blocks are injected in the database every 10 seconds.

The server is currently hosted in a shared infrastructure.
I will monitor server load and if it requires more resources, I will allocate any reward to this post to a dedicated infrastructure.

You like it, please upvote or follow me



0
0
0.000
65 comments
avatar

That's nice of you to make this available. hopefully you won't get flooded with requests that the resources requirements will require too much of an overhead :)

0
0
0.000
avatar

That could be, as for every service made public. But worth to experiment as it will be a good test for the infrastructure. This one is scaled for a normal usage.
The risk is indeed have bad response time if the load is to heavy. Then people writing poor queries might go away as they won't get correct response time.
Anyway, I will do my best to educate users on how to perform efficient queries.
Future will tell.

0
0
0.000
avatar

As soon as I have some time I will use your steemit database because I am a data fanatic. Thank you for making this available. I am following you now.

0
0
0.000
avatar

Thanks!
I will closely monitor server as I made it a bit quick and dirty. I will now work on tweaking it for best performances

0
0
0.000
avatar

Thankyou heaps for doing this!
I have a heap of ideas for queries but I've been trying to avoid the overhead of running my own full node - this is a far more efficient way to do it :)
I'll be following, great work

0
0
0.000
avatar

That's exactly the purpose of it: having an efficient alternative to local nodes.
Thanks!

0
0
0.000
avatar

This is pretty good. I've been trying a few queries, sometimes taking a while. Would you consider adding some indexes on a few 'key' fields, or do you want to keep it like that ?

0
0
0.000
avatar

This is a really great thing you have going here! Kudos for setting up a public node and service for the community. I sincerely believe this post should be upvoted like crazy and end with 3 or 4 figures. This does deserve more notice and appreciation. But from me, you've my upvote and follow. Thank you.

0
0
0.000
avatar

Thanks for your support!
The load is low at the moment but I already received lot of positive feedback of people wanting to use it.
I also wish I could get a few reward for this, not to become rich but to be able to put the server on a dedicated infrastructure with HA and DRS and ensure the best service to the community.

0
0
0.000
avatar

How did you populate the database to begin with? If you're using the RPC mechanism, after 30 days some data is no longer returned. For example, active_votes is empty for all posts after their second payout. You can even see this on steemit.com where old posts have zero upvotes.

0
0
0.000
avatar

All votes are in transactions and stored in the TxVotes table, even first votes ever :

0
0
0.000
avatar

the differences between the graph database and the monolithinc, intransient relational database takes some compilation of data, and new blocks alter records, in ways that can be difficult to adapt, such as if a poster makes a really big edit, and suddenly the datatype has been overflowed.

But I think this is a cool idea. It might make a model for moving from a memory heavy graphene database to a storage heavy sql, maybe some sort of hybrid to cache data for faster retrieval or so.

0
0
0.000
avatar

SteemSQL database is a replica of the whole blockchain, not a compilation!
This mean if a user edit a post, you will find 2 or more transactions in TxComments with the same author/permlink but with different bodies/titles.
Compilation of data is made at query time.

0
0
0.000
avatar

Ah. So it's just another implementation of the graph database protocol, layered on top of a relational database.

0
0
0.000
avatar

This isn't a graph database at all, unless I am somehow mistaken.

0
0
0.000
avatar

No, it however has to implement one to allow the diff-updates that the platform allows, which essentially is a graph database (like CVS or Git). It just gives you the ability to search using SQL queries which are very concise and neat and easy to write, and with that pull all kinds of higher level data out of it. I think that was the purpose of building it - to allow more people to tinker with analytics.

0
0
0.000
avatar

Seriously, Doesn't this defeat the purpose of decentralization?

0
0
0.000
avatar

Why its defeat decentralization?

0
0
0.000
avatar

No. The essence of data analysis is collecting (aggregating ) data and making meaningful insights from it, which requires centralization by rule. The decentralized nature of the data isn't in jeopardy here because there are still unlimited copies of the Steemit blockchain out there ensuring that there is no vulnerable, mutable, single-source of the truth. This particular copy just happens to facilitate reporting.

0
0
0.000
avatar
(Edited)

This is good. Although I like something like PhpMyAdmin more.
What hardware is your service running on? I'm curious about the scalability.

0
0
0.000
avatar

The server on a Intel Xeon E5-1650 v2 3.5Ghz 6 cores with 64GB memory, 8TB storage with MegaRAID 9271 Cache 1Go + CacheVault. It can be moved to a VMware HA cluster if required.

0
0
0.000
avatar

Argh!!!! I was just trying to build one of these!
I'm not sure whether to be upset I spent 2 days trying to make it work or ecstatic that whatever guided me, guided you there first.

Eitherway, I'm glad you did this, but super jealous you managed to pull it off faster than I could :)

Congratulations though, you earned my upvote. I'm following you now too!

0
0
0.000
avatar

I would have been super jealous too if you did it before me ;)
Everything was ready 2 weeks ago, but I wanted to fine tune and check every bit and byte of the infrastructure before releasing it. No worth to publish something if user experience is disappointing.
Many thanks for your support !

0
0
0.000
avatar

I'm late to the party but I am excited about this as I am learning SQL at work and would love to use that knowledge to understand Steemit better and contribute to the community with that knowledge.

0
0
0.000
avatar

This is way freaking cool. Man, I fall behind on #dev in steemit.chat by just a couple days and miss amazing gems like this. Well done!

0
0
0.000
avatar

Thanks ! Hopefully it will be useful to you.

0
0
0.000
avatar

I'm at RESTfest today, and I'm contemplating building a REST interface for the db as part of the hack day today... Hmmmm...

0
0
0.000
avatar

Best of luck! Waiting for any feedback.

0
0
0.000
avatar

Unfortunately, I wasn't able to get my local php connecting to mssql. :(

0
0
0.000
avatar

hey, I'm playing with it right now. how I can I see the full architecture of the database?

0
0
0.000
avatar

All tables and their relation are shown in the database diagram in this post.

0
0
0.000
avatar

http://steemsql.com/ is not working?

0
0
0.000
avatar

Haven't noticed any problem.
Please use steemsql channel on steemit.chat for help or support

0
0
0.000
avatar
(Edited)

Just clicked on the link: http://steemsql.com/ , I got this:

This site can’t be reached

steemsql.com’s server DNS address could not be found.
0
0
0.000
avatar

The link in your first reply works.
The link in your last reply is broken (redirect to "http://steemsql.com/," with a coma at the end)
It is useless to check "http://steemsql.com" website ATM as it redirect to this post

0
0
0.000
avatar

Still not working for me. No, it doesn't redirect to this post. Anyone else please check? http://steemsql.com/

By the way the comma is a known old issue of steemit.com related to link parsing. It's off-topic though.

0
0
0.000
avatar

hi @arcange,
Is this database still working ?
If yes Then I am not able to connect to server using Sql server management studio.

0
0
0.000
avatar

I am able to connect just fine.

0
0
0.000
avatar

This is awesome! Thank you for your effort.

0
0
0.000
avatar

Doesn't this defeat the purpose of decentralization? If there is a central database with all the information on the blockchain, then wouldn't that create a vulnerability?

This would be better if the database itself was decentralized... I think I'll code one like this. Thank you for your information! :)

0
0
0.000
avatar

Thank you so much for hosting this! You are my hero! Successfully connected and doing some poking around in the data for fun tonight :)

0
0
0.000
avatar

Why MS SQL server, and not something more open like MySQL or PostgreSQL?

0
0
0.000
avatar

The fact the server behind is open or not does not matter to me.

The server itself is not open, only the data in it.
What's important is to have the required skills to manage it (including its security)and have it available to users's requests.

You're free to build your own if you really want an open source server.

0
0
0.000
avatar

Here’s a sad scene for you. About 4 people have read this post I made about 3 days ago: “Can you supply some block-chain data for payment?”. Your post suggests that there must be or than 20 people in the community who might have been alerted to my post, and some would have promptly sent me to your service, which clearly needs to be well supported.

New Steemit posts disappear into ‘ether’ 20 minutes, and will turn up on peoples’ Feeds only if they are Following the author!

Anyway, I will be trying to use your service soon, and pay suitably. I am experienced programmer; but have not bothered to learn Python and only occasionally use SQL (inside SAS). I will come back when I have done the needed studying.

Cheers!

0
0
0.000
avatar

Here’s a sad scene for you. About 4 people have read this post I made about 3 days ago: “Can you supply some block-chain data for payment?”. Your post suggests that there must be or than 20 people in the community who might have been alerted to my post, and some would have promptly sent me to your service, which clearly needs to be well supported.

New Steemit posts disappear into ‘ether’ 20 minutes, and will turn up on peoples’ Feeds only if they are Following the author!

Anyway, I will be trying to use your service soon, and pay suitably. I am experienced programmer; but have not bothered to learn Python and only occasionally use SQL (inside SAS). I will come back when I have done the needed studying.

Cheers!

0
0
0.000
avatar

The database here is not responding!! I'm trying to connect to it from Java code and it says "Communication link failure". Also from comand line I run a ping and none of the packets come back!

0
0
0.000
avatar

Amazing. Glad to found this post. Thank you for providing the infrastructure. I have accessed it and it works well

0
0
0.000
avatar

the steemit account looks suspended, the service is no longer available?

0
0
0.000
avatar

it has moved to a monthly subscription model.

0
0
0.000
avatar

Hi, I can't connect using SSMS, do you use default port? (1433)

Ping to server is not responding, I'm not sure if i'm doing something wrong :(

Thanks for sharing!

0
0
0.000
avatar

SteemSQL has moved to a monthly subscription model and the default free account/password “steemit/steemit” has been disabled.

You will find more information about the subscription process here

Please visit steemsql.com for more information.

0
0
0.000