There are lots of conflicting opinions on financial database schema, but in the end I decided to follow the advice of an article from Quantstart. This was especially important because I knew I would end up with quite a lot of data: I was imagining at least 15 years of daily data for 5000 tickers, which is about 30 million rows. I wasn’t naive enough to think that I could come up with the perfect schema from nothing, so I took to google to learn from other people’s mistakes. Optimal schema design is a really huge topic that is often the subject of university courses, as well as being a typical interview task for prospective database administrators. Database schemaĪ database schema sets out how all of the information is going to be organised in your database. I like Sequel Pro (macOS only), which is open-source, elegant, and designed specifically for MySQL/MariaDB. Lastly, although it’s completely up to you, I would recommend having some other GUI database software on your computer to help visualise things. However, if you’re on a mac and already have homebrew, it’s really quite easy: Please refer to the official webpages for more. PrerequisitesĪ complete guide to setting up MariaDB/MySQL is outside of the scope of this post. The official documentation gives a very clear exposition on the topic of storage engines. They didn’t do so, hence I stuck with InnoDB. The default for anything after MariaDB 10.2 is InnoDB, and I felt that the burden of proof was on the alternatives to demonstrate superiority for my purposes. Lastly, you have to choose a storage engine. In the end though, I chose MariaDB, which is an open-source fork of MySQL that seemed to offer all of the features that MySQL did, with a few minor improvements (see an interesting discussion here). I decided to go with MySQL, because I felt that I didn’t need the application-embedding that SQLite offered, nor did I need all the advanced features of PostgreSQL. A quick look cut my options down to SQLite, MySQL, or PostgreSQL. But even after narrowing it down to a traditional RDBMS, you still have to choose the exact system. It was quite clear to me that a SQL relational database was what I wanted, after all, price data is highly structured and I need very quick read speed. I think this chart from nuodb sums up the various options quite well: Do you want a relational database or NoSQL? If you choose relational, which system are you going to go with? Within that system, what storage engine should you use? When it comes to choosing a database system, there are a somewhat distressing number of decisions that you have to make. Adding an exchange and a list of securities.Nevertheless, I’m leaving this post up for personal nostalgia. As this is my first financial database, there may be inefficiencies in the schema, but overall I believe that the solution presented here is relatively robust, and definitely sufficient for my purposes.Įdit as of Feb 2022: I probably wouldn’t recommend following the steps in this post – there are better resources elsewhere. I think creating your own securities database is an important step for anyone looking to get into algorithmic investing more seriously, so I’ve decided to share how I’ve done so. So I decided to make a hoard of this data, in the form of a stock price database. However, it remained a concern for me that one day the winds would change and Yahoo Finance would deprecate this hidden API permanently. I discovered a crude but functional way of doing this (detailed in this post), but then discovered an extremely convenient python library that does the same thing much more efficiently, with a direct pandas-datareader interface. Although they’ve deprecated their official API, they still have the same data on their website, meaning that it can be scraped if you can be bothered. My main data source has been Yahoo Finance. I’ve worked broadly with two datasets in particular: historical financial statistics (e.g P/E ratio, price/book) make up the features that my algorithms learn from, but the actual backbone of any strategy is historical price data. ![]() As part of this hobby, I’ve spent many more hours parsing and processing data than I have actually applying machine learning. One of my interests is exploring the applications of machine learning to financial markets.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |