Shell Script Loop for Yahoo-Finance Python API Fetch Error

For the past month, I've been working with a friend on an algorithm to predict fluctuations in the stock market.

We initially started by gathering historical data on various stocks using the yahoo-finance python api. Install it with pip install yahoo-finance.

Sample Input

I had a list of stocks with their IPO release year formatted as shown below:

AAAP 2015
AAOI 2013
AAPC 2015
AAPL 1980
ABAX 1992
ABCB 1994
ABCO 2001
ABIL 2014
ABTL 1999
ABTX 2015
"  " "  "

I originally looped the following code for each item in this list (approximately 1400).

#!/usr/bin/env python

from yahoo_finance import Share
from pprint import pprint
import sys
import time

arg1 = sysargv[1]
arg2 = sysargv[2]

date = time.strftime("%y-%m-%d")
date = "20" + date

stock = Share(arg1)

pprint (stock.get_historical(arg2+'-12-31', date))

Sample Output

If we were to find all the historical data for Advanced Accelerator Application SA (AAAP) from its IPO release in 2015 we would enterpython criticaldates.py AAAP 2015, where criticaldates.py is the program above. The output we would get would look something like this:

[{'Adj_Close': '26.290001',
  'Close': '26.290001',
  'Date': '2016-12-27',
  'High': '27.68',
  'Low': '23.77',
  'Open': '23.77',
  'Symbol': 'AAAP',
  'Volume': '372500'},
 {'Adj_Close': '24.07',
  'Close': '24.07',
  'Date': '2016-12-23',
  'High': '24.389999',
  'Low': '23.51',
  'Open': '23.790001',
  'Symbol': 'AAAP',
  'Volume': '148300'},
   {'Adj_Close': '23.709999',
  'Close': '23.709999',
  'Date': '2016-12-22',
  'High': '24.92',
  'Low': '23.50',
  'Open': '24.120001',
  'Symbol': 'AAAP',
  'Volume': '376100'},
 {'Adj_Close': '24.68',
  'Close': '24.68',
  'Date': '2016-12-21',
  'High': '25.540001',
  'Low': '23.98',
  'Open': '25.110001',
  'Symbol': 'AAAP',
  'Volume': '189900'},
 {'Adj_Close': '26.209999',
  'Close': '26.209999',
  'Date': '2016-12-20',
  'High': '26.940001',
  'Low': '25.389999',
  'Open': '26.34',
  'Symbol': 'AAAP',
  'Volume': '107700'},
 {'Adj_Close': '26.27',
  'Close': '26.27',
  'Date': '2016-12-19',
  'High': '27.82',
  'Low': '26.25',
  'Open': '27.60',
  'Symbol': 'AAAP',
  'Volume': '54800'},
  "                 ",
  "                 "]

However, in some occurances when the program was looped in python, I got the following error and the program would be aborted; if I ran the program again, it would get aborted at a different point on the list with the same error.

  File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 502: Cannot find server.

My Solution

Since the error was inconsistant and aborted the program at different points on the list each time, I decided to write the following shell script to externally loop the criticaldates.py program instead of making the loop withing the python program itself:

#!/bin/bash

IFS=$'\n'

for i in `cat stocklist`
do

a=`echo "${i:(-5)}"`
c=`basename $i $a`
b=`basename $c '     '`
d=`echo "${i:(-4)}"`

cd $b

echo ===== start $b ===== >> historicaldates_${b}
python ../criticaldates.py $b $d >> historicaldates_${b}

if [ $? != 0 ]; then
	echo $i  >> ../retry_file
else
	echo $i >> ../outputfile
fi

echo ===== end $b ===== >> historicaldates_${b}

cd ..

done

Running The Program

To run the shell script above, the criticaldates.py program has to be in the same directory, along with a file called stocklist, that contains a list of stocks and their IPO release dates and is formatted as described earlier, and a directory for each stock (Apple's stock would have a directory with the name AAPL). The output stores a similar file called historicaldates_STOCKNAME in each stock's folder where STOCKNAME is the stock's symbol as presented in the stocklist. In addition, two more files, retry_file and outputfile will be created in the parent directory. The outputfile is a list of all the successful items in stocklist; the retry_list is a list of all the unsuccessful items in stocklist. To rerun the program for the items in retry_list, replace the line for i in `cat stocklist` in the shell script with for i in `cat retry_file`.

Conclusion

So far, we have gathered historical data on about 3 thousand stocks and sorted them based on fluctuation percent over various intervals of time. We then found news articles published on those "critical time intervals" when the stock fluctuated more that about 5 percent (fluctuations under 5 percent are not considered significant). I'm currently working on an NLP algorithm to recognize patterns between large fluctuations in a stock and news trends. If you're interested in this project or would like to know more about the theory behind our algorithm feel free to email me!