Automate Uploads to Swivel with Bash, Curl and Perl

Here's a quick bash script that I wrote to help people upload data to Swivel. (Sorry about the poor bash script syntax, I don't do shell script very often).
# Usage: sh swivel_update.sh email@address.com password 12345 data_set_data.tsv
EMAIL=$1
PASSWORD=$2
DATA_SET_ID=$3
DATA_SET_DATA=$4

echo "$EMAIL $PASSWORD $DATA_SET_ID $DATA_SET_DATA"

# log in
curl -X POST -s -L -c cookies.txt -d "email=${EMAIL}" -d "password=${PASSWORD}" 'http://www.swivel.com/security/login' > /dev/null 2>&1

# upload the file
UPLOADED_FILE_ID=`curl -X POST -s -L -b cookies.txt \
-F "uploaded_text_area[text_area]=" \
"http://www.swivel.com/update/update_upload/${DATA_SET_ID}?upload_type=type_in" | perl -ne 'print $1 if($_ =~ /uploaded_file_id=(\d+)/)'`

echo "$UPLOADED_FILE_ID"

# set the settings on the file
# (for comma-delimited use 'uploaded_file[column_separator]=,')
curl -X POST -s -L -b cookies.txt \
-d 'uploaded_file[column_separator]=\t' \
-d 'uploaded_file[first_line_number]=1' \
-d 'uploaded_file[first_line_titles]=false' \
-d 'continue=Continue >' \
"http://www.swivel.com/update/update_preview/${DATA_SET_ID}?upload_type=type_in&uploaded_file_id=${UPLOADED_FILE_ID}" > /dev/null 2>&1

# append the data
# (to append use -d 'append=Append +')
# (to replace use -d 'replace=Replace -/+')
curl -X POST -s -L -b cookies.txt \
-d 'append=Append +' \
"http://www.swivel.com/update/update_alter/${DATA_SET_ID}?uploaded_file_id=${UPLOADED_FILE_ID}&upload_type=type_in" > /dev/null 2>&1

Determining Identical Files in Python, Bash

I've been consolidating files from multiple computers lately.  Here are a couple quick scripts that I found useful. 1.   Create an MD5 hash of every file in a directory.
# md5.py
import os, os.path, sys
from subprocess import Popen, PIPE

def walk(start_dir = '/'):
  directories = [start_dir]
  while directories:
    directory = directories.pop()
    for name in os.listdir(directory):
      fullpath = os.path.join(directory,name)
      if os.path.isfile(fullpath):
        md5 = Popen(["md5", fullpath], stdout=PIPE).communicate()[0].strip().split(' = ')[-1]
        print md5 + ' ' + fullpath
      elif os.path.isdir(fullpath):
        directories.append(fullpath)

if __name__ == "__main__":
    walk(sys.argv[1])
2.  Find duplicate MD5 hashes in multiple files.
> python md5.py /path/to/dir1 > dir1_md5_files.txt
> python md5.py /path/to/dir2 > dir2_md5_files.txt

# get just the md5 hashes
> cut -d' ' -f1 dir1_md5_files.txt > dir1_md5s.txt
> cut -d' ' -f1 dir2_md5_files.txt > dir2_md5s.txt

# find the duplicates
> cat dir1_md5s.txt dir2_md5s.txt | sort | uniq -d > md5_dupes.txt
3. Do stuff to the files which are duplicates.

Managing disruptive technological change

Chapter 10 of The Innovator’s Dilemma provides a case study of how to take advantage of disruptive change.  I thought the chapter was particularly valuable, so I’m summarizing it here. The case study in the chapter deals with the electric car, and how to structure a product such that it could take advantage of the disruptive technology and eventually overtake the mainstream market. Step 1: Determine if the technology is disruptive
  • Graph the trajectory of the performance improvement demanded by the market with compared to the trajectory of the performance improvement of the technology.  In the case of the electric car, the market demands only mild performance improvement in speed and driving ranges.  The technological performance improvement in battery technology is happening at a much greater pace.
  • Important: ”To measure market needs, I would watch carefully what customers do, not simply listen to what they say.”
  • Make sure the technology is not a sustaining technology.  That is, it does not and cannot currently meet the needs of the mainstream market.  In the case of the electric car, the cruising range and charging time make it fundamentally unmarketable to the mainstream.
Step 2:  Determine where to market the disruptive technology
  • Do not try to market it to the existing, mainstream market.  You will surely fail.
  • Do not leave the technology in the laboratory until the technology is ready for the existing market.  History has found enormous value in learning from having the technology in the market and getting a disruptive technology to market first..
  • It is often the case, the same attributes that make technologies a poor fit for mainstream markets are seen as positives in new markets.  In the electric car example, for instance, can a limited range be seen as a positive in a new market?
  • Important:No one can learn from market research what the early market(s) for electric vehicles will be.”
  • Plan to learn, rather than to succeed.  ”Plan to be wrong and to learn what is right as fast as possible.”  Do not blow your nest egg on an “all-or-nothing first-time bet.”
Step 3:  Determine a strategy
  • “Without a market, there is no obvious or reliable source of customer input; without a product that addresses customers’ needs, there can be no market.”  So prepare to experiment.
  • Important: ”the basis of competition will change over a product’s life cycle.”  When one feature of the product is “good enough,” people will begin to care about a different feature.  For example, when acceleration of the electric car is “good enough,” people will begin to care about range.  Only when range is good enough will people care about other attributes, like fuel economy.
  • Focus on attributes like simplicity, reliability, and convenience.  Focus on the low-margin, low end of the market, rather than the power user.
  • Since the market is unknown, and can change, “design a product in which feature, function, and styling changes can be made quickly and at low cost.”
  • Hit a low price point.  Not necessarily price per feature, but low unit price.  For example, don’t worry about a lower price per mile of range than the current automobile. Make your product a low-cost investment for potential people in the market to try out.
  • Do not count on technological breakthroughs, or even feel the need to create your own technology.  Pull together components of proven technologies into a new product and market.  For example, in the electric car, count on laptop batteries, rather than creating a new battery technology.
  • Do not count on an existing distribution channel (especially if it involves salespeople) to sell the product.  The disruptive technology will be lower margin, hence lower commission and will get less focus than it deserves from existing channels.  For example, do not try to sell your electric car through an existing auto franchise.
  • Create an organization small enough to be excited about a small market. You’re not going to get to the big wins for a long-time, and the organization should be small enough to be happy about the small ones.  Selling $10 million dollars worth of product is nothing to an automobile manufacture, but it’s a huge amount of revenue for a 100-person company.
Note that the hybrid car concept has changed the electric car from a disruptive technology to a sustaining one. However, were that not the case, then I would say that a laptop battery powered bike assisting electric motor for use in the city would be an ideal way to break into the electric car market, I’ll explore the reasons why in another post.

Marketing technology products

I recently read Crossing the Chasm, and am currently reading The Innovator’s Dilemma. The conclusion that both books drive to is that marketing is the most important thing in building a technology company. Since I’m currently in the middle of The Innovator’s Dilemma, I will focus on it for now. The basic premise of The Innovator’s Dilemma is that there are two types of innovation, disruptive innovation and sustaining innovation. The primary difference between the two types of innovations is that sustaining innovations are the ones that the highest value, most profitable customers want. These innovations also address the largest market. Disruptive innovations, on the other hand, are explicitly not wanted by a firm’s existing customers. They often go after a lower margin end of the market (such as steel minimills, which started out by making low-margin rebar), or an entirely new market, such as 1.8 inch harddrives, that got their start in pacemakers. In either case, the market starts off so small and unprofitable that the large players are not interested in it — and, more importantly, can’t be interested in it, because their customers actively don’t want them to be involved in it. The rub comes because technology advances faster than customers’ needs, so the low margin disruptive technology is able to rapidly move up market, gathering a larger and higher margin share of the business as it grows. Incumbents also move up market, garnering higher margins, but continuing to cede downmarket positions. Eventually, the disruptive technology has advanced far enough to meet the needs of the mainstream, and the incumbents only own a high end niche of the market. The conclusion then, for those of us wanting to take advantage of disruptive technology, is to focus at first on the lower margin, small, new or untapped market, and then growv into the larger space. The rewards for doing so are large, in the disk drive industry: “firms that sought growth by entering small, emerging markets logged 20 times the revenue of firms pursuing growth in larger markets.”

Baldwin beach

Relaxing at Baldwin beach. What a life. I had a call with a person who was interested in taking Stats 315 today. He has worked in OLAP for 20 years, before it was even called OLAP. He reaffirmed my belief about the importance of Ad Hoc analysis. OLAP is too expensive (initially) and inflexible to be a viable solution for understanding data. It's amazing that the industry doesn't recognize this. Talking to him really rekindled my passion for data analysis. Product Management is exciting, and I'm learning a lot - but I'm really happy doing data analysis, or rather, I would be if the tools were better. As Swivel veers further and further from data analysis, and my list of side projects gets longer, I wonder if I'll ever return to the world of analytics. If not, I wonder if that is a good thing. I think I'm going to post the parts of my analytic software business plan on the attemptry blog, along with a critique of each part. At least that will get me engaged slightly in analytics again.