Getting a Handle on Data Intensive Business Research Support (guest post by Alice Kalinowski)

getting a handle on data intensive business research support

headshot of woman

Alice Kalinowski is a Business Research Librarian at the Stanford Graduate School of Business Library. Her research interests focus on knowledge management and user experience in academic libraries. Her idea of a perfect Saturday involves a hike, some knitting, and homemade baked goods.

First, some context

Like many of my peers, I stumbled into business librarianship, ended up really liking it, and then remained a business librarian when I switched universities. I started in a pretty typical position as the sole business liaison at the University of Pittsburgh with a focus on supporting student research and learning. While there was a steep learning curve when I started, after a couple of years I felt more comfortable in this role. However, when I transitioned to my new job as a business research librarian at the Stanford Graduate School of Business (GSB) Library I realized there was a whole other side of business librarianship I hadn’t had much exposure to.

Some facts about the GSB Library: 

  • It is a coordinate library, funded by – and administratively organized in – the business school, not the main campus library system. 
  • The library has a large business database collection (over 200 items on our LibGuides database list)
  • The library is part of a unit of the GSB called the Research Hub, which includes the Data, Analytics, and Research Computing (DARC) team who provide more technical research support for GSB faculty. The library often works with DARC to help locate sources or dig into issues when merging or linking various datasets. 
  • A significant portion of the reference/research librarians’ time is spent supporting PhD and faculty research projects.

Can you give me an example?

To provide an example of the types of things I am now learning/doing routinely, here is a typical PhD student consultation (full disclosure: my colleague Todd Hines led this consultation and I took copious notes, abbreviated here). Feel free to skip ahead if you don’t want all the details!

Initial question: A PhD student needed to link Initial Public Offering (IPO) data from SDC Platinum to financial performance data in Compustat in WRDS.  (This means that the student wanted to create a dataset that includes a company’s IPO and financial performance data, which has to be created by merging the two sources.)

Step 1

Todd suggested using a universe (dataspeak for a list of covered/included entities) created by Jay Ritter, a famous IPO researcher, which includes important company identifiers like CUSIPs and PERMNOs. 

  • Not sure what these identifiers are? That’s okay, I wasn’t either. The gist is that there are various identifiers used by different vendors, and it can take some work to link various datasets.

Step 2

Todd demonstrated how to upload this list of CUSIPs to SDC Platinum (because SDC only accepts CUSIPs and Tickers) and how to use the (ancient) SDC software to get the IPO data for those companies. 

  • Todd made sure to include helpful tips, like how to format the report so that you can easily export it to a spreadsheet regardless of how many columns there are. 
  • Turns out this is important! We have worked with researchers who did not know how to format a report and spent WEEKS downloading data when it could have taken a day….

Step 3

Now we were ready to take the the SDC data and match it with Compustat. (This means we wanted to get the Compustat data for the same companies we have IPO data for.)

  • While the student could use the CUSIP, it isn’t the best option because CUSIPs can change over time (how unhelpful for a unique identifier!!). Instead, matching on PERMNOs is better, because they don’t change (yay!). 
  • Because the Jay Ritter dataset includes PERMNOs, the student didn’t have to do the conversion.
  • FYI, if you needed to, you would use the WRDS CRSP → Stock Events → Names file to get a table that lists when the CUSIPs changed and the PERMNO for each security.

Step 4

In a bit of a plot twist, Compustat doesn’t use the PERMNO Todd recommends (at this point, you’re probably thinking “uhh why is this so complicated??” I really don’t know.). 

  • To solve this, Todd explained how to use the Quarterly Update file in CRSP/COMPUSTAT Merged get the linking table to go from PERMNO to GVKEY (the identifier used for SEC filings and in Compustat). 

Step 5

This GVKEY can then be used to get the data from Compustat. 

Take a breath, you made it to the end of that long, somewhat convoluted process. If you’re left wondering how anyone is supposed to know how to do this, me too.

What do I do with this information now? 

It became clear that I was now working, often as part of a team, on more time consuming and complex questions. I typically have little prior knowledge of the topic, spend hours investigating and comparing sources, contact vendors, dig into the literature to find alternative methods and sources, etc. (If you ever want an in-depth comparison of the variables available in the historical business data from Reference USA available in WRDS, in the historical files you can purchase from InfoGroup, and in the Reference USA platform, I have you covered.)

I’ve never been great at note taking, but I quickly realized I was going to need good notes that were easy to find. There are many ways to organize notes and create a personal knowledge base, so in case it helps others think through their needs, I wanted to share the system I’ve started to use (special thanks to my DARC team coworker Mason for helping me!). I’ve been using it for about 6 months now and while I’m still figuring out some things, am really happy with the general structure. 

I Decided to Use Notion

I ended up picking a relatively new software called Notion*, which describes itself as an “all-in-one workspace for note-taking, project management and task management.” I chose it primarily because it allows for relatively easy linking between various notes/records in a way I haven’t figured out how to do in other systems.

Can you show me how it works? 

Notion allows you to create workspaces. Each workspace can have multiple sections. The three sections of my knowledge base workspace are: 

  • Data Sources
  • Research Topics
  • IDs & Linking

Because of the way Notion is structured, the information doesn’t lend itself to screenshots. There are some screenshots to demonstrate the basic structure of my knowledge base, but I also recorded a quick video to show you how I have it set up. 

Here is what the Data Sources page looks like. I have used the database format so I can add structured information to each source/record. 

shot 2

(Keep reading to learn about those private fields.)

The Research Topics section is structured in the exact same way as the Data Sources.

The IDs and Linking section, which is of particular interest to my colleagues in DARC as they work on a lot of faculty projects that involve linking various datasets. This is the section that is the most work-in-progress; right now it has a main page with links to the two subsections and some additional information and resources.

Who can see this information? 

Because I thought it was possible these notes would be helpful to other business librarians here and elsewhere, I wanted to make sure it was something I could potentially share. Notion allows for some limited access permissions. However, for the knowledge base to be helpful for me and others, while not making public any sensitive information, I have come up with the following (somewhat awkward) solution.

The knowledge base I’ve described “could be public,” meaning any information included could be shared without disclosing things that shouldn’t be shared. There is a separate “always will be private” section that only I have access to. In the private workspace, I have tables for helpful guides and resources (like BRASS Guides), other good things to know about (often taken from the BUSLIB-L list) but that don’t fit into the scope of the knowledge base**, a Consultations database (which has the notes I take during consultations), a CRM (linked to the Consultations database), and a private Data Sources database (linked to the ‘public’ version, but as a place to put information about who at Stanford uses it, price, issues we’ve had with vendors, etc). 

Of course, there are still challenges…

I need to work on how to best document where information came from. Some things I’ve learned through investigating the source myself, but others have come from the BUSLIB-L Listserv, many notes come from Todd Hines or other colleagues, etc. Because things change over time, I think it is important to be able to attribute when information was added as well as where it came from. Additionally, the way I’ve currently split up the public/private workspaces seems a bit awkward, so I’m open to other options.

I still have so much to learn, but have found that having a structured way to document my notes has really helped me to retain what I’m learning as well as have a great reference source when I forget specific details the next time I get a similar question. It does take some time to add my notes in a concise and organized way, but so far it has been paying off.

Right now I’m working on continuing to add notes and information as I have time and as I work on reference questions. If you think this would be helpful to you or have other comments or feedback, please get in touch

*If you’re interested in signing up for a Notion account, if you use my referral link I get a small credit towards my next years subscription.

**Note: I decided intentionally to only spend time documenting the ‘data-y’ stuff. So more normal MBA-type questions around company, industry, and market research doesn’t get included (with the exception that I might make some informal notes in one of the private workspaces I use to save links to helpful reference sites, etc.). The focus of the knowledge base is data-intensive business sources and topics, so anything in the bulk-data realm, typically of most interest or relevance to PhD and faculty research projects.

Header photo by Patrick Perkins on Unsplash

biz libratory logo

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.