Getting Started: Data Preparation for Inguo

Before you upload your data to Inguo, you’ll need to make sure your data is properly formatted for our system.


Number of Variables

We can process up to 500 variables, though in actuality, you are unlikely to have that many. 

A variable is a question. Some multiple choice questions (like selecting brands of peanut butter you’ve consumed recently) will be converted into multiple binary variables in Inguo, so keep that in mind when calculating your minimum sample size requirements.


Sample Size

We do not need “big data” sample sizes. In fact, we encourage you to seek quality over quantity. 

If a dataset has X variables, we will want a sample size at least 10-times that figure. For example, if you have 20 variables, we’ll need a minimum sample size of 200. 


Remove Irrelevant Factors

We want your causal analysis data and nothing else. 

Examples of factors to be removed:

  • Private data – we don’t need names, email addresses, phone numbers, social security numbers, etc.  In fact, our Terms of Service prohibits these items due to privacy concerns.
  • Random details – we also don’t want to know the fielding start date, end date, respondent IP address, submission number, time and date of the submission, etc.  If it has nothing to do with cause, then we don’t need to consider it in our analysis.


Convert Text to Numeric Values

All data must be continuous or binary. 

Example 1:  Some multiple choice text-based questions should be converted to continuous:

Household income range to continuous data.

  • $0-9,999 → 0
  • $10,000-24,999 → 1
  • $25,000-49,999 → 2
  • $50,000-74,999 → 3
  • $75,000 + → 4

Example 2: Some text-based multiple choice questions should be converted to binary:

Region data converted to binary.

Region : {North, East, South..}

  • Region_North : {0,1}
  • Region_East : {0,1}
  • Region_South : {0,1}


Final Details

The devil is in the details:

  • Variable names cannot have spaces, so you’ll need to remove spaces or replace them with underscores (no hyphens). 
  • We provide an option to automatically remove the spaces from your variable names during the upload process. 
  • We also provide an option to provide new names for each variable as it appears on the graph, so you can more easily understand which factor or question is being referred to on the graph as a node.

