In this project, I’ll predict a dancer’s income in five different cities in the U.S. by statistical modeling with various data. Please find many interesting plots and findings from my source codes I attached in each analysis step here!

Introduction

Motivation

I’m looking for the best metropolitan area in the U.S. where I can be a belly dancer. The most interesting places include LA, San Francisco, NY, Seattle, and Chicago. To decide where to live, I’d like to consider the potential income of opening a dance studio or being employed.

Situation

As I started to search for such information, I quickly found out that there are no reliable data on belly dancers’ income at all. I couldn’t find any reliable source about the number of local belly dance studios and their income levels. Therefore, alternatively, I decided to explore statistics about general dancers as a reference. However, there were still difficulties.

What I eventually want to know is the expected income level depending on the city, such as the average revenue of a dance studio in each metropolitan area. However, I couldn’t find any local revenue data of dance studio or any other similar business. After digging for relevent data of local business information, here’s what I’ve found.

Available data

Local (per metropolitan area) data

  • Employee statistics provided by the U.S. government: Average wage and population of dancer, choreographer, fitness instructor, and recreational workers of each city
  • Census and post estimate population data provided by the U.S. government
  • Scaling factors: Consumer Price Index (CPI), Cost Of Living Index (COLI)

Nationwide data

  • Employee statistics provided by the U.S. government: Average wage and population of dancer, choreographer, fitness instructor, and recreational workers of the entire country
  • The total U.S. revenue/market size data of fitness and dance studio businesses
  • The number of dance studio and their employee population and wage of the entire country.

Data quality of employed dancer statistics

Originally I tried to model the local dance studio’s revenue prediction using the local employment statistics of dancers. However, dancers’ data quality was bad, with huge statistical uncertainties with a lot of missing data points. On the other hand, employed fitness workers’ data quality was much better. Therefore, I decided to utilize the local fitness employee data than those of dancers.



Modeling Idea

After explore these data, I made a revenue prediction model as follows.

Notation

  • \(R\): total revenue
  • \(E\): the number of employee
  • \(W\): average annual wage
  • \(N\): the number of businesses
  • \(var_{biz}^{place}\): \(var\) value of \(biz\) business in \(place\) area
    • \(biz\):
      • \(Studio\): dance studio businesses
      • \(Gym\): fitness businesses
      • \(All~jobs\): all businesses of employed residents
    • \(place\):
      • \(U.S.\): nationwide
      • \(City~name\): local area

Definition

  • \(V\): Employment volume, \(E \times W\)
  • \(r_{biz}\): Business ratio, the revenue ratio of dance studio business to fitness business, \(R_{Studio}/R_{Gym}\)
  • \(g_{rev}\): Revenue generation constant, the business revenue divided by its employment volume, \(R/V\)
  • \(\rho_{studio}\): Studio density, the number of dance studio per resident employment volume (of all jobs), \(N_{Studio}/V_{All~jobs}\)

Formula to get a local dance studio revenue

\[R_{Studio}^{City} = \frac{r_{biz} ~ g_{rev} ~ V_{Gym}^{City}}{\rho_{density} ~ V_{All~jobs}^{City}}\]

I explained how I constructed this formula in my source codes. The analysis step and the link to the source code is in the below.

Analysis steps

Please follow each step checking my Jupyter notebook. There are many interesting plots and findings. It is fun!

1.Prepare dataset and EDA

  • Collect various data sources and merge
  • Data cleaning and feature engineering
  • EDA to check data quality and trend

Data preperation

EDA and feature engineering of area dependent features

EDA market trend

2. Calculate constants using the nationwide statistics

Each feature in the following step, was selected by goodness-of-fit test. Details are explained in the Jupyter Notebook.

  1. Calculate \(r_{biz}\) from the nationwide dance studio revenue and fitness revenue data.

    \[R_{Studio}^{U.S.} = r_{biz} R_{Gym}^{U.S.}\]
  2. Calculate \(\rho_{studio}\) from the nationwide dance studio and all resident employment volume.

    \[N_{Studio} = \rho_{studio} V_{All~jobs}^{U.S.}\]
  3. Calculate \(g_{rev}\) from the nationwide fitness revenue and fitness employment data.

    \[R_{Gym}^{U.S.} = g_{rev} V_{Gym}^{U.S.} = g_{rev} E_{Gym}^{U.S.}W_{Gym}^{U.S.}\]

Select feature and calculate constants

3. Predict local (each city) numbers using the constants above and the local statistics

  1. Predict the local fitness revenue using the local fitness employment data and \(g_{rev}\).

    \[R_{Gym}^{City} = g_{rev} V_{Gym}^{City}\]
  2. Predict the local dance studio revenue using the predicted local fitness revenue and \(r_{biz}\)

    \[R_{Studio}^{City} = r_{biz} R_{Gym}^{City}\]
  3. Predict the number of dance studio in local area using the local resident employment volume and \(\rho_{studio}\)

    \[N_{Studio}^{City} = \rho_{studio} V_{All~jobs}^{City}\]
  4. Predict the average revenue of a local dance studio using the result from 2 and 3

    \[Final~result~= R_{Studio}^{City}/N_{Studio}^{City}\]

Result and discussion

Conclusion

Here’s summary from the project.

Revenue trend and modeling constant

  • Both of fitness and dance studio businesses are growing.
  • The market size of fitness industry is proportianal to that of dance studio, 7-8 times larger for recent decade.
  • The fitness employment volume is proportional to its market size, 60-70% of the revenue.
  • The number of dance studio is proportional to the size of the employed residents’ income.
  • Those proportionalities are assumed to be area independent in this modeling and used to predict the revenue of a dance studio of each area. The only area dependent data were the size of fitness employment volume and the size of the employed residents’ income, which can be found from the government wage statistics.

To improve

  • Definitely, any local revenue or number of businesses data will be very useful to improve this model. That will also help me a lot to decide where to live. I would choose an area with low [fitness employee volume]/[number of dance studio].
  • This analysis gave me interesting findings. I’d like to analyze more area, too!
  • Generally, all trends were changed since Covid19. Keep that in mind.

Choose area

  • Seattle has the best wage for dance and fitness industry workers, considering the cost of living index of 2010. This number need adjustment for newer value.
  • The resident volume growth is also high in Seattle. Although the revenue of dance studio might grow accordingly, there’s potential to bring more competitors as well. However, the growth in Seattle could be from tech companies’ growth, who are not competitors of dance studio industry. They could be rather potential customers, who are interested in fitness.
  • I’m interested in to do this analyis for more places.
  • Meanwhile, importantly, income is not the only factor you will consider. You might be interested in other factors considering your career goal, such as where best dancers perform, which cities are good for artists, or which cities have more female population in ages 20-60s, who will willing to spend on both practicing and watching dance.

Choose between similar jobs

  • Running a dance studio is not expected to give significantly more income compared to other employed jobs. Therefore, I wouldn’t recommend running a studio from the beginning.
  • I’d rather recommend to start to run a dance class as an employed choreographer or recreational worker.

How would I run a dance studio

  • If I were a dance studio owner, I would run the dance class program by myself and hire a few recreational workers to run fitness programs because there are more demand for fitness businesses, and recreational worker’s salary is lower than dancer’s salary.

Leave a comment