The Representation
of Women in Bollywood

I have always loved films but as I grew older I started seeking out new films with more women, about women and by women because it felt as though mainstream Bollywood did not have enough of that. This data story is an exploration of that sentiment.

The Representation
of Women in Bollywood

I have always loved films but as I grew older I started seeking out new films with more women, about women and by women because it felt as though mainstream Bollywood did not have enough of that. This data story is an exploration of that sentiment.

The Representation of Women in Bollywood

I have always loved films but as I grew older I started seeking out new films with more women, about women and by women because it felt as though mainstream Bollywood did not have enough of that. This data story is an exploration of that sentiment.

“Men act and women appear. Men look at women. Women watch themselves being looked at. This determines not only most relations between men and women but also the relation of women to themselves. The surveyor of woman in herself is male: the surveyed female. Thus she turns herself into an object -- and most particularly an object of vision: a sight”

- John Berger, Ways of Seeing

“Men act and women appear. Men look at women. Women watch themselves being looked at. This determines not only most relations between men and women but also the relation of women to themselves. The surveyor of woman in herself is male: the surveyed female. Thus she turns herself into an object -- and most particularly an object of vision: a sight”

- John Berger, Ways of Seeing

It's obviously not that there aren’t any films about women in Bollywood. There are many big hits with women, but it is usually as a co lead and in a romantic setting.

It's obviously not that there aren’t any films about women in Bollywood. There are many big hits with women, but it is usually as a co lead and in a romantic setting.

And again, there's nothing wrong with romance but don't we deserve better, complex, multidimensional narratives? After all, we do make up 50% of the population.

And again, there's nothing wrong with romance but don't we deserve better, complex, multidimensional narratives? After all, we do make up 50% of the population.

So the idea is to put into perspective how few mainstream films there are with a female lead as well as explore their centrality in co-led films.

So the idea is to put into perspective how few mainstream films there are with a female lead as well as explore their centrality in co-led films.

I looked at the films between 2009 and 2019 because that was the time during which I grew up watching them.

I looked at the films between 2009 and 2019 because that was the time during which I grew up watching them.

The Big Picture:

Who Leads Bollywood?

The Big Picture:

Who Leads Bollywood?

The Big Picture:

Who Leads Bollywood?

Let's take a look at the top 25 grossing films of each year from 2009 to 2019

Let's take a look at the top 25 grossing films of each year from 2009 to 2019

Out of 250 films (Top 25 of each year)

Out of 250 films (Top 25 of each year)

Out of 250 films (Top 25 of each year)

21

21

21

were female led

were female led

8%

8%

8%

which is just

which is just

which is just

While there has been in an increase in female-led films the overall percentage per year is still very low

While there has been in an increase in female-led films the overall percentage per year is still very low

Zooming in: Seen
but Not Heard

Zooming in: Seen
but Not Heard

Zooming in: Seen
but Not Heard

I watched the top 3 films of 2009 and 2018 and time stamped the on-screen and dialogue time for the female protagonists.

I watched the top 3 films of 2009 and 2018 and time stamped the on-screen and dialogue time for the female protagonists.

Male-led

Male-led

Total Runtime

Total Runtime

On-screen Time

On-screen Time

17%

17%

Dialogue Time

Dialogue Time

6%

6%

Co-led

Co-led

Total Runtime

Total Runtime

On-screen Time

On-screen Time

58%

58%

Dialogue Time

Dialogue Time

9%

9%

Co-led

Co-led

Total Runtime

Total Runtime

On-screen Time

On-screen Time

46%

46%

Dialogue Time

Dialogue Time

8%

8%

Male-led

Male-led

Total Runtime

Total Runtime

On-screen Time

On-screen Time

23%

23%

Dialogue Time

Dialogue Time

6%

6%

Co-led

Co-led

Total Runtime

Total Runtime

On-screen Time

On-screen Time

32%

32%

Dialogue Time

Dialogue Time

5%

5%

Male-led

Male-led

Total Runtime

Total Runtime

On-screen Time

On-screen Time

13%

13%

Dialogue Time

Dialogue Time

3%

3%

Even if a film is co-led, it doesn't necessarily translate into more screen or dialogue time for the female protagonists.

Even if a film is co-led, it doesn't necessarily translate into more screen or dialogue time for the female protagonists.

How Central are Women

to the Story?

How Central are Women

to the Story?

How Central are Women

to the Story?

To further visualise how central female characters are in a story I compared the script of 2 films

To further visualise how central female characters are in a story I compared the script of 2 films

3 idiots, 2009, Male-led

3 idiots, 2009, Male-led

Queen, 2013, Female-led

Queen, 2013, Female-led

Each circle represents a character in the film. Larger circles indicate characters with greater narrative importance and screen presence. The lines between characters represent interactions; thicker lines mean more frequent connections.

Each circle represents a character in the film. Larger circles indicate characters with greater narrative importance and screen presence. The lines between characters represent interactions; thicker lines mean more frequent connections.

The contrast between the two films is stark. It shows how different a female-led film can look like and shows the lack of their centrality in male-led films.

The contrast between the two films is stark. It shows how different a female-led film can look like and shows the lack of their centrality in male-led films.

Top 10 films of the decade.

Top 10 films of the decade.

Top 10 films of the decade.

7 male-led films, 3 co-led films, 0 female-led films.

7 male-led films, 3 co-led films, 0 female-led films.

7 male-led films, 3 co-led films, 0 female-led films.

The lack of women is also evident in posters of the top films which plays in an important role in marketing.

The lack of women is also evident in posters of the top films which plays in an important role in marketing.

Male-led

Male-led

Male-led

Male-led

Male-led

Male-led

Co-led

Co-led

Co-led

Male-led

Male-led

Male-led

Co-led

Co-led

Co-led

Male-led

Male-led

Male-led

Male-led

Male-led

Male-led

Co-led

Co-led

Co-led

Male-led

Male-led

Male-led

Male-led

Male-led

Male-led

I want to end with a quote from one of my favourite scenes in TV:

I want to end with a quote from one of my favourite scenes in TV:

I want to end with a quote from one of my favourite scenes in TV:

“Our collective blindness has caused a lot of harm. We controlled so much, meddled so much, and to what end?…..


My fear, though, is that the world is as it always was, and I just didn’t see it. That a lot of us didn’t see it. Us, men.”

- Abe Weissman, a character on 'Marvelous Mrs Maisel'

“Our collective blindness has caused a lot of harm. We controlled so much, meddled so much, and to what end?…..


My fear, though, is that the world is as it always was, and I just didn’t see it. That a lot of us didn’t see it. Us, men.”

- Abe Weissman, a character on 'Marvelous Mrs Maisel'

“Our collective blindness has caused a lot of harm. We controlled so much, meddled so much, and to what end?…..


My fear, though, is that the world is as it always was, and I just didn’t see it. That a lot of us didn’t see it. Us, men.”

- Abe Weissman, a character on 'Marvelous Mrs Maisel'

Notes on my Process

Notes on my Process

Datasets used:
- https://www.kaggle.com/datasets/slmsshk/bollywood-release-movie-dataset

- https://www.kaggle.com/datasets/thedevastator/timdb-bollywood-films


I have always been interested in data and datavis (especially after reading 'Whole Numbers and Half Truths' and 'Invisible Women') but only dabbled in creating 1 pagers. I was intimidated to go further. I always had the resources to learn but sometimes you just need that spark, you know? Recently, I took Gurman Bhatia's course on 'The Craft of Building Stories with Data'. In one of the modules she talked about a story she did on how female voices have decreased in Bollywood and as someone who loves films I immediately thought of what a story would look like in that context.


I knew I also wanted to look at Bollywood films because I've had a love-hate relationship with them; I grew up loving them but felt disconnected as I got older. I started noticing how few films there were in the mainstream about women and stories from their perspectives.


I started by refining my hypothesis, 'lack of women in bollywood' was not enough. Hence, I took the top grossing films of each year, attesting to how mainstream/popular they are. And since this story is personal I chose the timeline of 2009-2019, the time during which I was watching a lot Bollywood films.


My first problem was figuring out how to define if the film is centred around a woman. Fortunately, the dataset had a column for the plot summary, so I used AI to categorise them into male, female and co-led films based on whether the summary included women and if it was centred around them (plus cross checked them myself). I know this might not be enough but it seemed like the only option. Would love to know if I could have gone about it some other way and how I could have avoided using AI.


But it still felt like it was not enough to look at who the film was led by because it's more complicated. That's why I decided to personally watch the top 3 films of 2009 and 2018 so I could timestamp the female characters' on-screen and dialogue time. It would have been great to do this on a larger scale. Maybe I actually will.


I also wanted to visualise this lack of centrality of female characters which is what led me to make a network diagram. Also used AI here to analyse the 2 scripts (also cross checked it myself). The constraint here was the lack of scripts; I only found a handful out of 250 films. Again, would love to know how I could go about this in some other way or without AI.


Overall, I genuinely enjoyed the entire process of making this is SO much, even the seemingly boring parts. And I know this is not perfect by any means so I would genuinely appreciate any thoughts on how it can be improved.


I want to keep learning and hopefully do larger, more complex projects!


PS: found these 2 amazing references after I published this piece. They are so much more detailed and use varied methodologies. In retrospect I should have spent more time on secondary research.

Datasets used:
- https://www.kaggle.com/datasets/slmsshk/bollywood-release-movie-dataset

- https://www.kaggle.com/datasets/thedevastator/timdb-bollywood-films


I have always been interested in data and datavis (especially after reading 'Whole Numbers and Half Truths' and 'Invisible Women') but only dabbled in creating 1 pagers. I was intimidated to go further. I always had the resources to learn but sometimes you just need that spark, you know? Recently, I took Gurman Bhatia's course on 'The Craft of Building Stories with Data'. In one of the modules she talked about a story she did on how female voices have decreased in Bollywood and as someone who loves films I immediately thought of what a story would look like in that context.


I knew I also wanted to look at Bollywood films because I've had a love-hate relationship with them; I grew up loving them but felt disconnected as I got older. I started noticing how few films there were in the mainstream about women and stories from their perspectives.


I started by refining my hypothesis, 'lack of women in bollywood' was not enough. Hence, I took the top grossing films of each year, attesting to how mainstream/popular they are. And since this story is personal I chose the timeline of 2009-2019, the time during which I was watching a lot Bollywood films.


My first problem was figuring out how to define if the film is centred around a woman. Fortunately, the dataset had a column for the plot summary, so I used AI to categorise them into male, female and co-led films based on whether the summary included women and if it was centred around them (plus cross checked them myself). I know this might not be enough but it seemed like the only option. Would love to know if I could have gone about it some other way and how I could have avoided using AI.


But it still felt like it was not enough to look at who the film was led by because it's more complicated. That's why I decided to personally watch the top 3 films of 2009 and 2018 so I could timestamp the female characters' on-screen and dialogue time. It would have been great to do this on a larger scale. Maybe I actually will.


I also wanted to visualise this lack of centrality of female characters which is what led me to make a network diagram. Also used AI here to analyse the 2 scripts (also cross checked it myself). The constraint here was the lack of scripts; I only found a handful out of 250 films. Again, would love to know how I could go about this in some other way or without AI.


Overall, I genuinely enjoyed the entire process of making this is SO much, even the seemingly boring parts. And I know this is not perfect by any means so I would genuinely appreciate any thoughts on how it can be improved.


I want to keep learning and hopefully do larger, more complex projects!


PS: found these 2 amazing references after I published this piece. They are so much more detailed and use varied methodologies. In retrospect I should have spent more time on secondary research.

Datasets used:
- https://www.kaggle.com/datasets/slmsshk/bollywood-release-movie-dataset

- https://www.kaggle.com/datasets/thedevastator/timdb-bollywood-films


I have always been interested in data and datavis (especially after reading 'Whole Numbers and Half Truths' and 'Invisible Women') but only dabbled in creating 1 pagers. I was intimidated to go further. I always had the resources to learn but sometimes you just need that spark, you know? Recently, I took Gurman Bhatia's course on 'The Craft of Building Stories with Data'. In one of the modules she talked about a story she did on how female voices have decreased in Bollywood and as someone who loves films I immediately thought of what a story would look like in that context.


I knew I also wanted to look at Bollywood films because I've had a love-hate relationship with them; I grew up loving them but felt disconnected as I got older. I started noticing how few films there were in the mainstream about women and stories from their perspectives.


I started by refining my hypothesis, 'lack of women in bollywood' was not enough. Hence, I took the top grossing films of each year, attesting to how mainstream/popular they are. And since this story is personal I chose the timeline of 2009-2019, the time during which I was watching a lot Bollywood films.


My first problem was figuring out how to define if the film is centred around a woman. Fortunately, the dataset had a column for the plot summary, so I used AI to categorise them into male, female and co-led films based on whether the summary included women and if it was centred around them (plus cross checked them myself). I know this might not be enough but it seemed like the only option. Would love to know if I could have gone about it some other way and how I could have avoided using AI.


But it still felt like it was not enough to look at who the film was led by because it's more complicated. That's why I decided to personally watch the top 3 films of 2009 and 2018 so I could timestamp the female characters' on-screen and dialogue time. It would have been great to do this on a larger scale. Maybe I actually will.


I also wanted to visualise this lack of centrality of female characters which is what led me to make a network diagram. Also used AI here to analyse the 2 scripts (also cross checked it myself). The constraint here was the lack of scripts; I only found a handful out of 250 films. Again, would love to know how I could go about this in some other way or without AI.


Overall, I genuinely enjoyed the entire process of making this is SO much, even the seemingly boring parts. And I know this is not perfect by any means so I would genuinely appreciate any thoughts on how it can be improved.


I want to keep learning and hopefully do larger, more complex projects!


PS: found these 2 amazing references after I published this piece. They are so much more detailed and use varied methodologies. In retrospect I should have spent more time on secondary research.

Thanks for reading!

See you soon :)) (hopefully)