




















The Representation
of Women in Bollywood
I have always loved films but as I grew older I started seeking out new films with more women, about women and by women because it felt as though mainstream Bollywood did not have enough of that. This data story is an exploration of that sentiment.





















The Representation
of Women in Bollywood
I have always loved films but as I grew older I started seeking out new films with more women, about women and by women because it felt as though mainstream Bollywood did not have enough of that. This data story is an exploration of that sentiment.





















The Representation of Women in Bollywood
I have always loved films but as I grew older I started seeking out new films with more women, about women and by women because it felt as though mainstream Bollywood did not have enough of that. This data story is an exploration of that sentiment.







“Men act and women appear. Men look at women. Women watch themselves being looked at. This determines not only most relations between men and women but also the relation of women to themselves. The surveyor of woman in herself is male: the surveyed female. Thus she turns herself into an object -- and most particularly an object of vision: a sight”
- John Berger, Ways of Seeing
“Men act and women appear. Men look at women. Women watch themselves being looked at. This determines not only most relations between men and women but also the relation of women to themselves. The surveyor of woman in herself is male: the surveyed female. Thus she turns herself into an object -- and most particularly an object of vision: a sight”
- John Berger, Ways of Seeing
It's obviously not that there aren’t any films about women in Bollywood. There are many big hits with women, but it is usually as a co lead and in a romantic setting.
It's obviously not that there aren’t any films about women in Bollywood. There are many big hits with women, but it is usually as a co lead and in a romantic setting.
And again, there's nothing wrong with romance but don't we deserve better, complex, multidimensional narratives? After all, we do make up 50% of the population.
And again, there's nothing wrong with romance but don't we deserve better, complex, multidimensional narratives? After all, we do make up 50% of the population.
So the idea is to put into perspective how few mainstream films there are with a female lead as well as explore their centrality in co-led films.
So the idea is to put into perspective how few mainstream films there are with a female lead as well as explore their centrality in co-led films.
I looked at the films between 2009 and 2019 because that was the time during which I grew up watching them.
I looked at the films between 2009 and 2019 because that was the time during which I grew up watching them.

The Big Picture:
Who Leads Bollywood?
The Big Picture:
Who Leads Bollywood?
The Big Picture:
Who Leads Bollywood?
Let's take a look at the top 25 grossing films of each year from 2009 to 2019
Let's take a look at the top 25 grossing films of each year from 2009 to 2019
Out of 250 films (Top 25 of each year)
Out of 250 films (Top 25 of each year)
Out of 250 films (Top 25 of each year)
21
21
21
were female led
were female led
8%
8%
8%
which is just
which is just
which is just
While there has been in an increase in female-led films the overall percentage per year is still very low
While there has been in an increase in female-led films the overall percentage per year is still very low



Zooming in: Seen
but Not Heard
Zooming in: Seen
but Not Heard
Zooming in: Seen
but Not Heard
I watched the top 3 films of 2009 and 2018 and time stamped the on-screen and dialogue time for the female protagonists.
I watched the top 3 films of 2009 and 2018 and time stamped the on-screen and dialogue time for the female protagonists.

Male-led
Total Runtime



On-screen Time
17%
Dialogue Time
6%

Co-led
Total Runtime



On-screen Time
58%
Dialogue Time
9%

Co-led
Total Runtime



On-screen Time
46%
Dialogue Time
8%

Male-led
Total Runtime



On-screen Time
23%
Dialogue Time
6%

Co-led
Total Runtime



On-screen Time
32%
Dialogue Time
5%

Male-led
Total Runtime



On-screen Time
13%
Dialogue Time
3%

Male-led
Male-led
Total Runtime
Total Runtime



On-screen Time
On-screen Time
17%
17%
Dialogue Time
Dialogue Time
6%
6%

Co-led
Co-led
Total Runtime
Total Runtime



On-screen Time
On-screen Time
58%
58%
Dialogue Time
Dialogue Time
9%
9%

Co-led
Co-led
Total Runtime
Total Runtime



On-screen Time
On-screen Time
46%
46%
Dialogue Time
Dialogue Time
8%
8%

Male-led
Male-led
Total Runtime
Total Runtime



On-screen Time
On-screen Time
23%
23%
Dialogue Time
Dialogue Time
6%
6%

Co-led
Co-led
Total Runtime
Total Runtime



On-screen Time
On-screen Time
32%
32%
Dialogue Time
Dialogue Time
5%
5%

Male-led
Male-led
Total Runtime
Total Runtime



On-screen Time
On-screen Time
13%
13%
Dialogue Time
Dialogue Time
3%
3%
Even if a film is co-led, it doesn't necessarily translate into more screen or dialogue time for the female protagonists.
Even if a film is co-led, it doesn't necessarily translate into more screen or dialogue time for the female protagonists.




How Central are Women
to the Story?
How Central are Women
to the Story?
How Central are Women
to the Story?
To further visualise how central female characters are in a story I compared the script of 2 films
To further visualise how central female characters are in a story I compared the script of 2 films
3 idiots, 2009, Male-led
3 idiots, 2009, Male-led
Queen, 2013, Female-led
Queen, 2013, Female-led
Each circle represents a character in the film. Larger circles indicate characters with greater narrative importance and screen presence. The lines between characters represent interactions; thicker lines mean more frequent connections.
Each circle represents a character in the film. Larger circles indicate characters with greater narrative importance and screen presence. The lines between characters represent interactions; thicker lines mean more frequent connections.
The contrast between the two films is stark. It shows how different a female-led film can look like and shows the lack of their centrality in male-led films.
The contrast between the two films is stark. It shows how different a female-led film can look like and shows the lack of their centrality in male-led films.
Top 10 films of the decade.
Top 10 films of the decade.
Top 10 films of the decade.
7 male-led films, 3 co-led films, 0 female-led films.
7 male-led films, 3 co-led films, 0 female-led films.
7 male-led films, 3 co-led films, 0 female-led films.
The lack of women is also evident in posters of the top films which plays in an important role in marketing.
The lack of women is also evident in posters of the top films which plays in an important role in marketing.

Male-led
Male-led
Male-led

Male-led
Male-led
Male-led

Co-led
Co-led
Co-led

Male-led
Male-led
Male-led

Co-led
Co-led
Co-led

Male-led
Male-led
Male-led

Male-led
Male-led
Male-led

Co-led
Co-led
Co-led

Male-led
Male-led
Male-led

Male-led
Male-led
Male-led

I want to end with a quote from one of my favourite scenes in TV:
I want to end with a quote from one of my favourite scenes in TV:
I want to end with a quote from one of my favourite scenes in TV:
“Our collective blindness has caused a lot of harm. We controlled so much, meddled so much, and to what end?…..
My fear, though, is that the world is as it always was, and I just didn’t see it. That a lot of us didn’t see it. Us, men.”
- Abe Weissman, a character on 'Marvelous Mrs Maisel'
“Our collective blindness has caused a lot of harm. We controlled so much, meddled so much, and to what end?…..
My fear, though, is that the world is as it always was, and I just didn’t see it. That a lot of us didn’t see it. Us, men.”
- Abe Weissman, a character on 'Marvelous Mrs Maisel'
“Our collective blindness has caused a lot of harm. We controlled so much, meddled so much, and to what end?…..
My fear, though, is that the world is as it always was, and I just didn’t see it. That a lot of us didn’t see it. Us, men.”
- Abe Weissman, a character on 'Marvelous Mrs Maisel'
Notes on my Process
Notes on my Process
Datasets used:
- https://www.kaggle.com/datasets/slmsshk/bollywood-release-movie-dataset
- https://www.kaggle.com/datasets/thedevastator/timdb-bollywood-films
I have always been interested in data and datavis (especially after reading 'Whole Numbers and Half Truths' and 'Invisible Women') but only dabbled in creating 1 pagers. I was intimidated to go further. I always had the resources to learn but sometimes you just need that spark, you know? Recently, I took Gurman Bhatia's course on 'The Craft of Building Stories with Data'. In one of the modules she talked about a story she did on how female voices have decreased in Bollywood and as someone who loves films I immediately thought of what a story would look like in that context.
I knew I also wanted to look at Bollywood films because I've had a love-hate relationship with them; I grew up loving them but felt disconnected as I got older. I started noticing how few films there were in the mainstream about women and stories from their perspectives.
I started by refining my hypothesis, 'lack of women in bollywood' was not enough. Hence, I took the top grossing films of each year, attesting to how mainstream/popular they are. And since this story is personal I chose the timeline of 2009-2019, the time during which I was watching a lot Bollywood films.
My first problem was figuring out how to define if the film is centred around a woman. Fortunately, the dataset had a column for the plot summary, so I used AI to categorise them into male, female and co-led films based on whether the summary included women and if it was centred around them (plus cross checked them myself). I know this might not be enough but it seemed like the only option. Would love to know if I could have gone about it some other way and how I could have avoided using AI.
But it still felt like it was not enough to look at who the film was led by because it's more complicated. That's why I decided to personally watch the top 3 films of 2009 and 2018 so I could timestamp the female characters' on-screen and dialogue time. It would have been great to do this on a larger scale. Maybe I actually will.
I also wanted to visualise this lack of centrality of female characters which is what led me to make a network diagram. Also used AI here to analyse the 2 scripts (also cross checked it myself). The constraint here was the lack of scripts; I only found a handful out of 250 films. Again, would love to know how I could go about this in some other way or without AI.
Overall, I genuinely enjoyed the entire process of making this is SO much, even the seemingly boring parts. And I know this is not perfect by any means so I would genuinely appreciate any thoughts on how it can be improved.
I want to keep learning and hopefully do larger, more complex projects!
PS: found these 2 amazing references after I published this piece. They are so much more detailed and use varied methodologies. In retrospect I should have spent more time on secondary research.
Datasets used:
- https://www.kaggle.com/datasets/slmsshk/bollywood-release-movie-dataset
- https://www.kaggle.com/datasets/thedevastator/timdb-bollywood-films
I have always been interested in data and datavis (especially after reading 'Whole Numbers and Half Truths' and 'Invisible Women') but only dabbled in creating 1 pagers. I was intimidated to go further. I always had the resources to learn but sometimes you just need that spark, you know? Recently, I took Gurman Bhatia's course on 'The Craft of Building Stories with Data'. In one of the modules she talked about a story she did on how female voices have decreased in Bollywood and as someone who loves films I immediately thought of what a story would look like in that context.
I knew I also wanted to look at Bollywood films because I've had a love-hate relationship with them; I grew up loving them but felt disconnected as I got older. I started noticing how few films there were in the mainstream about women and stories from their perspectives.
I started by refining my hypothesis, 'lack of women in bollywood' was not enough. Hence, I took the top grossing films of each year, attesting to how mainstream/popular they are. And since this story is personal I chose the timeline of 2009-2019, the time during which I was watching a lot Bollywood films.
My first problem was figuring out how to define if the film is centred around a woman. Fortunately, the dataset had a column for the plot summary, so I used AI to categorise them into male, female and co-led films based on whether the summary included women and if it was centred around them (plus cross checked them myself). I know this might not be enough but it seemed like the only option. Would love to know if I could have gone about it some other way and how I could have avoided using AI.
But it still felt like it was not enough to look at who the film was led by because it's more complicated. That's why I decided to personally watch the top 3 films of 2009 and 2018 so I could timestamp the female characters' on-screen and dialogue time. It would have been great to do this on a larger scale. Maybe I actually will.
I also wanted to visualise this lack of centrality of female characters which is what led me to make a network diagram. Also used AI here to analyse the 2 scripts (also cross checked it myself). The constraint here was the lack of scripts; I only found a handful out of 250 films. Again, would love to know how I could go about this in some other way or without AI.
Overall, I genuinely enjoyed the entire process of making this is SO much, even the seemingly boring parts. And I know this is not perfect by any means so I would genuinely appreciate any thoughts on how it can be improved.
I want to keep learning and hopefully do larger, more complex projects!
PS: found these 2 amazing references after I published this piece. They are so much more detailed and use varied methodologies. In retrospect I should have spent more time on secondary research.
Datasets used:
- https://www.kaggle.com/datasets/slmsshk/bollywood-release-movie-dataset
- https://www.kaggle.com/datasets/thedevastator/timdb-bollywood-films
I have always been interested in data and datavis (especially after reading 'Whole Numbers and Half Truths' and 'Invisible Women') but only dabbled in creating 1 pagers. I was intimidated to go further. I always had the resources to learn but sometimes you just need that spark, you know? Recently, I took Gurman Bhatia's course on 'The Craft of Building Stories with Data'. In one of the modules she talked about a story she did on how female voices have decreased in Bollywood and as someone who loves films I immediately thought of what a story would look like in that context.
I knew I also wanted to look at Bollywood films because I've had a love-hate relationship with them; I grew up loving them but felt disconnected as I got older. I started noticing how few films there were in the mainstream about women and stories from their perspectives.
I started by refining my hypothesis, 'lack of women in bollywood' was not enough. Hence, I took the top grossing films of each year, attesting to how mainstream/popular they are. And since this story is personal I chose the timeline of 2009-2019, the time during which I was watching a lot Bollywood films.
My first problem was figuring out how to define if the film is centred around a woman. Fortunately, the dataset had a column for the plot summary, so I used AI to categorise them into male, female and co-led films based on whether the summary included women and if it was centred around them (plus cross checked them myself). I know this might not be enough but it seemed like the only option. Would love to know if I could have gone about it some other way and how I could have avoided using AI.
But it still felt like it was not enough to look at who the film was led by because it's more complicated. That's why I decided to personally watch the top 3 films of 2009 and 2018 so I could timestamp the female characters' on-screen and dialogue time. It would have been great to do this on a larger scale. Maybe I actually will.
I also wanted to visualise this lack of centrality of female characters which is what led me to make a network diagram. Also used AI here to analyse the 2 scripts (also cross checked it myself). The constraint here was the lack of scripts; I only found a handful out of 250 films. Again, would love to know how I could go about this in some other way or without AI.
Overall, I genuinely enjoyed the entire process of making this is SO much, even the seemingly boring parts. And I know this is not perfect by any means so I would genuinely appreciate any thoughts on how it can be improved.
I want to keep learning and hopefully do larger, more complex projects!
PS: found these 2 amazing references after I published this piece. They are so much more detailed and use varied methodologies. In retrospect I should have spent more time on secondary research.

Thanks for reading!
See you soon :)) (hopefully)