The pitfalls of data collection
Everyone loves a freebie, from swag bags to the humble pen, but the lengths some go to are astounding.
As the resident (chartered) accountant and business consultant at F&A, data collation, naturally, falls to me. Unfortunately I need to "cleanse" the sample before I can analyse the data. And by “cleanse” I mean remove all those blatantly not in the profession but trying it on in the hope their name will be called in the £100 gift voucher draw. Critically, I don’t want to exclude anyone genuinely part of the Intellectual Property profession – this is a voluntary survey in a niche sector so the pool is decidedly small, and I can’t afford to lose any bona fide contributors.
So how do I know who's legit and who's pulling a fast one (given much of the survey is multiple choice) you ask? Well, I check for inconsistencies, unusual names (tricky given the survey is international and cultural norms in other parts of the world may come across as unusual to me), and “random” email accounts. I tend to start with the last of these. The survey is completely anonymous and contributors can choose to leave their contact details (or not). Even then, the vast majority prefer to use a private Gmail / Hotmail account rather than a work address, most likely because they like to keep their work and private lives separate. As such, I need to be a little more discerning than simply excluding all such accounts. I also can't reject those email addresses with numbers in them. I’m sure we’ve all experienced the email account creation frustration of discovering the email address you've chosen was not quite unique enough and you’re the 29th person to request that address so you add 29 to the end of your chosen name, or in some cases needing to add in your birth year for differentiation. I can’t simply eliminate these. No, it's more subtle than that but that's one of my triggers, where a flag goes up and I then look for further inconsistencies across that individual’s submission. That, and those emails that are a series of random letters – who would truly choose such an email address, besides a con artist that is?
I also like to compare the name presented against the email address to confirm they, or any part or combination of it, match. Another control is to check for obviously male names where they have marked themselves as female, or vice versa. This one is more sensitive as names, genders and identities can be more fluid and whilst the tide is definitely turning, the IP industry is steeped in tradition and there are disappointingly still swathes of the sector where hetero-normal stereotypes persist.
Additionally, I like to confirm that the currency selected matches with the location given, as why would a US attorney go through the hassle of converting their salary into Euros when they can merely select USD? Further too, does the location of the IP address align with the location provided? Naturally, many people use VPNs to hide their location so it’s always possible, although often it's unlikely. Another sign of an imposter is when they’ve written “kjhgkt” or similar whenever text was requested (but not required, so they could have left it blank). Reviewing the salary provided for reasonableness and consistency with the other remuneration figures provided is a given.
Tell me, would you believe Frodo Deepak (email@example.com), 25 – 34 year old female, from the US but with an IP address in Hong Kong, was earning €2.2million? Seems suss to me. But if I’m wrong, then incorrect exclusion probably still benefits the survey as which respondents truly need to have their comparative complete lack of achievement by the same age thrown in their face? There, I’m doing you a favour. To be clear, I’ve made up Frodo and her email address but the profile is representative of some of the questionable data I have received.
What about the results where respondents did not leave their name? Well, I still perform a reasonableness review and sense checks per the above. However, I’m less militant in my application on the assumption that life is too short and no one has the time to complete a survey for an industry they are not involved with for no personal gain, but I could be wrong.
Given it’s multiple choice, there’s always the chance that someone may have selected the incorrect response in error - which is why I consider all the above factors collectively and no single issue or deviation is sufficient for exclusion. Occasionally, all my checks, including a few not mentioned above, still do not provide a definitive answer. In these cases, I err on the side of caution and eliminate the respondent's data so as to ensure the integrity of the survey results published is upheld. My sincere apologies to anyone incorrectly rejected – your attempts at stealth and subterfuge have worked.
If you’d like to help a girl out, and are part of the IP industry, I welcome your contribution to the survey which can be found here: www.fellowssurvey.com. For those not in the field of Intellectual Property please don’t take this as an invitation to see if you can slip past my defences.