- Trulia uses big data to recommend similar properties for consumers and to identify rooms by image.
By Brian Blum; reposted with permission from AIM Groupâs Classified Intelligence Report.

Deep Varma
Deep Varma knows big data. As vice president of data engineering at U.S. real estate portal Trulia, the Silicon Valley veteran computer scientist oversees the management of 1.5 terabytes of data every single day.
The term âbig dataâ is popping up everywhere these days, Varma told the AIM Group, but to understand what it means in a tangible way, one need only look at a site like Trulia.
Challenge lies in collecting huge scope of data
When Trulia recommends âsimilar properties,â that is big data at play.
Generating a specific set of suggestions involves combining what Trulia has tagged about a property â everything from the âeasyâ data like number of bedrooms and bathrooms down to the nitty-gritty tech specs such as the type of marble on the kitchen island or faucet design â with user behavior on the site.
Trulia tracks what it calls consumer âintent,â Varma said. In just a few minutes of engagement on trulia.com, a visitor will âgenerate an average of 18 to 20 events â or signals â about their intent.â This includes what images theyâve looked at â did they poke around the closets or inspect the size of the kitchen cabinets? â as well as data external to the property itself, such as neighborhood crime scores or local school ratings.
An in-house âpersonalization hubâ
Varma and his team of 50 Big Data specialists at Trulia have built an in-house âpersonalization hubâ to serve up the right content to visitors.
âPersonalization is when we show you similar properties in your price range in the particular neighborhood youâre looking at,â Varma said. âIndividualization takes a broader look â if youâre looking for certain types of schools, we can show you similar properties but in different neighborhoods.â
All that sounds simple enough, but when you have millions of monthly visitors all generating dozens of âevents,â coupled with the 4 million listings that Trulia processes every day plus another 10 million public records, it is easy to see why Big Data is big business.
External data, such as public records, are just as valuable
Data at Trulia falls into two core sets, Varma said, with separate teams processing the  information (the Trulia and Zillow brands remain completely siloâd within the Zillow Group corporate umbrella).
Trulia processes 4 million listings every day, plus another 10 million public records.
The first dataset comprises listings and public records. Varma called listings a âcommodity itemâ â most are provided to Trulia via feeds from MLSs, brokers and agents.
Public records are trickier. These are the deeds, taxes and assessment data that give visitors to Trulia the historical perspective it needs to understand a propertyâs true value.
There are 3,000 counties in the U.S., but no standards across counties or even between different types of public records. So data schema, format and accessibility can be wildly different.
Standardizing addresses is undoubtedly âthe biggest problem we face,â Varma said.
One misassignment âcan screw up all our unique insights.â Trulia built its own tools for address standardization â itâs definitely not something off-the-shelf, Varma said proudly. (For techies, it involves using an open standard format called JSON, derived from JavaScript.)
Into this mix of text listings and records, Trulia adds pictures (âweâve built our own image recognition technology that can tag a kitchen, bathroom or front yard as such, as well as an object recognition technology that can do the same for dishwashers and stainless steel stoves,â Varma said) and âlocation aware dataâ from external sources, such as local amenities and school rankings.

Using the standardized addresses, everything is linked together before itâs merged, indexed and run through Truliaâs Data Service API, which makes the resulting content searchable by visitors to Trulia.com. And of course, it all has to happen in near real time.
On the consumer behavior side, Trulia uses âdeep data science and machine learningâ in order to build âa digital signature.â Providing the same experience whether a visitor is registered and logged in or not is critical to Varma.
âOur goal is to help consumers make the best decisions. We donât need their names for that. In fact, the majority of our users are anonymous. Weâre not on a path of pushing a person into a funnel towards an agent. Itâs up to the consumer. They can do that when theyâre ready,â Varma said.
The future is virtual reality and AI
What is the future for big data at Trulia? Varma expects virtual reality to catch on and become another even more valuable source of Big Data to be processed. If today Trulia can track user âeventsâ generated by a click on an image of a refrigerator or the en-suite bathroom, imagine what happens when that same user straps on a pair of VR goggles at home and virtually walks around a home for sale. Every touch, every linger and glance can â and will â be added to the shopperâs digital signature, Varma said.
Before virtual reality goes mainstream, though, the term big data may fall out of use.
âToday, people think, oh my God, it must be a big thing, since the name starts with big. But in the next two to three years, it will become an integral part of every consumer business,â Varma said.
In the future, we may be talking more about the Internet of Things, which is basically the same as big data, or we may simply call it âartificial intelligence.â
Whatever name is used, big data is here to stay, and so is Deep Varma, Truliaâs big data daddy.
 © 2016 Advanced Interactive Media Group LLC / Classified Intelligence, reprinted with permission