DIY DSP, Part 1

Last updated: March 3, 2017

Overview

So now you have decided to build your own DSP. It all looks pretty do-able. But it is not as east to do as you think - it is an involved process and a lot of people get tripped up by the details. This blog post will introduce you how to all the other things it takes to build a DSP from the ground up with RTB4FREE. Later blog posts will set up an actual DSP using RTB4FREE.

There is more to a DSP than just running an RTB bidder. The bidder is important, but there are other important components. We will run through the basic pieces you will need here. To start with, we are presuming you are going to use Amazon AWS.

Requirements

First, you need to know what are your requirements ahead of time. How many QPS are you going to need?. Are you going to store bid requests for post-analysis (you should)? How much of this data are you going to store? What will your overhead costs be? Here is a laundry list of the costs involved:

  • Instance/server costs.
  • Disk storage costs.
  • Long term storage costs.
  • Post-processing costs.
  • Cost of system management.
  • SSP charges.

Basis Costs

Once you determine the QPS requirements you can predict your instance and storage costs. But generally, you can assume that each QPS will cost you $1USD per month in infrastructure costs. So if you plan on doing 5K QPS, plan on $5,000/month in infrastructure costs. There is a lot more to a DSP than a bidder.

System management is critical. You need to know if your instances are healthy, that the CPU utilization, disk space, network connections are alive healthy and manageable. We can't provide this software and capability for you, you need to set this up yourself. And note, you will need to move the stored bid request data from the bidder instances periodically to make sure you don't run out of disk space. We provide monitoring of DSP data with ElasticSearch.

As you save the bid request data you will be storing huge amounts of data. How relevant all of it is to you and how much to keep is something you have to determine yourself. But, disk storage easily is the lion's share of the cost.

DO NOT SKIP THE STEP OF DETERMINING YOUR QPS REQUIREMENTS FIRST. AD-HOC DEVELOPMENT OF YOUR SYSTEM MEANS YOU WILL OUTGROW YOUR SYSTEM CAPABILITIES RIGHT FROM THE OUTSET.

Other Software

As mentioned there are a lot of other software components than just the bidder needed to run a DSP. Here is a list of what we think you will need. It's a bare minimum

  • Systems management software.

    At a bare minimum you need to closely watch CPU utilization, free disk space, and the bid response tim of the bidders responding to processing campaigns. We use Elastic Search to watch logs, and Grafana and Promethius for resource utilization.

  • Campaign management software

    RTB4FREE provides campaign manager written in Ruby on Rails plus MySQL and a controller that connects the MySQL to the bidder software. A budgeting control system is also provided that works with Elastic Search and the campaign manager.

  • Post processing of bid request data.

    The bid request data is valuable. You can get an excellent idea of what the SSP's publishers have to sell, but you need analysis software capable of reading JSON data and that provides a query capability on that data. You will need this data to create white lists AND WHITE LISTS ARE VERY IMPORTANT. If you don't white list, you will bid on a lot of garbage impressions.

    The bid request data is a “Big Data” problem. There is a lot of it, and it grows rapidly. But you need this data to ensure your campaigns work in the eco system. Are you using the best banner sizes? Where are the bid requests originating from? What devices are being used? Do my IAB categories match? All of the answers are in the data. But you need the ability to store it, load it and query it.

    We use ElasticSearch tools for looking at the log, billing and bid request data, and each has powerful query capabilities that can give you insight into the market you are trying to purchase ad space.

Bear in mind each of these additional systems brings there own configurations and administrative tasks and overhead.

Data Management

How much data will you store, how will you get at it? Are you going to store in Glaciar, in Hadoop? Cassandra?, Flat Files? Your analysis will depend on the proper implementation of a good data management plan.

How long are you going to keep the data?

Bid requests can be stored on each bidder in the XRTB/logs/request file. This file grows very quickly. You need to move this data off the bidder and into your repository frequently, at least once an hour into your repository.

If you are using Splunk or ElasticSearch, Hadoop, etc., you then need to move this data into their repository.

IT IS VERY IMPORTANT TO DETERMINE YOUR DATA STORAGE REQUIREMENTS FIRST. IF YOU FAIL TO DO THIS, THEN WHEN YOUR DISK SPACE IS EXCEEDED THE BIDDER WILL STOP. OR WORSE, YOU LOSE YOUR ACCOUNTING DATA.

Know Your Bid Request Data

Obviously this is critical part of RTB, but many people ignore it. Before you begin bidding, run the bidders for a while and just store the bid requests, but NO-BID on everything.. Do this for several days. Then analyze thier bid request data to see what is actually available out there to bid on.

It is a big mistake to not know what is out there in the data stream. What domains are most prevalent? What are the ad sizes most frequently requested. What IAB categories are the interesting domains interested in? Do the device records in the bid request data have GEO tags? Does the user object have age and gender information?

The simplest mistake is choosing the wrong ad-sizes. People make simple mistakes and can't understand why their bidders are not bidding on anything. If your banner sizes aren't being requested by the publishers, you won't make any bids. There are standard sizes, and the most common being 320x50. Make sure your impression size is relevant

If your campaign isn't bidding analyze the bid request data. Use Splunk or ElasticSearch to see why using a query on the dataset that matches your campaign. Obviously if none of the bid requests from the SSP match your campaigns, you aren't going to bid

Know your bid request data! Just because a bid request comes in, and the bid request parameters meet your constraints that doesn't mean you should bid on it. There are a lot of bad actors out there publishing garbage web sites specifically to lure you onto a page that has hundreds of ads per page, or your impressions could be loading into BOTs. You need to take proactive action to protect your ad dollars. Here are some things to consider.

  • White listing.

    White listing is a list of publishers that you will bid on, and nobody else. These are advertisers you trust. Believe it or not, the SSPs sell a ton of garbage impressions from garbage sites. You don't want to bid on a site where the page has 20 other ads on it, no? Know your seller if you can.

  • Black listing.

    Black listing is the reverse of white listing. These are a list of advertisers you won't buy from, for whatever reason. If you don't white-list you should at least black list. Over time, after losing money you will know who these sites are.

    You need your post data analysis here. Look for those sites where you bought a lot of impressions and got no clicks, or you got clicks but no impressions. In the case of no clicks, these are likely click-bait sites with lots of ads on every page. You get lost on the page. The no conversions likely means BOTs.

    The SSPs talk a good game about transparency and spread a lot of buzz-words spouting quality, but in the end they have no problem presenting click-bait sites to you or exposing you to BOTnets faking clicks. This problem is rampant. Probably 50% of all ad traffic on the Internet is fake.

  • BOT Protection.

    The best way to protect yourself from BOTs is to get a Forensiq account. It's not free, but it costs less than presenting impressions to click BOTs. RTB4FREE is already set up to use Forensiq.

  • High quality Web Publisher detection.

    We don't know how to do this, if we did, we would be millionaires. But you need to periodically go back over your click data and the bid requests and find those sites that are high quality. Maybe you should manually click on some sites presented in the bid requests to see exactly what impressions are being offered. If the page has more than 5 ads on it, maybe you shouldn't bid on that page. This is a value judgment.

    We can't tell you how many times we have clicked on a publisher's site and been hit with a spear phishing campaign, or is a page with 1000 ads on it. Clearly fraud. But the SSPs can't catch them all, and some don't care and don't even try.

    Use white lists if you can, and blaclist as you go if you don't . Don't blindly bid, you will lose money.

  • Ensure your SSP has the publishers that can make you money.

    The nightmare: now you are connected to your SSP and you are bidding away. But you aren't making making many conversions…. Now what do you do?

    You go back and look at the variety of publishers the SSP has for you – and you find 80% of the traffic is from 6 publishers. Guess what, you won't make any money there. One time, this actually happened. And those 6 sites were all gossip, soft-core porn and celebrity sites. And the advertiser was trying to protect his brand. Obviously this was a big mismatch, not to mention the SSP really had nothing to offer of value.

    But you won't know if you don't know your bid request data. The easier it is to analyze the bid request data, the better prepared you are to make the proper judgments.

Campaign Performance

Here are a list of campaign performance issues. Make sure you understand these issues so that you don't fall into the traps described below.

  1. Running Too Many campaigns

    If all of your campaigns are bidding and winning and your QPS is good you are doing great. But if you add more campaigns that aren't bidding/winning, simply adding more campaigns will not help.

  2. Too Many SSPs

    Many DSPs make the mistake of trying to support too many SSPs. If your traffic from SSP doesn't exactly fit your campaigns, adding another SSP that also doesn't fit just costs you more money in the end. Also, be aware that many smaller SSPs syndicate the feed from other SSPs. Basically, sending you garbage traffic that nobody wants.

  3. Increasing QPS to Overcome Low Bid Rates

    This is the biggest mistake you can make. If your campaigns aren't bidding, simply upping your QPS to increase your revenue will cause you to lose money. If your bid rate is 5%, and your win rate is 1%. Doubling your infrastructure cost to increase your effective spend from 1% to 2% of 5% is a total waste of money.

  4. Increasing QPS Is Costly

    Processing bids is expensive. It takes a lot of hardware to set up a DSP. At a very minimum 3 servers, and preferably 5. If you double your QPS, your server needs will increase.

    Adding more campaigns... More servers. Adding more QPS... more servers. Adding more SSPs... need more servers. Infrasture costs go up. Not down.

Performance Issues

You can run everything on a single bidder instance. This will limit your total QPS handling capability. Also, at high QPS on a single instance Aerospike runs into some client connection issues. Unless your requirements are static, it is not advisable to use a single bidder.

Likely you will be setting your bidders up behind a load balancer. The more QPS you need, the more bidders you start. As your QPS needs decline, shut down bidders – so you don't pay for instance charges you don't need.

Between 11PM and 5AM you probably don't need to bid that much, and guess what, the BOTs never sleep and BOT traffic is more prevalent at night as other advertisers go off-line for the night.

We reiterate the need for performance measurement of the /var/log/rtb4free.log files on the instances you are running. As bid times increase beyond 40ms per second, start more instances, as the bid times decrease, consider stopping the instance(s) you don't need. But you need to measure to know what is going on

If you can tie all of this in with auto-scale, that's all the better.

Processing Bids for Affiliates

If you are selling ads for affiliates beware. Do you what they are advertising? Several companies we know opened their DSP to the public to help utilize the SSP inventory. Only to find that the affiliates are pumping low quality ads out, and engaging in phishing scams and delivering pornography. All of these things hurt your reputation with the SSP, and can even get your account cancelled.

As a result, several of these DSPs that were once open to the public are now closed. Policing the affiliates just became too much of a headache.

In the end, know what you are offering on the network, it's your SSP account, but your affiliate's ad. Don't get your account canceled because you didn't know an affiliate was injecting malware into their ads. It happens all the time….

Ready for Step 2

So now you are ready to move to designing your system. Here's a checklist of things you need to determine before you go to that next step:

  1. What is your QPS Requirement?
  2. How much data do you plan to store?
  3. How will you make campaigns?
  4. What System Monitoring do you plan to use?
  5. How will you bill your clients?

Now that you have some background, you are ready to go a little deeper into the development of your DSP. You can learn about setting up the infrastructure here in Part 2, DIY DSP.