Werden wir Helden für einen Tag

Home | About | Archive

Week 6: Wrapping API queries with an R6 class

Posted on Jan 15, 2019 by Chung-hong Chan

firststep

As I’ve mentioned previously, I am now taking courses from Datacamp. My choices of courses are quite unconventional, because I just want to fill my knowledge gaps. One course by Richie Cotton on Object-Oriented Programming (OOP) is very interesting. I have used RefClass in one of my projects before, so learning R6 is not a very big deal. But still, there are a lot of information and I think it is a better idea to write down what I’ve learned by deliberate practicing.

OOP is not very useful for data analysis and therefore if you consider yourself an analyst, you can still have a good life without any knowledge in OOP. However, if you want to develop some tools, for example tools for data collection from web APIs, it is a better idea to learn some OOP. Instead of using the usual example of “person class inherits from animal class” to illustate OOP concepts, it is better to use a practical example of wrapping an API. In this example, I wrap the unofficial Deutsche Bahn Public Transport API. API wrapping is a natural fit for OOP, because the end points are usually objects. (e.g. a station, a user, a tweet, a post, a piece of gene, a product)

It is entirely possible to query APIs without OOP. I have even taught that previously. But the problem for it is that, I need to expose a lot of API implementation details. By using OOP, we can hide those details and expose a clean UI for the users to use.

For the DB API, the centre of all API calls is “Station”. We can create the simplest class “Station”, which represents a train station (or Bahnhof, if you prefer me to speak German).

library(httr)
library(R6)

station_generator <- R6Class(
	"Station",
		public = list(
		name = NULL
	)
)

mannheim_hbf <- station_generator$new()
mannheim_hbf <- station_generator$new('Mannheim')
## Error in station_generator$new("Mannheim"): Called new() with arguments, but there is no initialize method.

As you can see above, we can create a new instance of “Station” using the new() method. The public in the class is a list, which stores all information or methods (methods are just functions attached to an object) open to the public. In this case, a Station have a name. However, using this class implementation, we cannot register the name of the Station. The error message indicates there is no initialize method. So, we create one.

station_generator <- R6Class(
    classname = "Station",
		public = list(
			name = NULL,
            initialize = function(name) {
				self$name <- name
			}
		)
)

mannheim_hbf <- station_generator$new('Mannheim')
mannheim_hbf
## <Station>
##   Public:
##     clone: function (deep = FALSE) 
##     initialize: function (name) 
##     name: Mannheim

Now, we can give our station a name. But we know that the name of the train station in Mannheim is not just ‘Mannheim’. Instead, it is called Mannheim Hauptbahnhof (or Mannheim Hbf). Also, in order to do any query, we need to know the station id as well. Now, we do our first wrapping of an API call.

station_generator <- R6Class(
	classname = "Station",
	public = list(
		name = NULL,
		sid = NULL,
		initialize = function(name) {
			res <- httr::GET("https://2.db.transport.rest/stations", query = list(query = name))
			self$sid <- httr::content(res)[[1]]$id
			self$name <- httr::content(res)[[1]]$name
		}
	)
)

mannheim_hbf <- station_generator$new('Mannheim')
mannheim_hbf
		
## <Station>
##   Public:
##     clone: function (deep = FALSE) 
##     initialize: function (name) 
##     name: Mannheim Hbf
##     sid: 8000244

So far so good. We can even query the station id.

mannheim_hbf$sid
## [1] "8000244"

How about modifying the station id?

mannheim_hbf$sid <- "suck"
mannheim_hbf
## <Station>
##   Public:
##     clone: function (deep = FALSE) 
##     initialize: function (name) 
##     name: Mannheim Hbf
##     sid: suck

Yes, we can! But it is not a good idea. Imagine a situation when a user has created a station with the correct station id, and then he or she monkeys around with the station id. Then he or she makes a query using the same object but with a wrong, tempered station id. It will generate a wrong and confusing result. Therefore, it is a better idea to put the important information that you don’t want your users to modify easily into private. By convention, those private information is named with double dot at the beginning. The private information is not exposed.

station_generator <- R6Class(
	classname = "Station",
	private = list(
		..sid = NULL
	),
	public = list(
		name = NULL,
		initialize = function(name) {
			res <- httr::GET("https://2.db.transport.rest/stations", query = list(query = name))
			private$..sid <- httr::content(res)[[1]]$id
			self$name <- httr::content(res)[[1]]$name
		}
	)
)


mannheim_hbf <- station_generator$new('Mannheim')
mannheim_hbf$..sid
## NULL

We can put more information into the private. For example, all the information from the API query. That information, although we don’t want our users to modify, we want to read them. But the current implementation cannot do that.

station_generator <- R6Class(
	classname = "Station",
	private = list(
		..sid = NULL,
		..info = NULL
	),
	public = list(
		name = NULL,
		initialize = function(name) {
			res <- httr::GET("https://2.db.transport.rest/stations", query = list(query = name))
			private$..sid <- httr::content(res)[[1]]$id
			self$name <- httr::content(res)[[1]]$name
			private$..info <- httr::content(res)[[1]]
		}
	)
)

mannheim_hbf <- station_generator$new('Mannheim')
mannheim_hbf$info
## NULL
mannheim_hbf$..info
## NULL

In order to retify this, we can use active bindings to ensure “read only” access. Active bindings are defined like function, but they are not actually functions. Those bindings are read with obj$active_binding and write with obj$active_binding <- 123. We can enforce read-only access like so:

station_generator <- R6Class(
	classname = "Station",
	private = list(
		..sid = NULL,
		..info = NULL
	),
	public = list(
		name = NULL,
		initialize = function(name) {
			res <- httr::GET("https://2.db.transport.rest/stations", query = list(query = name))
			private$..sid <- httr::content(res)[[1]]$id
			self$name <- httr::content(res)[[1]]$name
			private$..info <- httr::content(res)[[1]]
		}
	),
	active = list(
		info = function(field) {
			if (missing(field)) {
				private$..info
			} else {
				stop("You can't modify the info.")
			}
		}
	)
)

mannheim_hbf <- station_generator$new('Mannheim')
##mannheim_hbf$info
mannheim_hbf$info$address
## $city
## [1] "Mannheim"
## 
## $zipcode
## [1] "68161"
## 
## $street
## [1] "Willy-Brandt-Platz 17"
mannheim_hbf$info <- "suck"
## Error in (function (field) : You can't modify the info.

So now, we define a public method for query trains depart from the station.

station_generator <- R6Class(
	classname = "Station",
	private = list(
		..sid = NULL,
		..info = NULL
	),
	public = list(
		name = NULL,
		initialize = function(name) {
			res <- httr::GET("https://2.db.transport.rest/stations", query = list(query = name))
			private$..sid <- httr::content(res)[[1]]$id
			self$name <- httr::content(res)[[1]]$name
			private$..info <- httr::content(res)[[1]]
		},
		depart_trains = function(duration = '60') {
			res <- httr::GET(paste0("https://2.db.transport.rest/stations/", private$..sid, "/departures"), query = list("next" = duration))
			httr::content(res)
		}
	),
	active = list(
		info = function(field) {
			if (missing(field)) {
				private$..info
			} else {
				stop("You can't modify the info.")
			}
		}
	)
)

mannheim_hbf <- station_generator$new('Mannheim')
res <- mannheim_hbf$depart_trains() ### churn out the train in the next 60 mins
res[[1]]
## $tripId
## [1] "1|1033326|0|81|15012019"
## 
## $stop
## $stop$type
## [1] "stop"
## 
## $stop$id
## [1] "518342"
## 
## $stop$name
## [1] "Hauptbahnhof, Mannheim"
## 
## $stop$location
## $stop$location$type
## [1] "location"
## 
## $stop$location$latitude
## [1] 49.48022
## 
## $stop$location$longitude
## [1] 8.469529
## 

So far so good! The result is a named list and it is too long to be listed in its entirety. It is possible to convert that named list into data frame, but it is beyond the scope of this post.

How about we create a public method to query trains to another station?

station_generator <- R6Class(
	classname = "Station",
	private = list(
		..sid = NULL,
		..info = NULL
	),
	public = list(
		name = NULL,
		initialize = function(name) {
			res <- httr::GET("https://2.db.transport.rest/stations", query = list(query = name))
			private$..sid <- httr::content(res)[[1]]$id
			self$name <- httr::content(res)[[1]]$name
			private$..info <- httr::content(res)[[1]]
		},
		depart_trains = function(duration = '60') {
			res <- httr::GET(paste0("https://2.db.transport.rest/stations/", private$..sid, "/departures"), query = list("next" = duration))
			httr::content(res)
		},
		trains_to = function(station) {
			res <- httr::GET("https://2.db.transport.rest/journeys", query = list(from = private$..sid, to = station$..sid))
			httr::content(res)
		}
	),
	active = list(
		info = function(field) {
			if (missing(field)) {
				private$..info
			} else {
				stop("You can't modify the info.")
			}
		}
	)
)

mannheim_hbf <- station_generator$new('Mannheim')
berlin_hbf <- station_generator$new('Berlin')
berlin_hbf
## <Station>
##   Public:
##     clone: function (deep = FALSE) 
##     depart_trains: function (duration = "60") 
##     info: active binding
##     initialize: function (name) 
##     name: Berlin Hauptbahnhof
##     trains_to: function (station) 
##   Private:
##     ..info: list
##     ..sid: 8011160

Does it work?

mannheim_hbf$trains_to(berlin_hbf) ## nothing
## $error
## [1] TRUE
## 
## $msg
## [1] "Missing destination."

No. It is because the private station id of the Berlin Station is not exposed to the Mannheim Station. We can once again create an active binding to enforce read-only access.

station_generator <- R6Class(
	classname = "Station",
	private = list(
		..sid = NULL,
		..info = NULL
	),
	public = list(
		name = NULL,
		initialize = function(name) {
			res <- httr::GET("https://2.db.transport.rest/stations", query = list(query = name))
			private$..sid <- httr::content(res)[[1]]$id
			self$name <- httr::content(res)[[1]]$name
			private$..info <- httr::content(res)[[1]]
		},
		depart_trains = function(duration = '60') {
			res <- httr::GET(paste0("https://2.db.transport.rest/stations/", private$..sid, "/departures"), query = list("next" = duration))
			httr::content(res)
		},
		trains_to = function(station) {
			res <- httr::GET("https://2.db.transport.rest/journeys", query = list(from = private$..sid, to = station$sid))
			httr::content(res)
		}
	),
	active = list(
		info = function(field) {
			if (missing(field)) {
				private$..info
			} else {
				stop("You can't modify the info.")
			}
		},
		sid = function(field) {
			if (missing(field)) {
				private$..sid
			} else {
				stop("You can't modify the sid.")
			}
		}        
	)
)

mannheim_hbf <- station_generator$new('Mannheim')
berlin_hbf <- station_generator$new('Berlin')
berlin_hbf$sid
## [1] "8011160"
berlin_hbf$sid <- "suck"
## Error in (function (field) : You can't modify the sid.

Does it work now?

res <- mannheim_hbf$trains_to(berlin_hbf)
res[[1]]
## $type
## [1] "journey"
## 
## $legs
## $legs[[1]]
## $legs[[1]]$origin
## $legs[[1]]$origin$type
## [1] "stop"
## 
## $legs[[1]]$origin$id
## [1] "8000244"
## 
## $legs[[1]]$origin$name
## [1] "Mannheim Hbf"
## 
## $legs[[1]]$origin$location
## $legs[[1]]$origin$location$type
## [1] "location"
## 
## $legs[[1]]$origin$location$latitude
## [1] 49.47918
## 
## $legs[[1]]$origin$location$longitude
## [1] 8.469268
## 
## 
## $legs[[1]]$origin$products
## $legs[[1]]$origin$products$nationalExp
## [1] TRUE
## 

Yes it works! We can play around with other station as well. For example the Munich main station and Munich Airport station.

##berlin_hbf$trains_to(mannheim_hbf)

munich_hbf <- station_generator$new('München')
munich_flughafen <- station_generator$new('München Flug')

munich_flughafen
## <Station>
##   Public:
##     clone: function (deep = FALSE) 
##     depart_trains: function (duration = "60") 
##     info: active binding
##     initialize: function (name) 
##     name: Flughafen München
##     sid: active binding
##     trains_to: function (station) 
##   Private:
##     ..info: list
##     ..sid: 8004168

There are other concepts that I haven’t demonstrated in this example (e.g. Inheritance, Override, $super, etc…) But anyway, it is always fun to write some OOP code in R.

6 down, 46 to go.


Powered by Jekyll and profdr theme