Friday, January 12, 2018

Address Masking for RAM Devices

I found the problem of generating the correct address for a memory operation confusing and decided that it would be of value to me (and others) to take notes.

Assume we have an implementation of a slave device on a TileLink2 bus. It can receive either read (Get) or write (PutFullData) messages on channel A and correctly return the response on channel D.  More information about this in the spec.

The following covers the implementation of the module, specifically on the mask generation, nothing more.

Assumptions. The slave device covers address range 0x2000000, size 0x1FFF (or 8KB).  The system bus uses 8 bytes per beat (matching the 64 bit bus).  The memory width matches the bus width.  Number of memory words is 1024 and address width is 10 bits.

val address = in.a.bits.address // receive the full 64-bit physical address
val size = 0x1FFF // 8KB size, get using addressSet.mask
val size = in.a.size  // log2(bytes) of total amount of the data requested, e.g., 8B cache line == 1

Problem.  To compute the word-aligned address, you need a mask.  Here is an example worked out by hand.

2000_1000 ^ (0x2000_1000 & ~mask-1) = 0x1000 (byte aligned) // where mask = 0x2000
1000 / 8 = 0x200 (8B word aligned)
addr[9:0] = 0x200

That was the easy part.  I got confused by the operation of the circuit generator.  Isn't the following great ;)

val a_address   = Cat((mask zip (in.haddr >> log2Ceil(beatBytes)).toBools).filter(_._1).map(_._2).reverse)

This is definitely one the least appealing parts of functional programming.  It would take me a few hours to reverse engineer this.  How this should work is any non-trivial function should be placed in its own container and unit tested.  The unit tests are commented and serve as a description of the functional algorithm.  But, I digress ...

There is a helper object in the util package.  Moral of the story.  Use it.

// This gets used everywhere, so make the smallest circuit possible ...
// Given an address and size, create a mask of beatBytes size
// eg: (0x3, 0, 4) => 0001, (0x3, 1, 4) => 0011, (0x3, 2, 4) => 1111
// groupBy applies an interleaved OR reduction; groupBy=2 take 0010 => 01
object MaskGen {
  def apply(addr_lo: UInt, lgSize: UInt, beatBytes: Int, groupBy: Int = 1): UInt = {
[..]




Mid-January Update

November was mostly about writing a grant proposal for DARPA.  It has been 110% technical work since then.  Beginning in February, I will shift to writing pretty PowerPoint slides for the company.   I should have a minimum viable prototype by then.

I worked on an experiment to build a continuous integration environment for the software assets of the RISC-V Foundation; learning Groovy and Gradle the process.  The prototype uses Jenkins2 for which I have been pleasantly surprised.  My previous experience was Jenkins 1.x.  I published source code for the prototype here.  I plan to volunteer some time to help coordinate some this activity.  It is a small <5%-of-week commitment.

I began to move parts of my internal wiki to GitHub gists.  I plan to add more over time.

Finally, I am accumulating notes on how to bootstrap RocketChip.  You can find the notes here

Also, compared to six months ago, I am beginning to feel very comfortable writing functional code using Scala.  I struggle a bit here and there (i.e., when recursion comes into play), but overall it is a positive experience.  If I had time, I would love to take an advanced class.  It would be interesting to re-learn data structures, algorithms through a functional lense.