These thoughts were generated in response to Kwiki:WebServiceTransclusionSolution. This idea is half-baked, just my two cents before my mind wanders:    (7LZ)

-MatthewOconnor?    (7MH)

To transclude a unit of information you need three pieces of meta-information about it:    (7M0)

    1) Where the information is located. 
    2) How to retrieve the remote information.
    3) What to do with the info when you have it.    (7M1)

One way to avoid recursive transclusion problems is to have the original requesting client perform retrieval and processing for *every* piece of transcluded information, even information transcluded in the aggregate.    (7M2)

For example:    (7M3)

    Say you have a unit of transcluded information, call it A.  Assume that
    unit A transcludes from unit B which transcludes from unit C which
    transcludes from unit A.  So we have a simple triangle loop.    (7M4)
    Classically A would ask B for it's content, who'd ask C, who'd ask A, etc.
    You have a loop.  Rather than ask B for its post-processed content (i.e.
    content after transclusion) ask it for its pre-processed content and
    instead of transcluding substitute in the 3 pieces of information above,
    i.e. the Where, How, and What. [coach sale]    (7M5)
    In this scenario A would ask B for it's pre-processed content.  Examining
    B's pre-processed content A would see that B transcludes from C.  So to
    "complete" B, A would ask for the pre-processed content of C.  When
    examining C's content A would notice the loop and be able to break it. [coach outlet store locations]    (7M6)

In the general case this amounts to a breadth first decent down the transclusion tree. If the client notices a loop it'll be able to manually break it.[coach bag outlet]    (7M7)

One of the problem with this is that, among a family of agents who can transclude with each other, there would need to be a way to express the Where, How, and What of a piece of transcludable information. Down that path is protocols, data formats, etc. This isn't necessarily so bad.    (7M8)

Usually the Where and the How portions are tightly coupled (as they should be), e.g: http://www.foo.tld/bar/baz.wiki#nid3145. The Where is the URL, the How is HTTP GET. This is especially true if the set of transcluding agents agree to provide a unique HTTP URL for each piece of transcludable information.    (7M9)

The What is trickier. Say you have an HTTP URL for a piece of information, how do you know what to do with the content of the HTTP GET Response? PurpleWiki, again, just dumps the text right into the document tree which amounts to simple string substitution. We could do the same, but we need to know where in the text to substitute. For example, let's flesh A, B, and C out some more:    (7MA)

    A: I like [t B] cookies.
    B: Chocolate [t C] Oatmeal
    C: Chip [t A]    (7MB)

When A retrieves the pre-processed text of B, it might see something like:    (7MC)

    <information>
        <chunk id="1">Chocolate </chunk>
        <transclusion id="2">http://foo.bar.tld/baz#nidC</transclusion>
        <chunk id="3"> Oatmeal</chunk>
    </information>    (7MD)

Processing this A knows to simply substitute the <transclusion> node with whatever the textual value of C becomes. This has the limitation that B will be unable to do any post-processing based on the value of C.    (7ME)

As an example of this limitation, consider a web browser. Images in HTML are transclusions, a browser sometimes changes the rendering of the page when it loads an image. Sending the pre-processed information is like sending (or in the browser's case, displaying) a webpage before any of its transcluded content is considered (css, images, etc).    (7MF)

So, with this transclusion "solution", we've avoided recursion but replaced it with coordination among the various agents who expose information for transclusion and agents who transclude. This isn't a terrible trade-off because much of the coordination is happening anyway (especially the Where and How). The What does require some extra coordination as noted above.    (7MG)

-MatthewOconnor?    (7MI)


I'm not sure I understand the solution described on the Kwiki site and why you would want to use locks that way, but I think this solution is problematic as well. If the host of the top level page does everything, aren't you limited to trascluding from sites whose trasclusion syntax the top level site understands?    (7MK)

Also, I think you are blurring the distinction between how to fetch it with what to do. The transcluded element is not the entire page that a GET would fetch, so how includes the how to extract it from the page as well. Of course, since we are just doing GETs to a transcluded URL, how do you get the remote system not to do it's transcusion processing? Use some extra CGI parameter? Use an alternate URL that doesn't do the pre-process? Maybe I missed it.    (7ML)

If you are going to need a special way to fetch the transcluded content, you might as well change it to do the cycle check as it goes by sending along the current transcusion path (i.e. the transclusion path from the current request plus the current NID).    (7MM)

I sort of see how locks might break the loop, at the cost of affecting simultaneous transactions (i.e. two requests for the same page processing at the same time. Maybe you could solve that by just stalling a second top level request until the first completes and satisfying the second one from the cache entry created when the first one completes.    (7MN)

Another way might be to use the NID indexes at the time of insertion to prevent loops from being added to the database in the first place. You would need to record which NIDs contain transclusions and the NIDs transcluded, then you could read the entire transclusion tree from this new transclusion index. You can detect loops as you read the tree, but there should be no loops in the stored tree, so if the tree doesn't contain the starting transcluded NID, the transclusion is good and can be added. You would still have to do something to prevent a race condition where the update to a page in the transclusion tree creates a loop after that part of the tree is scanned. Maybe if you pre-added the transclusions from the current page before starting the scan. Then if a second request comes in it will see the loop about to be created and not do it. If a conflict occurs you would have to back out those changes. I think this creates a conservative race condition where two conflicting updates both fail where either one would be ok. That doesn't seem like a very big problem.    (7MO)

- GerryGleason    (7MP)


Gerry Said: aren't you limited to trascluding from sites whose trasclusion syntax the top level site understands?    (7MR)

I'm not exactly sure what you mean, but I'm pretty sure the answer is no. I am assuming that one would want to transclude from multiple places, with distinct local syntax, and places that don't even necessairly share the same addressing scheme (i.e. different NID pools, or even between places that don't use NIDs).    (7MS)

What is required between any two sites is that the requesting site must understand the remote site's representation, which is different, I think, than what you're calling transclusion syntax    (7MT)

However, I am assuming that what is being transcluded is essentially text. Although that is only a weak assumption, I could imagine what I described be extended to transclusions of other kinds.    (7N5)

Gerry Said: The transcluded element is not the entire page that a GET would fetch, so how includes the how to extract it from the page as well. Of course, since we are just doing GETs to a transcluded URL, how do you get the remote system not to do it's transcusion processing?    (7MU)

No, the How (the 2nd item in the list at the top of the page) does not include the how of extracting the transcluded information. I failed to explain this aspect in my original post, it was implicit to my thinking though.    (7MV)

When I say a "unit of transcludable information" what I am really thinking of is a resource. My ninja-REST training has taught me to associate resource with unique and persistant URI, or in this case URL. Resources have a unlimited number of potential representations. For a transcludable unit there could be a pre-processed representation, which may look something like the XML snippet above. Of course, there would be a post-processed representation too.    (7MW)

At this point, you could have one URL for each transcludable unit and then use some kind of ContentNegotiation? to decide which representation you can get. Or, as I would prefer, have a unique URL for each representation. E.g. http://www.foo.tld/wiki/ExamplePage/nid415.raw and http://www.foo.tld/wiki/ExamplePage/nid415.html.    (7MX)

So, the answer is yes to your question: Use an alternate URL that doesn't do the pre-process?. However, the thing at that URL probably isn't HTML. It'll be some shared representation. You could use a fragment identifier (i.e. #nid515) too if the representation has some way of expressing fragements.    (7MY)

Gerry Said: If you are going to need a special way to fetch the transcluded content, you might as well change it to do the cycle check as it goes by sending along the current transcusion path (i.e. the transclusion path from the current request plus the current NID).    (7MZ)

I have some trouble following this as well. The solution I outlined puts all the information in one place so that cycles can be spotted and broken easily. It seems you're talking about pushing information along the transclusion chain rather than gathering information all in one spot. However, that's just a wild guess.    (7N0)

Gerry, I also don't grok your remarks about avoiding loops by detecting them when NIDs are added to the database. It seems that such a solution is limited to a single NID pool and would fail to scale to multiple sites.    (7N1)

One thing that's key to what I described is the ability to break cycles that don't involve the original transcludable unit. Say, that unit A tries to transclude from unit B which transcludes from unit C which transcludes from unit B (i.e. not A, as in the original example). Here the loop is between B and C. B and C may or may not even belong to the same address space as A, so NID indexing is not going to work here. Rather than A blocking, possibly forever, or getting some kind of probabilistic result (as would happen w/ locking), A now has the ability to notice the loop and decide its own course of action.    (7N6)

- MatthewOconnor?    (7N2)