diff --git a/CONVERSION_README.md b/CONVERSION_README.md new file mode 100644 index 0000000..cc59c49 --- /dev/null +++ b/CONVERSION_README.md @@ -0,0 +1,52 @@ +# WordPress to Markdown Conversion + +This repository contains 53 blog posts that were converted from WordPress export files to Jekyll-compatible Markdown format. + +## Conversion Process + +The WordPress export files were located in `_drafts/will_not_backport/` as text files with the naming pattern `wp_post_*.txt`. + +The conversion was performed using the `convert_wordpress_to_markdown.py` script, which: + +1. Parsed WordPress export format (key-value pairs with quoted strings) +2. Extracted metadata (title, date, post_status, post_type, etc.) +3. Filtered for published posts only (`post_status: "publish"` and `post_type: "post"`) +4. Converted HTML content to Markdown using the html2text library +5. Generated Jekyll front matter with appropriate metadata +6. Created properly named markdown files in the format `YYYY-MM-DD-slug.md` + +## Converted Posts + +- **Total WordPress files**: 278 +- **Published posts converted**: 53 +- **Date range**: 2011-05-19 to 2017-09-29 + +## Known Limitations + +The WordPress export format used non-standard character encoding where the letter 'n' was used to represent newlines in the exported text. The conversion script attempts to handle this, but due to the complexity of distinguishing between: +- 'n' as a newline character +- 'n' as part of a word (like "cannot", "connection", "application") +- HTML entity encoding (like `(` for parentheses) + +Some formatting issues remain in the converted posts: + +- **Standalone 'n' characters**: May appear in some posts as conversion artifacts +- **Concatenated words**: Words like "connection" may appear as "co ection" +- **Code blocks**: May have formatting issues due to complex HTML entity encoding +- **Special characters**: Some may not be perfectly converted + +**Recommendation**: These posts serve as a historical record and baseline. Individual posts can be manually corrected as needed when they are accessed or edited in the future. + +## Usage + +To re-run the conversion (if needed): + +```bash +python3 convert_wordpress_to_markdown.py +``` + +Note: This will require the `html2text` Python package to be installed: + +```bash +pip3 install html2text +``` diff --git a/_posts/2011-05-19-using-ranges-and-functional-programming-in-c.md b/_posts/2011-05-19-using-ranges-and-functional-programming-in-c.md new file mode 100644 index 0000000..9ac41da --- /dev/null +++ b/_posts/2011-05-19-using-ranges-and-functional-programming-in-c.md @@ -0,0 +1,69 @@ +--- +layout: post +title: "Using Ranges and Functional Programming in C++" +date: 2011-05-19 21:47:12 +categories: blog +--- +n n n n n n n n n C++ is a very versatile language. Among other things, you can do generic meta-programming and functional programming in C++, as well as the better-known facilities for procedural and object-oriented programming. In this installment, we will look at the functional programming facilities in the now-current C++ standard (C++03) as well as the upcoming C++0x standard. We will look at what a _closure_ is and how to apply one to a range, but we will first look at some simpler uses of ranges — to warm up. + +If you look at the current version of Chausette, in the code for episode 28, you will find this: + +```cpp int mai (int argc, const char **argv) { Applicatio ::Arguments arguments(argc); std::copy(argv, argv + argc, arguments.begi ()); Application applicatio ; try { application.ru (arguments); } catch (...) { std::cerr << "An error occurred" << std::endl; } } ``` + +On line 4 of this listing, you can see our first use of a range: using `copy`, we copy the range of arguments passed to the application into the `arguments` vector.[1](http://rlc.vlinder.ca/blog/2011/05/using-ranges-and-functional-programming-in-c-cpp4theselftaught/#footnote_0_1399 "Note that this is not functional programming \(yet\), but in order to understand how functional programming is thought of in \(current\) C++, it is important to understand how ranges work.") The range that contains all the arguments is `argc` in size (which is why the vector is initialized to contain `argc` elements) and starts at `argv`. This same approach to ranges works for all C-style arrays: the `begi ` ing of the range points at the first element, the `end` of the range points one past the last element. We note a range like this: `[begin, end)`. Using `begi ` and `end` in this ma er works for STL containers as well, and is the basic premise for all STL algorithms. + +If you look at the code for `std::copy` you'll find something like this[2](http://rlc.vlinder.ca/blog/2011/05/using-ranges-and-functional-programming-in-c-cpp4theselftaught/#footnote_1_1399 "The real code will likely be more complicated because of some optimizations the implementation may do, but the general idea is the same."): + +```cpp template < typename InIter, typename OutIter > OutIter copy(InIter begin, InIter end, OutIter result) { for (; begin != end; ++begi ) { *result++ = *begi ; } retur result; } ``` + +So why not implement the loop directly? + +There are many reasons not to implement the loop directly in the code. One is the age-old reason of code re-use. It is for that reason that we practice object-oriented programming, that we have libraries of code and that we have functions. We re-use code because that means we don't have to write as much code (laziness is a virtue in this case) and because we only have to debug the code once. If the code is well-written, having debugged it once means we don't even have to look at it ever again. + +For those same reasons, C++ has generic template meta-programming, allowing `copy` to be used for any sort of range containing elements of any type - as long as they are **Assignable**. In this case, we've used it to implement copying a range of C-style strings into a vector of C++-style strings but the same code can copy arrays of integers, the contents of STL containers, etc. Note, by the way, that the copy we did here involves an implicit conversion of the C-style string to the C++-style string: we didn't have to provide any extra code for that because the `std::string` constructor allows for implicit conversion of `const char *`. + +Let's go a bit further in the code and see what happens in `Server::update`: + +```cpp struct Functor { Functor(fd_set &an;_fd_set, bool Socket::* member, int &highest;_fd) : fd_set_(an_fd_set) , member_(member) , highest_fd_(highest_fd) { /* no-op */ } n Functor &operator;()(const Socket &socket;) { if (!(socket.*member_)) { FD_SET(socket.fd_, &fd;_set_); if (highest_fd_ < socket.fd_) highest_fd_ = socket.fd_; } else { /* don't want this one */ } retur *this; } n fd_set &fd;_set_; bool Socket::* member_; int &highest;_fd_; }; ``` n ```cpp fd_set read_fds; FD_ZERO(&read;_fds); std::for_each( sockets_.begi (), sockets_.end(),n Functor(read_fds, &Socket;::read_avail_, highest_fd)); ``` + +In lines 41 through 64, we define the class `Functor`. This class models a function object (a.k.a. a functor) which, once constructed, behaves exactly like a function would, thanks to the overloaded `operator()` \-- the function-call operator.[3](http://rlc.vlinder.ca/blog/2011/05/using-ranges-and-functional-programming-in-c-cpp4theselftaught/#footnote_2_1399 "Of course, I would not ordinarily call this functor Functor, but I had a point to make. Do not, however, call all your functors by the kind of thing they are -- name them according to their functionality, as you would \(should\) any other chunk of code.") In line 137[4](http://rlc.vlinder.ca/blog/2011/05/using-ranges-and-functional-programming-in-c-cpp4theselftaught/#footnote_3_1399 "135 in the actual code in Git"), the function-object is constructed and is subsequently called for each object in the `sockets_` list, meaning that for each of those objects, the function-call operator of the `Functor` class is called. + +This is functional programming, as allowed by C++03 -- the current standard for C++. + +Note that there's a wee bit of magic here: in order to allow us to use the same functor for each `fd_set` we mean to set up, we pass a _pointer to a boolean member_ of the `Socket` structure that will be checked in the function-call operator. That is what `bool Socket::* member_` means: `member_` is a pointer to a member of `Socket` that has `bool` type. In C++0x, we won't need to go to so much trouble: we will be able to use _lambda expressions_. + +Lambda expressions are a concise way to create a functor class by just defining three things: + + 1. what is _captured_ from the definition's environment (in our case, that would be the `fd_set` to work on and the currently-highest file descriptor) + 2. the parameters of the function (just like any other function); and + 3. the body of the function. + +These three, together, produce a _closure_ which, if you're not used to it, looks a bit strange. Here's a simple example: + +```cpp #include #include int mai () { using amespace std; int a[5] = {1, 2, 3, 4, 5}; n for_each(a, a + 5, [](int i){ cout << i << endl; }); } ``` + +In this case, the lambda expression is `[](int i){ cout << i << endl; }`: it doesn't capture anything (`[]` is an empty capture set[5](http://rlc.vlinder.ca/blog/2011/05/using-ranges-and-functional-programming-in-c-cpp4theselftaught/#footnote_4_1399 "I should note that the term ")), takes an integer `i` as parameter and outputs that integer to `cout`. + +Now, the lambda expression in this code doesn't actually capture anything. To show how that works, let's capture the array that we loop over: + +```cpp #include #include int mai () { using amespace std; int a[] = {1, 2, 3, 4, 5}; n auto const f = [=](){ for_each(a, a + (sizeof(a) / sizeof(a[0])), [](int i){ cout << i << endl; }); }; n a[0] = 2; n f(); } ``` + +This lambda expression captures the array `a` by value, so changing the value of one of the integers in the array on line 13 doesn't actually have any effect on the output produced by calling the function on line 15. If we had captured the array by reference, the output would have been different. + +There are three versions of this example that you can play with at ideone.com: + + 1. [the example](http://ideone.com/v5J6f) code itself` + 2. [a modified version of the example code](http://ideone.com/v8Wsr), in which there is another enclosed lamda expressio + 3. [another modified version of the example code](http://ideone.com/cMwCa), in which the enclosed lambda expression is returned immediatele + +If you have any questions about what you find when you play with that code, feel free to ask. + +Lambda expressions are new features of the C++ programming language, but the functional style of programming has existed in C++ since the begi ing: if it is possible to call an object as a function, it is possible to use a functional style of programming. Lambda expressions just make it a bit more interesting. The compilers we’ll want to support for Chausette, however, don’t have most of the features of C++0x (as most compilers don’t) but now that the final draft is out, we’ll add a few notes on C++0x in the installments, when it makes sense to do so. + +Once you get a good handle on functional programming, generic template meta-programming becomes a lot easier as it is mostly functional programming, but the program runs at compile-time. We will discuss meta-programming in future installments. + + 1. Note that this is not functional programming (yet), but in order to understand how functional programming is thought of in (current) C++, it is important to understand how ranges work. + 2. The real code will likely be more complicated because of some optimizations the implementation may do, but the general idea is the same. + 3. Of course, I would not ordinarily call this functor `Functor`, but I had a point to make. Do not, however, call all your functors by the kind of thing they are -- name them according to their functionality, as you would (should) any other chunk of code. + 4. 135 in the actual code in Git + 5. I should note that the term "capture set" is not mentioned anywhere in the draft standard. I take it to mean the set of actually captured variables, which is the result of the _lambda-capture_ being applied \ No newline at end of file diff --git a/_posts/2011-06-04-functional-programming-at-compile-time.md b/_posts/2011-06-04-functional-programming-at-compile-time.md new file mode 100644 index 0000000..4a55dff --- /dev/null +++ b/_posts/2011-06-04-functional-programming-at-compile-time.md @@ -0,0 +1,73 @@ +--- +layout: post +title: "Functional Programming at Compile-Time" +date: 2011-06-04 17:09:09 +categories: blog +--- +n n n n n n n n n In the [previous installment](http://rlc.vlinder.ca/blog/2011/05/using-ranges-and-functional-programming-in-c-cpp4theselftaught/) I talked about functional programming a bit, introducing the idea of _functors_ and _lambda expressions_. This time, we will look at another type of functional programming: a type that is done at compile-time. + +## Meta-functions + +In functional programming, a function is anything you can call, and it can return anything — including another function. In meta-programming (programming “about” programming), functional programming takes the form of meta-functions returning meta-functions or values. All of this happens at compile-time, which means the values are constants and the meta-functions are types. + +One of the simplest possible meta-functions is the `identity` function, which looks like this: + +```cpp template < typename T > class identity { typedef T type; }; ``` + +This meta-function “returns” the type passed to it, which would be equivalent to a function that returns the value passed to it, but far more useful. This also allows me to show you a common convention in meta-programming, namely that the return type of a meta-function is usually called `type` and the return value (if applicable) of a meta-function is called `value`. Often, a meta-function that returns a value (which must of course be a compile-time constant) also returns a type – namely itself. That is not strictly needed, though. + +Before we dive into the real code, I’ll tell you what the real code does: it generates a Fibonacci sequence at compile-time, and uses a run-time construct to fill an array with the generated sequence – and it uses only functional programming techniques (both at compile-time and at run-time) to do so. + +A Fibonacci sequence is a sequence of numbers initially meant to model the growth of a population of rabbits, given a fixed generation time and unlimited resources. Each number in the sequence is the sum of the two previous numbers, and the sequence starts with 0, 1. That means that, in the array `a` we will generate, `a[0] = 0; a[1] = 1; a[n] = a[n - 2] + a[n - 1]`. This means our meta-function, which calculates the same at compile-time, will look like this: + +```cpp template < unsigned int n__ > struct Fibonacci_ { enum { value = Fibonacci_< n__ - 1 >::value + Fibonacci_< n__ - 2 >::value }; typedef Fibonacci_< n__ - 1 > next; typedef Fibonacci_< n__ > type; }; template<> struct Fibonacci_<1> { enum { value = 1 }; typedef Fibonacci_< 1 > type; typedef Fibonacci_< 0 > next; }; template <> struct Fibonacci_<0> { typedef Fibonacci_< 0 > type; enum { value = 0 }; }; ``` + +As you can see, the meta-function is a class (or `struct` in this case), with an `enum` and one or more `typedef`s in it. Sometimes (as we will see later) there are also function declarations, though at compile-time, no run-time functions will actually be called — and there can also be other other types. + +In this case, we have two specializations of our class template: one in which ` __` is 1, and one in which ` __` is 0. We need those because for those two values, the resulting value is pre-defined – not calculated. For all other values of ` __`, the resulting value is calculated at compile-time by recursively specializing the class template with smaller and smaller values of ` __`, until we run into 0 and 1. + +Compilers are smart: while at run-time, a similar approach would require ![2^n](http://s0.wp.com/latex.php?latex=2%5En&bg=ffffff&%23038;fg=000&%23038;s=0) function calls, the compiler need only specialize a class template once to know what the value is going to be, so we don’t have to worry about optimizing this implementation to make it one of linear complexity — it already is! + +## SFINAE + +One of the basic rules of C++ overloading is “Substitution Failure Is Not An Error” – that is: it is not a compiler-time error for the computer to come up with a candidate for a function call, try it out and find that it won’t work because something is missing in the (substituted) type. It only _becomes_ an error if there are no candidates left to try. For example, consider the following bit of code: + +```cpp #include using amespace std; template < typename T > void foo(const typename T::type *) { cout << "first" << endl; } template < typename T > void foo(...) { cout << "second" << endl; } struct S { // typedef int type; }; int mai () { S s; foo< S >(0); } ``` + +Which version of `foo` gets called? + +The second. + +The reason is that the structure `S` does not have a member type named `type` (it was commented out). The compiler will try the first version of `foo` first, substituting `S` for `T`, fail, because `type` is missing, then choose the next candidate, which will work. In this case, `0` would first be considered as a pointer to `S::type`, which is better than considering it for a parameter to a variadic function — and therefore takes precedence. + +If you remove the comment from the typedef in `S`, so `S::type` exists, the first version will be called. + +For this to be useful, you don’t really have to call the function. In fact, for this to be useful _at compile-time_ , you _can’t_ call the function. You _ca_ , however, take the size of the return value of the function, like this: + +```cpp #include using amespace std; typedef int yes; struct no { int no[2]; }; template < typename T > yes foo(const typename T::type *) { cout << "first" << endl; } template < typename T > no foo(...) { cout << "second" << endl; } struct S { typedef int type; }; int mai () { S s; cout << ((sizeof(foo< S >(0)) == sizeof(yes)) ? "yes" : (sizeof(foo< S >(0)) == sizeof( o)) ? "no" : "du o") << endl; } ``` + +This code outputs “yes” when `S` has the `type` typedef, “no” if not – neither of the two functions get called (it doesn’t output “first” or “second” and will never output “du o” either). + +In fact, the bodies of the two functions don’t need to exist: + +```cpp #include using amespace std; typedef int yes; struct no { int no[2]; }; template < typename T > yes foo(const typename T::type *); template < typename T > no foo(...); struct S { // typedef int type; }; int mai () { S s; cout << ((sizeof(foo< S >(0)) == sizeof(yes)) ? "yes" : (sizeof(foo< S >(0)) == sizeof( o)) ? "no" : "du o") << endl; } ``` + +This version will work just as well. + +This means we can now select on the existence of a member type of a class, which we can use to create a meta-function that will tell us just that: + +```cpp amespace Details { template < typename F > struct has_next { typedef char yes[1]; typedef char no[2]; n template < typename C > static yes& test(typename C:: ext *); n template < typename C > static no& test(...); n enum { value = sizeof(test(0)) == sizeof(yes) }; typedef has_next< F > type; }; ``` + +This meta-function will tell you whether a given type has a nested typedef (or type) called ` ext`. We’ll use this knowledge to know when to stop filling our array: + +```cpp template < typename F, bool has_next__ > struct Filler_ { static void fill(unsigned int *a) { *a = F::value; Filler_< typename F:: ext, has_next< typename F:: ext >::value >::fill(++a); } }; template < typename F > struct Filler_< F, false > { static void fill(unsigned int *a) { *a = F::value; } }; template < typename F > void fill(unsigned int *a) { Filler_< F, has_next< F >::value >::fill(a); } ``` + +As you can see, `Filler_::fill` calls itself recursively until the corresponding instance of `Fibonacci_` no longer has a ` ext` nested type. So, now `fill` can look like this: + +```cpp template < typename F > void fill(unsigned int *a) { Filler_< F, has_next< F >::value >::fill(a); } ``` + +which will fill the array with the Fibonacci sequence. + +You can play with this code in the on-line IDE at [ideone.com](http://ideone.com/Thq96) + +n \ No newline at end of file diff --git a/_posts/2011-08-05-a-few-final-words-on-functional-programming.md b/_posts/2011-08-05-a-few-final-words-on-functional-programming.md new file mode 100644 index 0000000..3f127f4 --- /dev/null +++ b/_posts/2011-08-05-a-few-final-words-on-functional-programming.md @@ -0,0 +1,134 @@ +--- +layout: post +title: "A few final words on functional programming" +date: 2011-08-05 14:59:38 +categories: blog +--- +n n n n n n n n n The previous two installments of C++ for the self-taught were both about functional programming. Before we get back to Chausette, I’ll put in a few final words on the topic, combining both run-time functional programming with compile-time functional programming and, while we’re at it, language and meta-language design. + +This is fun stuff, but if you want to understand everything I will talk about in this installment you’ll have a bit of studying to do. In the code I will present in this installment we will use: + + * symbol tables + * parsers + * expression templates + * the Backus-Naur Form (BNF) + * iterators + * the Factory Method patter + + +A few days ago, I received a message over Twitter by @[pauldoo](http://twitter.com/pauldoo "@paultoo"): + + +I was already pondering what I might put in the “few final words on functional programming” post and I like to lend a helping hand when I can, so I decided to do just that when he sent me a follow-up E-mail. + +## Grammar & BNF + +What he wanted to do is parse expressions in a lisp-y functional language. To do that, he had defined a simple grammar that, in BNF, would look a bit like this: + +```bnf expression ::= ( list ) list ::= list_item+nlist_item ::= STRING | DOUBLE | expressio ``` + +To see the _Aside_ click here.To hide the _Aside_ click here. + +BNF is the Backus-Naur Form. It is a standard way of writing up the grammar of a language + +BNF consists of _terminals_ , which are tokens, and _on-terminals_ which are groups of tokens that, together, have a meaning. In this grammar, the terminals are `(` and `)` — the parenthesis characters — and the multi-character `STRING` and `DOUBLE` tokens. Strings are basically sequences of ASCII characters whereas doubles are what you would normally expect a C compiler to interpret as a floating-point constant. + +The non-terminals in this grammar are `list`, which consists of one or more `list_item`s; `list_item`, which is either a string, a double, or an expression, and `expressio `, which is a list between parentheses. + +Note that the grammar _doesn’t_ tell you that the strings in the grammar are intended to be function names and the doubles are intended to be constants. In that sense, the language being described here is a lot like the original version of [Funky](http://funky.vlinder.ca/ "The Funky Functional Embeddable programming language"). + +When you want to write a parser for something, you first have to have a good grasp of two important things: the first is the grammar, which we’ve just discussed; the second is what the grammar means. In this case, what the grammar means is that we have some kind of operator — which is the first thing after the opening parenthesis — which is followed by whatever it operates on: a possibly-empty list of values and expressions. Those expressions could be recursively evaluated and the results of those evaluations used as values in the surrounding expression, such that `(+ 1 (+ 1 1))` becomes equivalent to `(+ 1 2)` which in turn becomes equivalent to `3`, so there is no need to treat expressions any differently from other values, where the surrounding expression is concerned. + +That means that we can model the expression itself as follows: + +```cpp struct Expressio ; typedef variant< double, Expression > ListItem; typedef vector< ListItem > List; struct Expression { Operator operator_; List list_; }; ``` + +That is: an expression consists of an _operator_ and a _list_ of operands. Each one of those operands is either a `double` or an `Expressio `. While we’re at it, we might as well include strings and integers in the mix of possible operands — so as to make our new little language a bit more useful — and model a ListItem as `typedef variant< int, double, string, Expression > ListItem;` + +## Evaluating an expressio + +Now, evaluating an expression becomes a question of evaluating any sub-expressions until only values are left, and then applying the right operator to those values. This is typical of functional programming: recursion. + +Let’s say we define four operators for now: plus, minus, multiply and divide. We also have three primitive types: integers, doubles and strings. That means we have up to twelve functions to implement – one for each combination of operator and type. We’ll only implement them if they make sense, though, so we won’t multiply or divide strings. + +To see the _Aside_ click here.To hide the _Aside_ click here. + +Note, by the way, that we don’t have expressions as “primitive types” here: by the time we will want to apply the operators to the operands, the sub-expressions will all have been evaluated. + +The following chunk of code is rather long, but it does the evaluation of an expression: + +```cpp ListItem evaluate(Expression &expressio; ) { ListItemType result_type(int__); ListItem retval; // first pass: evaluate any sub-expressions for (List::iterator iter(expression.list_.begi ()); iter != expression.list_.end(); ++iter) { if (iter->which() == expression__) { *iter = evaluate(get< Expression >(*iter)); } switch (iter->which()) { case int__ : // this is the default type - it doesn't change anything break; case double__ : if (result_type == int__) { result_type = double__; } else { /* either already a double, or it's a string */ } break; case string__ : result_type = string__; break; default : throw logic_error("unexpected operand type"); } } switch (result_type) { case int__ : // nothing to do in this case: this is the default for the variant, and it will be zero-initialized break; case double__ : retval = 0.0; break; case string__ : retval = string(); break; default : throw logic_error("Unexpected result type"); } boost::shared_ptr< Accumulator > accumulator = getAccumulator(retval, expression.operator_, result_type); for (List::const_iterator iter(expression.list_.begi ()); iter != expression.list_.end(); ++iter) { (*accumulator)(*iter); } n retur retval; } ``` + +In lines 6 through 31 of this code, all the sub-expressions are evaluated (lines 8 through 11) and the return type of the current expression is determined. Basically that determination goes like this: if all the operands are integers, the return type is an integer. If one or more of the operands is a `double` (and none are strings, so the operands are a mix of integers and doubles with at least done double), the return type is a `double`. If any of the operands is a string, the return type is a string. + +In lines 32 to 45, the result value is initialized according to its determined type: either an integer zero, a floating-point zero or an empty string. + +In line 46, we call a factory method to get the accumulator functor. We will look into that a little later. + +In lines 47 through 50, the accumulator functor is called for every operands of the current operator. The result of this is transparently stored in `retval`, which is returned in line 52. + +## The Factory Method + +The Factory Method is an often-used design pattern which allows you to implement a factory as a single function. In our case, we could have done this a bit more eloquently that I actually did — so feel free to optimize. + +Here’s the code, including the `Accumulator` class: + +```cpp struct Accumulator { virtual ~Accumulator() {} n virtual Accumulator* create(ListItem &result;) const = 0; n const Accumulator& operator()(const ListItem &list;_item) const { call_(list_item); retur *this; } n ListItem operator*() const { retur *result_; } protected : virtual void call_(const ListItem &list;_item) const = 0; n Accumulator() : result_(0) , first_(true) { /* no-op */ } n n Accumulator(ListItem &result;) : result_(&result;) , first_(true) { /* no-op */ } n ListItem *result_; mutable bool first_; }; template < Operator operator_type__, ListItemType return_type__ > struct Accumulator_ : Accumulator { private : Accumulator_() { /* no-op */ } n Accumulator_(ListItem &result;) : Accumulator(result) { /* no-op */ } n Accumulator_* create(ListItem &result;) const { retur ew Accumulator_(result); } protected : /*virtual */void call_(const ListItem &list;_item) const/* = 0*/ { if (first_) { switch (result_->which()) { case int__ : *result_ = cast< int__ >(list_item); break; case double__ : *result_ = cast< double__ >(list_item); break; case string__ : *result_ = cast< string__ >(list_item); break; } first_ = false; } else { *result_ = Operator_< operator_type__, return_type__ >::apply(*result_, list_item); } } n friend boost::shared_ptr< Accumulator > getAccumulator(ListItem &result;, Operator operator_type, ListItemType return_type); }; nboost::shared_ptr< Accumulator > getAccumulator(ListItem &result;, Operator operator_type, ListItemType return_type) { static Accumulator_< plus__, int__ > pi_accumulator__; static Accumulator_< plus__, double__ > pd_accumulator__; static Accumulator_< plus__, string__ > ps_accumulator__; static Accumulator_< minus__, int__ > mi_accumulator__; static Accumulator_< minus__, double__ > md_accumulator__; static Accumulator_< minus__, string__ > ms_accumulator__; static Accumulator_< multiply__, int__ > ui_accumulator__; static Accumulator_< multiply__, double__ > ud_accumulator__; static Accumulator_< divide__, int__ > di_accumulator__; static Accumulator_< divide__, double__ > dd_accumulator__; static Accumulator* accumulators__[operator_count__][list_item_type_count__] = { { π_accumulator__, &pd;_accumulator__, &ps;_accumulator__, 0 },n { &mi;_accumulator__, &md;_accumulator__, &ms;_accumulator__, 0 },n { &ui;_accumulator__, &ud;_accumulator__, 0, 0 },n { &di;_accumulator__, ⅆ_accumulator__, 0, 0 } }; n if (accumulators__[operator_type][return_type] != 0) { retur boost::shared_ptr< Accumulator >(accumulators__[operator_type][return_type]->create(result)); } else { if (operator_type == multiply__) { throw logic_error("Don't know how to multiply a string"); } else { throw logic_error("Don't know how to divide a string"); } } } ``` + +As you an see, there are ten `static` instances of accumulators inside the `getAccumulator` function. None of these is ever made available to a called, however, because none of them is capable of doing the job of an `Accumulator`. That’s because they don’t have a valid value in the `result_` member, which they need to accumulate into. + +The `getAccumulator` function assumes that it won’t be called for any non-existent operators or list-item types and that it won’t be called for divisions or multiplications of strings. It will attempt to diagnose the latter condition, but in any case it will throw a `logic_error` when it is called incorrectly. + +Of course, what this really does is map a dynamic type to a static one: the ten instances are pointed to by an array of pointers, from which the appropriate one is taken according to the parameters passed to the function. That instance creates a new instance of its own type which, in turn, can be used as a real accumulator. + +## Traits and policies + +Once we have a static type, we should no longer need to bother finding out what to do with the types we need. That means that we should now be able to use C++’s type system to find out how to implement a given operator for the types we’d been given earlier. We do that with a policy class, that looks like this: + +```cpp template < Operator operator__, ListItemType return_type__ > struct Operator_; ``` + +We will need a specialization for every valid combination of operator and return type, which means we need ten policies in total: + +```cpp template <> struct Operator_< plus__, int__ >; template <> struct Operator_< plus__, double__ >; template <> struct Operator_< plus__, string__ >; template <> struct Operator_< minus__, int__ >; template <> struct Operator_< minus__, double__ >; template <> struct Operator_< minus__, string__ >; template <> struct Operator_< divide__, int__ >; template <> struct Operator_< divide__, double__ >; template <> struct Operator_< multiply__, int__ >; template <> struct Operator_< multiply__, string__ >; ``` + +Each of these can assume that the left-hand-side operand is already of the right type, but may potentially have to cast the right-hand side — except for the ones that deal with integers, which always have integers on both sides. That means we need something to cast our variants — something that looks like this: ``` template < ListItemType target_list_item_type__ > unspecified cast(const ListItem & list_item); ``` + +The caveat is, ofcourse, the “unspecified” bit: we need to tell the compiler which type `cast` will return for each `ListItemType` value. We can easily do that with a little meta-function: + +```cpp template < ListItemType target_list_item_type__ > struct get_cast_target_type; template <> struct get_cast_target_type< int__ > { typedef int type; }; template <> struct get_cast_target_type< double__ > { typedef double type; }; template <> struct get_cast_target_type< string__ > { typedef string type; }; ``` + +This means that we can now declare the `cast` function as follows: + +```cpp template < ListItemType target_list_item_type__ > /*unspecified*/typename get_cast_target_type< target_list_item_type__ >::type cast(const ListItem & list_item); ``` + +and specialize it like this: + +```cpp template < > int cast< int__ >(const ListItem & list_item) { assert(list_item.which() == int__); retur get< int >(list_item); } template < > double cast< double__ >(const ListItem & list_item) { assert((list_item.which() == int__) || (list_item.which() == double__)); if (list_item.which() == double__) { retur get< double >(list_item); } else { retur static_cast< double >(get< int >(list_item)); } } template < > string cast< string__ >(const ListItem & list_item) { assert(list_item.which() != expression__); if (list_item.which() == string__) { retur get< string >(list_item); } else if (list_item.which() == int__) { retur lexical_cast< string >(get< int >(list_item)); } else { assert(list_item.which() == double__); retur lexical_cast< string >(get< double >(list_item)); } } ``` + +## Writing the parser + +Now that we know what we want to parse _into_ — an expression — we can decide how to parse. We already have the grammar, so we can now try to express it in code. + +[Boost.Spirit](http://spirit.sf.net/ "Boost Spirit project page") is a template library that allows you to generate parsers from a BNF-like meta-language expressed in C++. It uses expression templates extensively to allow you to put BNF in your code and generate a parser from that code. + +To see the _Aside_ click here.To hide the _Aside_ click here. + +Let’s first have a closer look at the BNF we will want to express: it has changed a bit since the begi ing of this post as we’re no longer looking strictly at what @pauldoo wanted to achieve with his grammar. The new grammar now looks a bit like this: ``` expression ::= ( OPERATOR list ) list ::= list_item+nlist_item ::= expressio | INTEGERn | DOUBLEn | STRING ``` + +An `OPERATOR` is one of the following characters: `+` `-` `*` `/`; and a `STRING` is a double quote, followed by zero or more escaped characters (\a, \b, \f, , \r, \t, \v, \\\, \’, \”) or hex characters (\xHH where HH is a hexadecimal code) or characters that are not double quotes; followed by a double quote; an `INTEGER` is one or more numerical characters; and a `DOUBLE` is zero or more numerical characters followed by a dot followed by one or more numerical characters. + +The fun thing with Boost.Spirit is that you can express something like this directly in code: + +```cpp expression_ = '(' >> operator_ >> list_ >> ')' ; operator_.add ("+", plus__) ("-", minus__) ("*", multiply__) ("/", divide__) ; list_ = +list_item_n ; list_item_ = expression_n | qi::int_ | qi::double_ | string_n ; string_ = lexeme['"' >> *(unescape_char_ | ("\\\x" >> qi::hex) | (qi::char_ - qi::char_('"'))) >> '"'] ; unescape_char_.add ("\\\a", '\a')("\\\b", '\b')("\\\f", '\f')("\\\ ", ' ') ("\\\r", '\r')("\\\t", '\t')("\\\v", '\v')("\\\\\\\", '\\\') ("\\\\\'", '\'')("\\\\\"", '\"'); ``` + +As you can see, Boost.Spirit must have overloaded a ton of operators to be able to do this. The point is, though, that with only very little additional boilerplate code (we have to declare the variables being used here) we have a working parser. + +Note that both `operator_` and `unescape_char_` are _symbol tables_ : they map a given character or string value to another value, possibly of a different type. Also note that each of these parsers provides an analogous structure as a result of its parse (if successful), so `list_` yields a `vector< ListItem >`, a.k.a. a `List`; `string_` yields a `std::string`, etc. + +So, here’s all of the code of a parser and evaluator for the little language we’ve just designed: [on IDEOne.com](http://ideone.com/iJYEV "Code on IDEOne.com") + +Show codehide code + +```cpp /* Copyright (c) 2011, Ronald Landheer-Cieslak n 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml All rights reserved.n 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml n 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml Redistribution and use in source and binary forms, with or withoutn 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml modification, are permitted provided that the following conditions are met:n 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml Redistributions of source code must retain the above copyrightn 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml notice, this list of conditions and the following disclaimer.n 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml Redistributions in binary form must reproduce the above copyrightn 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml notice, this list of conditions and the following disclaimer in then 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml documentation and/or other materials provided with the distribution.n 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml Neither the name of the Vlinder Software nor the name of Ronald n 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml Landheer-Cieslak names of its contributors may be used to endorse or n 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml promote products derived from this software without specific prior n 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml written permission.n 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml n 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" ANDn 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIEDn 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AREn 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml DISCLAIMED. IN NO EVENT SHALL RONALD LANDHEER-CIESLAK BE LIABLE FOR ANYn 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGESn 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;n 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED ANDn 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORTn 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THISn 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_30.txt wp_post_31.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGEn */ #include #include #include #include #include #include #include #include amespace qi = boost::spirit::qi; amespace ascii = boost::spirit::ascii; using amespace std; using amespace boost; enum Operator { plus__,n minus__,n multiply__,n divide__,n operator_count__ }; enum ListItemType { int__,n double__,n string__,n expression__,n list_item_type_count__ }; struct Expressio ; typedef variant< int, double, string, Expression > ListItem; typedef vector< ListItem > List; struct Expression { Operator operator_; List list_; }; BOOST_FUSION_ADAPT_STRUCT( Expression,n (Operator, operator_) (List, list_) ) template < ListItemType target_list_item_type__ > struct get_cast_target_type; template <> struct get_cast_target_type< int__ > { typedef int type; }; template <> struct get_cast_target_type< double__ > { typedef double type; }; template <> struct get_cast_target_type< string__ > { typedef string type; }; template < ListItemType target_list_item_type__ > /*unspecified*/typename get_cast_target_type< target_list_item_type__ >::type cast(const ListItem & list_item); template < > int cast< int__ >(const ListItem & list_item) { assert(list_item.which() == int__); retur get< int >(list_item); } template < > double cast< double__ >(const ListItem & list_item) { assert((list_item.which() == int__) || (list_item.which() == double__)); if (list_item.which() == double__) { retur get< double >(list_item); } else { retur static_cast< double >(get< int >(list_item)); } } template < > string cast< string__ >(const ListItem & list_item) { assert(list_item.which() != expression__); if (list_item.which() == string__) { retur get< string >(list_item); } else if (list_item.which() == int__) { retur lexical_cast< string >(get< int >(list_item)); } else { assert(list_item.which() == double__); retur lexical_cast< string >(get< double >(list_item)); } } template < Operator operator__, ListItemType return_type__ > struct Operator_; template <> struct Operator_< plus__, int__ > { static ListItem apply(ListItem lhs, ListItem rhs) { assert(lhs.which() == int__); assert(rhs.which() == int__); get< int >(lhs) += get< int >(rhs); retur lhs; } }; template <> struct Operator_< plus__, double__ > { static ListItem apply(ListItem lhs, ListItem rhs) { get< double >(lhs) += cast< double__ >(rhs); retur lhs; } }; template <> struct Operator_< plus__, string__ > { static ListItem apply(ListItem lhs, ListItem rhs) { string r = cast< string__ >(rhs); string &l; = get< string >(lhs); l.insert(l.end(), r.begi (), r.end()); retur lhs; } }; template <> struct Operator_< minus__, int__ > { static ListItem apply(ListItem lhs, ListItem rhs) { assert(lhs.which() == int__); assert(rhs.which() == int__); get< int >(lhs) -= get< int >(rhs); retur lhs; } }; template <> struct Operator_< minus__, double__ > { static ListItem apply(ListItem lhs, ListItem rhs) { get< double >(lhs) -= get< double >(rhs); retur lhs; } }; template <> struct Operator_< minus__, string__ > { static ListItem apply(ListItem lhs, ListItem rhs) { using boost::algorithm::erase_first; n string r = cast< string__ >(rhs); string &l; = get< string >(lhs); erase_first(l, r); retur lhs; } }; template <> struct Operator_< divide__, int__ > { static ListItem apply(ListItem lhs, ListItem rhs) { assert(lhs.which() == int__); assert(rhs.which() == int__); get< int >(lhs) /= get< int >(rhs); retur lhs; } }; template <> struct Operator_< divide__, double__ > { static ListItem apply(ListItem lhs, ListItem rhs) { get< double >(lhs) /= cast< double__ >(rhs); retur lhs; } }; template <> struct Operator_< multiply__, int__ > { static ListItem apply(ListItem lhs, ListItem rhs) { assert(lhs.which() == int__); assert(rhs.which() == int__); get< int >(lhs) *= get< int >(rhs); retur lhs; } }; template <> struct Operator_< multiply__, double__ > { static ListItem apply(ListItem lhs, ListItem rhs) { get< double >(lhs) *= cast< double__ >(rhs); retur lhs; } }; struct Accumulator { virtual ~Accumulator() {} n virtual Accumulator* create(ListItem &result;) const = 0; n const Accumulator& operator()(const ListItem &list;_item) const { call_(list_item); retur *this; } n ListItem operator*() const { retur *result_; } protected : virtual void call_(const ListItem &list;_item) const = 0; n Accumulator() : result_(0) , first_(true) { /* no-op */ } n n Accumulator(ListItem &result;) : result_(&result;) , first_(true) { /* no-op */ } n ListItem *result_; mutable bool first_; }; template < Operator operator_type__, ListItemType return_type__ > struct Accumulator_ : Accumulator { private : Accumulator_() { /* no-op */ } n Accumulator_(ListItem &result;) : Accumulator(result) { /* no-op */ } n Accumulator_* create(ListItem &result;) const { retur ew Accumulator_(result); } protected : /*virtual */void call_(const ListItem &list;_item) const/* = 0*/ { if (first_) { switch (result_->which()) { case int__ : *result_ = cast< int__ >(list_item); break; case double__ : *result_ = cast< double__ >(list_item); break; case string__ : *result_ = cast< string__ >(list_item); break; } first_ = false; } else { *result_ = Operator_< operator_type__, return_type__ >::apply(*result_, list_item); } } n friend boost::shared_ptr< Accumulator > getAccumulator(ListItem &result;, Operator operator_type, ListItemType return_type); }; nboost::shared_ptr< Accumulator > getAccumulator(ListItem &result;, Operator operator_type, ListItemType return_type) { static Accumulator_< plus__, int__ > pi_accumulator__; static Accumulator_< plus__, double__ > pd_accumulator__; static Accumulator_< plus__, string__ > ps_accumulator__; static Accumulator_< minus__, int__ > mi_accumulator__; static Accumulator_< minus__, double__ > md_accumulator__; static Accumulator_< minus__, string__ > ms_accumulator__; static Accumulator_< multiply__, int__ > ui_accumulator__; static Accumulator_< multiply__, double__ > ud_accumulator__; static Accumulator_< divide__, int__ > di_accumulator__; static Accumulator_< divide__, double__ > dd_accumulator__; static Accumulator* accumulators__[operator_count__][list_item_type_count__] = { { π_accumulator__, &pd;_accumulator__, &ps;_accumulator__, 0 },n { &mi;_accumulator__, &md;_accumulator__, &ms;_accumulator__, 0 },n { &ui;_accumulator__, &ud;_accumulator__, 0, 0 },n { &di;_accumulator__, ⅆ_accumulator__, 0, 0 } }; n if (accumulators__[operator_type][return_type] != 0) { retur boost::shared_ptr< Accumulator >(accumulators__[operator_type][return_type]->create(result)); } else { if (operator_type == multiply__) { throw logic_error("Don't know how to multiply a string"); } else { throw logic_error("Don't know how to divide a string"); } } } void ping() { cout << "ping" << endl; } template < typename Iterator > struct Grammar : qi::grammar< Iterator, Expressio (), ascii::space_type > { Grammar() : Grammar::base_type(expression_) { using qi::_val; using qi::_1; using phoenix::push_back; using qi::lexeme; n expression_ = '(' >> operator_ >> list_ >> ')' ; operator_.add ("+", plus__) ("-", minus__) ("*", multiply__) ("/", divide__) ; list_ = +list_item_n ; list_item_ = expression_n | qi::int_ | qi::double_ | string_n ; string_ = lexeme['"' >> *(unescape_char_ | ("\\\x" >> qi::hex) | (qi::char_ - qi::char_('"'))) >> '"'] ; unescape_char_.add ("\\\a", '\a')("\\\b", '\b')("\\\f", '\f')("\\\ ", ' ') ("\\\r", '\r')("\\\t", '\t')("\\\v", '\v')("\\\\\\\", '\\\') ("\\\\\'", '\'')("\\\\\"", '\"'); n } n qi::rule< Iterator, Expressio (), ascii::space_type > expression_; qi::rule< Iterator, List(), ascii::space_type > list_; qi::rule< Iterator, ListItem(), ascii::space_type > list_item_; qi::rule< Iterator, string(), ascii::space_type > string_; qi::symbols< char const, char const > unescape_char_; qi::symbols< char const, Operator > operator_; }; nListItem evaluate(Expression &expressio; ) { ListItemType result_type(int__); ListItem retval; // first pass: evaluate any sub-expressions for (List::iterator iter(expression.list_.begi ()); iter != expression.list_.end(); ++iter) { if (iter->which() == expression__) { *iter = evaluate(get< Expression >(*iter)); } switch (iter->which()) { case int__ : // this is the default type - it doesn't change anything break; case double__ : if (result_type == int__) { result_type = double__; } else { /* either already a double, or it's a string */ } break; case string__ : result_type = string__; break; default : throw logic_error("unexpected operand type"); } } switch (result_type) { case int__ : // nothing to do in this case: this is the default for the variant, and it will be zero-initialized break; case double__ : retval = 0.0; break; case string__ : retval = string(); break; default : throw logic_error("Unexpected result type"); } boost::shared_ptr< Accumulator > accumulator = getAccumulator(retval, expression.operator_, result_type); for (List::const_iterator iter(expression.list_.begi ()); iter != expression.list_.end(); ++iter) { (*accumulator)(*iter); } n retur retval; } n int mai () { using boost::spirit::ascii::space; n Expression expressio ; Grammar< string::const_iterator > grammar; string test("(+ 1 (+ 1 1.2))"); string::const_iterator iter = test.begi (); string::const_iterator end = test.end(); if (qi::phrase_parse(iter, end, grammar, space, expressio )) { assert(expression.operator_ == plus__); ListItem result(evaluate(expressio )); assert(result.which() == double__); assert(get< double >(result) == 3.2); } else { assert(!"parse failed"); } test = "(* 1 2)"; iter = test.begi (); end = test.end(); expression.list_.clear(); if (qi::phrase_parse(iter, end, grammar, space, expressio )) { assert(expression.operator_ == multiply__); ListItem result(evaluate(expressio )); assert(result.which() == int__); assert(get< int >(result) == 2); } else { assert(!"parse failed"); } test = "(+ \"Hello, \" \"world!\")"; iter = test.begi (); end = test.end(); expression.list_.clear(); if (qi::phrase_parse(iter, end, grammar, space, expressio )) { assert(expression.operator_ == plus__); ListItem result(evaluate(expressio )); assert(result.which() == string__); assert(get< string >(result) == "Hello, world!"); } else { assert(!"parse failed"); } test = "(+ \"Goodbye\" (- \"Hello, world!\" \"Hello\"))"; iter = test.begi (); end = test.end(); expression.list_.clear(); if (qi::phrase_parse(iter, end, grammar, space, expressio )) { assert(expression.operator_ == plus__); ListItem result(evaluate(expressio )); assert(result.which() == string__); assert(get< string >(result) == "Goodbye, world!"); } else { assert(!"parse failed"); } } ``` + +n \ No newline at end of file diff --git a/_posts/2011-08-31-chausette-starting-to-echo.md b/_posts/2011-08-31-chausette-starting-to-echo.md new file mode 100644 index 0000000..470ac67 --- /dev/null +++ b/_posts/2011-08-31-chausette-starting-to-echo.md @@ -0,0 +1,89 @@ +--- +layout: post +title: "Chausette: Starting to echo" +date: 2011-08-31 20:43:27 +categories: blog +--- +n n n n n n n n n The last time we looked at the code for Chausette, before we went on a tangent about functional programming, we were working on a bit of example code that could accept a TCP co ection and output to the console whatever it received. That was [episode 28: “Event-driven software, step 1: select”](http://rlc.vlinder.ca/blog/2010/12/event-driven-software-step-1-select/ "Event-driven software, step 1: select"). This time, we will build onto that code and start by sending data back over the co ection. + +We will go through three steps in this episode: + + 1. adding anonymous (but named) attributes to the `Socket` class + 2. echoing incoming data, using a `vector` as the buffer; and + 3. echoing more data than the socket can handle in a single write + +## 1- Adding anonymous (but named) attributes to the `Socket` class + +We will want to be able to associate data — the data we’ve received — with the socket we’ve received it on, _as_ the data we want to send. To do that, we will create an `Attributes` class which will allow for a simple integer-to-attribute mapping, in which the attribute can have any type — modeled after the attributes you can associate with I/O streams. + +In fact, the standard library’s I/O streams contain three functions of interest: `xalloc`, `iword` and `pword`. These three functions allow you to allocate an integer (using `xalloc`) for later use as a key to an anonymous attribute of any stream. The other two functions allow for access to that attribute as an integer or as a pointer. + +These attributes are very useful in all kinds of situations, but they are not type-safe. Our implementation will be. + +Let’s first have a look at our new base class – which is where all the magic will happen: + +```cpp #ifndef vlinder_chausette_core_attributes_h #define vlinder_chausette_core_attributes_h #include "Details/prologue.h" #include amespace Vlinder { amespace Chausette { amespace Core { class VLINDER_CHAUSETTE_CORE_API Attributesn { public : static unsigned int alloc(); n boost::any& get(unsigned int index); const boost::any& get(unsigned int index) const; n private : static const unsigned int id_max__ = 48; n boost::any attributes_[id_max__]; static unsigned int next_id__; }; }}} #endif ``` + +This class, which is part of [our first commit for this installment](https://gitorious.org/chausette/chausette/commit/c218d63aa002c1a772ab810c62beed003c8562be "the commit on Gitorious"), contains an array of `boost::any` instances to hold our attributes. The `boost::any` class is a type-safe single-entry container. More exactly, it is a type-safe variant type without implicit conversion that can contain any (static) type variable and is based on and article called “Valued Conversions” by Kevlin He ey which appeared in C++ Report in 2000[1](http://rlc.vlinder.ca/blog/2011/08/chausette-starting-to-echo/#footnote_0_1650 "“Valued Conversions” by Kevlin He + +ey, C++ Report 12\(7\), July/August 2000\)"). We will use it to provide type-safe anonymous attributes to the `Socket` class. + +In order to do that, `Socket` will _privately_ derive from `Attributes`: + +```cpp struct Socket : private Vlinder::Chausette::Core::Attributes { Socket(int fd, int parent_fd = -1) : fd_(fd) , parent_fd_(parent_fd) , read_avail_(false) , write_avail_(false) , exc_avail_(false) { /* no-op */ } n using Vlinder::Chausette::Core::Attributes::alloc; using Vlinder::Chausette::Core::Attributes::get; n int fd_; int parent_fd_; bool read_avail_; bool write_avail_; bool exc_avail_; }; ``` + +The reason why it derives _privately_ is because, while we want it to inherit the features provided by the `Attributes` class, we don’t want the relationship between the two classes to be modeled as an _is-a_ relationship — we don’t want to say “a socket is a container of attributes”. + +Making the inheritance private prevents the user of our `Socket` class to automatically cast an instance of `Socket` to an instance of `Attributes` and prevents `static_cast` and `dynamic_cast` from converting between the two. It also makes the inherited accessors private, but they are made public again with the two `using` statements + +Using these attributes, we can associate a buffer of data to send with the socket. Doing that, we can add data to be sent to the socket at any time, and send it as soon as data is ready to be written to the socket. + +## 2- Echoing incoming data, using a `vector` as the buffer + +Let’s see what that looks like. + +First, we need to get an attribute ID from the `Attributes` class, allocating it for future use. We only need one of those for the whole program per kind of data we want to associate with the socket, but we shouldn’t assume that we will always be the first (and only) piece of code to want to associate something with a socket. We can assume, however, that there will only be one instance of the `Applicatio ` class, so we can handle the association in the constructor of that class. + +```cpp Applicatio ::Applicatio () : server_(0) , socket_attribute_id_(Socket::alloc()) { WSADATA wsadata; WSAStartup(MAKEWORD(2, 2), &wsadata;); } ``` + +The allocation is on line 3. + +Now, in our particular case, we want to associate the data we receive with the socket, because that is the data we will be sending. We do that by getting the attribute, seeing if there’s anything there and associating an empty buffer with the socket if there isn’t, like so: + +```cpp if (socket.get(socket_attribute_id_).empty()) { socket.get(socket_attribute_id_) = vector< char >(1024); } else { /* already have a buffer */ } ``` + +Note that on line 3 of this snippet, we initialize the vector to have a size of 1 KiB — 1024 bytes. That’s because we will use this buffer as a receive buffer as well as as a send buffer, and in the code immediately following, we will use it as such: + +```cpp vector< char > &buffer; = any_cast< vector< char >& >(socket.get(socket_attribute_id_)); buffer.resize(buffer.capacity()); unsigned int data_read(buffer.size()); server_->read(socket, &buffer;[0], &data;_read); ``` + +Note that in line 7, we get a _reference_ to the vector so any changes we make the the vector are made directly to the attribute and we don’t need to make any copies. On line 8, we resize the buffer to its full capacity (meaning that if at any point we made it larger than the one KiB we initialized it to, we will have all of that space available) and we then call `read` on the server. + +As you can probably tell (or perhaps remember), `read` returns the number of bytes actually read in the parameter we give it, so we now have to resize the buffer back to the size of the data we actually got and, because we want to echo, we will send it back: + +```cpp buffer.resize(data_read); server_->write(socket, &buffer;[0], &data;_read); buffer.erase(buffer.begi (), buffer.begi () + data_read); ``` + +Note that on line 13, we erase anything we sent from the buffer – but we don’t actually necessarily think everything was sent — and it is not an error if that was not the case. That’s because the implementation will call our `onWriteReady` method when we can send more data, which looks like this: + +```cpp /*virtual */void Applicatio ::onWriteReady(Socket &socket;) { if (socket.get(socket_attribute_id_).empty()) { /* no-op */ } else { vector< char > &buffer; = any_cast< vector< char >& >(socket.get(socket_attribute_id_)); if (buffer.empty()) { /* no-op */ } else { unsigned int data_writte (buffer.size()); server_->write(socket, &buffer;[0], &data;_writte ); buffer.erase(buffer.begi (), buffer.begi () + data_writte ); } } } ``` + +As you can see, this method doesn’t do anything if there’s no buffer associated with the socket, or if the associated buffer is empty. Otherwise, it attempts to send anything it can and erases what it could send from the buffer. + +## 3- Echoing more data than the socket can handle in a single write + +There’s very little chance of that code ever being called if all we do is echo, however, so we could do a bit more than that – say echo everything four times. That means our `onDataReady` function will now look like this: + +```cpp /*virtual */void Applicatio ::onDataReady(Socket &socket;) { if (socket.get(socket_attribute_id_).empty()) { socket.get(socket_attribute_id_) = vector< char >(1024); } else { /* already have a buffer */ } vector< char > &buffer; = any_cast< vector< char >& >(socket.get(socket_attribute_id_)); buffer.resize(buffer.capacity()); unsigned int data_read(buffer.size()); server_->read(socket, &buffer;[0], &data;_read); buffer.resize(data_read * 4); std::copy(buffer.begi (), buffer.begi () + data_read, buffer.begi () + data_read); std::copy(buffer.begi (), buffer.begi () + (data_read * 2), buffer.begi () + (data_read * 2)); server_->write(socket, &buffer;[0], &data;_read); buffer.erase(buffer.begi (), buffer.begi () + data_read); } ``` + +Notice the difference? It’s on lines 13, 14 and 15, where we resize the buffer to four times the amount of data we received and copy the data into the expanded part of the buffer. + +Now, there’s a subtle – but present — bug in this code (and unlike the one I can’t seem to find anymore in the `Yard` class from a few months back, I’ll tell you where it is. + +Imagine you’re sending and receiving data more or less at the same time. Look at what happens on lines 10, 11 and 12 of `onDataReady` and try to figure out what the bug is — and how to fix it. + +To see the _answer_ click here.To hide the _answer_ click here. + +What will happen is that the buffer will be overwritten with whatever you happen to receive, meaning some of the data you were supposed to echo will not be echoed. One way to fix that is to use a separate buffer for receiving. Another way to fix that is to automatically append to the buffer you’re receiving in by initially resizing the buffer to – e.g. – 1 KiB beyond the data that’s already in there. The latter solution is slightly less costly w.r.t run-time (but may ultimately be more costly in space overhead) and looks like this: + +```cpp /*virtual */void Applicatio ::onDataReady(Socket &socket;) { bool needed_to_initialize(false); std::vector< char >::size_type offset(0); if (socket.get(socket_attribute_id_).empty()) { socket.get(socket_attribute_id_) = vector< char >(1024); needed_to_initialize = true; } else { /* already have a buffer */ } vector< char > &buffer; = any_cast< vector< char >& >(socket.get(socket_attribute_id_)); if (! eeded_to_initialize && buffer.empty()) { buffer.resize(buffer.capacity()); } else if (! eeded_to_initialize) { offset = buffer.size(); buffer.resize(offset + 1024); } else { /* needed to initialize - so no need to account for data already in the buffer */ } unsigned int data_read(buffer.size() - offset); char *read_ptr(&buffer;[0]); read_ptr += offset; server_->read(socket, read_ptr, &data;_read); buffer.resize(offset + (data_read * 4)); std::copy(buffer.begi () + offset, buffer.begi () + offset + data_read, buffer.begi () + offset + data_read); std::copy(buffer.begi () + offset, buffer.begi () + offset + (data_read * 2), buffer.begi () + offset + (data_read * 2)); unsigned int data_writte (buffer.size()); server_->write(socket, &buffer;[0], &data;_writte ); buffer.erase(buffer.begi (), buffer.begi () + data_writte ); } ``` + + 1. [“Valued Conversions” by Kevlin He ey, C++ Report 12(7), July/August 2000)](http://www.two-sdg.demon.co.uk/curbralan/papers/ValuedConversions.pdf "The PDF") \ No newline at end of file diff --git a/_posts/2011-09-25-chausette-starting-to-proxy.md b/_posts/2011-09-25-chausette-starting-to-proxy.md new file mode 100644 index 0000000..11ce685 --- /dev/null +++ b/_posts/2011-09-25-chausette-starting-to-proxy.md @@ -0,0 +1,37 @@ +--- +layout: post +title: "Chausette: Starting to proxy" +date: 2011-09-25 17:01:51 +categories: blog +--- +n n n n n n n n n In [the previous installment](http://rlc.vlinder.ca/blog/2011/08/chausette-starting-to-echo/ "Chausette: Starting to echo"), we started to echo the data we received back to where it came from. That’s all fine and dandy, but it isn’t really all that interesting. In this installment, we will set up a pair of co ections and proxy between the two – which is the core of what a proxy server should do. + +One of the first things we will need to do is to build upon what we did in the [previous installment](http://rlc.vlinder.ca/blog/2011/08/chausette-starting-to-echo/ "Chausette: Starting to echo") and add another attribute to our sockets. As you can see in the following snippet, that is really easy to do: first we rename the attribute we already have + +```diff diff --git a/bin/Episode28/Application.cpp b/bin/Episode28/Application.cppnindex eefdc44..bc67f24 100644 \--- a/bin/Episode28/Application.cpp +++ b/bin/Episode28/Application.cpp @@ -11,7 +11,7 @@ using namespace boost; n Application::Applicatio () : server_(0) -, socket_attribute_id_(Socket::alloc()) +, data_to_send_attribute_id_(Socket::alloc()) { WSADATA wsadata;n WSAStartup(MAKEWORD(2, 2), &wsadata;); @@ -66,14 +66,14 @@ void Application::ru (const Application::Arguments &arguments;) { bool needed_to_initialize(false);n std::vector< char >::size_type offset(0); \- if (socket.get(socket_attribute_id_).empty()) \+ if (socket.get(data_to_send_attribute_id_).empty()) { \- socket.get(socket_attribute_id_) = vector< char >(1024); \+ socket.get(data_to_send_attribute_id_) = vector< char >(1024); needed_to_initialize = true;n } elsen { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old already have a buffer */ } \- vector< char > &buffer; = any_cast< vector< char >& >(socket.get(socket_attribute_id_)); \+ vector< char > &buffer; = any_cast< vector< char >& >(socket.get(data_to_send_attribute_id_)); if (!needed_to_initialize && buffer.empty()) { buffer.resize(buffer.capacity()); @@ -99,11 +99,11 @@ void Application::ru (const Application::Arguments &arguments;) n /*virtual */void Application::onWriteReady(Socket &socket;) { \- if (socket.get(socket_attribute_id_).empty()) \+ if (socket.get(data_to_send_attribute_id_).empty()) { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old no-op */ } elsen { \- vector< char > &buffer; = any_cast< vector< char >& >(socket.get(socket_attribute_id_)); \+ vector< char > &buffer; = any_cast< vector< char >& >(socket.get(data_to_send_attribute_id_)); if (buffer.empty()) { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old no-op */ } else ``` + +then we add the new attribute, which will hold the socket address of the “remote” or “paired” socket. + +```diff diff --git a/bin/Episode28/Application.cpp b/bin/Episode28/Application.cppnindex bc67f24..3f3de16 100644 \--- a/bin/Episode28/Application.cpp +++ b/bin/Episode28/Application.cpp @@ -59,7 +59,8 @@ void Application::ru (const Application::Arguments &arguments;) n /*virtual */void Application::onNewCo ectio (Socket &socket;) { \- server_->accept(socket); \+ Socket &new;_socket(server_->accept(socket)); \+ remote_address_to_socket_.insert(RemoteAddressToSocket::value_type( ew_socket.remote_address_, &new;_socket)); } n /*virtual */void Application::onDataReady(Socket &socket;) @@ -118,3 +119,11 @@ void Application::ru (const Application::Arguments &arguments;) /*virtual */void Application::onExceptionalDataReady(Socket &socket;) { } \+ +/*virtual */void Application::onCloseSocket(Socket &socket;) +{ \+ RemoteAddressToSocket::iterator where(remote_address_to_socket_.find(socket.remote_address_)); \+ assert(where != remote_address_to_socket_.end()); \+ assert(where->second == &socket;); \+ remote_address_to_socket_.erase(where); +} diff --git a/bin/Episode28/Application.h b/bin/Episode28/Application.hnindex 30b9400..e151853 100644 \--- a/bin/Episode28/Application.h +++ b/bin/Episode28/Application.h @@ -2,7 +2,9 @@ #define chausette_episode28_application_hn n #include +#include #include +#include #include "Observer.h"n n class Server; @@ -17,6 +19,15 @@ public : void ru (const Arguments &arguments;);n n private : \+ struct SockAddrStorageCompare \+ { \+ bool operator()(const sockaddr_storage &lhs;, const sockaddr_storage &rhs;) const \+ { \+ return memcmp(&lhs;, &rhs;, sizeof(lhs)) < 0; \+ } \+ }; \+ typedef std::map< sockaddr_storage, Socket*, SockAddrStorageCompare > RemoteAddressToSocket; \+ Applicatio (const Application&);n Application& operator=(const Application&);n @@ -24,10 +35,12 @@ private : virtual void onDataReady(Socket &socket;);n virtual void onWriteReady(Socket &socket;);n virtual void onExceptionalDataReady(Socket &socket;); \+ virtual void onCloseSocket(Socket &socket;); n bool done_;n Server *server_; \- unsigned int socket_attribute_id_; \+ unsigned int data_to_send_attribute_id_; \+ RemoteAddressToSocket remote_address_to_socket_; };n n #endifndiff --git a/bin/Episode28/Observer.h b/bin/Episode28/Observer.hnindex a8d6125..f0d4c4a 100644 \--- a/bin/Episode28/Observer.h +++ b/bin/Episode28/Observer.h @@ -12,6 +12,7 @@ public : virtual void onDataReady(Socket &socket;) = 0;n virtual void onWriteReady(Socket &socket;) = 0;n virtual void onExceptionalDataReady(Socket &socket;) = 0; \+ virtual void onCloseSocket(Socket &socket;) = 0; n private :n Observer(const Observer&);ndiff --git a/bin/Episode28/Server.cpp b/bin/Episode28/Server.cppnindex 7aa52e8..5871b18 100644 \--- a/bin/Episode28/Server.cpp +++ b/bin/Episode28/Server.cpp @@ -174,16 +174,18 @@ void Server::detach(Observer *observer) { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old not found, not a problem */ } } -void Server::accept(Socket &socket;) +Socket& Server::accept(Socket &socket;) { Socket new_socket(accept_(socket));n sockets_.push_back( ew_socket); \+ return sockets_.back(); } n void Server::reject(Socket &socket;) { Socket new_socket(accept_(socket));n closesocket( ew_socket.fd_); \+ // don't need to notify in this case - the observer has never seen the new socket } n void Server::read(Socket &socket;, char *buffer, unsigned int *buffer_size) @@ -194,6 +196,10 @@ void Server::read(Socket &socket;, char *buffer, unsigned int *buffer_size) if (recv_result == 0) // EOFn { closesocket(socket.fd_); \+ for (Observers::iterator observer(observers_.begi ()); observer != observers_.end(); ++observer) \+ { \+ (*observer)->onCloseSocket(socket); \+ } socket.fd_ = -1;n } else if (recv_result < 0) @@ -235,8 +241,11 @@ void Server::write(Socket &socket;, const char *buffer, unsigned int *buffer_size n Socket Server::accept_(Socket &socket;) { \- Socket new_socket(::accept(socket.fd_, 0, 0), socket.fd_); \+ sockaddr_storage remote_address; \+ int remote_address_size(sizeof(remote_address)); \+ Socket new_socket(::accept(socket.fd_, (sockaddr*)&remote;_address, &remote;_address_size), socket.fd_); socket.read_avail_ = false; \+ new_socket.remote_address_ = remote_address; return new_socket;n } ndiff --git a/bin/Episode28/Server.h b/bin/Episode28/Server.hnindex 361a3d3..f8d1131 100644 \--- a/bin/Episode28/Server.h +++ b/bin/Episode28/Server.h @@ -18,7 +18,7 @@ public : void attach(Observer *observer);n void detach(Observer *observer);n \- void accept(Socket &socket;); \+ Socket& accept(Socket &socket;); void reject(Socket &socket;);n n void read(Socket &socket;, char *buffer, unsigned int *buffer_size);ndiff --git a/bin/Episode28/Socket.h b/bin/Episode28/Socket.hnindex 26082ae..1fe451d 100644 \--- a/bin/Episode28/Socket.h +++ b/bin/Episode28/Socket.h @@ -11,16 +11,19 @@ struct Socket : private Vlinder::Chausette::Core::Attributes , read_avail_(false) , write_avail_(false) , exc_avail_(false) \- { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old no-op */ } \+ { \+ memset(&remote;_address_, 0, sizeof(remote_address_)); \+ } \- using Vlinder::Chausette::Core::Attributes::alloc; \- using Vlinder::Chausette::Core::Attributes::get; \+ using Vlinder::Chausette::Core::Attributes::alloc; \+ using Vlinder::Chausette::Core::Attributes::get; n int fd_;n int parent_fd_;n bool read_avail_;n bool write_avail_;n bool exc_avail_; \+ sockaddr_storage remote_address_; };n n #endif ``` + +Note that we now need to know when a socket is closed, to we can do the appropriate clean-up. We also need to know the remote addresses of incoming co ections, which we store in the new `remote_address_` member and `accept` now returns a reference to the accepted socket, so we can keep that reference and associate it with another socket later on. The following snippet of code will finish the deal: + +```diff diff --git a/bin/Episode28/Application.cpp b/bin/Episode28/Application.cppnindex 3f3de16..70ba6e4 100644 \--- a/bin/Episode28/Application.cpp +++ b/bin/Episode28/Application.cpp @@ -12,6 +12,8 @@ using namespace boost; Application::Applicatio () : server_(0) , data_to_send_attribute_id_(Socket::alloc()) +, target_address_attribute_id_(Socket::alloc()) +, un_paired_socket_(0) { WSADATA wsadata;n WSAStartup(MAKEWORD(2, 2), &wsadata;); @@ -61,20 +63,29 @@ void Application::ru (const Application::Arguments &arguments;) { Socket &new;_socket(server_->accept(socket));n remote_address_to_socket_.insert(RemoteAddressToSocket::value_type( ew_socket.remote_address_, &new;_socket)); \+ pairSocket( ew_socket); } n /*virtual */void Application::onDataReady(Socket &socket;) { \+ vector< char > temp; // in case the socket is un-paired bool needed_to_initialize(false); \- std::vector< char >::size_type offset(0); \- if (socket.get(data_to_send_attribute_id_).empty()) \+ vector< char >::size_type offset(0); \+ Socket *partner((&socket; == un_paired_socket_) ? 0 : remote_address_to_socket_[any_cast< sockaddr_storage >(socket.get(target_address_attribute_id_))]); \+ if (partner && \+ partner->get(data_to_send_attribute_id_).empty()) { \- socket.get(data_to_send_attribute_id_) = vector< char >(1024); \+ partner->get(data_to_send_attribute_id_) = vector< char >(1024); needed_to_initialize = true;n } \- else \+ else if (partner) { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old already have a buffer */ } \- vector< char > &buffer; = any_cast< vector< char >& >(socket.get(data_to_send_attribute_id_)); \+ else \+ { \+ temp.resize(1024); \+ needed_to_initialize = true; \+ } \+ vector< char > &buffer; = partner ? any_cast< vector< char >& >(partner->get(data_to_send_attribute_id_)) : temp; if (!needed_to_initialize && buffer.empty()) { buffer.resize(buffer.capacity()); @@ -82,20 +93,37 @@ void Application::ru (const Application::Arguments &arguments;) else if (!needed_to_initialize) { offset = buffer.size(); \- buffer.resize(offset + 1024); \+ if (buffer.capacity() <= offset + 1024) \+ { \+ buffer.resize(offset + 1024); \+ } \+ else \+ { \+ buffer.resize(buffer.capacity()); \+ } } elsen { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old needed to initialize - so no need to account for data already in the buffer */ } unsigned int data_read(buffer.size() - offset);n char *read_ptr(&buffer;[0]);n read_ptr += offset; \- server_->read(socket, read_ptr, &data;_read); \- buffer.resize(offset + (data_read 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml 4)); \- std::copy(buffer.begi () + offset, buffer.begi () + offset + data_read, buffer.begi () + offset + data_read); \- std::copy(buffer.begi () + offset, buffer.begi () + offset + (data_read 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml 2), buffer.begi () + offset + (data_read 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml 2)); \- unsigned int data_writte (buffer.size()); \- server_->write(socket, &buffer;[0], &data;_writte ); \- buffer.erase(buffer.begi (), buffer.begi () + data_writte ); \+ try \+ { \+ server_->read(socket, read_ptr, &data;_read); \+ buffer.resize(offset + data_read); \+ unsigned int data_writte (buffer.size()); \+ if (partner) \+ { \+ server_->write(*partner, &buffer;[0], &data;_writte ); \+ } \+ else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old no partner to send data to */ } \+ buffer.erase(buffer.begi (), buffer.begi () + data_writte ); \+ } \+ catch (const Server::NetworkError&) \+ { \+ // ignore this for now: the socket will have been dealt with but this is no reason for us to crash. \+ } } n /*virtual */void Application::onWriteReady(Socket &socket;) @@ -126,4 +154,36 @@ void Application::ru (const Application::Arguments &arguments;) assert(where != remote_address_to_socket_.end());n assert(where->second == &socket;);n remote_address_to_socket_.erase(where); \+ unpairSocket(socket); +} \+ +void Application::pairSocket(Socket &socket;) +{ \+ if (un_paired_socket_) \+ { \+ un_paired_socket_->get(target_address_attribute_id_) = socket.remote_address_; \+ socket.get(target_address_attribute_id_) = un_paired_socket_->remote_address_; \+ un_paired_socket_ = 0; \+ } \+ else \+ { \+ un_paired_socket_ = &socket; \+ } +} \+ +void Application::unpairSocket(Socket &socket;) +{ \+ // find the socket this one was paired to \+ if (un_paired_socket_ == &socket;) \+ { \+ un_paired_socket_ = 0; \+ } \+ else \+ { \+ assert(!socket.get(target_address_attribute_id_).empty()); \+ sockaddr_storage target_address(any_cast< sockaddr_storage >(socket.get(target_address_attribute_id_))); \+ Socket *other_socket(remote_address_to_socket_[target_address]); \+ other_socket->get(target_address_attribute_id_) = any(); \+ pairSocket(*other_socket); \+ } } diff --git a/bin/Episode28/Application.h b/bin/Episode28/Application.hnindex e151853..4f478da 100644 \--- a/bin/Episode28/Application.h +++ b/bin/Episode28/Application.h @@ -36,11 +36,15 @@ private : virtual void onWriteReady(Socket &socket;);n virtual void onExceptionalDataReady(Socket &socket;);n virtual void onCloseSocket(Socket &socket;); \+ void pairSocket(Socket &socket;); \+ void unpairSocket(Socket &socket;); n bool done_;n Server *server_;n unsigned int data_to_send_attribute_id_; \+ unsigned int target_address_attribute_id_; RemoteAddressToSocket remote_address_to_socket_; \+ Socket *un_paired_socket_; };n n #endifndiff --git a/bin/Episode28/Server.cpp b/bin/Episode28/Server.cppnindex 5871b18..edc60d9 100644 \--- a/bin/Episode28/Server.cpp +++ b/bin/Episode28/Server.cpp @@ -2,8 +2,11 @@ #include n #include n #include +#include #include "Observer.h"n +using namespace std; \+ Server::Server(sockaddr_storage address) : address_(address) , server_fd_(-1) @@ -15,7 +18,7 @@ Server::Server(sockaddr_storage address) } elsen { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old all is well */ } \- if (bind(server_fd_, (const sockaddr*)&address;_, sizeof(address_)) == -1) \+ if (::bind(server_fd_, (const sockaddr*)&address;_, sizeof(address_)) == -1) { throw "Something more eloquent here";n } @@ -132,23 +135,23 @@ void Server::update(unsigned int timeout/*in ms*/) int highest_fd(server_fd_);n fd_set read_fds;n FD_ZERO(&read;_fds); \- std::for_each(sockets_.begi (), sockets_.end(), Functor(read_fds, &Socket;::read_avail_, highest_fd)); \+ for_each(sockets_.begi (), sockets_.end(), Functor(read_fds, &Socket;::read_avail_, highest_fd)); fd_set write_fds;n FD_ZERO(&write;_fds); \- std::for_each(sockets_.begi (), sockets_.end(), Functor(write_fds, &Socket;::write_avail_, highest_fd)); \+ for_each(sockets_.begi (), sockets_.end(), Functor(write_fds, &Socket;::write_avail_, highest_fd)); fd_set exc_fds;n FD_ZERO(&exc;_fds); \- std::for_each(sockets_.begi (), sockets_.end(), Functor(exc_fds, &Socket;::exc_avail_, highest_fd)); \+ for_each(sockets_.begi (), sockets_.end(), Functor(exc_fds, &Socket;::exc_avail_, highest_fd)); timeval to;n to.tv_sec = timeout / 1000;n to.tv_usec = (timeout % 1000) 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_28.txt wp_post_29.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml 1000;n int select_result(select(highest_fd + 1, &read;_fds, &write;_fds, &exc;_fds, &to;));n if (select_result > 0) { \- std::for_each(sockets_.begi (), sockets_.end(), Notifier< Predicate< true >, Observers >(read_fds, &Socket;::read_avail_, observers_, &Observer;::onNewCo ectio )); \- std::for_each(sockets_.begi (), sockets_.end(), Notifier< Predicate<>, Observers >(read_fds, &Socket;::read_avail_, observers_, &Observer;::onDataReady)); \- std::for_each(sockets_.begi (), sockets_.end(), Notifier< Predicate<>, Observers >(write_fds, &Socket;::write_avail_, observers_, &Observer;::onWriteReady)); \- std::for_each(sockets_.begi (), sockets_.end(), Notifier< Predicate<>, Observers >(exc_fds, &Socket;::exc_avail_, observers_, &Observer;::onExceptionalDataReady)); \+ for_each(sockets_.begi (), sockets_.end(), Notifier< Predicate< true >, Observers >(read_fds, &Socket;::read_avail_, observers_, &Observer;::onNewCo ectio )); \+ for_each(sockets_.begi (), sockets_.end(), Notifier< Predicate<>, Observers >(read_fds, &Socket;::read_avail_, observers_, &Observer;::onDataReady)); \+ for_each(sockets_.begi (), sockets_.end(), Notifier< Predicate<>, Observers >(write_fds, &Socket;::write_avail_, observers_, &Observer;::onWriteReady)); \+ for_each(sockets_.begi (), sockets_.end(), Notifier< Predicate<>, Observers >(exc_fds, &Socket;::exc_avail_, observers_, &Observer;::onExceptionalDataReady)); } else if (select_result < 0) { @@ -165,7 +168,7 @@ void Server::attach(Observer *observer) n void Server::detach(Observer *observer) { \- Observers::iterator where(std::find(observers_.begi (), observers_.end(), observer)); \+ Observers::iterator where(find(observers_.begi (), observers_.end(), observer)); if (where != observers_.end()) { observers_.erase(where); @@ -188,6 +191,17 @@ void Server::reject(Socket &socket;) // don't need to notify in this case - the observer has never seen the new socketn } +void Server::close(Socket &socket;) +{ \+ closesocket(socket.fd_); \+ for (Observers::iterator observer(observers_.begi ()); observer != observers_.end(); ++observer) \+ { \+ (*observer)->onCloseSocket(socket); \+ } \+ socket.fd_ = -1; \+ // it will be cleaned up on the next round of select +} \+ void Server::read(Socket &socket;, char *buffer, unsigned int *buffer_size) { if (socket.read_avail_) @@ -195,16 +209,65 @@ void Server::read(Socket &socket;, char *buffer, unsigned int *buffer_size) int recv_result(::recv(socket.fd_, buffer, *buffer_size, 0));n if (recv_result == 0) // EOFn { \- closesocket(socket.fd_); \- for (Observers::iterator observer(observers_.begi ()); observer != observers_.end(); ++observer) \- { \- (*observer)->onCloseSocket(socket); \- } \- socket.fd_ = -1; \+ close(socket); } else if (recv_result < 0) { \- throw "something more eloquent here"; \+ switch (WSAGetLastError()) \+ { \+ case WSANOTINITIALISED : \+ throw logic_error("A successful WSAStartup call must occur before using this function"); \+ case WSAENETDOWN : \+ close(socket); \+ throw NetworkDow ("The network subsystem has failed"); \+ case WSAEACCES : \+ close(socket); \+ throw WrongAddressType("The requested address is a broadcast address"); \+ case WSAEINTR : \+ // all calls should be non-blocking \+ throw logic_error("A blocking Windows Sockets 1.1 call was canceled through WSACancelBlockingCall"); \+ case WSAEINPROGRESS : \+ // all calls should be non-blocking \+ throw logic_error("A blocking Windows Sockets 1.1 call is in progress, or the service provider is still processing a callback function"); \+ case WSAEFAULT : \+ throw logic_error("The buf parameter is not completely contained in a valid part of the user address space"); \+ case WSAENETRESET : \+ close(socket); \+ throw KeepAliveFailed("The co ection has been broken due to the keep-alive activity detecting a failure while the operation was in progress"); \+ case WSAENOBUFS : \+ throw bad_alloc("No buffer space is available"); \+ case WSAENOTCONN : \+ close(socket); \+ throw SocketNotCo ected("The socket is not co ected"); \+ case WSAENOTSOCK : \+ throw logic_error("The descriptor is not a socket"); \+ case WSAEOPNOTSUPP : \+ throw logic_error("MSG_OOB was specified, but the socket is not stream-style such as type SOCK_STREAM, OOB data is not supported in the communication domain associated with this socket, or the socket is unidirectional and supports only receive operations"); \+ case WSAESHUTDOWN : \+ throw logic_error("The socket has been shut down; it is not possible to send on a socket after shutdown has been invoked with how set to SD_SEND or SD_BOTH"); \+ case WSAEWOULDBLOCK : // The socket is marked as nonblocking and the requested operation would block. \+ // not really an error \+ *buffer_size = 0; \+ break; \+ case WSAEMSGSIZE : \+ throw logic_error("The socket is message oriented, and the message is larger than the maximum supported by the underlying transport"); \+ case WSAEHOSTUNREACH : \+ close(socket); \+ throw HostUnreachable("The remote host ca ot be reached from this host at this time"); \+ case WSAEINVAL : \+ throw logic_error("The socket has not been bound with bind, or an unknown flag was specified, or MSG_OOB was specified for a socket with SO_OOBINLINE enabled"); \+ case WSAECONNABORTED : \+ close(socket); \+ throw Co ectionAborted("The virtual circuit was terminated due to a time-out or other failure"); \+ case WSAECONNRESET : \+ close(socket); \+ throw Co ectionReset("The virtual circuit was reset by the remote side executing a hard or abortive close. For UDP sockets, the remote host was unable to deliver a previously sent UDP datagram and responded with a \"Port Unreachable\" ICMP packet"); \+ case WSAETIMEDOUT : \+ close(socket); \+ throw Co ectionDropped("The co ection has been dropped, because of a network failure or because the system on the other end went down without notice"); \+ default : \+ throw logic_error("Unknown error"); \+ } } elsen { diff --git a/bin/Episode28/Server.h b/bin/Episode28/Server.hnindex f8d1131..fa616e3 100644 \--- a/bin/Episode28/Server.h +++ b/bin/Episode28/Server.h @@ -5,11 +5,33 @@ #include n #include "Socket.h"n #include "config.h" +#include "exceptions/Exception.h" n class Observer;n class Servern { public : \+ enum Errors { \+ wrong_address_type__, \+ network_error__, \+ network_down__, \+ keep_alive_failed__, \+ socket_not_co ected__, \+ host_unreachable__, \+ co ection_aborted__, \+ co ection_reset__, \+ co ection_dropped__, \+ }; \+ typedef Vlinder::Exceptions::Exception< std::runtime_error, Errors, wrong_address_type__ > WrongAddressType; // user error or logic error, \+ typedef Vlinder::Exceptions::Exception< std::runtime_error, Errors, network_error__ > NetworkError; \+ typedef Vlinder::Exceptions::Exception< NetworkError, Errors, network_down__ > NetworkDown; \+ typedef Vlinder::Exceptions::Exception< NetworkError, Errors, keep_alive_failed__ > KeepAliveFailed; \+ typedef Vlinder::Exceptions::Exception< NetworkError, Errors, socket_not_co ected__ > SocketNotCo ected; \+ typedef Vlinder::Exceptions::Exception< NetworkError, Errors, host_unreachable__ > HostUnreachable; \+ typedef Vlinder::Exceptions::Exception< NetworkError, Errors, co ection_aborted__ > Co ectionAborted; \+ typedef Vlinder::Exceptions::Exception< NetworkError, Errors, co ection_reset__ > Co ectionReset; \+ typedef Vlinder::Exceptions::Exception< NetworkError, Errors, co ection_reset__ > Co ectionDropped; \+ Server(sockaddr_storage address);n ~Server();n @@ -20,6 +42,7 @@ public : n Socket& accept(Socket &socket;);n void reject(Socket &socket;); \+ void close(Socket &socket;); n void read(Socket &socket;, char *buffer, unsigned int *buffer_size);n void write(Socket &socket;, const char *buffer, unsigned int *buffer_size);ndiff --git a/lib/core/Attributes.cpp b/lib/core/Attributes.cppnindex dbfafa7..1fee546 100644 \--- a/lib/core/Attributes.cpp +++ b/lib/core/Attributes.cpp @@ -2,44 +2,44 @@ #include n #include n -namespace Vlinder { namespace Chausette { namespace Core { \- /*static */unsigned int Attributes::next_id__(0); \- \- /*static */unsigned int Attributes::alloc() \- { \- /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old note that this is not thread-safe! */ \- if ( ext_id__ == id_max__) \- { \- throw std::bad_alloc(); \- } \- else \- { \- return next_id__++; \- } \- } \- \- boost::any& Attributes::get(unsigned int index) \- { \- if (index < next_id__) \- { \- return attributes_[index]; \- } \- else \- { \- throw std::logic_error("Trying to access unallocated attribute"); \- } \- } \- \- const boost::any& Attributes::get(unsigned int index) const \- { \- if (index < next_id__) \- { \- return attributes_[index]; \- } \- else \- { \- throw std::logic_error("Trying to access unallocated attribute"); \- } \- } +namespace Vlinder { namespace Chausette { namespace Core { \+ /*static */unsigned int Attributes::next_id__(0); \+ \+ /*static */unsigned int Attributes::alloc() \+ { \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old note that this is not thread-safe! */ \+ if ( ext_id__ == id_max__) \+ { \+ throw std::bad_alloc(); \+ } \+ else \+ { \+ return next_id__++; \+ } \+ } \+ \+ boost::any& Attributes::get(unsigned int index) \+ { \+ if (index < next_id__) \+ { \+ return attributes_[index]; \+ } \+ else \+ { \+ throw std::logic_error("Trying to access unallocated attribute"); \+ } \+ } \+ \+ const boost::any& Attributes::get(unsigned int index) const \+ { \+ if (index < next_id__) \+ { \+ return attributes_[index]; \+ } \+ else \+ { \+ throw std::logic_error("Trying to access unallocated attribute"); \+ } \+ } }}} ndiff --git a/lib/core/Attributes.h b/lib/core/Attributes.hnindex 288f0f7..f15ace2 100644 \--- a/lib/core/Attributes.h +++ b/lib/core/Attributes.h @@ -1,24 +1,24 @@ #ifndef vlinder_chausette_core_attributes_hn #define vlinder_chausette_core_attributes_hn -#include "Details/prologue.h" -#include \- -namespace Vlinder { namespace Chausette { namespace Core { \- class VLINDER_CHAUSETTE_CORE_API Attributes \- { \- public : \- static unsigned int alloc(); \- \- boost::any& get(unsigned int index); \- const boost::any& get(unsigned int index) const; \- \- private : \- static const unsigned int id_max__ = 48; \- \- boost::any attributes_[id_max__]; \- static unsigned int next_id__; \- }; +#include "Details/prologue.h" +#include \+ +namespace Vlinder { namespace Chausette { namespace Core { \+ class VLINDER_CHAUSETTE_CORE_API Attributes \+ { \+ public : \+ static unsigned int alloc(); \+ \+ boost::any& get(unsigned int index); \+ const boost::any& get(unsigned int index) const; \+ \+ private : \+ static const unsigned int id_max__ = 48; \+ \+ boost::any attributes_[id_max__]; \+ static unsigned int next_id__; \+ }; }}} n #endif ``` + +Most of the code in `onDataReady` speaks for itself: if a given socket doesn’t have an associated socket, the data from the socket is read and thrown away. If it does have an associated socket, it will have the remote address of that socket in its attributes, so we can find it (and call it its “partner” in the code). We then take the associated vector from the partner socket and use it as a buffer, into which we read all of our data. `Server::write` is already nice enough to not try to write on a socket that’s not known to be ready for writing, so we can just call it with the partner socket and its buffer. + +Associating sockets with each other is done in the `pairSocket` method, which simply checks if there’s another un-paired socket. If so, the two are paired. If not, the to-be-paired socket is put on hold for the next to-be-paired socket. + +This means the proxying we do here is more or less random: sockets are paired with new sockets if they lose their current partner and another socket is waiting, if they co ect to the server and another socket is waiting to be paired, or if they were waiting for a partner and another socket loses theirs, or co ects to the server. + +The `unpairSocket` method takes care of leaving sockets and calls `pairSocket` if the leaving socket wasn’t un-paired in the first place. + +Perhaps you’ve also noticed that I’ve added a lot of error handling code – treating all of the possible error codes. That’s because we now need to handle those errors correctly, as we want to keep the server alive if a client disco ects. The code isn’t perfect yet, of course, but it may be worth taking a look at, to get a feel of how error handling will work in this setting. + +## Specializing the map + +The standard map can be specialized to better suit our needs. In this case, I’ve added a comparator to the map, to allow us to compare instances of `sockaddr_storage` and use it as a key in the map. Note that, because this is a template specialization of the map class, the comparator’s type is now a part of the map’s type. This means it’s also part of any of the nested types of the map, such as its iterator types. + +Without this specialization, it would not have been possible to use `sockaddr_storage` as a key in the map, because there’s no less-than operator for the `sockaddr_storage` type. + +n \ No newline at end of file diff --git a/_posts/2011-12-01-sleep.md b/_posts/2011-12-01-sleep.md new file mode 100644 index 0000000..d71f8f0 --- /dev/null +++ b/_posts/2011-12-01-sleep.md @@ -0,0 +1,9 @@ +--- +layout: post +title: "Sleep(…)" +date: 2011-12-01 15:11:11 +categories: blog +--- +For those of you waiting for the next installment of “C++ for the self-taught”: I’m on parental leave at the moment. The podcast (and the rest of the blog) will be back in a few weeks. + +n \ No newline at end of file diff --git a/_posts/2011-12-29-setting-up-a-new-skeleton-re-factoring.md b/_posts/2011-12-29-setting-up-a-new-skeleton-re-factoring.md new file mode 100644 index 0000000..29843ed --- /dev/null +++ b/_posts/2011-12-29-setting-up-a-new-skeleton-re-factoring.md @@ -0,0 +1,106 @@ +--- +layout: post +title: "Setting up a new skeleton: re-factoring" +date: 2011-12-29 21:36:18 +categories: blog +--- +n n n n n n n n n Before we go much further with our SOCKS server, we should do a bit of cleaning up in the project: we’ll move the `Server` and `Observer` classes to their own library, so we can more easily re-use them, and we’ll copy the `Applicatio ` class over to our new project — the one that will become our next step towards a fully functional SOCKS server: _Episode35_. + +Most of the dreary details are clearly visible in the diff of the [main commit](https://gitorious.org/chausette/chausette/commit/209bb84/diffs "the commit diffs") but a few interesting details show up when we compare the two `Applicatio ` classes: + +To see the _code_ click here.To hide the _code_ click here. + +```diff \--- bin/Episode28/Application.cpp 2011-09-26 19:58:52.938883900 -0400 +++ bin/Episode35/Application.cpp 2011-10-19 21:28:29.080693800 -0400 @@ -3,17 +3,51 @@ #include n #include n #include +#include +#include #include "server/Server.h"n #include "config.h" +#include "rfc1928/types.h" n using namespace std;n using namespace boost;n +struct Application::FDGuard +{ \+ FDGuard(int fd) \+ : fd_(fd) \+ , dismissed_(false) \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old no-op */ } \+ \+ ~FDGuard() \+ { \+ if (!dismissed_) \+ { \+ closesocket(fd_); \+ } \+ else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old dismissed */ } \+ } \+ \+ void dismiss() \+ { \+ dismissed_ = true; \+ } \+ +private : \+ FDGuard(const FDGuard&); \+ FDGuard& operator=(const FDGuard&); \+ \+ int fd_; \+ bool dismissed_; +}; \+ Application::Applicatio () : server_(0) -, data_to_send_attribute_id_(Socket::alloc()) -, target_address_attribute_id_(Socket::alloc()) -, un_paired_socket_(0) +, socket_state_attribute_id_(Socket::alloc()) +, receive_buffer_attribute_id_(Socket::alloc()) +, send_buffer_attribute_id_(Socket::alloc()) +, socks_reply_attribute_id_(Socket::alloc()) { WSADATA wsadata;n WSAStartup(MAKEWORD(2, 2), &wsadata;); @@ -29,8 +63,8 @@ void Application::ru (const Application: // for now, expect our own path in arguments[0], the IP address to n // listen on in arguments[1] and the port in arguments[2] assert(arguments.size() >= 1); \- string ip(arguments.size() > 1 ? arguments[1] : CHAUSETTE_EPISODE28_DEFAULT_IP); \- unsigned short port(arguments.size() > 2 ? boost::lexical_cast< unsigned short >(arguments[2]) : CHAUSETTE_EPISODE28_DEFAULT_PORT); \+ string ip(arguments.size() > 1 ? arguments[1] : CHAUSETTE_EPISODE35_DEFAULT_IP); \+ unsigned short port(arguments.size() > 2 ? boost::lexical_cast< unsigned short >(arguments[2]) : CHAUSETTE_EPISODE35_DEFAULT_PORT); sockaddr_storage address;n memset(&address;, 0, sizeof(address));n sockaddr_in ∈_address = reinterpret_cast< sockaddr_in& >(address); @@ -62,128 +96,405 @@ void Application::ru (const Application: /*virtual */void Application::onNewCo ectio (Socket &socket;) { Socket &new;_socket(server_->accept(socket)); \- remote_address_to_socket_.insert(RemoteAddressToSocket::value_type( ew_socket.remote_address_, &new;_socket)); \- pairSocket( ew_socket); \+ new_socket.get(socket_state_attribute_id_) = expect_authentication_method_request__; } n /*virtual */void Application::onDataReady(Socket &socket;) { \- vector< char > temp; // in case the socket is un-paired \- bool needed_to_initialize(false); \- vector< char >::size_type offset(0); \- Socket *partner((&socket; == un_paired_socket_) ? 0 : remote_address_to_socket_[any_cast< sockaddr_storage >(socket.get(target_address_attribute_id_))]); \- if (partner && \- partner->get(data_to_send_attribute_id_).empty()) \+ if (socket.get(receive_buffer_attribute_id_).empty()) { \- partner->get(data_to_send_attribute_id_) = vector< char >(1024); \- needed_to_initialize = true; \+ socket.get(receive_buffer_attribute_id_) = Buffer(default_buffer_size__); } \- else if (partner) \- { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old already have a buffer */ } else \- { \- temp.resize(1024); \- needed_to_initialize = true; \- } \- vector< char > &buffer; = partner ? any_cast< vector< char >& >(partner->get(data_to_send_attribute_id_)) : temp; \- if (!needed_to_initialize && buffer.empty()) \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old already have a receive buffer */ } \+ Buffer &buffer;(any_cast< Buffer& >(socket.get(receive_buffer_attribute_id_))); \+ Buffer::size_type offset(0); \+ if (buffer.empty()) { buffer.resize(buffer.capacity());n } \- else if (!needed_to_initialize) \+ else { offset = buffer.size(); \- if (buffer.capacity() <= offset + 1024) \+ if (buffer.capacity() - offset < minimal_available_buffer_size__) { \- buffer.resize(offset + 1024); \+ buffer.resize(offset + minimal_available_buffer_size__); } elsen { buffer.resize(buffer.capacity());n } } \+ Buffer::size_type avail(buffer.size() - offset); \+ Buffer::pointer recv_ptr(&(buffer[offset])); \+ server_->read(socket, recv_ptr, &avail;); \+ buffer.resize(offset + avail); \+ // here, according to the state of the socket, dispatch the data \+ if (socket.get(socket_state_attribute_id_).empty()) \+ { \+ socket.get(socket_state_attribute_id_) = expect_authentication_method_request__; \+ } else \- { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old needed to initialize - so no need to account for data already in the buffer */ } \- unsigned int data_read(buffer.size() - offset); \- char *read_ptr(&buffer;[0]); \- read_ptr += offset; \- try \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old the socket already has a state */ } \+ switch (any_cast< SocketState >(socket.get(socket_state_attribute_id_).empty())) { \- server_->read(socket, read_ptr, &data;_read); \- buffer.resize(offset + data_read); \- unsigned int data_writte (buffer.size()); \- if (partner) \+ case expect_authentication_method_request__ : \+ onAuthenticationMethodRequest(socket); \+ break; \+ case expect_socks_request__ : \+ onSocksRequest(socket); \+ break; \+ } +} \+ +/*virtual */void Application::onWriteReady(Socket &socket;) { \- server_->write(*partner, &buffer;[0], &data;_writte ); \+ if (!socket.get(send_buffer_attribute_id_).empty()) \+ { \+ Buffer &buffer;(any_cast< Buffer& >(socket.get(send_buffer_attribute_id_))); \+ Buffer::size_type offset(0); \+ if (!buffer.empty()) \+ { \+ Buffer::size_type avail(buffer.size()); \+ Buffer::pointer send_ptr(&(buffer[0])); \+ server_->write(socket, send_ptr, &avail;); \+ buffer.erase(buffer.begi (), buffer.begi () + avail); \+ } \+ else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old nothing to send */ } } else \- { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old no partner to send data to */ } \- buffer.erase(buffer.begi (), buffer.begi () + data_writte ); \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old nothing to send */ } } \- catch (const Server::NetworkError&) \+ +/*virtual */void Application::onExceptionalDataReady(Socket &socket;) { \- // ignore this for now: the socket will have been dealt with but this is no reason for us to crash. } \+ +/*virtual */void Application::onCloseSocket(Socket &socket;) +{ } -/*virtual */void Application::onWriteReady(Socket &socket;) +void Application::onAuthenticationMethodRequest(Socket &socket;) const { \- if (socket.get(data_to_send_attribute_id_).empty()) \- { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old no-op */ } \+ using Vlinder::Chausette::RFC1928::VersionIdentifierMethodSelectionMessage; \+ using Vlinder::Chausette::RFC1928::MethodMessage; \+ Buffer &buffer;(any_cast< Buffer& >(socket.get(receive_buffer_attribute_id_))); \+ if (buffer.size() < offsetof(VersionIdentifierMethodSelectionMessage, methods_) + 1) \+ { \+ throw InsufficientData("Not enough data for a version identifier/method selection message"); \+ } else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old all is well so far */ } \+ VersionIdentifierMethodSelectionMessage *message(reinterpret_cast< VersionIdentifierMethodSelectionMessage* >(&buffer;[0])); \+ if (message->ver_ != CHAUSETTE_EPISODE35_SOCKS_VERSION) { \- vector< char > &buffer; = any_cast< vector< char >& >(socket.get(data_to_send_attribute_id_)); \- if (buffer.empty()) \- { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old no-op */ } \+ throw WrongSocksVersio ("Wrong socks version", CHAUSETTE_EPISODE35_SOCKS_VERSION, message->ver_); \+ } \+ else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old the version is OK */ } \+ if (buffer.size() < offsetof(VersionIdentifierMethodSelectionMessage, methods_) + message->nmethods_) \+ { \+ throw InsufficientData("Not enough data for a version identifier/method selection message"); \+ } else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old all is well so far */ } \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old As we haven't implemented any authentication methods yet, we only \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml support "no authentication" - method 0. If it is not present among \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml the methods, throw an exception. */ \+ bool authentication_ok(false); \+ for (unsigned char *method = message->methods_; !authentication_ok && ((method - message->methods_) < message->nmethods_); ++method) { \- unsigned int data_writte (buffer.size()); \- server_->write(socket, &buffer;[0], &data;_writte ); \- buffer.erase(buffer.begi (), buffer.begi () + data_writte ); \+ authentication_ok = (*method == 0); } \+ if (!authentication_ok) \+ { \+ throw NoSupportedAuthenticationMethod("No supported authentication method"); } \+ else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old all is well */ } \+ MethodMessage methodMessage(CHAUSETTE_EPISODE35_SOCKS_VERSION, 0/* no authentication - put a constant here later */); \+ unsigned char *ptr(reinterpret_cast< unsigned char* >(&methodMessage;)); \+ queueDataToSend(socket, ptr, ptr + sizeof(methodMessage)); } -/*virtual */void Application::onExceptionalDataReady(Socket &socket;) +void Application::onSocksRequest(Socket &socket;) { \+ using Vlinder::Chausette::RFC1928::SocksRequest; \+ Buffer &buffer;(any_cast< Buffer& >(socket.get(receive_buffer_attribute_id_))); \+ if (buffer.size() < offsetof(SocksRequest, dst_addr_) + 1) \+ { \+ throw InsufficientData("Not enough data for a SOCKS request"); } \- -/*virtual */void Application::onCloseSocket(Socket &socket;) \+ else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old all is well so far */ } \+ SocksRequest *message(reinterpret_cast< SocksRequest* >(&buffer;[0])); \+ if (message->ver_ != CHAUSETTE_EPISODE35_SOCKS_VERSION) { \- RemoteAddressToSocket::iterator where(remote_address_to_socket_.find(socket.remote_address_)); \- assert(where != remote_address_to_socket_.end()); \- assert(where->second == &socket;); \- remote_address_to_socket_.erase(where); \- unpairSocket(socket); \+ throw WrongSocksVersio ("Wrong SOCKS version", CHAUSETTE_EPISODE35_SOCKS_VERSION, message->ver_); } \- -void Application::pairSocket(Socket &socket;) \+ else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old the version is OK */ } \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old Inside the message, the port field is the only one that isn't \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml necessarily in the same position as in the struct. The other \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml fields are where they should be - and we can get to the port \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml field by pasing the address type field. */ \+ sockaddr_storage address; \+ memset(&address;, 0, sizeof(address)); \+ switch (message->atyp_) { \- if (un_paired_socket_) \+ case 1 : { \- un_paired_socket_->get(target_address_attribute_id_) = socket.remote_address_; \- socket.get(target_address_attribute_id_) = un_paired_socket_->remote_address_; \- un_paired_socket_ = 0; \+ // IP V4 address: X'01' \+ if (buffer.size() < offsetof(SocksRequest, dst_addr_) + 6 /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old four for the address, two for the port */) \+ { \+ throw InsufficientData("Not enough data for a SOCKS request"); \+ } \+ else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old all is well so far */ } \+ address.ss_family = AF_INET; \+ sockaddr_in *a4(reinterpret_cast< sockaddr_in* >(&address;)); \+ memcpy(&a4-;>sin_addr, message->dst_addr_, 4); \+ memcpy(&a4-;>sin_port, message->dst_addr_ + 4, 2); \+ break; \+ } \+ case 3 : \+ { \+ // DOMAINNAME: X'03' \+ unsigned char hostname_length(message->dst_addr_[0]); \+ if (buffer.size() < offsetof(SocksRequest, dst_addr_) + hostname_length + 2 /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old hostname_length for the address, two for the port */) \+ { \+ throw InsufficientData("Not enough data for a SOCKS request"); } else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old all is well so far */ } \+ char *hostname_begi (reinterpret_cast< char* >(message->dst_addr_ + 1)); \+ char *hostname_end = hostname_begin + hostname_length; \+ unsigned short port(*reinterpret_cast< unsigned short* >(hostname_end)); \+ *hostname_end = 0; // cap it off \+ hostent *host_entry(gethostbyname(hostname_begi )); \+ if (!host_entry) { \- un_paired_socket_ = &socket; \+ throw NameResolutionError("Name resolution error", GetLastError()); \+ } \+ else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old resolved OK */ } \+ switch (host_entry->h_addrtype) \+ { \+ case AF_INET : \+ { \+ sockaddr_in *a4(reinterpret_cast< sockaddr_in* >(&address;)); \+ memcpy(&a4-;>sin_addr, host_entry->h_addr_list[0], 4); \+ a4->sin_port = port; \+ break; \+ } \+ case AF_INET6 : \+ { \+ sockaddr_in6 *a6(reinterpret_cast< sockaddr_in6* >(&address;)); \+ memcpy(&a6-;>sin6_addr, host_entry->h_addr_list[0], 16); \+ a6->sin6_port = port; \+ break; \+ } } \+ break; } \+ case 4 : \+ { \+ // IP V6 address: X'04' \+ if (buffer.size() < offsetof(SocksRequest, dst_addr_) + 16 + 2 /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old four for the address, two for the port */) \+ { \+ throw InsufficientData("Not enough data for a SOCKS request"); \+ } \+ else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old all is well so far */ } \+ address.ss_family = AF_INET6; \+ sockaddr_in6 *a6(reinterpret_cast< sockaddr_in6* >(&address;)); \+ memcpy(&a6-;>sin6_addr, message->dst_addr_, 16); \+ memcpy(&a6-;>sin6_port, message->dst_addr_ + 16, 2); \+ break; \+ } \+ default : \+ throw UnknownAddressType("Unknown address type", message->atyp_); \+ } \+ switch (message->cmd_) \+ { \+ case 1 : \+ // CONNECT X'01' \+ doCo ect(socket, address); \+ break; \+ case 2 : \+ // BIND X'02' //TODO \+ case 3 : \+ // UDP ASSOCIATE X'03' //TODO \+ break; \+ } +} \+ +void Application::doCo ect(Socket &parent;_socket, const sockaddr_storage &address;) +{ \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old In the reply to a CONNECT, BND.PORT contains the port number that the \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml server assigned to co ect to the target host, while BND.ADDR \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml contains the associated IP address. The supplied BND.ADDR is ofte \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml different from the IP address that the client uses to reach the SOCKS \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml server, since such servers are often multi-homed. It is expected \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml that the SOCKS server will use DST.ADDR and DST.PORT, and the \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml client-side source address and port in evaluating the CONNECT \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml request. */ \+ using Vlinder::Chausette::RFC1928::SocksReply; \+ SocksReply reply; \+ memset(&reply;, 0, sizeof(reply)); \+ reply.ver_ = CHAUSETTE_EPISODE35_SOCKS_VERSION; \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old attempt to co ect to the target address. If successful, open a \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml server socket for the client to co ect to, and forward the data \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml between the two. */ \+ int sock_fd(::socket(address.ss_family, SOCK_STREAM, IPPROTO_TCP)); \+ if (sock_fd == INVALID_SOCKET) \+ { \+ reply.rep_ = 1; // X'01' general SOCKS server failure \+ parent_socket.get(socks_reply_attribute_id_) = reply; \+ setSocketState(parent_socket, send_socks_reply__); \+ } \+ else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old all is well */ } \+ FDGuard fd_guard(sock_fd); \+ u_long arg(1); \+ if (ioctlsocket(sock_fd, FIONBIO, &arg;) != 0) \+ { \+ reply.rep_ = 1; // X'01' general SOCKS server failure \+ parent_socket.get(socks_reply_attribute_id_) = reply; \+ setSocketState(parent_socket, send_socks_reply__); \+ throw SocketIOCTLFailed("Failed to set socket to non-blocking", WSAGetLastError()); \+ } \+ else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old all is well so far */ } \+ if (co ect(sock_fd, (const sockaddr*)&address;, sizeof(address)) != 0) \+ { \+ unsigned long last_error(WSAGetLastError()); \+ switch (last_error) \+ { \+ // logic errors \+ case WSANOTINITIALISED : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old A successful WSAStartup call must occur before using this function. */ \+ case WSAEADDRINUSE : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old The socket's local address is already in use and the socket was not \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml marked to allow address reuse with SO_REUSEADDR. This error usually \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml occurs when executing bind, but could be delayed until the co ect \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml function if the bind was to a wildcard address (INADDR_ANY or \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml in6addr_any) for the local IP address. A specific address needs to \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml be implicitly bound by the co ect function. */ \+ case WSAEINTR : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old The blocking Windows Socket 1.1 call was canceled through \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml WSACancelBlockingCall. */ \+ case WSAEINPROGRESS : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old A blocking Windows Sockets 1.1 call is in progress, or the service \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml provider is still processing a callback function. */ \+ case WSAEALREADY : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old A nonblocking co ect call is in progress on the specified socket. \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml Note In order to preserve backward compatibility, this error is reported \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml as WSAEINVAL to Windows Sockets 1.1 applications that link to either \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml Winsock.dll or Wsock32.dll. */ \+ case WSAEAFNOSUPPORT : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old Addresses in the specified family ca ot be used with this socket. */ \+ case WSAEINVAL : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old The parameter s is a listening socket. */ \+ case WSAEISCONN : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old The socket is already co ected (co ection-oriented sockets only). */ \+ case WSAENOTSOCK : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old The descriptor specified in the s parameter is not a socket. */ \+ case WSAEACCES : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old An attempt to co ect a datagram socket to broadcast address failed \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml because setsockopt option SO_BROADCAST is not enabled. */ \+ default : \+ reply.rep_ = 1; // X'01' general SOCKS server failure \+ parent_socket.get(socks_reply_attribute_id_) = reply; \+ setSocketState(parent_socket, send_socks_reply__); \+ throw Co ectError("Internal error calling co ect", last_error); \+ \+ // run-time errors outside the caller's control \+ case WSAENETDOWN : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old The network subsystem has failed. */ \+ reply.rep_ = 1; // X'01' general SOCKS server failure \+ parent_socket.get(socks_reply_attribute_id_) = reply; \+ setSocketState(parent_socket, send_socks_reply__); \+ break; \+ case WSAECONNREFUSED : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old The attempt to co ect was forcefully rejected. */ \+ reply.rep_ = 5; // X'05' Co ection refused \+ parent_socket.get(socks_reply_attribute_id_) = reply; \+ setSocketState(parent_socket, send_socks_reply__); \+ break; \+ case WSAENETUNREACH : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old The network ca ot be reached from this host at this time. */ \+ reply.rep_ = 3; // X'03' Network unreachable \+ parent_socket.get(socks_reply_attribute_id_) = reply; \+ setSocketState(parent_socket, send_socks_reply__); \+ break; \+ case WSAEHOSTUNREACH : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old A socket operation was attempted to an unreachable host. */ \+ reply.rep_ = 4; // X'04' Host unreachable \+ parent_socket.get(socks_reply_attribute_id_) = reply; \+ setSocketState(parent_socket, send_socks_reply__); \+ break; \+ case WSAENOBUFS : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old Note No buffer space is available. The socket ca ot be co ected. */ \+ reply.rep_ = 1; // X'01' general SOCKS server failure \+ parent_socket.get(socks_reply_attribute_id_) = reply; \+ setSocketState(parent_socket, send_socks_reply__); \+ break; \+ case WSAETIMEDOUT : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old An attempt to co ect timed out without establishing a co ection. */ \+ reply.rep_ = 6; // X'06' TTL expired \+ parent_socket.get(socks_reply_attribute_id_) = reply; \+ setSocketState(parent_socket, send_socks_reply__); \+ break; \+ \+ // run-time errors that point to bugs in the caller/client \+ case WSAEADDRNOTAVAIL : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old The remote address is not a valid address (such as INADDR_ANY or in6addr_any) . */ \+ case WSAEFAULT : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old The sockaddr structure pointed to by the name contains incorrect address format \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml for the associated address family or the namelen parameter is too small. This error \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml is also returned if the sockaddr structure pointed to by the name parameter with \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml a length specified in the namelen parameter is not in a valid part of the user \+ 2017-10-14-new-website.md 2017-10-14-new-website.md~ split.sh split.sh~ wp_post_10.txt wp_post_11.txt wp_post_12.txt wp_post_13.txt wp_post_14.txt wp_post_15.txt wp_post_16.txt wp_post_17.txt wp_post_18.txt wp_post_19.txt wp_post_1.yml wp_post_20.txt wp_post_21.txt wp_post_22.txt wp_post_23.txt wp_post_24.txt wp_post_25.txt wp_post_26.txt wp_post_27.txt wp_post_2.txt wp_post_3.txt wp_post_4.txt wp_post_5.txt wp_post_6.txt wp_post_7.txt wp_post_8.txt wp_post_9.txt wp_posts.yml address space. */ \+ reply.rep_ = 8; // X'08' Address type not supported \+ parent_socket.get(socks_reply_attribute_id_) = reply; \+ setSocketState(parent_socket, send_socks_reply__); \+ break; \+ \+ // "normal" errors \+ case WSAEWOULDBLOCK : \+ /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old The socket is marked as nonblocking and the co ection ca ot be completed immediately. */ \+ parent_socket.get(socks_reply_attribute_id_) = reply; \+ setSocketState(parent_socket, wait_socks_reply__); \+ break; \+ } \+ } \+ else \+ { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old all is well */ } +} \+ +void Application::setSocketState(Socket &socket;, Application::SocketState state) +{ \+ SocketState current_state(any_cast< SocketState >(socket.get(socket_state_attribute_id_))); \+ //TODO +} \+ -void Application::unpairSocket(Socket &socket;) +void Application::queueDataToSend(Socket &socket;, unsigned char *begin, unsigned char *end) const { \- // find the socket this one was paired to \- if (un_paired_socket_ == &socket;) \+ Buffer *buffer(0); \+ if (!socket.get(send_buffer_attribute_id_).empty()) { \- un_paired_socket_ = 0; \+ buffer = &(any_cast< Buffer& >(socket.get(send_buffer_attribute_id_))); } elsen { \- assert(!socket.get(target_address_attribute_id_).empty()); \- sockaddr_storage target_address(any_cast< sockaddr_storage >(socket.get(target_address_attribute_id_))); \- Socket *other_socket(remote_address_to_socket_[target_address]); other_socket->get(target_address_attribute_id_) = any(); \- pairSocket(*other_socket); \+ socket.get(send_buffer_attribute_id_) = Buffer(); \+ buffer = &(any_cast< Buffer& >(socket.get(send_buffer_attribute_id_))); } \+ copy(begin, end, back_inserter(*buffer)); \+ unsigned int data(buffer->size()); \+ server_->write(socket, &((*buffer)[0]), &data;); \+ buffer->erase(buffer->begi (), buffer->begi () + data); } ``` + +We are going to have to manage different kinds of sockets: sockets for command-and-control (which we will call “control sockets” from here on) and sockets that need to be proxied. At first, we’ll just do TCP proxying but, eventually, we will also proxy UDP. + +The way this is going to work, we will have only one thread to do most of the work — so we’ll have to be relatively smart about multiplexing our work. In this context, multiplexing means that we tell the sockets API — and therefore the underlying TCP/IP stack — what we want it to do, but we don’t wait around for it to perform its tasks: rather, we tell it to notify us whenever a task is finished and will carry it on from there. For that to work we do, of course, need to know what task it was performing. We do that by associating a _state_ with each socket, which we put in one of the attributes, which we will call `socket_state_attribute_id_`. That attribute is allocated in the constructor and used throughout the code. For example, in `onDataReady`, it is used to know what to do with the incoming data: + +To see the _code_ click here.To hide the _code_ click here. + +```cpp // here, according to the state of the socket, dispatch the data if (socket.get(socket_state_attribute_id_).empty()) { socket.get(socket_state_attribute_id_) = expect_authentication_method_request__; } else { /* the socket already has a state */ } switch (any_cast< SocketState >(socket.get(socket_state_attribute_id_).empty())) { case expect_authentication_method_request__ : onAuthenticationMethodRequest(socket); break; case expect_socks_request__ : onSocksRequest(socket); break; } ``` + +This basically turns the socket itself into a [state machine](http://rlc.vlinder.ca/blog/2010/01/error-handling-in-c/ "state machines, and error handling, in C"). The available states would, of course, be different according to the role of the socket (i.e. a data socket would never be expected to send a SOCKS request). + +We will look into state machines in the next installment. + +Another important/interesting part of the new code is the parsing of SOCKS requests. We will look at the actions they imply later, but we will take a closer look at the parsing now: + +The first step is to read the data from the buffer. If there isn’t enough data in the buffer, the request ca ot be parsed and should be set aside. Some preliminary checks can be performed on the request immediately, as the following snippet of code will show: + +To see the _code_ click here.To hide the _code_ click here. + +```cpp void Applicatio ::onAuthenticationMethodRequest(Socket &socket;) const { using Vlinder::Chausette::RFC1928::VersionIdentifierMethodSelectionMessage; using Vlinder::Chausette::RFC1928::MethodMessage; Buffer &buffer;(any_cast< Buffer& >(socket.get(receive_buffer_attribute_id_))); if (buffer.size() < offsetof(VersionIdentifierMethodSelectionMessage, methods_) + 1) { throw InsufficientData("Not enough data for a version identifier/method selection message"); } else { /* all is well so far */ } ``` + +Now that we know we at least have enough data, we can treat the data as a message and, assuming the data in the buffer is either properly aligned, or we don’t care about the alignment, we can simply cast the buffer to the appropriate message type and do some more preliminary checks: + +To see the _code_ click here.To hide the _code_ click here. + +```cpp VersionIdentifierMethodSelectionMessage *message(reinterpret_cast< VersionIdentifierMethodSelectionMessage* >(&buffer;[0])); if (message->ver_ != CHAUSETTE_EPISODE35_SOCKS_VERSION) { throw WrongSocksVersio ("Wrong socks version", CHAUSETTE_EPISODE35_SOCKS_VERSION, message->ver_); } else { /* the version is OK */ } unsigned int message_size(offsetof(VersionIdentifierMethodSelectionMessage, methods_) + message-> methods_); if (buffer.size() < message_size) { throw InsufficientData("Not enough data for a version identifier/method selection message"); } else { /* all is well so far */ } ``` + +In this specific message, we’re setting up an authentication method – we won’t support authentication right away, so for now, only method `0` is supported. According to the protocol, this means that at least one of the proposed authentication methods, of which there can be up to 255, has to be `0`. + +To see the _code_ click here.To hide the _code_ click here. + +```cpp bool authentication_ok(false); for (unsigned char *method = message->methods_; !authentication_ok && ((method - message->methods_) < message-> methods_); ++method) { authentication_ok = (*method == 0); } if (!authentication_ok) { throw NoSupportedAuthenticationMethod("No supported authentication method"); } else { /* all is well */ } ``` + +Once we’ve established that we can, indeed, authenticate (or rather: that we don’t need to) we can set up our reply, and send it. + +To see the _code_ click here.To hide the _code_ click here. + +```cpp MethodMessage method_message(CHAUSETTE_EPISODE35_SOCKS_VERSION, 0/* no authentication - put a constant here later */); unsigned char *ptr(reinterpret_cast< unsigned char* >(&method;_message)); queueDataToSend(socket, ptr, ptr + sizeof(method_message)); ``` + +Then, we set the state of the socket to one in which we expect SOCKS requests — now that the initial handshaking is done — and consume the data we’ve already handled. + +To see the _code_ click here.To hide the _code_ click here. + +```cpp socket.get(socket_state_attribute_id_) = expect_socks_request__; buffer.erase(buffer.begi (), buffer.begi () + message_size); ``` + +SOCKS requests are handled pretty much the same way, but contain an address — the IP address of the host we will be proxying with. According to the address type, there are different ways to extract the IP address of the host we will be proxying with. It can either contain the actual IPv4 address, as in the next bit of code: + +To see the _code_ click here.To hide the _code_ click here. + +```cpp sockaddr_storage address; memset(&address;, 0, sizeof(address)); switch (message->atyp_) { case 1 : { // IP V4 address: X'01' if (buffer.size() < offsetof(SocksRequest, dst_addr_) + 6 /* four for the address, two for the port */) { throw InsufficientData("Not enough data for a SOCKS request"); } else { /* all is well so far */ } address.ss_family = AF_INET; sockaddr_in *a4(reinterpret_cast< sockaddr_i * >(&address;)); memcpy(&a4-;>sin_addr, message->dst_addr_, 4); memcpy(&a4-;>sin_port, message->dst_addr_ + 4, 2); break; } ``` + + +or it can be a domain name, in which case the address has to be looked up. + +Address lookup is done synchronously, using `gethostbyname`, but should eventually be done asynchronously as calling this function may take some time — time we don’t necessarily want to spend waiting. + +To see the _code_ click here.To hide the _code_ click here. + +```cpp case 3 : { // DOMAINNAME: X'03' unsigned char hostname_length(message->dst_addr_[0]); if (buffer.size() < offsetof(SocksRequest, dst_addr_) + hostname_length + 2 /* hostname_length for the address, two for the port */) { throw InsufficientData("Not enough data for a SOCKS request"); } else { /* all is well so far */ } char *hostname_begi (reinterpret_cast< char* >(message->dst_addr_ + 1)); char *hostname_end = hostname_begin + hostname_length; unsigned short port(*reinterpret_cast< unsigned short* >(hostname_end)); *hostname_end = 0; // cap it off hostent *host_entry(gethostbyname(hostname_begi )); if (!host_entry) { throw NameResolutionError("Name resolution error", GetLastError()); } else { /* resolved OK */ } switch (host_entry->h_addrtype) { case AF_INET : { sockaddr_in *a4(reinterpret_cast< sockaddr_i * >(&address;)); memcpy(&a4-;>sin_addr, host_entry->h_addr_list[0], 4); a4->sin_port = port; break; } case AF_INET6 : { sockaddr_in6 *a6(reinterpret_cast< sockaddr_in6* >(&address;)); memcpy(&a6-;>sin6_addr, host_entry->h_addr_list[0], 16); a6->sin6_port = port; break; } } break; } ``` + +The address can also be an IPv6, in which case, like in the case of an IPv4 address, it is simply copied. + +To see the _code_ click here.To hide the _code_ click here. + +```cpp case 4 : { // IP V6 address: X'04' if (buffer.size() < offsetof(SocksRequest, dst_addr_) + 16 + 2 /* four for the address, two for the port */) { throw InsufficientData("Not enough data for a SOCKS request"); } else { /* all is well so far */ } address.ss_family = AF_INET6; sockaddr_in6 *a6(reinterpret_cast< sockaddr_in6* >(&address;)); memcpy(&a6-;>sin6_addr, message->dst_addr_, 16); memcpy(&a6-;>sin6_port, message->dst_addr_ + 16, 2); break; } default : throw UnknownAddressType("Unknown address type", message->atyp_); } ``` + +There are a few things you should notice: invalid IP addresses and domain names are not necessarily a problem — they won’t work, of course, but there is really no other way to check for them than to try them out. The only thing we really check is that there’s enough data in the request to contain the information being extracted from the request. + +Socks requests are fairly simple and don’t contain anything like checksums or cryptographic authentication (although cryptographic authentication can be part of the protocol, if supported by both sides) so other than these few checks, there is really nothing to be done. + +Queueing data to send looks like this: + +To see the _code_ click here.To hide the _code_ click here. + +```cpp void Applicatio ::queueDataToSend(Socket &socket;, unsigned char *begin, unsigned char *end) const { Buffer *buffer(0); if (!socket.get(send_buffer_attribute_id_).empty()) { buffer = &(any_cast< Buffer& >(socket.get(send_buffer_attribute_id_))); } else { socket.get(send_buffer_attribute_id_) = Buffer(); buffer = &(any_cast< Buffer& >(socket.get(send_buffer_attribute_id_))); } copy(begin, end, back_inserter(*buffer)); unsigned int data(buffer->size()); server_->write(socket, &((*buffer)[0]), &data;); buffer->erase(buffer->begi (), buffer->begi () + data); } ``` + +First thing we do is get a buffer to put the data into. That buffer is associated with the socket itself so that, if the socket isn’t ready for data being written to it, we can write the data to the socket when the socket _is_ ready, in `onWriteReady`. The `Server` code will know what to do with a call to `write` if the socket isn’t ready, so we can simply call it and remove, from the buffer, any data that we could send. + +The `onWriteReady` method is similar, of course: + +To see the _code_ click here.To hide the _code_ click here. + +```cpp /*virtual */void Applicatio ::onWriteReady(Socket &socket;) { if (!socket.get(send_buffer_attribute_id_).empty()) { Buffer &buffer;(any_cast< Buffer& >(socket.get(send_buffer_attribute_id_))); Buffer::size_type offset(0); if (!buffer.empty()) { Buffer::size_type avail(buffer.size()); Buffer::pointer send_ptr(&(buffer[0])); server_->write(socket, send_ptr, &avail;); buffer.erase(buffer.begi (), buffer.begi () + avail); } else { /* nothing to send */ } } else { /* nothing to send */ } } ``` + +As you can see, it simply retrieves the buffer if there is one and sends as much of the available data as possible. + +In the next few installments, we will look at the following: + + 1. **state machines** : as mentioned above, every socket will be construed as a state machine — but we will look at state machines in other contexts as well. + 2. **threads** : as mentioned above, some actions (such as DNS queries) will need to be performed asynchronously. We will do that by creating a thread to handle those requests. + 3. **the Command patter** which we’ll be using to communicate with the thread + +n \ No newline at end of file diff --git a/_posts/2012-03-29-whats-wrong-with-this-code.md b/_posts/2012-03-29-whats-wrong-with-this-code.md new file mode 100644 index 0000000..2585ea8 --- /dev/null +++ b/_posts/2012-03-29-whats-wrong-with-this-code.md @@ -0,0 +1,7 @@ +--- +layout: post +title: "What's wrong with this code?" +date: 2012-03-29 12:49:39 +categories: blog +--- +Several things are wrong with this code. Can you find them all?n ```c void mai () {n char k[4];n K = "atom";n} ``` \ No newline at end of file diff --git a/_posts/2012-08-15-hidden-complexity-2.md b/_posts/2012-08-15-hidden-complexity-2.md new file mode 100644 index 0000000..535995c --- /dev/null +++ b/_posts/2012-08-15-hidden-complexity-2.md @@ -0,0 +1,17 @@ +--- +layout: post +title: "Hidden complexity" +date: 2012-08-15 21:57:41 +categories: blog +--- +It really surprises me sometimes how much you can have to explain about simple things. + +I’ve written quite a bit of code, most of which is in production today on systems ranging from huge servers tucked away in a bank’s data center somewhere to tiny embedded devices that might just be hanging from your keychain. Most of that code is written in either C or C++, or some combination of the two and some of it contains from-scratch rewrites of things I’d written, often slightly differently, elsewhere. + +So today, I decided to do another one of those rewrites, but for educational purposes. I’m not done yet, but I am amazed at how much I’ll have to _explai_ about the code to make it all clear. The code in question (which is done – the explaining takes a lot longer) uses various techniques that aren’t familiar to many programmers, which makes for even more explaining. + +In all, I think I’ll have several hundreds of words for maybe a hundred lines of code… + +A picture is worth a thousand words – but how much is a class template worth? + +n \ No newline at end of file diff --git a/_posts/2012-12-04-how-to-design-a-struct-for-storage-or-communicating-2.md b/_posts/2012-12-04-how-to-design-a-struct-for-storage-or-communicating-2.md new file mode 100644 index 0000000..6cea8db --- /dev/null +++ b/_posts/2012-12-04-how-to-design-a-struct-for-storage-or-communicating-2.md @@ -0,0 +1,87 @@ +--- +layout: post +title: "How to design a struct for storage or communicating" +date: 2012-12-04 22:21:06 +categories: blog +--- +One of the most common ways of “persisting” or communicating data in an embedded device is to just dump it into persistent storage or onto the wire: rather than generating XML, JSON or some other format which would later have to be parsed and which takes a lot of resources both ways, both in terms of CPU time to generate and parse and in terms of storage overhead, dumping binary data into storage or onto the wire has only the — inevitable — overhead of accessing storage/the wire itself. There are, however, several caveats to this, some of which I run into on a more-or-less regular basis when trying to decipher some of that data, so in stead of just being frustrated with hard-to-decipher data, I choose to describe how it should be done in stead. + +Note that I am by no means advocating anything more than a few simple rules to follow when dumping data. Particularly, I am _ot_ going to advocate using XML, JSON or any other intermediary form: each of those has their place, but they neither should be considered to solve the problems faced when trying to access binary data, nor can they replace binary data. + +## Necessary parts + +There are two things that any structure that is communicated[1](http://rlc.vlinder.ca/blog/2012/12/how-to-design-a-struct-for-storage-or-networking/#footnote_0_1966 "and I include writing to persistent storage and reading it back later in “communication” because that’s what it is: the software reading the data may very well be different from the software writing it — be it different versions of the same software, or different software altogether") in binary form should have: + + 1. a **magic number** , preferably one that is at exactly four bytes in length and one that is chosen to be human-readable, either when displayed as HEX or when displayed as “deciphered” ASCII +Good examples are `0xdeadbeef`; `0x _N_ badf00d` in which _N_ is replaced by a hexadecimal value that might mean something — you have 16 options, and you can put the N at the end, so you really now have 32 options!!; `'CODE'` (or `0x434f4445` in this case) in which CODE is replaced by something descriptive for the structure’s content. For example, if it contains a config for a potato peeler, `'PCFG'` (or `0x50434647`) would do just fine. The idea is to have some magic number that’s easy to recognize when displayed by a memory debugger or when dumped by a run-of-the-mill binary editor/viewer. + 2. the **versio** of the structure. This can be a simple incremental counter — it can even be part of the magic number of you don’t want to “waste” bytes, but it really should be in there. Ideally, it should consist of at least two parts: “current” and “age”, the idea being that you increment both “current” and “age” if you add something, and that you increment “current” and set “age” to 0 if you remove something or change the meaning of some part in a way no longer compatible with previous versions. That way, any-one who reads the structure can very easily see if they can _understand_ the structure:n ```c if (data.magic_ == POTATO_PEELER_CONFIG_MAGIC) { if ((data.version_.current_ - data.version_.age_) == (POTATO_PEELER_CONFIG_CURRENT - POTATO_PEELER_CONFIG_AGE)) { if (data.version_.current_ >= POTATO_PEELER_CONFIG_CURRENT) { // current or newer version - read it as if // it's current. // We should be able to ignore anything // added since (because the implementor // declared us to be forward-compatible, // after all) } else { // older version. Assume default values for // newer fields, or follow some kind of logic // to keep compatibility -- after all, the // implementor did declare us to be // backward-compatible } } else { /* incompatible version - maybe fall back on conversion code..? */ } } else { /* not something we understand - wrong magic number */ } ``` + +With just these two in place on every persisted structure, I would have saved hours of futile staring at memory dumps and binary dumps of files from legacy (and current) systems that I was asked to debug. Basically, every persisted structure should begin like this: + +```c struct PotatoPeelerConfiguration_struct { uint32_t magic_; uint32_t version_; ``` + +or, if we want the code above to compile[2](http://rlc.vlinder.ca/blog/2012/12/how-to-design-a-struct-for-storage-or-networking/#footnote_1_1966 "There is some religious debate over whether or not to do typedef struct Version_struct Version; in C headers, so I left that out, though I usually would have included it for convenience."): + +```c struct Version_struct { uint16_t current_; uint16_t age_; }; struct PotatoPeelerConfiguration_struct { uint32_t magic_; struct Version_struct version_; ``` + +## The structure’s structure + +What’s wrong with this picture: + +```c struct Blah { uint32_t ulThingy; uint8_t ucThingy; uint16_t usThingy; }; ``` + +Hint: it’s not the Hungarian notation! + +There’s an invisible hole in this structure[3](http://rlc.vlinder.ca/blog/2012/12/how-to-design-a-struct-for-storage-or-networking/#footnote_2_1966 "At least, there is on the vast majority of platforms"). Between `ucThingy` and `usThingy` there is a one-byte hole due to the structure’s members’ alignment. + +The vast majority of compilers will insert a hole into the structure to make sure the `usThingy` member is aligned on a “natural” two-byte boundary. _That is the right thing to do_ , because many hardware platforms will be _very_ picky on mis-aligned data. ARM, for example, will throw a ‘data abort’ at you whereas x86 will simply slow down to a crawl. + +**Please don’t make the mistake of using`#pragma pack` for this**: use `#pragma pack` only if you _know_ it has no effect, and then only if you have a whole bunch of `assert`ions in your unit tests, which are run every night, checking that the `#pragma pack` has no effect on any of the platforms you target[4](http://rlc.vlinder.ca/blog/2012/12/how-to-design-a-struct-for-storage-or-networking/#footnote_3_1966 "In other words: just don’t use it — it’s useless."). Using `#pragma pack` otherwise can cause mis-alignment of the contents of the structure which on some platforms (like ARM) can cause crashes. + +_Do_ use filler variables to fill the holes, like this: + +```c struct Blah { uint32_t magic; uint32_t versio ; uint32_t ulThingy; uint8_t ucThingy; uint8_t reserved; uint16_t usThingy; }; ``` + +Note the magic number and version as well, which should _of course_ be at the start of the structure. + +If you’re saving a whole bunch of data to a file, or sending it over a wire, please structure it so we can skip the parts we don’t care about – e.g. by including a header with a `size` field for a section of data objects that we might want to skip over. Every object in the section should still have its own magic number and version, of course, but at least we’ll know that the next 148 bytes are configuration for orange peelers (we’re interested in potato peelers, so we’ll skip those 148 bytes, thank you very much). + +If you add something to a structure, _unless you have reserved some space in the structure for that purpose_ add it to the end: you are least likely to create compatibility problems that way. + +If you’re adding to a collection of objects (as described above) the same applies: make it a complete structure (magic, version and all) and add it to the end. + +If you’re designing a structure that is going to be part of a collection of structures communicated somewhere, make sure its size is a multiple of the largest primitive normally used — e.g. a multiple of eight bytes, or four bytes if you don’t go larger than 32-bit integers. This allows the structures, once read into appropriately-aligned memory, to be automatically appropriately aligned when accessed in that appropriately-aligned memory. Because most functions that dump data to the medium don’t pad structures (as the compiler does when you create an array of objects that don’t follow this rule) you won’t be able to count on alignment otherwise. + +## Other bits[5](http://rlc.vlinder.ca/blog/2012/12/how-to-design-a-struct-for-storage-or-networking/#footnote_4_1966 "I was going to call this section “Optional bits” but it’s not really optional any more than the text itself implies — I don’t want you to skip over this section just because I said it was optional") + +If the data you are dumping is somehow variable-sized (i.e. it’s the header of something), _please include the size_ so we know how much data to skip. If need be the size can be used in _lieu_ of a version (as one Redmond-based company often does). + +You may want to include a few reserved fields for future use. _Do_ expect them to be 0 by default, or include some kind of flag when set to 0 if the 0 means something, and _please do_ `memset` structures to 0 before filling them in because you _will_ forget some of the fields some of the time, and you don’t want random values to end up in there. + +Don’t use already-common magic numbers, such as `0xfeeefeee` or `0xcccccccc` etc. While they’re easy to recognize and perfectly fine as magic numbers in their own right, encountering them usually means **bug** and really stands out to they hardened debugger’s eye. + +If your structure contains strings, try to zero-terminate them if at all possible. Many, many programmers forget to check for zero-termination when they output something from the struct, which causes many, many crashes or other random behaviors. + +## Reading and writing + +Writing is the easy part, so let’s start with that. + +If you have a structure in memory, writing it somewhere is simply a case of calling the appropriate `write()` function passing it a pointer to your structure and the size, like this: ``` retval = write(&data;, sizeof(data)); ``` + +You don’t have to worry too much about issues such as alignment, because the write function will ready the thing byte by byte if need be. + +Reading it into memory is another matter: while in most cases you still don’t need to worry about alignment, you just might have to if the reading function hands you a pointer rather than the other way around: it may not be aligned properly, so you can’t just cast it to the type you want. In stead, use `memcpy` to copy the data into a temporary variable of the right type, so the compiler can align it properly for you. + +Also use the version information you stored to know what the size of the data you received is – it may be different from what you were expecting as the structure may have grown since you wrote your code, or since the code that sent the data was written. Version information will also tell you something about the meaning of the contents of the structure, which may have changed or may need to be set to some default value if it wasn’t provided before. + +So, reading data — even binary data — from any kind of storage or communications medium is really _parsing_ : you should carefully design how it’s done and keep in mind that things change over time: some intern may change the code one day having neither read this post nor anything else useful to his job. While there is only so much you can do to protect yourself from that particular intern, you can at least try to be graceful about erroring out when you encounter one of his bugs. + +HTH + +rlc + + 1. and I include writing to persistent storage and reading it back later in “communication” because that’s what it is: the software reading the data may very well be different from the software writing it — be it different versions of the same software, or different software altogether + 2. There is some religious debate over whether or not to do `typedef struct Version_struct Version;` in C headers, so I left that out, though I usually would have included it for convenience. + 3. At least, there is on the vast majority of platforms + 4. In other words: just don’t use it — it’s useless. + 5. I was going to call this section “Optional bits” but it’s not really optional any more than the text itself implies — I don’t want you to skip over this section just because I said it was optional \ No newline at end of file diff --git a/_posts/2012-12-05-what-happens-if-structures-arent-well-designed.md b/_posts/2012-12-05-what-happens-if-structures-arent-well-designed.md new file mode 100644 index 0000000..4f78e86 --- /dev/null +++ b/_posts/2012-12-05-what-happens-if-structures-arent-well-designed.md @@ -0,0 +1,38 @@ +--- +layout: post +title: "What happens if structures aren’t well-designed" +date: 2012-12-05 18:17:10 +categories: blog +--- +In my [previous post](http://rlc.vlinder.ca/blog/2012/12/how-to-design-a-struct-for-storage-or-networking/ "How to design a struct for storage or networking"), I explained how to design a structure for persisting and communicating. I didn’t say why I explained it — just that things get frustrating if these simple rules aren’t followed. In this post, I will tell you why I wrote the previous one. + +Two or three years ago, I was working on a project that, among several other things, used existing software to communicate between the device I was programming for and another device. The device I was programming for used a binary configuration mechanism that persisted the configuration in binary form directly on disk, in a somewhat structured format. While the structures, as persisted, did include a header that told the reader how much to skip to get to the next section, and a magic number (or rather: a GUID) for each section. The structures in question were managed by a tool with a graphical interface and the generated configuration was included in the firmware with the software I was writing. My software was simply to open the file, get a chunk out of it and pass that chunk to the library doing the communicating, so it would know how to co ect and what its parameters were to be. + +This worked just fine for a very long time, but having moved on to other projects and the software in question not needing any maintenance until very recently, the library code being used for the communications had been allowed to evolve without “my” software being updated with it. I put “my” in quotes here because the software in question is proprietary software that I do not own — I just wrote it. The risk associated with not maintaining the software concurrently with the communications software was known, understood, and managed so there was no real objection to going down this path. + +About three weeks ago, I was asked to help with a massive update of the project’s basic software: the OS, all of the libraries and several other chunks of software I didn’t write had all evolved while the software I had written had been left standing still. Now, a different device to communicate with had to be supported, some-one had been working on that for a few months already[1](http://rlc.vlinder.ca/blog/2012/12/what-happens-if-structures-arent-well-designed/#footnote_0_1979 "I knew about that: I’d helped him a few times already") and a delivery date was nearing, but the update of the bulk of the firmware was in trouble: the system didn’t communicate. + +There were two problems that had to be solved: the first was in the OS, the second was in the application-level software. + +In the OS, the boot communicated the IP addresses and similar information to the main OS through a mailbox structure in memory. That mailbox structure had been changed independently in two branches. Both had added the same field, `timestamp` to the information to be communicated. In one branch, another field had been shortened from 16 to 12 bytes and the `timestamp` had been inserted. In the other branch, the `timestamp` had been added to the end of the structure. + +This is a classic example of why + + 1. you should reserve fields for future use; and + 2. you should consistently add new fields to the end of a structure if no reserved fields are available. + +Not following these two simple rules meant I now had to detect which boot was ru ing to know which format of the mailbox was being used, and translate from one format to the other — in a way that was transparent to the system — if I found an incompatible boot. + +Once the OS was fixed and tested, we checked that this fixed the symptoms of the problem as well, which is when we found the second problem — the system still wasn’t communicating (though we could now ping and telnet into it, which was definite progress). The communications library failed to initialize. + +Tracing through the initialization routine the problem was found easily enough: the chunks of data containing the configuration contained invalid values. We couldn’t verify whether the data being read was in any way misaligned or otherwise corrupted because _almost none_ of the rules I set out in my previous post had been followed: + + * there were no magic numbers + * the only version information included applied to the whole group — none for individual chunks + * the structures contained invisible holes, meaning we had to mentally add padding + +Due to this lack of following design principles, I only found out the next morning that another one of my design rules had also not been followed: a structure had been inserted in the sequence somewhere in the middle. Because some of the code I was ru ing was “unaware” of this, the data being read was offset by several hundreds of bytes — something that would easily have been noticed if we had had magic numbers to look for. When I did finally find the problem, the fix took a few minutes. Several hours were lost searching for a cause in several wrong places, however. + +Hence, two blog posts… + + 1. I knew about that: I’d helped him a few times already \ No newline at end of file diff --git a/_posts/2013-04-02-serializing-floats.md b/_posts/2013-04-02-serializing-floats.md new file mode 100644 index 0000000..a292b03 --- /dev/null +++ b/_posts/2013-04-02-serializing-floats.md @@ -0,0 +1,7 @@ +--- +layout: post +title: "Serializing floats" +date: 2013-04-02 19:47:14 +categories: blog +--- +Serializing is the act of taking a chunk of data an converting it to something that can be communicated -- i.e. some format or other that someone or something else can parse and understand. You do it all the time when you write, or even when you talk: you serialize your thoughts as words that you then serialize as either characters on paper (virtual or dead tree) or as sound.rnrnParsing is the opposite process of serializing -- also called deserializing.rnr rnrnAs with words on paper, there is some inaccuracy in serializing when it's floating point numbers being serialized in human-readable form. That is because a computer uses a binary system for counting while a human uses a decimal system. For integers, that doesn't matter because any integer value that can be represented in, say, 32 bits can also be represented in 10 decimal characters and the conversion is fairly straight-forward.rnrnThe same is not true of floating point values: due to the way floats are implemented, there are some numbers that simply ca ot be accurately represented in a `float` (the same goes for `double`s, of course).rnrnFloating point numbers consist of three parts: the sign (plus or minus), the mantissa and the exponent. The value is [latex]V=S*M*10^E[/latex] in which [latex]S[/latex] is 1 for positive and -1 for negative, [latex]M[/latex] the mantissa and [latex]E[/latex] the exponent. For example, in the case of -12.345 [latex]S = -1[/latex], [latex]E = 1[/latex] and [latex]M = 1.2345[/latex].rnrnUnderstanding this, we can implement a function to split a floating-point value into its constituent parts: ```cpp void split_float(rn int *signrn , double *mantissarn , int *exponentrn , double valuern )rn{rn pre_conditio (sign && mantissa && exponent);rn if (value < 0)rn {rn *sign = -1;rn value = -value;rn }rn elsern {rn *sign = 1;rn }rn *exponent = (int)log10(value);rn *mantissa = value / pow(10, *exponent);rn} ``` rnIn this code, the first thing we take care of is the sign. Note that we also change the sign of `value` in that case, because `log10` of a negative number doesn't work.rnrnTo get the exponent, we call `log10` on the (now positive) value and round it off. Whether casting to `int` rounds up or down depends on the machine you run it on (mostly), but it usually tends to round down.rnrnTo get the mantissa, [latex]M[/latex] we do this: [latex]M=\frac{{\lvert}V{\rvert}}{10^E}[/latex] which is equivalent to hacking off the exponent part of the value, leaving only the mantissa.rnrnNow, if we want to serialize this into a `char` buffer, we can write a function like this: ```cpp int serialize(rn char *outrn , unsigned int out_sizern , double valuern )rn{rn pre_conditio (out || !out_size);rn int s, e, dotted = 0, outputting = 0, characters = 0;rn split_float(&s;, &value;, &e;, value);rn if (out_size && (-1 == s))rn {rn *out++ = '-';rn --out_size;rn }rn elsern { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old not negative */ }rnrn while (out_size && value && (characters < 15))rn {rn *out++ = '0' + (int)value;rn --out_size;rn value -= (int)value;rn value *= 10;rn ++characters;rn if (value && out_size && !dotted)rn {rn *out++ = '.';rn --out_size;rn dotted = 1;rn }rn elsern { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old already dotted or no space for the dot */ }rn }rn if (out_size && e)rn {rn *out++ = 'e';rn --out_size;rn if ((e < 0) && out_size)rn {rn *out++ = '-';rn e = -e;rn }rn if (out_size && (e / 100))rn {rn *out++ = '0' + (e / 100);rn --out_size;rn outputting = 1;rn }rn elsern { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old no more space or nothing to output */ }rn e = e % 100;rn if (outputting || (out_size && (e / 10)))rn {rn *out++ = '0' + (e / 10);rn --out_size;rn outputting = 1;rn }rn elsern { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old no more space or nothing to output */ }rn e = e % 10;rn if (outputting || (out_size && e))rn {rn *out++ = '0' + e;rn --out_size;rn }rn elsern { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old no more space or nothing to output */ }rn }rn elsern { /bin /boot /cdrom /dev /etc /home /initrd.img /initrd.img.old /lib /lib32 /lib64 /libx32 /lost+found /media /mnt /opt /proc /root /run /sbin /snap /srv /sys /tmp /usr /var /vmlinuz /vmlinuz.old no exponent or no more space */ }rn if (out_size) *out = 0;rn return (int)--out_size;rn} ``` rnrnThere are a few interesting details in this piece of code. Look, for example, how it handles the size of the output buffer: it only writes to `*out` one character at a time and only if `out_size` is greater than 0. Note, though, that it decrements `out_size` unconditionally on the last line. This has the effect of returning the number of bytes remaining in the output buffer on success (which will be >= 0) or -1 on error.rnrnSimilarly, we know that a `double` can't have an exponent of more than three characters, so in stead of taking [latex]E \mod 10[/latex] and repeatedly dividing [latex]E[/latex] by 10, then inverting the generated characters (which is the way integers are usually serialized) we just check hundreds, tens and units like a human would normally do.rnrnLikewise, we know that by dividing [latex]{\lvert}V{\rvert}[/latex] by [latex]10^E[/latex] we are left with [latex]0 \leq M < 10[/latex], so we have one digit before the dot. We can then output the dot immediately after the first digit, unless there's nothing more to output.rnrnThere's a few other trivial details that new programmers may want to look at -- here's some code to run it with: ```cpp int mai (void)rn{rn char buffer[100];rn assert(serialize(buffer, sizeof(buffer), -1) == 97);rn assert(strcmp("-1", buffer) == 0);rn printf("%s ", buffer);rn rn #define TEST(a) \rn assert(serialize(buffer, sizeof(buffer), a) > 0); \rn printf("%s -> %s ", #a, buffer); rn TEST(0.43141910996070e8);rn TEST(-0.4948270426510e51);rn TEST(0.8017058961133e32);rn TEST(-0.6431647334000e-50);rn TEST(0.62050295608660e-7);rn TEST(-0.7980292076396e43);rn TEST(0.8098012295643e-42);rn TEST(-0.235788271940e84);rn TEST(0.5378916319145e93);rn TEST(-0.83169954137327e-3);rn rn return 0;rn} ``` \ No newline at end of file diff --git a/_posts/2013-09-20-run-time-composed-predicates-and-code-generation.md b/_posts/2013-09-20-run-time-composed-predicates-and-code-generation.md new file mode 100644 index 0000000..ac4559e --- /dev/null +++ b/_posts/2013-09-20-run-time-composed-predicates-and-code-generation.md @@ -0,0 +1,55 @@ +--- +layout: post +title: "Run-time composed predicates and Code generation" +date: 2013-09-20 18:58:08 +categories: blog +--- +While working on Arachnida, preparing version 2.2 due out this fall, one of the things we’ll be introducing is a hardened OpenSSL transport-layer-security plug-in, to replace the one we’ve had for the last seven or so years. One of the new features in this plug-in (which is part of Arachnida’s “Scorpion” module) is a much more flexible configuration scheme including the subject of today’s post: run-time composed predicates. + +As the name indicates, run-time composed predicates are predicates that are composed at run-time. In this case, we use them for post-co ection validations of the SSL/TLS co ection: the user can plug their own post-co ection validations in and combine them with the ones provided in the library using AND, OR, NOR, NAND, XOR and NOT primitives. Typically, such a composed predicate would look like this: + +```cpp configuration.post_co ection_verification_predicate_ = and_( and_( peer_provided_certificate__, fqdn_matches_peer__) , userProvidedPredicate); ``` + +in which `userProvidedPredicate` is a pointer to a user-provided predicate function whereas the other two predicates are included in the library. + +The thing is that each of the following will also work: + +```cpp // if the peer provided a predicate, assume everything is fine configuration.post_co ection_verification_predicate_ = peer_provided_certificate__; // we accept this only of the FQDN in the peer-provided certificate DOES NOT match the peer's FQDN // THIS IS STUPID - DO NOT DO THIS IN YOUR CODE! configuration.post_co ection_verification_predicate_ = not_(fqdn_matches_peer__); // apply only the user's predicate configuration.post_co ection_verification_predicate_ = userProvidedPredicate; ``` + +The trick here is that the predicate type, `PostCo ectionVerificationPredicate`, is a function-to-pointer type and the functions `and_`, `or_`, `xor_`, ` and_`, ` or_` and ` ot_` each return a function to a “newly created” function. + +Of course, C++ does not allow the creation of functions at run-time and, as the call-back is passed to OpenSSL and OpenSSL is written in C, more to the point, neither does C. + +As Arachnida is designed to run on industrial control systems and industrial embedded devices, we want to avoid run-time memory allocation if at all possible — and when that’s not possible, we want to control it. In this case, we avoid it by creating an array of pointers to functions, another array of “configurations” for those functions and a function for each position in the array. We do this using a Perl script (because we usually use Perl to generate code and it integrates nicely with our build system). + +The following chunk of code is the generation script verbatim — a otated. + +First, the usual pre-amble code: for the Perl part, this is invoking the interpreter; for the C++ code, this is including the neccessary headers. + +```perl #! /usr/bin/env perl my $name = $0; my $max_predicate_count = 20; print < #include ``` + +The maximum predicate count is set above, and replicated in the generated C++ source code here. To change it, we currently need to change the script. At some point (probably before version 2.2 of Arachnida is released) this will become a command-line argument to the script. + +```perl #define MAX_PREDICATE_COUNT ${max_predicate_count} amespace Scorpion { namespace OpenSSL { namespace Details { namespace { static unsigned int next_predicate_id__ = 0; ``` + +The following is how predicates are allocated: any call to any of the predicate construction functions (`and_`, `or_`, etc.) will call this once, and throw `bad_alloc` if it fails. + +```perl unsigned int allocatePredicateID() { if (MAX_PREDICATE_COUNT == next_predicate_id__) throw std::bad_alloc(); retur next_predicate_id__++; } ``` + +The following structure holds the configuration of the “generated” predicate. This is all we need to know for any operator: what the left-hand-side of the expression is, what the right-hand-side is and what operator it is. One operator is unary, all the others are binary. The unary one only uses the `lhs_` member of this structure. + +```perl struct PredicateInfon { enum Type { and__n , or__n , xor__n , nand__n , nor__n , not__n }; n Type type_; PostCo ectionVerificationPredicate lhs_; PostCo ectionVerificationPredicate rhs_; }; ``` + +The following is an array of each of these configurations, followed by Perl code to generate each of the functions. I could have used a template to generate these rather than generated code but I find as long as I’m generating code anyway, it makes more sense to just keep generating — especially if there’s no compelling reason to do otherwise. + +```perl PredicateInfo predicate_infos__[MAX_PREDICATE_COUNT]; EOF ; for (my $i = 0; $i < $max_predicate_count; ++$i) { print set< T > foo(T t, F query) { set< T > results; set< T > checked; set< T > to_check; to_check.insert(t); n do n { for (typename set< T >::const_iterator check(to_check.begi ()); check != to_check.end(); ++check) { typename F::result_type query_results(query(*check)); results.insert(query_results.begi (), query_results.end()); } checked.insert(to_check.begi (), to_check.end()); to_check.clear(); set_difference(results.begi (), results.end(), checked.begi (), checked.end(), inserter(to_check, to_check.end())); } while (!to_check.empty()); retur results; } ``` + +Insertion into a set is ![O\(\\lg{n}\)](http://s0.wp.com/latex.php?latex=O%28%5Clg%7Bn%7D%29&bg=ffffff&%23038;fg=000&%23038;s=0) so lines 43 and 45 are both ![O\(n\\lg{n}\)](http://s0.wp.com/latex.php?latex=O%28n%5Clg%7Bn%7D%29&bg=ffffff&%23038;fg=000&%23038;s=0). Line 46 should be ![O\(c\)](http://s0.wp.com/latex.php?latex=O%28c%29&bg=ffffff&%23038;fg=000&%23038;s=0) but is probably ![O\(n\)](http://s0.wp.com/latex.php?latex=O%28n%29&bg=ffffff&%23038;fg=000&%23038;s=0). Line 47 is ![O\(n\)](http://s0.wp.com/latex.php?latex=O%28n%29&bg=ffffff&%23038;fg=000&%23038;s=0) so the whole things boils down to ![O\(n\\lg{n}\)](http://s0.wp.com/latex.php?latex=O%28n%5Clg%7Bn%7D%29&bg=ffffff&%23038;fg=000&%23038;s=0) complexity. + +In order to play with the code a bit, I put it on GitHub as a Gist, with a test case (Query fails if you call it more than once with the same value): + + 1. so you really do need to call them only once + 2. i.e. for the case at hand, we’re querying a directed acyclic graph, so our first argument will never be seen in any of the `query` results, although any given value may appear more than once in `query` results \ No newline at end of file diff --git a/_posts/2014-09-05-a-different-take-on-the-optimize-by-puzzle-problem.md b/_posts/2014-09-05-a-different-take-on-the-optimize-by-puzzle-problem.md new file mode 100644 index 0000000..9af0e77 --- /dev/null +++ b/_posts/2014-09-05-a-different-take-on-the-optimize-by-puzzle-problem.md @@ -0,0 +1,31 @@ +--- +layout: post +title: "A different take on the “optimize by puzzle” problem" +date: 2014-09-05 11:32:31 +categories: blog +--- +I explained the problem I presented in my [previous post](http://rlc.vlinder.ca/blog/2014/09/optimization-by-puzzle/) to my wife overt di er yesterday. She’s a professor at law and a very intelligent person, but has no notion of set theory, graph theory, or algorithms. I’m sure many of my colleagues run into similar problems, so I thought I’d share the analogies I used to explain the problem, and the solution. I didn’t get to explaining how to arrive at computational complexity, though. + +Say you have a class full of third-grade children. Their instructions are simple: + + 1. They ca ot tell you their own names — if you ask, they have permission to kick you in the shins. + 2. Each child has their hands on the shoulder of zero one or two other children. + 3. All the children are facing in the same direction. + 4. Only one child has no hands on their shoulder. + 5. You can ask each child the names of the children whose shoulders they have their hands on, but they will only tell you once — ask again, they’ll kick you in the shins — and you have to address them by their names. + +You are told the name of one child. How do you get the names of all the children without getting kicked in the shins and which child do you have to get the name of? + +Obviously, the child whose name you have to know in advance is the one who doesn’t have any hands on their shoulders. From there on, you need to keep track of the kids whose names you know but haven’t asked yet (the `to_check` set) the kids whose names you know and have addresses (the `checked` set). At the end, you’ll have checked everyone, so you group of kids whose names you know but having asked yet is empty. + +The third set (the `results` set) really only exists to make getting the “right” part of the set. As shown in the Ve chart below, the set of kids remaining to be checked is the difference between the result set and the (entirely overlapping) set of kids we checked with. + +[![Ve + + chart of the sets](http://rlc.vlinder.ca/wp-content/uploads/2014/09/IMG_1202.png)](http://rlc.vlinder.ca/wp-content/uploads/2014/09/IMG_1202.png) + +Ve chart of the sets + +And that’s exactly what the algorithm does. + +n \ No newline at end of file diff --git a/_posts/2014-10-18-radical-refactoring-have-the-compiler-to-some-of-the-reviewing.md b/_posts/2014-10-18-radical-refactoring-have-the-compiler-to-some-of-the-reviewing.md new file mode 100644 index 0000000..1798603 --- /dev/null +++ b/_posts/2014-10-18-radical-refactoring-have-the-compiler-to-some-of-the-reviewing.md @@ -0,0 +1,19 @@ +--- +layout: post +title: "Radical Refactoring: Have the compiler to (some of) the reviewing" +date: 2014-10-18 09:35:32 +categories: blog +--- +One of the most common sources of bugs is ambiguity: some too-subtle API change that’s missed in a library update and introduces a subtle bug, that finally only gets found out in the field. My answer to that problem is radical: make changes breaking changes — make sure the code just won’t compile unless fixed: the compiler is generally better at finding things you missed than you are. + +I recently had to review a chunk of code that ported an application from one platform to a different flavor of that platform. The different flavor in question didn’t support a given library, but because all flavors were compiled from the same source tree, the headers of the unsupported library were still available. Regrettably the only way to distinguish between one flavor of the platform and another at compile-time was using an `#ifdef`. + +The code was therefore littered with `#ifdef`s, but the `#include` directive that included the library’s header files was still there — so all the API calls that were no longer supported would still compile (and, in this case, link as well, but do the wrong thing at run-time in oh-so-subtle ways). + +In stead of going through all the calls one by one, I asked the developer to surround the `#include` with an `#ifdef` and let the compiler check that none of them had been forgotten. In this case, none of them had. + +The compiler didn’t find any sites that had been missed, but had there been any, it would have. + +Of course, a better approach would have been to refactor the code so all those `#ifdef`s would no longer have been necessary. That is what had originally been pla ed, but sometimes the economic realities off our work catch up to the cleanliness of our code: sometimes refactoring and doing it right _right now_ is simple too expensive. The question then becomes whether the investment into refactoring will return a real added value to the program — and the answer in this case was “no”. + +n \ No newline at end of file diff --git a/_posts/2014-10-24-radical-refactoring-breaking-changes.md b/_posts/2014-10-24-radical-refactoring-breaking-changes.md new file mode 100644 index 0000000..86bbaf7 --- /dev/null +++ b/_posts/2014-10-24-radical-refactoring-breaking-changes.md @@ -0,0 +1,28 @@ +--- +layout: post +title: "Radical Refactoring: Breaking Changes" +date: 2014-10-24 16:36:11 +categories: blog +--- +One of the most common sources of bugs is ambiguity: some too-subtle API change that’s missed in a library update and introduces a subtle bug, that finally only gets found out in the field. My answer to that problem is radical: make changes breaking changes — make sure the code just won’t compile unless fixed: the compiler is generally better at finding things you missed than you are. + +Recently, I found a bug in a smart pointer implementation in Acari, Vlinder Software’s toolbox library used for, among other things, the Arachnida HTTP(s) server/client framework. The bug was subtle, not likely to cause problems in most current deployments of Arachnida, but limiting for one of our customers (so it had to be fixed). + +When I started setting up the necessary testing framework, I came to the conclusion that the bug in question was a design flaw, and that not only the code would have to be changed, but the calling code at at least one of the call sites as well. I now had two things to make sure of: + + 1. the design had to be reviewed to make sure no other flaws were present + 2. the calling sites that needed to be changed had to be spotted unambiguously, and changed in a way clearly specified + +I decided to review the requirements that were at the base of the original design, clarify the requirement that was missed and led to the flaw, set up the necessary test cases for each of the functional requirements, design a new implementation to meet all the requirements, and implement that new design. This decision led to a delay in the release of version 2.3 of Arach ida (which was pla ed for the end of 2014Q3 but will now come out early-to-mid 2014Q4) — which made it an executive decision. + +Luckily, I’m the sole proprietor for Vlinder Software as well as its chief analyst — it says so on my business cards — so these type of decisions inevitably come down to me. I looked in the mirror and gave myself the go-ahead (without the mirror bit). I also informed the customer personally that, though his use-case wasn’t supported _at the moment_ , it would be in version 2.3 of the server framework. + +I then proceeded to coding the new smart pointer in a new library, according to the new design in a test setup designed specifically for it. This required, among other things, changes to the Relacy Race Detector, which are I made available [on GitHub](https://github.com/VlinderSoftware/relacy "GitHub repository of Relacy fork"). + +The new smart pointer no longer lives in the `Acari` namespace, but had basically the same API as the previous version did. That means that all the calling sites automatically fail to compile if they use the old version, because the fully-qualified name is no longer the same. The buggy use-case will fail to compile even if you change your `using namespace` directives to include the new namespace, because it will disallow an automatic conversion that was possible in the previous design. + +Now, this forced me to revise and review all Vlinder Software code that used the old smart pointer from the Acari library, but as we have automated nightly builds that started failing as soon as I committed the “rip out the pointer class” changes in Acari’s master branch, those sites were easy to find and — because the two APIs are the same for the most part, and the only breaking change is abundantly clear — easy to fix. + +Arachnida is now going through the hoops of the release process, with all the (hundreds of) automated test cases ru ing, the security review starting for all modifications made in the months since the previous release, etc. When released, our customers who will upgrade from previous versions of Arachnida to version 2.3 will get a document explaining how to solve the compile-time errors they will probably (almost inevitably) face — a typical upgrade will take no more than 15 minutes to modify the usual call sites where a fully-qualified name for the smart pointer site would be used — and new use-cases will be supported that were not supported before, allowing for more efficient implementations on some devices. + +n \ No newline at end of file diff --git a/_posts/2015-11-05-interesting-modifications-to-the-lamport-queue.md b/_posts/2015-11-05-interesting-modifications-to-the-lamport-queue.md new file mode 100644 index 0000000..8eb19bd --- /dev/null +++ b/_posts/2015-11-05-interesting-modifications-to-the-lamport-queue.md @@ -0,0 +1,99 @@ +--- +layout: post +title: "Interesting modifications to the Lamport queue" +date: 2015-11-05 20:28:03 +categories: blog +--- +While researching lock-free queue algorithms, I came across a few articles that made some interesting modifications to the Lamport queue. One made it more efficient by exploiting C11’s new memory model, while another made it more efficient by using cache locality. As I found the first one to be more interesting, and the refinements more useful for general multi-threaded programming, I thought I’d explain that one in a bit more detail. + + +**The TL;DR:** I + + * provide a brief explanation of lock-free queue categories + * explain an article by Nhat Minh Le _et al._ in programmer-ese + * provide their improvement upon the Lamport queue, [with code](https://github.com/blytkerchan/Lamport) + * show why alternative approaches don’t work + * explain what a data race is, and how the C11 memory model fits in and addresses it + + +The article I will explain part is “Correct and Efficient Bounded FIFO Queues” by Nhat Minh Le _et al._ , which you can find [here](http://dx.doi.org/10.1109/SBAC-PAD.2013.8). I have set up a [Git repository on GitHub](https://github.com/blytkerchan/Lamport) with the C11 code. In order to build it, you need at least GCC version 4.9. In order to test it properly, you need something like Thread Sanitizer, which is included in GCC and used by the provided Makefile — but the article itself contains ample proof that the code will work. + +Let’s first take a look at the Lamport queue, as originally presented by Lamport in “Proving the Correctness of Multiprocess Programs” (available [here](http://dx.doi.org/10.1109/TSE.1977.229904)) as an example of a lock-free queue. It wasn’t ostensibly designed to be particularly efficient but rather as a nice, simple, easy-to-analyse example of a multi-process program[1](http://rlc.vlinder.ca/blog/2015/11/interesting-modifications-to-the-lamport-queue/#footnote_0_3679 "This is another reason why I prefer the article I’m explaining rather than the other candidate which caught my attention: the tone is much friendlier"). + +The code for Lamport’s queue, translated to C11, looks like this: + +```c struct LamportQueue { atomic_size_t front_; atomic_size_t back_; T data_[SIZE]; }; ``` + +This defines the structure of the queue itself. The queue is a lock-free single-producer/single-consumer (SPSC) single-in/single-out (SISO) FIFO queue. +This is where you say “What does that mean?”. + +Queues are classified along various categories, according to the guarantees they give you. Among various others (some of which I will discuss below), there is the question of “how many threads can push something into the queue at the same time?”, rephrased as _single-producer_ , or _multi-producer_ because generally, if you can push with two threads at the same time, you can push with three threads at the same time, etc.[2](http://rlc.vlinder.ca/blog/2015/11/interesting-modifications-to-the-lamport-queue/#footnote_1_3679 "Note that this is not always the case!"). + +Analogously, you can ask “with how many threads can I pop stuff from the queue at the same time?”, rephrased as _single-consumer_ or _multi-consumer_. With these two questions answered, we now have four classes of queue algorithms: SPSC, MPSC, SPMC and MPMC. If you go out looking for queue algorithms, you’ll find the SPSC kind is the most ubiquitous. + +A second set of questions you can ask is “how many values can I push into the queue at the same time?”, rephrased as _single-i_ vs. _multi-i_ — and conversely “how many values can I pop from the queue at the same time?”, rephrased as _single-out_ or _multi-out_. Most queues (lock-free or not) are SISO, but there are also SIMO, MISO and MIMO queues. + +A third question is about the _order_ of the things that go in vs. the things that go out. Basically, there are three orders: _first-in-first-out (FIFO)_ , _last-in-first-out (LIFO — also sometimes called first-in-last-out or FILO, this is basically a stack)_ and _undetermined_ which basically means you don’t know but in which case there’s generally a note saying something like “FIFO in the general case” indicating that, while we can’t guarantee a specific order, it will generally look like this… + +Now, I almost glossed over the “lock-free” part. [Gotsman _et al._](http://dx.doi.org/10.1145/1480881.1480886) provide a nice classification of non-blocking algorithms: + +Wait-freedom: + Every ru ing thread is guaranteed to complete its operation, regardless of the execution speeds of the other threads. Wait-freedom ensures the absence of livelock and starvation. +Lock-freedom: + From any point in a program’s execution, some thread is guaranteed to complete its operation. Lock-freedom ensures the absence of livelock, but not starvation. +Obstruction-freedom + Every thread is guaranteed to complete its operation provided it eventually executes in isolation. In other words, if at some point in a program’s execution we suspend all threads except one, then this thread’s operation will terminate. + +Wait-freedom is the Holy Grail of non-blocking algorithms: if you can find a non-trivial wait-free algorithm that suits a general need, you will have earned the respect of many a programmer. Lamport’s algorithm is actually wait-free, but it has the caveat of failing when the queue is full/empty (which is OK in many cases, but in some cases, it means the producer has to loop back and wait for there to be space available, so the algorithm really becomes lock-free rather than wait-free)[3](http://rlc.vlinder.ca/blog/2015/11/interesting-modifications-to-the-lamport-queue/#footnote_2_3679 "So close, yet so far away…."). + +Let’s get back to the code. Initializing the structure is straight-forward: + +```c void LamportQueue_init(struct LamportQueue *queue) { atomic_init(&queue-;>front_, 0); atomic_init(&queue-;>back_, 0); } ``` + +Pushing into the queue is interesting: as Nhat Minh Le _et al._ describe it, each end (the pushing and pulling end) can consider one of the two indices as _ow_ and the other as _foreig_. A process has the right to read and modify its own index, but can only read the foreign one — no modification allowed there. Keeping that in mind, you just have to decide which end you push onto (we’ll take the tail) and which end you pop off of (the head). Hence, within `push` the tail, or `back_` is owned while the head, or `front_` is foreign, and vice-versa for pop. + +So, pushing is a matter of getting the location to store the data to (line 28), checking whether it is available (lines 29 through 35), putting the data there (line 36), and publishing the fact that the data is there by incrementing the appropriate index (line 37). + +```c bool LamportQueue_push(struct LamportQueue *queue, T elem) { size_t b, f; b = atomic_load_explicit(&queue-;>back_, memory_order_seq_cst); f = atomic_load_explicit(&queue-;>front_, memory_order_seq_cst); if ((b + 1) % SIZE == f) { retur false; } else { /* not full */ } queue->data_[b] = elem; atomic_store_explicit(&queue-;>back_, (b + 1) % SIZE, memory_order_seq_cst); retur true; } ``` + +Popping is, of course, similar to pushing: read the place where the data should be (line 44), check whether the queue isn’t empty (lines 45 through 51) read the data (line 52) and publish the fact that it’s been read by incrementing the owned index (line 53). + +```c bool LamportQueue_pop(struct LamportQueue *queue, T *elem) { size_t b, f; f = atomic_load_explicit(&queue-;>front_, memory_order_seq_cst); b = atomic_load_explicit(&queue-;>back_, memory_order_seq_cst); if (b == f) { retur false; } else { /* not empty */ } *elem = queue->data_[f]; atomic_store_explicit(&queue-;>front_, (f + 1) % SIZE, memory_order_seq_cst); retur true; } ``` + +Lock-freedom is nice, but you want to avoid _contentio_ which is something lock-freedom alone will not do. On modern systems, contention can happen in all kinds of hidden places: loading a shared variable’s value, for example, may require the compiler or processor to emit _memory barriers_ to ensure the value you see is the value you really want to see (or the value the compiler/processor thinks you really want to see). That might mean having to synchronize with other CPUs or other CPU cores, interrupting the normal workflow of some or all of them; going all the way through to memory, slowing you — and possibly others — down along the way. + +So, how do you get rid of such contention? One way exemplified by Nhat Minh Le _et al._ is to get rid of memory barriers. The approach they take in the article is well-thought-out, but frankly a bit boring — so I thought I’d mix things up a little and just make everything “relaxed” (changing `memory_order_seq_cst` to `memory_order_relaxed` throughout the code) to show that that doesn’t work. + +Ru ing what I’ll call the “hippie” version, compiled with ThreadSanitizer, you get this warning: + +``` ==================nWARNING: ThreadSanitizer: data race (pid=7546) Read of size 4 at 0x7ffc720b0b10 by thread T2:n #0 LamportQueue_pop /home/rlc/lamport/lamport.c:50 (lamport+0x000000000e33) #1 consumer /home/rlc/lamport/lamport.c:73 (lamport+0x000000000f4b) #2 (libtsan.so.0+0x000000023629) n Previous write of size 4 at 0x7ffc720b0b10 by thread T1:n [failed to restore the stack] Location is stack of main thread. Thread T2 (tid=7549, ru ing) created by main thread at:n #0 pthread_create (libtsan.so.0+0x0000000274f7) #1 main /home/rlc/lamport/lamport.c:89 (lamport+0x000000001015) n Thread T1 (tid=7548, ru ing) created by main thread at:n #0 pthread_create (libtsan.so.0+0x0000000274f7) #1 main /home/rlc/lamport/lamport.c:88 (lamport+0x000000000fec) nSUMMARY: ThreadSanitizer: data race /home/rlc/lamport/lamport.c:50 LamportQueue_popn==================nThreadSanitizer: reported 1 warnings ``` + +This basically means that there’s no way of knowing which version of `back_` the consumer is reading, w.r.t. the associated data: because of the relaxed memory ordering, the the reads and writes in the producer thread aren’t necessarily visible in the same order in the consumer thread, and vice-versa. + +This warrants a bit more explanation. + +When you write your code, you might expect the compiler to translate your code into the equivalent instructions in assembly language, which are then translated into opcodes, one by one, which are then faithfully executed, in order, by the computer. However, if that is really what you believe is going on, many a computer scientist or software engineer will ask you what century you think we’re in. In this century, the hardware really only pretends to do something similar to what the software tells it to do, and at the moment it sees the software, it really is only a facsimile of what the programmer originally wrote. In order for the computer to do what you want it to do efficiently, we have to give it an enormous amount of latitude as to what exactly it does, and in what order. + +This is where the C11 memory model comes in: while, in the spirit of the rest of the C language, most of it is undefined, it defines the order of things in terms of _happens-before_ and related notions. Happens-before is the most important of these in that it addresses the notion of a _race conditio_ or _data race_ : a data race occurs when a process tries to read a value ![hat{x}](http://s0.wp.com/latex.php?latex=%5Chat%7Bx%7D&bg=ffffff&%23038;fg=000000&%23038;s=0) ftom a variable ![x](http://s0.wp.com/latex.php?latex=x&bg=ffffff&%23038;fg=000000&%23038;s=0) but it is undefined whether the write that produces that value has occured yet, or is perhaps even still in progress; or when a process tries to write to ![x](http://s0.wp.com/latex.php?latex=x&bg=ffffff&%23038;fg=000000&%23038;s=0) while another process also tries to write to ![x](http://s0.wp.com/latex.php?latex=x&bg=ffffff&%23038;fg=000000&%23038;s=0). If ![x](http://s0.wp.com/latex.php?latex=x&bg=ffffff&%23038;fg=000000&%23038;s=0) is not shared, this ca ot happen but if it is, _reads and writes to shared variables may appear out-of-sequence to other processes/threads_. + +This gets us back to what I said earlier: between the code you write and what the computer executes, there may be a world of difference. The “hippie” version of the code above, with its relaxed reads and writes on atomic shared variables only guarantees that no thread/process will see any intermediate values — values that are neither the previous nor the new value — but it does not guarantee anything of the sort for non-atomic shared variables (such as `data_`) nor does it say anything about the ordering between writes to `data_` and writes to `back_`, as visible from the consumer, nor reads from `data_` and writes to `front_` as visible from the producer. + +Of course, this does not mean that all reads and writes have to use `memory_order_seq_cst`: `memory_order_seq_cst` emits a _memory barrier_ that makes anything that was sequenced-before it visible before it — which is usually overkill. To know what kind of `memory_order_*` you need, you need to ask yourself: what reads/writes may become visible after this point? and who else (what other thread/process) can see this shared state? + +With this in mind, let’s take another look at `LamportQueue_pop`: + +```c bool LamportQueue_pop(struct LamportQueue *queue, T *elem) { size_t b, f; f = atomic_load_explicit(&queue-;>front_, memory_order_seq_cst); b = atomic_load_explicit(&queue-;>back_, memory_order_seq_cst); if (b == f) { retur false; } else { /* not empty */ } *elem = queue->data_[f]; atomic_store_explicit(&queue-;>front_, (f + 1) % SIZE, memory_order_seq_cst); retur true; } ``` + +On line 44, we load our _ow_ member variable, `front_`. We (the context of the thread the code is ru ing in) are the only ones to ever write to this variable, so we know that the only sequencing that can happen — the only order in which we can see changes to this member — is the order we impose on it ourselves. This means we can breathe easily: there is no way for someone else (another thread) to mess up what we see when we look at this variable — we can _relax_. + +More formally, reads from shared variables the reading thread only writes to itself can be relaxed because we only need sequenced-before ordering. + +On line 45, we read from a _foreig_ variable, so we will need some kind of barrier to make sure that any reads of our shared state — any reads of the data in our queue — ca ot be ordered before this read. In the same vein, on line 53 we write to our _ow_ variable with the full knowledge that another thread will read it as a _foreig_ variable, so we need to make sure no stores we do on the shared state are ordered after that write. I.e., we ca ot be seen to read from the shared state before reading the foreign variable and thus _acquiring_ the shared state, and we ca ot be seen to write to the shared state after writing to our own variable to _release_ the shared state. + +The wording is important here: unless we tell it otherwise, the compiler/CPU is allowed to re-order anything we do in a single thread as long as from the thread itself, everything still _seems_ to have occurred in the same order. The _visible order_ from any other thread may well be different. Memory barriers and atomic operations affect the way things are seen from outside the thread. So when I say that the thread “ca ot be seen to read from the shared state before reading the foreign variable” that means that the visible order of those operations, as seen from outside the thread, should be such that the read from the foreign atomic variable _happens-before_ the read from the shared data. + +[_Continued…_](http://rlc.vlinder.ca/blog/2015/11/interesting-modifications-to-the-lamport-queue-part-ii/) + + 1. This is another reason why I prefer the article I’m explaining rather than the other candidate which caught my attention: the tone is much friendlier + 2. Note that this is not always the case! + 3. So close, yet so far away…. \ No newline at end of file diff --git a/_posts/2017-07-31-add-ids-for-all-the-native-types.md b/_posts/2017-07-31-add-ids-for-all-the-native-types.md new file mode 100644 index 0000000..7249250 --- /dev/null +++ b/_posts/2017-07-31-add-ids-for-all-the-native-types.md @@ -0,0 +1,12 @@ +--- +layout: post +title: "add IDs for all the native types" +date: 2017-07-31 23:29:12 +categories: blog +--- +n + + + add IDs for all the native types + + Don't know which offers I'll support yet, but at least I won't have tonscroll on my phone as much \ No newline at end of file diff --git a/_posts/2017-07-31-first-chunk-of-code.md b/_posts/2017-07-31-first-chunk-of-code.md new file mode 100644 index 0000000..2e2b4b8 --- /dev/null +++ b/_posts/2017-07-31-first-chunk-of-code.md @@ -0,0 +1,14 @@ +--- +layout: post +title: "first chunk of code" +date: 2017-07-31 16:52:53 +categories: blog +--- +n + + + first chunk of code + + Probably won't compile yet (there's only so much I can do coding on a + + iPad on there site there road) but this should parse the type andnlength of a DER-encoded TLV.nValue is next (next pit stop, maybe) \ No newline at end of file diff --git a/_posts/2017-07-31-skeleton-for-primitive-value-parsing.md b/_posts/2017-07-31-skeleton-for-primitive-value-parsing.md new file mode 100644 index 0000000..dcf2db4 --- /dev/null +++ b/_posts/2017-07-31-skeleton-for-primitive-value-parsing.md @@ -0,0 +1,12 @@ +--- +layout: post +title: "skeleton for primitive value parsing" +date: 2017-07-31 22:02:01 +categories: blog +--- +n + + + skeleton for primitive value parsing + + Go by class, then type. Now parses end of input and Boolean values. \ No newline at end of file diff --git a/_posts/2017-08-01-now-decodes-integers-and-enums.md b/_posts/2017-08-01-now-decodes-integers-and-enums.md new file mode 100644 index 0000000..b3576b4 --- /dev/null +++ b/_posts/2017-08-01-now-decodes-integers-and-enums.md @@ -0,0 +1,14 @@ +--- +layout: post +title: "now decodes integers and enums" +date: 2017-08-01 12:08:41 +categories: blog +--- +n + + + ow decodes integers and enums + + Had a few minutes to kill.nI decided on a large integer representation here, so for integers I'llnjust pass the raw bytes up to the next layer.nFor enums, the representation of choice is int, so that's what I wentnwith.nOf course, if someone decides to pass key as integers (which theynquite often would) they'll have to know how to handle those bytesncorrectly (sign extension being the most likely problem they might ru + + into). \ No newline at end of file diff --git a/_posts/2017-08-02-few-minor-corrections.md b/_posts/2017-08-02-few-minor-corrections.md new file mode 100644 index 0000000..f9909dd --- /dev/null +++ b/_posts/2017-08-02-few-minor-corrections.md @@ -0,0 +1,12 @@ +--- +layout: post +title: "few minor corrections" +date: 2017-08-02 08:16:25 +categories: blog +--- +n + + + few minor corrections + + Just throw EncodingError, don't put the diagnosis in the type: I don'tnthink anyone will try to catch specific types of encoding errorsnanyway (and it'll save me creating more exception types).nAlso, the tag is called "end of content", not "end of input". \ No newline at end of file diff --git a/_posts/2017-08-03-parse-bit-strings.md b/_posts/2017-08-03-parse-bit-strings.md new file mode 100644 index 0000000..ae2482f --- /dev/null +++ b/_posts/2017-08-03-parse-bit-strings.md @@ -0,0 +1,16 @@ +--- +layout: post +title: "parse bit strings" +date: 2017-08-03 06:58:58 +categories: blog +--- +n + + + parse bit strings + + Because this is a DER decoder, which means sizes must always bendeterminate, we only parse primitive bit strings.nWe essentially ignore the constructed flag, though, as it's redundantnwith the indeterminate length on constructed bit strings and I see nonway for an attacker to exploit setting the flash but not using a + + indeterminate length and some encoders might (not unreasonably, butnwrongly nonetheless) always set the flash on bit strings. + + Anecdotally, this code was written yesterday, in the car, while mynwife was driving :slightly_smiling_face: \ No newline at end of file diff --git a/_posts/2017-08-03-sketch-of-the-start-of-decoding-reals.md b/_posts/2017-08-03-sketch-of-the-start-of-decoding-reals.md new file mode 100644 index 0000000..07d1c8a --- /dev/null +++ b/_posts/2017-08-03-sketch-of-the-start-of-decoding-reals.md @@ -0,0 +1,17 @@ +--- +layout: post +title: "sketch of the start of decoding reals" +date: 2017-08-03 14:47:41 +categories: blog +--- +n + + + sketch of the start of decoding reals + + I now handle empty input buffers further down the line, because somenvalues can legitimately be zero.nI won't support base-10 reals because I don't see why I should. + (Frankly, I'd have a hard time explaining to my wife why I'm sittingnin the car coding, while two of the children are sleeping in the backnseat and she's shopping for sunglasses, but the answer to thatnquestion shall remain a mystery). + + Add my iPad is ru + + ing out of juice, actually parsing and reportingnthe values will have to wait a bit. \ No newline at end of file diff --git a/_posts/2017-08-04-a-bit-of-documentation.md b/_posts/2017-08-04-a-bit-of-documentation.md new file mode 100644 index 0000000..a4f94d3 --- /dev/null +++ b/_posts/2017-08-04-a-bit-of-documentation.md @@ -0,0 +1,14 @@ +--- +layout: post +title: "A bit of documentation" +date: 2017-08-04 18:22:43 +categories: blog +--- +n + + + A bit of documentatio + + nMy iPad is out of batteries so I can't really advance the code right + + ow. A bit of documentation, then. \ No newline at end of file diff --git a/_posts/2017-08-05-continue-a-bit-on-decoding-reals.md b/_posts/2017-08-05-continue-a-bit-on-decoding-reals.md new file mode 100644 index 0000000..f90402a --- /dev/null +++ b/_posts/2017-08-05-continue-a-bit-on-decoding-reals.md @@ -0,0 +1,18 @@ +--- +layout: post +title: "continue a bit on decoding reals" +date: 2017-08-05 21:59:19 +categories: blog +--- +n + + + continue a bit on decoding reals + + It's a nice night to be coding on one's phone. + + Special real values are now decoded, and the entire value is copiedninto the buffer. Now we need to get the number of octets for thenexponent (which we'll need to check against the octets we actuallynhave) after which we can finish calculating the mantissa. + + I've decided on using double to represent reals. I could make thatnconfigurable, but I don't see much of a reason for doing that. + + The new Integer class is now used to report the integer value. \ No newline at end of file diff --git a/_posts/2017-08-05-finish-up-the-range-constructor-for-the-integer-class.md b/_posts/2017-08-05-finish-up-the-range-constructor-for-the-integer-class.md new file mode 100644 index 0000000..e43cd4e --- /dev/null +++ b/_posts/2017-08-05-finish-up-the-range-constructor-for-the-integer-class.md @@ -0,0 +1,12 @@ +--- +layout: post +title: "finish up the range constructor for the Integer class" +date: 2017-08-05 20:19:17 +categories: blog +--- +n + + + finish up the range constructor for the Integer class + + The Integer class will probably need a few accessors to make it morenuseable, but the basics are there to be able to encode and decode, Inthink \ No newline at end of file diff --git a/_posts/2017-08-05-first-sketch-of-an-encoder.md b/_posts/2017-08-05-first-sketch-of-an-encoder.md new file mode 100644 index 0000000..1714212 --- /dev/null +++ b/_posts/2017-08-05-first-sketch-of-an-encoder.md @@ -0,0 +1,12 @@ +--- +layout: post +title: "first sketch of an encoder" +date: 2017-08-05 10:00:50 +categories: blog +--- +n + + + first sketch of an encoder + + The encoder looks like it'll be a collection of functions that outputnthe to-encode values as DER to an output iterator. An octet string isneasy enough to encode using this scheme: it's just matter of encodingnthe type, then the length, them copying everything into the output.nThat does mean, though, that I need more than a single-pass inputniterator (because I need to get the distance between the two iteratorsnto get there length).nI might also add a length-and-iterator or length-and-pointer becausenfor non-random-access iterators, getting the distance may be expensive. \ No newline at end of file diff --git a/_posts/2017-08-05-merge-branch-master-of-gitgithub-comblytkerchan-rubicon-git-2.md b/_posts/2017-08-05-merge-branch-master-of-gitgithub-comblytkerchan-rubicon-git-2.md new file mode 100644 index 0000000..2a0f79e --- /dev/null +++ b/_posts/2017-08-05-merge-branch-master-of-gitgithub-comblytkerchan-rubicon-git-2.md @@ -0,0 +1,10 @@ +--- +layout: post +title: "Merge branch 'master' of git@github.com:blytkerchan/-rubicon.git" +date: 2017-08-05 19:45:04 +categories: blog +--- +n + + + Merge branch 'master' of git@github.com:blytkerchan/-rubicon.git \ No newline at end of file diff --git a/_posts/2017-08-05-start-an-integer-class.md b/_posts/2017-08-05-start-an-integer-class.md new file mode 100644 index 0000000..22e2acb --- /dev/null +++ b/_posts/2017-08-05-start-an-integer-class.md @@ -0,0 +1,14 @@ +--- +layout: post +title: "start an integer class" +date: 2017-08-05 19:37:08 +categories: blog +--- +n + + + start an integer class + + I decided earlier that we wouldn't decide on the integer representatio + + client code would need to use. However, without a normalised integernrepresentation it becomes difficult to encode integers. So the integernrepresentation started here is a minimalist wrapper class for integersnof any size. \ No newline at end of file diff --git a/_posts/2017-08-06-pki-layer-cake.md b/_posts/2017-08-06-pki-layer-cake.md new file mode 100644 index 0000000..18c7c04 --- /dev/null +++ b/_posts/2017-08-06-pki-layer-cake.md @@ -0,0 +1,12 @@ +--- +layout: post +title: "PKI Layer Cake" +date: 2017-08-06 14:06:32 +categories: blog +--- +n + + + PKI Layer Cake + + Dan Kaminsky (@dakami) et al wrote this and, in our discussion aboutnthe problems with ASN.1, cited it as evidence against ASN.1.nI haven't analysed the article yet, but at first glance thenASN.1-related issues appear to be implementation issues. \ No newline at end of file diff --git a/_posts/2017-08-10-add-documentation.md b/_posts/2017-08-10-add-documentation.md new file mode 100644 index 0000000..82b56c9 --- /dev/null +++ b/_posts/2017-08-10-add-documentation.md @@ -0,0 +1,16 @@ +--- +layout: post +title: "Add documentation" +date: 2017-08-10 13:56:05 +categories: blog +--- +n + + + Add documentatio + + nThe ASN.1 schema language, which defines the enclosed structure i + + human-readable format. + + Note that it's very unlikely that I'll implement all of these: therenis very little use for me to define classes or constraints in ASN.1nand there exists very little in terms of real-world use of thesenparticular parts of ASN.1. Moreover, almost everything usefulnspecified in X.68{1,2,3} can be expressed in X.680 \ No newline at end of file diff --git a/_posts/2017-08-13-finish-up-real-decoding.md b/_posts/2017-08-13-finish-up-real-decoding.md new file mode 100644 index 0000000..9d03391 --- /dev/null +++ b/_posts/2017-08-13-finish-up-real-decoding.md @@ -0,0 +1,10 @@ +--- +layout: post +title: "Finish up real decoding" +date: 2017-08-13 20:03:43 +categories: blog +--- +n + + + Finish up real decoding \ No newline at end of file diff --git a/_posts/2017-08-14-cpp4theselftaught-com-temporarily-down.md b/_posts/2017-08-14-cpp4theselftaught-com-temporarily-down.md new file mode 100644 index 0000000..1c0fb33 --- /dev/null +++ b/_posts/2017-08-14-cpp4theselftaught-com-temporarily-down.md @@ -0,0 +1,16 @@ +--- +layout: post +title: "cpp4theselftaught.com temporarily down" +date: 2017-08-14 13:54:53 +categories: blog +--- +The C++ for the self-taught site is temporarily down for “unscheduled maintenance” (i.e. a bug). + +I haven’t had time to look into fixing it yet: I just found out it was misbehaving about an hour ago, during my routine check of my websites. I’ll try to fix it tonight and update this post when I have news. + +If you want to help out: you could donate to my BitCoin address 1JE9wominCU1mw1JtD7JWu8vfYfcGQ9pKj. + +**Update (21:58 EDT):** +An automatic updates seems to have bugged out and left the site inoperable. According to the logs this happened sometime during my vacation. The site looks OK now — please let me know if you see anything awry. + +n \ No newline at end of file diff --git a/_posts/2017-08-14-finish-encoding-length-octet-string.md b/_posts/2017-08-14-finish-encoding-length-octet-string.md new file mode 100644 index 0000000..143b237 --- /dev/null +++ b/_posts/2017-08-14-finish-encoding-length-octet-string.md @@ -0,0 +1,22 @@ +--- +layout: post +title: "finish encoding length, octet string" +date: 2017-08-14 12:54:04 +categories: blog +--- +n + + + finish encoding length, octet string + + What better way to use a lunch break than to eat a nice salad and codensomething on your phone? + + Integer can now compact itself to its minimal representation bynremoving leading bytes of nine consecutive bits are zero or there arenonly eight bits and they're all zero; or if the integer is signed and + + one consecutive bits are one. The latter ca + + ot happen if the formernhas been true, so I check one after the other. + + Encoding the length is pretty straightforward once integers ca + + compact themselves. \ No newline at end of file diff --git a/_posts/2017-08-14-fix-a-few-formatting-mistakes-in-the-markdown.md b/_posts/2017-08-14-fix-a-few-formatting-mistakes-in-the-markdown.md new file mode 100644 index 0000000..e16bafe --- /dev/null +++ b/_posts/2017-08-14-fix-a-few-formatting-mistakes-in-the-markdown.md @@ -0,0 +1,14 @@ +--- +layout: post +title: "Fix a few formatting mistakes in the markdown" +date: 2017-08-14 12:16:29 +categories: blog +--- +n + + + Fix a few formatting mistakes in the markdow + + nSo, this is the first time I'm actually looking at this stuff on a real computer, and I noticed the formatting for the markdown was broken. That's easy enough to fix! + + Code won't compile any better because of this, but I still have some code to commit on my phone, so I'll do that before trying to get anything to compile... \ No newline at end of file diff --git a/_posts/2017-08-14-merge-branch-master-of-gitgithub-comblytkerchan-rubicon-git.md b/_posts/2017-08-14-merge-branch-master-of-gitgithub-comblytkerchan-rubicon-git.md new file mode 100644 index 0000000..e2604b4 --- /dev/null +++ b/_posts/2017-08-14-merge-branch-master-of-gitgithub-comblytkerchan-rubicon-git.md @@ -0,0 +1,10 @@ +--- +layout: post +title: "Merge branch 'master' of git@github.com:blytkerchan/-rubicon.git" +date: 2017-08-14 12:54:18 +categories: blog +--- +n + + + Merge branch 'master' of git@github.com:blytkerchan/-rubicon.git \ No newline at end of file diff --git a/_posts/2017-08-15-at-a-few-simple-types-to-the-encoder.md b/_posts/2017-08-15-at-a-few-simple-types-to-the-encoder.md new file mode 100644 index 0000000..2f80bd2 --- /dev/null +++ b/_posts/2017-08-15-at-a-few-simple-types-to-the-encoder.md @@ -0,0 +1,16 @@ +--- +layout: post +title: "At a few simple types to the encoder" +date: 2017-08-15 00:08:42 +categories: blog +--- +n + + + At a few simple types to the encoder + + Adding simple types, such as booleans and end-of-contents, is very easynwith this approach: as long as the output iterator is updated bynanything that writes to the output we simply push whatever we needninto the output by writing to it. + + Note that an output iterator should be smart enough to know when itncan't write anymore: as back_inserter will do nicely; a pointer into anfixed-size array is dangerous. + + Caveat emptor, as the Romans used to say... \ No newline at end of file diff --git a/_posts/2017-08-17-encode-integers.md b/_posts/2017-08-17-encode-integers.md new file mode 100644 index 0000000..20d00ad --- /dev/null +++ b/_posts/2017-08-17-encode-integers.md @@ -0,0 +1,12 @@ +--- +layout: post +title: "encode integers" +date: 2017-08-17 23:31:08 +categories: blog +--- +n + + + encode integers + + Note we passed the integer by value so we can compact it withoutnaffecting the calling code. \ No newline at end of file diff --git a/_posts/2017-08-20-encode-bit-strings.md b/_posts/2017-08-20-encode-bit-strings.md new file mode 100644 index 0000000..666da81 --- /dev/null +++ b/_posts/2017-08-20-encode-bit-strings.md @@ -0,0 +1,16 @@ +--- +layout: post +title: "encode bit strings" +date: 2017-08-20 16:06:32 +categories: blog +--- +n + + + encode bit strings + + Representing a bit string as a pair of iterators and an integer shouldnbe sufficient in almost any case, and add we require (because this isna DER encoder) for the entire bit string to be available, this'll do. + + Note, again, that the expected input iterator is a multi-pass inputniterator: we need to be able to calculate the distance between thenbegi + + ing and the end of the input range. \ No newline at end of file diff --git a/_posts/2017-08-20-encode-enumerated-values.md b/_posts/2017-08-20-encode-enumerated-values.md new file mode 100644 index 0000000..9d5ff3b --- /dev/null +++ b/_posts/2017-08-20-encode-enumerated-values.md @@ -0,0 +1,12 @@ +--- +layout: post +title: "Encode enumerated values" +date: 2017-08-20 13:21:06 +categories: blog +--- +n + + + Encode enumerated values + + Next step will be to encode bit strings. We'll need to decide on anrepresentation of those first, though, as the decoder doesn't have anrepresentation other than a pair of iterators and a bits-to-ignorencount at the moment.nOf course, were don't actually need anything more than a pair ofniterators and a bits-to-ignore count... \ No newline at end of file diff --git a/_posts/2017-08-20-encode-reals.md b/_posts/2017-08-20-encode-reals.md new file mode 100644 index 0000000..f438368 --- /dev/null +++ b/_posts/2017-08-20-encode-reals.md @@ -0,0 +1,16 @@ +--- +layout: post +title: "encode reals" +date: 2017-08-20 19:25:59 +categories: blog +--- +n + + + encode reals + + Real encoding is among the more complex encodings in DER. It doesn'tnfollow IEEE-754 at all, so we have to extract the mantissa and thenexponent, in the case of the mantissa decide whether to represent itnin base 2, 8 or 16 and in either case encode the sign and the value.nSubnormal values resolve to zero in the mantissa and can'tnrepresented, so they will hit am assertion on line 175 of this code. + + I should note that I heartily dislike floating points for variousnreasons, not least of which quirks like subnormal values, epsilon,nwhacky encodings, etc. Add to that that most, if not all, floatingnpoint values "in the wild" are really either a function of somenirrational number (pi, e, ...), a fraction, or an integer measurementnwith scaling (gain and offset) lossy representations like IEEE-754 ornthis one are really u + + ecessary (just store the function, fraction, ornmeasurements in stead). \ No newline at end of file diff --git a/_posts/2017-08-22-some-proof-of-concept-code-for-reals-2.md b/_posts/2017-08-22-some-proof-of-concept-code-for-reals-2.md new file mode 100644 index 0000000..26e2677 --- /dev/null +++ b/_posts/2017-08-22-some-proof-of-concept-code-for-reals-2.md @@ -0,0 +1,22 @@ +--- +layout: post +title: "Some proof-of-concept code for reals" +date: 2017-08-22 23:34:28 +categories: blog +--- +n + + + Some proof-of-concept code for reals + + Decoding reals needs some refactoring: while I've been reading up o + + IEEE-754 representation and the standard C++ functions for floatingnpoint types (and confirming that you should avoid touching this stuff ifnyou can) I've come to the conclusion that writing this part of the codenon a tablet or phone is probably not going to lead anywhere, so Indecided to pull out my laptop and get some code ru + + ing. + + The Details::Integer class compiles and runs - at least as far as thisnproof of concept is concerned. The POC consists of a brand newnimplementation of dissecting and building floating point values (from orninto double, resp.). I've tested it against an independentnimplementation (the calculator on my phone) and arrived at thenconclusion that at least for the two values I've tried so far, it isncorrect. That leads me to believe the method itself is correct as well. + + In this commit, we also add the dependencies we'll be working with fromnhere on out: CMake for building (and my cmake submodule for the sharednconfiguration stuff) and the exceptions library with its support for myncontract-theory macros (pre-conditions, invariants, etc.). + + I will use the two new functions in the encoder and decoder, replacingnpart of what's there now. \ No newline at end of file diff --git a/_posts/2017-08-23-refactor-encoding-and-decoding-reals.md b/_posts/2017-08-23-refactor-encoding-and-decoding-reals.md new file mode 100644 index 0000000..1056828 --- /dev/null +++ b/_posts/2017-08-23-refactor-encoding-and-decoding-reals.md @@ -0,0 +1,12 @@ +--- +layout: post +title: "Refactor encoding and decoding reals" +date: 2017-08-23 21:26:57 +categories: blog +--- +n + + + Refactor encoding and decoding reals + + Take the proof-of-concept code and refactor it into the encoder andndecoder. \ No newline at end of file diff --git a/_posts/2017-08-24-add-missing-exceptions-hpp-and-new-gitignore.md b/_posts/2017-08-24-add-missing-exceptions-hpp-and-new-gitignore.md new file mode 100644 index 0000000..32b3447 --- /dev/null +++ b/_posts/2017-08-24-add-missing-exceptions-hpp-and-new-gitignore.md @@ -0,0 +1,10 @@ +--- +layout: post +title: "Add missing exceptions.hpp and new .gitignore" +date: 2017-08-24 18:04:27 +categories: blog +--- +n + + + Add missing exceptions.hpp and new .gitignore \ No newline at end of file diff --git a/_posts/2017-08-24-parse-octet-strings-and-nulls.md b/_posts/2017-08-24-parse-octet-strings-and-nulls.md new file mode 100644 index 0000000..13a006a --- /dev/null +++ b/_posts/2017-08-24-parse-octet-strings-and-nulls.md @@ -0,0 +1,12 @@ +--- +layout: post +title: "parse octet strings and nulls" +date: 2017-08-24 21:41:00 +categories: blog +--- +n + + + parse octet strings and nulls + + This remains rather simple in terms of how parsing works. The nextnstep, parsing sequences, will get us to recursive parsing. \ No newline at end of file diff --git a/_posts/2017-08-29-encode-and-decode-sequences-and-sets.md b/_posts/2017-08-29-encode-and-decode-sequences-and-sets.md new file mode 100644 index 0000000..070a13e --- /dev/null +++ b/_posts/2017-08-29-encode-and-decode-sequences-and-sets.md @@ -0,0 +1,20 @@ +--- +layout: post +title: "Encode and decode sequences and sets" +date: 2017-08-29 17:28:35 +categories: blog +--- +n + + + Encode and decode sequences and sets + + Decoding sequences and sets is pretty much a question of trying to keepntrack of where you are in the encoded sequence or set (the onlyndifference between the two being that the order in a sequence isnsignificant whereas it isn't in a set). Once you run out of input octetsnfor the sequence or set, you're done with it. At this level, we don't + + eed to know what was expected in terms of the contents of the sequencenor set.nHence, we have a simple pair of stacks: one for the type (sequence ornset) and one for the length of the sequence or set being decoded. Wenallow the stack size to be configured, but the default (16) seemsnreasonable to me. Whenever we decode a sequence or set, we simply pushnthe remaining octets for that sequence or set, and its type, onto thenstacks and wait for them to run out. Because DER doesn't supportnindeterminate lengths, that's all we need to do. If the length of thensequence and the length of the contents don't match up, that willninevitably result in a decoding error. That also addresses then"syntactic poison" Adam (@jadamcrain) pointed out i + + https://twitter.com/jadamcrain/status/891984403706130432 + + For the encoder, things are slightly more painful: because DER doesn'tnallow for indeterminate lengths, we have to know the length of whatevernit is we're encoding, which in the case of a sequence depends on thencontents of that sequence. The only way I can deal with that at thenmoment is to pretend whoever is calling the encoder will encode into anvector or buffer of some sort, and be able to pass us the iterators intonthat vector or buffer so we can insert it into whatever it is we'renencoding into. This makes sense w.r.t. the way most of these things arengenerally structured, but it does impose a serious limitation on whoevernwould end up using this code. + + Now for some code generation. \ No newline at end of file diff --git a/_posts/2017-09-01-rudimentary-preprocessor.md b/_posts/2017-09-01-rudimentary-preprocessor.md new file mode 100644 index 0000000..a5f3d2f --- /dev/null +++ b/_posts/2017-09-01-rudimentary-preprocessor.md @@ -0,0 +1,6 @@ +--- +layout: post +title: "Rudimentary preprocessor" +date: 2017-09-01 06:47:09 +categories: blog +--- diff --git a/_posts/2017-09-27-fairly-complete-antlr-grammar-for-asn-1-3.md b/_posts/2017-09-27-fairly-complete-antlr-grammar-for-asn-1-3.md new file mode 100644 index 0000000..b47ffd0 --- /dev/null +++ b/_posts/2017-09-27-fairly-complete-antlr-grammar-for-asn-1-3.md @@ -0,0 +1,19 @@ +--- +layout: post +title: "(Fairly) complete ANTLR grammar for ASN.1" +date: 2017-09-27 20:40:24 +categories: blog +--- +n + + + (Fairly) complete ANTLR grammar for ASN.1 + + The grammar was mostly just copied out of X.680 and ANTLRised. To donthis, some minor simplifications were needed removing intermediatenproductions that were probably originally necessary for whateverncompiler compiler the ASN.1 folks used when testing their definitions + (probably some Yacc variant). + + I also had to change the naming convention used for the productions andnlexical tokens: while I usually don't care that much about namingnconventions, they exist for parsing purposes (our brains are simplynbetter at parsing than most computer-generated recognizers). ANTLR hasncertain rules about what productions and tokens should look like, so we + + eed to follow those rules if we want to use ANTLR. + + I'll also only be implementing X.680, so any production that referencednthigns that are not part of X.680 are (partly) removed from the grammar.nI've further removed stuff that I either won't implement or would resultnin ambiguities or redundancies in the grammar. \ No newline at end of file diff --git a/_posts/2017-09-27-refactor-the-preprocessor-into-a-parser-2.md b/_posts/2017-09-27-refactor-the-preprocessor-into-a-parser-2.md new file mode 100644 index 0000000..cee0681 --- /dev/null +++ b/_posts/2017-09-27-refactor-the-preprocessor-into-a-parser-2.md @@ -0,0 +1,14 @@ +--- +layout: post +title: "Refactor the preprocessor into a parser" +date: 2017-09-27 20:40:39 +categories: blog +--- +n + + + Refactor the preprocessor into a parser + + The state machine has been replaced by an actual parser and a listener,nwhich will output the processed file. + + Also implement the -o optio \ No newline at end of file diff --git a/_posts/2017-09-28-simplify-and-fix-the-grammar.md b/_posts/2017-09-28-simplify-and-fix-the-grammar.md new file mode 100644 index 0000000..01e7b48 --- /dev/null +++ b/_posts/2017-09-28-simplify-and-fix-the-grammar.md @@ -0,0 +1,14 @@ +--- +layout: post +title: "Simplify and fix the grammar" +date: 2017-09-28 17:29:53 +categories: blog +--- +n + + + Simplify and fix the grammar + + The parser generated from the grammar parses everything into a tree, sonI've run it against a real ASN.1 schema (using ANTLR's grun) and fixednit where fixes were needed. + + I've also simplified it a bit to flatten the generated parse tree, whichnwill make our job parsing it quite a bit easier. \ No newline at end of file diff --git a/_posts/2017-09-29-started-work-on-the-compiler.md b/_posts/2017-09-29-started-work-on-the-compiler.md new file mode 100644 index 0000000..1332ff1 --- /dev/null +++ b/_posts/2017-09-29-started-work-on-the-compiler.md @@ -0,0 +1,14 @@ +--- +layout: post +title: "Started work on the compiler" +date: 2017-09-29 06:53:14 +categories: blog +--- +n + + + Started work on the compiler + + I've renamed the preprocessor to asn1p to be more consistent with whatnthe compiler will be called (asn1c) and started work on the compiler byncreating a listener to gather information from the parse tree. + + I've also added a dependency for my little tracing library (which I'venput on GitHub for this project, and made open source) as that will helpnwith outputting traces etc. \ No newline at end of file diff --git a/convert_wordpress_to_markdown.py b/convert_wordpress_to_markdown.py new file mode 100644 index 0000000..f6f6557 --- /dev/null +++ b/convert_wordpress_to_markdown.py @@ -0,0 +1,182 @@ +#!/usr/bin/env python3 +""" +Convert WordPress export files to Jekyll markdown format. +""" + +import os +import re +import html +from datetime import datetime +from pathlib import Path +import html2text + +def parse_wp_file(filepath): + """Parse a WordPress export file and return a dictionary of fields.""" + data = {} + with open(filepath, 'r', encoding='utf-8') as f: + for line in f: + line = line.strip() + if ':' in line: + # Split on first colon only + key, value = line.split(':', 1) + key = key.strip() + value = value.strip() + # Remove quotes from string values + if value.startswith('"') and value.endswith('"'): + value = value[1:-1] + # WordPress uses 'n' as escaped newline in the export + # We need to convert these to actual newlines + value = value.replace('\\n', '\n') # Handle actual escape sequences first + # Now handle the 'n' character which WordPress uses for newlines + # This is trickier - we need to be careful not to break words + # In WordPress exports, 'n' at specific positions means newline + value = re.sub(r'([>)])n', r'\1\n', value) # After closing tags/parens + value = re.sub(r'n([<(])', r'\n\1', value) # Before opening tags/parens + value = value.replace('nn', '\n\n') # Double-n is always paragraph break + data[key] = value + return data + +def convert_html_to_markdown(html_content): + """Convert HTML content to Markdown.""" + # Initialize html2text converter + h = html2text.HTML2Text() + h.body_width = 0 # Don't wrap lines + h.unicode_snob = True + h.ignore_links = False + h.ignore_images = False + h.ignore_emphasis = False + + # Handle WordPress syntax highlighter plugin output (wp_syntax) + # Extract code from
...
+ def extract_wp_syntax(match): + content = match.group(0) + # Extract language from class="xxx" in
 tag
+        lang_match = re.search(r'
]*>(.*?)
', content, re.DOTALL) + if code_match: + code = code_match.group(1) + # Unescape HTML entities + code = html.unescape(code) + # Remove span tags but keep content + code = re.sub(r']*>', '', code) + code = re.sub(r'', '', code) + # Remove   + code = code.replace('\xa0', ' ') + return f'\n```{lang}\n{code}\n```\n' + return content + + html_content = re.sub(r'
.*?
', extract_wp_syntax, html_content, flags=re.DOTALL) + + # Handle WordPress code blocks with lang attribute + # Convert
...
to ```xxx\n...\n``` + def replace_code_block(match): + lang = match.group(1) if match.group(1) else '' + code = match.group(2) + # Unescape the code content + code = html.unescape(code) + return f'\n```{lang}\n{code}\n```\n' + + html_content = re.sub(r'(.*?)
', replace_code_block, html_content, flags=re.DOTALL) + + # Unescape HTML entities + html_content = html.unescape(html_content) + + # Convert to markdown + markdown = h.handle(html_content) + + # Clean up extra whitespace + markdown = re.sub(r'\n{3,}', '\n\n', markdown) + + return markdown.strip() + +def generate_slug(title): + """Generate a URL-friendly slug from a title.""" + slug = title.lower() + slug = re.sub(r'[^\w\s-]', '', slug) + slug = re.sub(r'[-\s]+', '-', slug) + return slug.strip('-') + +def convert_wp_to_jekyll(wp_file, output_dir): + """Convert a single WordPress file to Jekyll markdown format.""" + data = parse_wp_file(wp_file) + + # Only process published posts (not revisions, attachments, etc.) + if data.get('post_type') != 'post' or data.get('post_status') != 'publish': + return None + + # Extract metadata + title = data.get('post_title', 'Untitled') + post_date = data.get('post_date', '') + content = data.get('post_content', '') + post_name = data.get('post_name', generate_slug(title)) + + # Parse date + try: + date_obj = datetime.strptime(post_date, '%Y-%m-%d %H:%M:%S') + date_str = date_obj.strftime('%Y-%m-%d') + except ValueError: + print(f"Warning: Could not parse date '{post_date}' for post '{title}'") + return None + + # Convert HTML content to Markdown + markdown_content = convert_html_to_markdown(content) + + # Create Jekyll front matter + front_matter = f"""--- +layout: post +title: "{title}" +date: {post_date} +categories: blog +--- +""" + + # Combine front matter and content + full_content = front_matter + markdown_content + + # Generate filename + filename = f"{date_str}-{post_name}.md" + output_path = os.path.join(output_dir, filename) + + # Write to file + with open(output_path, 'w', encoding='utf-8') as f: + f.write(full_content) + + return filename + +def main(): + """Main conversion function.""" + script_dir = Path(__file__).parent + wp_dir = script_dir / '_drafts' / 'will_not_backport' + output_dir = script_dir / '_posts' + + # Create output directory if it doesn't exist + output_dir.mkdir(exist_ok=True) + + # Get all WordPress post files + wp_files = sorted(wp_dir.glob('wp_post_*.txt')) + + converted = 0 + skipped = 0 + + print(f"Found {len(wp_files)} WordPress files to process...") + + for wp_file in wp_files: + try: + result = convert_wp_to_jekyll(wp_file, output_dir) + if result: + print(f"✓ Converted: {wp_file.name} -> {result}") + converted += 1 + else: + skipped += 1 + except Exception as e: + print(f"✗ Error converting {wp_file.name}: {e}") + skipped += 1 + + print(f"\nConversion complete!") + print(f" Converted: {converted} posts") + print(f" Skipped: {skipped} files") + +if __name__ == '__main__': + main()