Any proposal to add defineenum functionality to C++26 Reflection?
It's helpful when we create multiple enums that have related keys with cleaner code, no copy-pasting or boilerplates.
For example:
enum class ConstantIndex: sizet {
ProgramNameString,
VersionString,
GlobalInitFuncPtr,
ApplePropTablePtr,
BananaPropTablePtr,
WatermelonPropTablePtr,
// ... XPropTablePtr for each X in enum fruit
GetAppleFuncPtr,
GetBananaFuncPtr,
GetWatermelonFuncPtr,
// ... GetXFuncPtr for each X in enum Fruit
NumEntries,
};
enum class Fruit {
Apple,
Banana,
Watermelon,
// ... The list keeps growing as time goes by.
};
Currently we keep consistency between
Copy and paste manually: We copy all the new `Fruit` items to `ConstantIndex`, add prefix `Get` and suffix `FuncPtr` one by one; then copy again, this time add suffix `PropTablePtr` one by one (maybe some advanced tools of editor can help, but I'm not an editor expert :<). It's more troublesome when related enums are scattered in multiple source files.
Generate with macros: We create a generator macro
Example of macro-based generation:
#define MAKEPROPTABLEPTRENTRY(FruitName) FruitName##PropTablePtr,
#define MAKEGETFUNCPTRENTRY(FruitName) Get##FruitName##FuncPtr,
enum class ConstantIndex {
ProgramNameString,
VersionString,
GlobalInitFuncPtr,
FRUITFOREACH(MAKEPROPTABLEPTRENTRY)
FRUITFOREACH(MAKEGETFUNCPTRENTRY)
};
#undef MAKEPROPTABLEPTRENTRY
#undef MAKEGETFUNCPTRENTRY
The issues above can be solved elegantly with static reflection (details of
// An alternative is P3394: Annotations for Reflection
struct FruitItem {
std::stringview name;
bool presentInConstantIndex;
};
constexpr auto FRUITITEMS = std::array{
FruitItem{.name = "Apple", .presentInConstantIndex = true},
// ...
};
enum class Fruit;
DEFINEENUM(^^Fruit,
FRUITITEMS | std::views::transform(&FruitItem::name));
enum class ConstantIndex: sizet;
DEFINEENUM(^^ConstantIndex,
"ProgramNameString",
"VersionString",
"GlobalInitFuncPtr",
FRUITITEMS
| std::views::filter(&FruitItem::presentInConstantIndex)
| std::views::transform(&FruitItem::name)
// NumEntries can be replaced by enumeratorsof(^^ConstantIndex).size()
);
https://redd.it/1j1o9jv
@r_cpp
It's helpful when we create multiple enums that have related keys with cleaner code, no copy-pasting or boilerplates.
For example:
enum class ConstantIndex: sizet {
ProgramNameString,
VersionString,
GlobalInitFuncPtr,
ApplePropTablePtr,
BananaPropTablePtr,
WatermelonPropTablePtr,
// ... XPropTablePtr for each X in enum fruit
GetAppleFuncPtr,
GetBananaFuncPtr,
GetWatermelonFuncPtr,
// ... GetXFuncPtr for each X in enum Fruit
NumEntries,
};
enum class Fruit {
Apple,
Banana,
Watermelon,
// ... The list keeps growing as time goes by.
};
Currently we keep consistency between
ConstantIndex and Fruit by either of the following methods:Copy and paste manually: We copy all the new `Fruit` items to `ConstantIndex`, add prefix `Get` and suffix `FuncPtr` one by one; then copy again, this time add suffix `PropTablePtr` one by one (maybe some advanced tools of editor can help, but I'm not an editor expert :<). It's more troublesome when related enums are scattered in multiple source files.
Generate with macros: We create a generator macro
FRUIT_FOR_EACH(F) F(Apple) F(Banana) ... and generate ConstantIndex items as code below. Yet macro-based method has a crucial drawback that flexibility is lacked: What if we want some specified Fruit items not to be added to ConstantIndex? Mulitiple generators are required (FRUIT_FOR_EACH, FRUIT_NO_CONSTANT_INDEX_FOR_EACH, and more and more...) and code maintenance is still a big problem.Example of macro-based generation:
#define MAKEPROPTABLEPTRENTRY(FruitName) FruitName##PropTablePtr,
#define MAKEGETFUNCPTRENTRY(FruitName) Get##FruitName##FuncPtr,
enum class ConstantIndex {
ProgramNameString,
VersionString,
GlobalInitFuncPtr,
FRUITFOREACH(MAKEPROPTABLEPTRENTRY)
FRUITFOREACH(MAKEGETFUNCPTRENTRY)
};
#undef MAKEPROPTABLEPTRENTRY
#undef MAKEGETFUNCPTRENTRY
The issues above can be solved elegantly with static reflection (details of
DEFINE_ENUM and its design is omitted for simplicity):// An alternative is P3394: Annotations for Reflection
struct FruitItem {
std::stringview name;
bool presentInConstantIndex;
};
constexpr auto FRUITITEMS = std::array{
FruitItem{.name = "Apple", .presentInConstantIndex = true},
// ...
};
enum class Fruit;
DEFINEENUM(^^Fruit,
FRUITITEMS | std::views::transform(&FruitItem::name));
enum class ConstantIndex: sizet;
DEFINEENUM(^^ConstantIndex,
"ProgramNameString",
"VersionString",
"GlobalInitFuncPtr",
FRUITITEMS
| std::views::filter(&FruitItem::presentInConstantIndex)
| std::views::transform(&FruitItem::name)
// NumEntries can be replaced by enumeratorsof(^^ConstantIndex).size()
);
https://redd.it/1j1o9jv
@r_cpp
Reddit
From the cpp community on Reddit
Explore this post and more from the cpp community
gnuplot with C++
I'm trying to write a program in C++ using gnuplot, but I can't. I get the message " 'gnuplot' is not recognized as an internal or external command, operable program or batch file". Please help me fix this. Here is the code:
#include <iostream>
#include <fstream>
#include <cmath>
using namespace std;
#define GNUPLOTNAME "gnuplot -persist"
struct Point
{
double x, y;
};
class Figure
{
int num;
Point *coord;
public:
Figure() {};
Figure(int num, Point _coord)
{
num = _num;
coord = new Point[num + 1];
for (int i = 0; i < num; i++)
coord[i] = _coord[i];
coord[num] = _coord[0];
}
~Figure()
{
num = 0;
delete[] coord;
coord = nullptr;
}
void save(string nameOfFile, int a)
{
ofstream file;
file.open(nameOfFile + ".txt");
{
for (int i = 0; i < num; i++)
{
if (coord[i].x == 0.0 && coord[i].y == 0.0)
file << "\n";
else
file << coord[i].x << " " << coord[i].y << endl;
}
if (a == 1)
file << coord[0].x << " " << coord[0].y << endl;
file.close();
}
}
void DrawLines(const string nameOfFile, int a)
{
string str;
if (a == 1)
str = "set grid\nset xrange[-5:15]\nset yrange[-2:10]\nplot \"" + nameOfFile + ".txt" + "\" lt 7 w lp\n";
if (a == 2)
str = "set grid\nset xrange[-5:15]\nset yrange[-2:10]\nplot \"" + nameOfFile + ".txt" + "\" lt 7 w p\n";
FILE pipe = popen(GNUPLOTNAME, "w");
if (pipe != NULL)
{
fprintf(pipe, "%s", str.cstr());
fflush(pipe);
pclose(pipe);
}
}
};
int main(void)
{
int l1 = 3, l2 = 3;
Point p1l1;
Point p2l2;
p10 = {1.0, 0.0};
p11 = {1.0, 6.0};
p12 = {8.0, 3.0};
// p13 = {6.0, 0.0};
p20 = {2.0, 1.0};
p21 = {2.0, 2.0};
p22 = {4.0, 4.0};
// p23 = {3.0, 1.0};
Figure z1(l1, p1);
z1.save("z1", 1);
z1.DrawLines("z1", 1);
Figure z2(l2, p2);
z2.save("z2", 1);
z2.DrawLines("z2", 1);
return 0;
}
https://redd.it/1j4rthn
@r_cpp
I'm trying to write a program in C++ using gnuplot, but I can't. I get the message " 'gnuplot' is not recognized as an internal or external command, operable program or batch file". Please help me fix this. Here is the code:
#include <iostream>
#include <fstream>
#include <cmath>
using namespace std;
#define GNUPLOTNAME "gnuplot -persist"
struct Point
{
double x, y;
};
class Figure
{
int num;
Point *coord;
public:
Figure() {};
Figure(int num, Point _coord)
{
num = _num;
coord = new Point[num + 1];
for (int i = 0; i < num; i++)
coord[i] = _coord[i];
coord[num] = _coord[0];
}
~Figure()
{
num = 0;
delete[] coord;
coord = nullptr;
}
void save(string nameOfFile, int a)
{
ofstream file;
file.open(nameOfFile + ".txt");
{
for (int i = 0; i < num; i++)
{
if (coord[i].x == 0.0 && coord[i].y == 0.0)
file << "\n";
else
file << coord[i].x << " " << coord[i].y << endl;
}
if (a == 1)
file << coord[0].x << " " << coord[0].y << endl;
file.close();
}
}
void DrawLines(const string nameOfFile, int a)
{
string str;
if (a == 1)
str = "set grid\nset xrange[-5:15]\nset yrange[-2:10]\nplot \"" + nameOfFile + ".txt" + "\" lt 7 w lp\n";
if (a == 2)
str = "set grid\nset xrange[-5:15]\nset yrange[-2:10]\nplot \"" + nameOfFile + ".txt" + "\" lt 7 w p\n";
FILE pipe = popen(GNUPLOTNAME, "w");
if (pipe != NULL)
{
fprintf(pipe, "%s", str.cstr());
fflush(pipe);
pclose(pipe);
}
}
};
int main(void)
{
int l1 = 3, l2 = 3;
Point p1l1;
Point p2l2;
p10 = {1.0, 0.0};
p11 = {1.0, 6.0};
p12 = {8.0, 3.0};
// p13 = {6.0, 0.0};
p20 = {2.0, 1.0};
p21 = {2.0, 2.0};
p22 = {4.0, 4.0};
// p23 = {3.0, 1.0};
Figure z1(l1, p1);
z1.save("z1", 1);
z1.DrawLines("z1", 1);
Figure z2(l2, p2);
z2.save("z2", 1);
z2.DrawLines("z2", 1);
return 0;
}
https://redd.it/1j4rthn
@r_cpp
Reddit
From the cpp community on Reddit
Explore this post and more from the cpp community
[Concepts] Is there a technical reason we cannot use static_assert in a requirement-seq?
I've been pretty happy that simple-requirements are so successfully simple, e.g.
template <typename F, typename P>
concept SingleArgument = requires(F f, P p)
{
f(p);
};
I read that very much like "if `f(p);` compiles, `F` and `P` satisfy `SingleArgument`.
But for some reason that doesn't include `static_assert`
template <typename F, typename P>
concept UnaryPredicate = requires(F f, P p)
{
f(p);
// doesn't compile:
static_assert( std::is_same_v<decltype(f(p)),bool> );
};
- clang: ` error: expected expression`
- gcc: ` error: expected primary-expression before 'static_assert'`
- msvc: ` error C2760: syntax error: 'static_assert' was unexpected here; expected 'expression'`
I mean I guess? I've never really had to think about what type of thing `static_assert` actually is. Guess it's not an expression.
Now there are ways around it of course, where you stop using simple requirements:
- compound requirement:
- `{ f(p) } -> std::same_as<bool>;`
- I understand this *now* but that took some reading. Especially when I looked up `std::same_as` and realized it takes *two* parameters and my required return type is the *second* parameter.
- Originally I thought I had to fill in both, using `decltype` to get my return type like `std::same_as<decltype(f(p)),bool>`
- home-made compund requirement:
- `{ f(p) } -> snns::returns<bool>;`
- it's a bad name in a vacuum but it's pretty obvious what it does when seen in a constraint-expression
- type requirement:
- `typename std::enable_if_t<std::is_same_v<decltype(f(p)), bool>, int>;`
- I don't love it. I do not love it.
- my concept users are going to see that and think "...what?"
- I'll be honest here, *I* am going to see that and think "...what?"
- what is that `int` even doing there? It is up to no good I bet.
- Macros!
- everybody loves macros
- we definitely need more in the language
template <typename F, typename P>
concept UnaryPredicate = requires(F f, P p)
{
f(p);
SNNS_FUNCTION_RETURNS_TYPE( f(p), bool );
};
where `SNNS_FUNCTION_RETURNS_TYPE` is:
#define SNNS_FUNCTION_RETURNS_TYPE( FUNCTION, TYPE)\
typename \
std::enable_if_t < \
std::is_same_v < \
decltype( FUNCTION ), \
TYPE \
>, int> // here's int again!
though I guess I could have done it with a compound-expression also?
#define SNNS_FUNCTION_RETURNS_TYPE( FUNCTION, TYPE)\
{ FUNCTION } -> TYPE
But getting back around, this doesn't compile:
template <typename F, typename P>
concept UnaryPredicate = requires(F f, P p)
{
f(p);
static_assert( std::is_same_v<decltype(f(p)),bool> );
};
So...
> **[Concepts] Is there a technical reason we cannot use static_assert in requirement-seq?**
https://redd.it/1jezdkk
@r_cpp
I've been pretty happy that simple-requirements are so successfully simple, e.g.
template <typename F, typename P>
concept SingleArgument = requires(F f, P p)
{
f(p);
};
I read that very much like "if `f(p);` compiles, `F` and `P` satisfy `SingleArgument`.
But for some reason that doesn't include `static_assert`
template <typename F, typename P>
concept UnaryPredicate = requires(F f, P p)
{
f(p);
// doesn't compile:
static_assert( std::is_same_v<decltype(f(p)),bool> );
};
- clang: ` error: expected expression`
- gcc: ` error: expected primary-expression before 'static_assert'`
- msvc: ` error C2760: syntax error: 'static_assert' was unexpected here; expected 'expression'`
I mean I guess? I've never really had to think about what type of thing `static_assert` actually is. Guess it's not an expression.
Now there are ways around it of course, where you stop using simple requirements:
- compound requirement:
- `{ f(p) } -> std::same_as<bool>;`
- I understand this *now* but that took some reading. Especially when I looked up `std::same_as` and realized it takes *two* parameters and my required return type is the *second* parameter.
- Originally I thought I had to fill in both, using `decltype` to get my return type like `std::same_as<decltype(f(p)),bool>`
- home-made compund requirement:
- `{ f(p) } -> snns::returns<bool>;`
- it's a bad name in a vacuum but it's pretty obvious what it does when seen in a constraint-expression
- type requirement:
- `typename std::enable_if_t<std::is_same_v<decltype(f(p)), bool>, int>;`
- I don't love it. I do not love it.
- my concept users are going to see that and think "...what?"
- I'll be honest here, *I* am going to see that and think "...what?"
- what is that `int` even doing there? It is up to no good I bet.
- Macros!
- everybody loves macros
- we definitely need more in the language
template <typename F, typename P>
concept UnaryPredicate = requires(F f, P p)
{
f(p);
SNNS_FUNCTION_RETURNS_TYPE( f(p), bool );
};
where `SNNS_FUNCTION_RETURNS_TYPE` is:
#define SNNS_FUNCTION_RETURNS_TYPE( FUNCTION, TYPE)\
typename \
std::enable_if_t < \
std::is_same_v < \
decltype( FUNCTION ), \
TYPE \
>, int> // here's int again!
though I guess I could have done it with a compound-expression also?
#define SNNS_FUNCTION_RETURNS_TYPE( FUNCTION, TYPE)\
{ FUNCTION } -> TYPE
But getting back around, this doesn't compile:
template <typename F, typename P>
concept UnaryPredicate = requires(F f, P p)
{
f(p);
static_assert( std::is_same_v<decltype(f(p)),bool> );
};
So...
> **[Concepts] Is there a technical reason we cannot use static_assert in requirement-seq?**
https://redd.it/1jezdkk
@r_cpp
Reddit
From the cpp community on Reddit
Explore this post and more from the cpp community
this way. Users are
obliged to use suppressions if they wish to avoid this noise.
* ==================== FIXED BUGS ====================
The following bugs have been fixed or resolved. Note that "n-i-bz"
stands for "not in bugzilla" -- that is, a bug that was reported to us
but never got a bugzilla entry. We encourage you to file bugs in
bugzilla () rather
than mailing the developers (or mailing lists) directly -- bugs that
are not entered into bugzilla tend to get forgotten about or ignored.
290061 pie elf always loaded at 0x108000
396415 Valgrind is not looking up $ORIGIN rpath of shebang programs
420682 io_pgetevents is not supported
468575 Add support for RISC-V
469782 Valgrind does not support zstd-compressed debug sections
487296 --track-fds=yes and --track-fds=all report erroneous information
when fds 0, 1, or 2 are used as non-std
489913 WARNING: unhandled amd64-linux syscall: 444 (landlock_create_ruleset)
493433 Add --modify-fds=[no|high\] option
494246 syscall fsopen not wrapped
494327 Crash when running Helgrind built with #define TRACE_PTH_FNS 1
494337 All threaded applications cause still holding lock errors
495488 Add FreeBSD getrlimitusage syscall wrapper
495816 s390x: Fix disassembler segfault for C[G\]RT and CL[G\]RT
495817 s390x: Disassembly to match objdump -d output
496370 Illumos: signal handling is broken
496571 False positive for null key passed to bpf_map_get_next_key syscall.
496950 s390x: Fix hardware capabilities and EmFail codes
497130 Recognize new DWARF5 DW_LANG constants
497455 Update drd/scripts/download-and-build-gcc
497723 Enabling Ada demangling breaks callgrind differentiation between
overloaded functions and procedures
498037 s390x: Add disassembly checker
498143 False positive on EVIOCGRAB ioctl
498317 FdBadUse is not a valid CoreError type in a suppression
even though it's generated by --gen-suppressions=yes
498421 s390x: support BPP, BPRP and NIAI insns
498422 s390x: Fix VLRL and VSTRL insns
498492 none/tests/amd64/lzcnt64 crashes on FreeBSD compiled with clang
498629 s390x: Fix S[L\]HHHR and S[L\]HHLR insns
498632 s390x: Fix LNGFR insn
498942 s390x: Rework s390_disasm interface
499183 FreeBSD: differences in avx-vmovq output
499212 mmap() with MAP_ALIGNED() returns unaligned pointer
501119 memcheck/tests/pointer-trace fails when run on NFS filesystem
501194 Fix ML_(check_macho_and_get_rw_loads) so that it is correct for
any number of segment commands
501348 glibc built with -march=x86-64-v3 does not work due to ld.so memcmp
501479 Illumos DRD pthread_mutex_init wrapper errors
501365 syscall userfaultfd not wrapped
501846 Add x86 Linux shm wrappers
501850 FreeBSD syscall arguments 7 and 8 incorrect.
501893 Missing suppression for __wcscat_avx2 (strcat-strlen-avx2.h.S:68)?
502126 glibc 2.41 extra syscall_cancel frames
502288 s390x: Memcheck false positives with NNPA last tensor dimension
502324 s390x: Memcheck false positives with TMxx and TM/TMY
502679 Use LTP for testing valgrind
502871 Make Helgrind "pthread_cond_{signal,broadcast}: dubious: associated
lock is not held by any thread" optional
https://redd.it/1kawl4q
@r_cpp
obliged to use suppressions if they wish to avoid this noise.
* ==================== FIXED BUGS ====================
The following bugs have been fixed or resolved. Note that "n-i-bz"
stands for "not in bugzilla" -- that is, a bug that was reported to us
but never got a bugzilla entry. We encourage you to file bugs in
bugzilla () rather
than mailing the developers (or mailing lists) directly -- bugs that
are not entered into bugzilla tend to get forgotten about or ignored.
290061 pie elf always loaded at 0x108000
396415 Valgrind is not looking up $ORIGIN rpath of shebang programs
420682 io_pgetevents is not supported
468575 Add support for RISC-V
469782 Valgrind does not support zstd-compressed debug sections
487296 --track-fds=yes and --track-fds=all report erroneous information
when fds 0, 1, or 2 are used as non-std
489913 WARNING: unhandled amd64-linux syscall: 444 (landlock_create_ruleset)
493433 Add --modify-fds=[no|high\] option
494246 syscall fsopen not wrapped
494327 Crash when running Helgrind built with #define TRACE_PTH_FNS 1
494337 All threaded applications cause still holding lock errors
495488 Add FreeBSD getrlimitusage syscall wrapper
495816 s390x: Fix disassembler segfault for C[G\]RT and CL[G\]RT
495817 s390x: Disassembly to match objdump -d output
496370 Illumos: signal handling is broken
496571 False positive for null key passed to bpf_map_get_next_key syscall.
496950 s390x: Fix hardware capabilities and EmFail codes
497130 Recognize new DWARF5 DW_LANG constants
497455 Update drd/scripts/download-and-build-gcc
497723 Enabling Ada demangling breaks callgrind differentiation between
overloaded functions and procedures
498037 s390x: Add disassembly checker
498143 False positive on EVIOCGRAB ioctl
498317 FdBadUse is not a valid CoreError type in a suppression
even though it's generated by --gen-suppressions=yes
498421 s390x: support BPP, BPRP and NIAI insns
498422 s390x: Fix VLRL and VSTRL insns
498492 none/tests/amd64/lzcnt64 crashes on FreeBSD compiled with clang
498629 s390x: Fix S[L\]HHHR and S[L\]HHLR insns
498632 s390x: Fix LNGFR insn
498942 s390x: Rework s390_disasm interface
499183 FreeBSD: differences in avx-vmovq output
499212 mmap() with MAP_ALIGNED() returns unaligned pointer
501119 memcheck/tests/pointer-trace fails when run on NFS filesystem
501194 Fix ML_(check_macho_and_get_rw_loads) so that it is correct for
any number of segment commands
501348 glibc built with -march=x86-64-v3 does not work due to ld.so memcmp
501479 Illumos DRD pthread_mutex_init wrapper errors
501365 syscall userfaultfd not wrapped
501846 Add x86 Linux shm wrappers
501850 FreeBSD syscall arguments 7 and 8 incorrect.
501893 Missing suppression for __wcscat_avx2 (strcat-strlen-avx2.h.S:68)?
502126 glibc 2.41 extra syscall_cancel frames
502288 s390x: Memcheck false positives with NNPA last tensor dimension
502324 s390x: Memcheck false positives with TMxx and TM/TMY
502679 Use LTP for testing valgrind
502871 Make Helgrind "pthread_cond_{signal,broadcast}: dubious: associated
lock is not held by any thread" optional
https://redd.it/1kawl4q
@r_cpp
Reddit
From the cpp community on Reddit
Explore this post and more from the cpp community
Kokkos vs OpenMP performance on multi-core CPUs?
Does anyone have experience on OpenMP vs Kokkos performance on multicore CPUs? I am seeing papers on its GPU performance, but when I tested Kokkos against a basic OpenMP implementation of 2D Laplace equation with Jacobi, I saw a 3-4x difference (with OpenMP being the faster one). Not sure if I am doing it right since I generated the Kokkos code using an LLM. I wanted to see if its worth it before getting into it.
The OpenMP code itself is slower than a pure MPI implementation, but from what I am reading in textbooks thats well expected. I am also seeing no difference in speed for the equivalent Fortran and C++ codes.
The Kokkos code: (generated via LLM): Takes about 6-7 seconds on my PC with 16 OMP threads
#include <iostream>
#include <cmath> // For std::abs, std::max (Kokkos::abs used in kernel)
#include <cstdlib> // For std::rand, RANDMAX
#include <ctime> // For std::time
#include <utility> // For std::swap
#include <KokkosCore.hpp>
#define NX (128 8 + 2) // Grid points in X direction (including boundaries)
#define NY (128 6 + 2) // Grid points in Y direction (including boundaries)
#define MAXITER 10000 // Maximum number of iterations
#define TOL 1.0e-6 // Tolerance for convergence
int main(int argc, char* argv[]) {
// Initialize Kokkos. This should be done before any Kokkos operations.
Kokkos::initialize(argc, argv);
{
std::cout << "Using Kokkos with execution space: " << Kokkos::DefaultExecutionSpace::name() << std::endl;
Kokkos::View<double**> phiA("phiA", NX, NY);
Kokkos::View<double**> phiB("phiB", NX, NY);
// Host-side view for initialization.
Kokkos::View<double**, Kokkos::HostSpace> phiinithost("phiinithost", NX, NY);
std::srand(staticcast<unsigned int>(std::time(nullptr))); // Seed random number generator
for (int j = 0; j < NY; ++j) {
for (int i = 0; i < NX; ++i) {
phiinithost(i, j) = staticcast<double>(std::rand()) / RANDMAX;
}
}
for (int i = 0; i < NX; ++i) { // Iterate over x-direction
phiinithost(i, 0) = 0.0;
phiinithost(i, NY - 1) = 0.0;
}
// For columns (left and right boundaries: x=0 and x=NX-1)
for (int j = 0; j < NY; ++j) { // Iterate over y-direction
phiinithost(0, j) = 0.0;
phiinithost(NX - 1, j) = 0.0;
}
Kokkos::deepcopy(phiA, phiinithost);
Kokkos::deepcopy(phiB, phiinithost); .
Kokkos::fence("InitialDataCopyToDeviceComplete");
std::cout << "Start solving with Kokkos (optimized with ping-pong buffering)..." << std::endl;
Kokkos::Timer timer; // Kokkos timer for measuring wall clock time
Kokkos::View<double> phi_current_ptr = &phi_A;
Kokkos::View<double> phinextptr = &phiB;
int iter;
double maxres = 0.0; // Stores the maximum residual found in an iteration
for (iter = 1; iter <= MAXITER; ++iter) {
auto& Pold = *phicurrentptr; // View to read from (previous iteration's values)
auto& Pnew = phi_next_ptr; // View to write to (current iteration's values)
Kokkos::parallel_for("JacobiUpdate",
Kokkos::MDRangePolicy<Kokkos::Rank<2>>({1, 1}, {NY - 1, NX - 1}),
// KOKKOS_FUNCTION: Marks lambda for device compilation.
[=] KOKKOS_FUNCTION (const int j, const int i) { // j: y-index, i: x-index
P_new(i, j) = 0.25 (Pold(i + 1, j) + Pold(i - 1, j) +
Pold(i, j + 1) + Pold(i, j - 1));
});
maxres = 0.0; // Reset maxres for the current iteration's reduction (host
Does anyone have experience on OpenMP vs Kokkos performance on multicore CPUs? I am seeing papers on its GPU performance, but when I tested Kokkos against a basic OpenMP implementation of 2D Laplace equation with Jacobi, I saw a 3-4x difference (with OpenMP being the faster one). Not sure if I am doing it right since I generated the Kokkos code using an LLM. I wanted to see if its worth it before getting into it.
The OpenMP code itself is slower than a pure MPI implementation, but from what I am reading in textbooks thats well expected. I am also seeing no difference in speed for the equivalent Fortran and C++ codes.
The Kokkos code: (generated via LLM): Takes about 6-7 seconds on my PC with 16 OMP threads
#include <iostream>
#include <cmath> // For std::abs, std::max (Kokkos::abs used in kernel)
#include <cstdlib> // For std::rand, RANDMAX
#include <ctime> // For std::time
#include <utility> // For std::swap
#include <KokkosCore.hpp>
#define NX (128 8 + 2) // Grid points in X direction (including boundaries)
#define NY (128 6 + 2) // Grid points in Y direction (including boundaries)
#define MAXITER 10000 // Maximum number of iterations
#define TOL 1.0e-6 // Tolerance for convergence
int main(int argc, char* argv[]) {
// Initialize Kokkos. This should be done before any Kokkos operations.
Kokkos::initialize(argc, argv);
{
std::cout << "Using Kokkos with execution space: " << Kokkos::DefaultExecutionSpace::name() << std::endl;
Kokkos::View<double**> phiA("phiA", NX, NY);
Kokkos::View<double**> phiB("phiB", NX, NY);
// Host-side view for initialization.
Kokkos::View<double**, Kokkos::HostSpace> phiinithost("phiinithost", NX, NY);
std::srand(staticcast<unsigned int>(std::time(nullptr))); // Seed random number generator
for (int j = 0; j < NY; ++j) {
for (int i = 0; i < NX; ++i) {
phiinithost(i, j) = staticcast<double>(std::rand()) / RANDMAX;
}
}
for (int i = 0; i < NX; ++i) { // Iterate over x-direction
phiinithost(i, 0) = 0.0;
phiinithost(i, NY - 1) = 0.0;
}
// For columns (left and right boundaries: x=0 and x=NX-1)
for (int j = 0; j < NY; ++j) { // Iterate over y-direction
phiinithost(0, j) = 0.0;
phiinithost(NX - 1, j) = 0.0;
}
Kokkos::deepcopy(phiA, phiinithost);
Kokkos::deepcopy(phiB, phiinithost); .
Kokkos::fence("InitialDataCopyToDeviceComplete");
std::cout << "Start solving with Kokkos (optimized with ping-pong buffering)..." << std::endl;
Kokkos::Timer timer; // Kokkos timer for measuring wall clock time
Kokkos::View<double> phi_current_ptr = &phi_A;
Kokkos::View<double> phinextptr = &phiB;
int iter;
double maxres = 0.0; // Stores the maximum residual found in an iteration
for (iter = 1; iter <= MAXITER; ++iter) {
auto& Pold = *phicurrentptr; // View to read from (previous iteration's values)
auto& Pnew = phi_next_ptr; // View to write to (current iteration's values)
Kokkos::parallel_for("JacobiUpdate",
Kokkos::MDRangePolicy<Kokkos::Rank<2>>({1, 1}, {NY - 1, NX - 1}),
// KOKKOS_FUNCTION: Marks lambda for device compilation.
[=] KOKKOS_FUNCTION (const int j, const int i) { // j: y-index, i: x-index
P_new(i, j) = 0.25 (Pold(i + 1, j) + Pold(i - 1, j) +
Pold(i, j + 1) + Pold(i, j - 1));
});
maxres = 0.0; // Reset maxres for the current iteration's reduction (host
Kokkos vs OpenMP performance on multi-core CPUs?
Does anyone have experience on OpenMP vs Kokkos performance on multicore CPUs? I am seeing papers on its GPU performance, but when I tested Kokkos against a basic OpenMP implementation of 2D Laplace equation with Jacobi, I saw a 3-4x difference (with OpenMP being the faster one). Not sure if I am doing it right since I generated the Kokkos code using an LLM. I wanted to see if its worth it before getting into it.
The OpenMP code itself is slower than a pure MPI implementation, but from what I am reading in textbooks thats well expected. I am also seeing no difference in speed for the equivalent Fortran and C++ codes.
The Kokkos code: (generated via LLM): Takes about 6-7 seconds on my PC with 16 OMP threads
#include <iostream>
#include <cmath> // For std::abs, std::max (Kokkos::abs used in kernel)
#include <cstdlib> // For std::rand, RAND_MAX
#include <ctime> // For std::time
#include <utility> // For std::swap
#include <Kokkos_Core.hpp>
#define NX (128 * 8 + 2) // Grid points in X direction (including boundaries)
#define NY (128 * 6 + 2) // Grid points in Y direction (including boundaries)
#define MAX_ITER 10000 // Maximum number of iterations
#define TOL 1.0e-6 // Tolerance for convergence
int main(int argc, char* argv[]) {
// Initialize Kokkos. This should be done before any Kokkos operations.
Kokkos::initialize(argc, argv);
{
std::cout << "Using Kokkos with execution space: " << Kokkos::DefaultExecutionSpace::name() << std::endl;
Kokkos::View<double**> phi_A("phi_A", NX, NY);
Kokkos::View<double**> phi_B("phi_B", NX, NY);
// Host-side view for initialization.
Kokkos::View<double**, Kokkos::HostSpace> phi_init_host("phi_init_host", NX, NY);
std::srand(static_cast<unsigned int>(std::time(nullptr))); // Seed random number generator
for (int j = 0; j < NY; ++j) {
for (int i = 0; i < NX; ++i) {
phi_init_host(i, j) = static_cast<double>(std::rand()) / RAND_MAX;
}
}
for (int i = 0; i < NX; ++i) { // Iterate over x-direction
phi_init_host(i, 0) = 0.0;
phi_init_host(i, NY - 1) = 0.0;
}
// For columns (left and right boundaries: x=0 and x=NX-1)
for (int j = 0; j < NY; ++j) { // Iterate over y-direction
phi_init_host(0, j) = 0.0;
phi_init_host(NX - 1, j) = 0.0;
}
Kokkos::deep_copy(phi_A, phi_init_host);
Kokkos::deep_copy(phi_B, phi_init_host); .
Kokkos::fence("InitialDataCopyToDeviceComplete");
std::cout << "Start solving with Kokkos (optimized with ping-pong buffering)..." << std::endl;
Kokkos::Timer timer; // Kokkos timer for measuring wall clock time
Kokkos::View<double**>* phi_current_ptr = &phi_A;
Kokkos::View<double**>* phi_next_ptr = &phi_B;
int iter;
double maxres = 0.0; // Stores the maximum residual found in an iteration
for (iter = 1; iter <= MAX_ITER; ++iter) {
auto& P_old = *phi_current_ptr; // View to read from (previous iteration's values)
auto& P_new = *phi_next_ptr; // View to write to (current iteration's values)
Kokkos::parallel_for("JacobiUpdate",
Kokkos::MDRangePolicy<Kokkos::Rank<2>>({1, 1}, {NY - 1, NX - 1}),
// KOKKOS_FUNCTION: Marks lambda for device compilation.
[=] KOKKOS_FUNCTION (const int j, const int i) { // j: y-index, i: x-index
P_new(i, j) = 0.25 * (P_old(i + 1, j) + P_old(i - 1, j) +
P_old(i, j + 1) + P_old(i, j - 1));
});
maxres = 0.0; // Reset maxres for the current iteration's reduction (host
Does anyone have experience on OpenMP vs Kokkos performance on multicore CPUs? I am seeing papers on its GPU performance, but when I tested Kokkos against a basic OpenMP implementation of 2D Laplace equation with Jacobi, I saw a 3-4x difference (with OpenMP being the faster one). Not sure if I am doing it right since I generated the Kokkos code using an LLM. I wanted to see if its worth it before getting into it.
The OpenMP code itself is slower than a pure MPI implementation, but from what I am reading in textbooks thats well expected. I am also seeing no difference in speed for the equivalent Fortran and C++ codes.
The Kokkos code: (generated via LLM): Takes about 6-7 seconds on my PC with 16 OMP threads
#include <iostream>
#include <cmath> // For std::abs, std::max (Kokkos::abs used in kernel)
#include <cstdlib> // For std::rand, RAND_MAX
#include <ctime> // For std::time
#include <utility> // For std::swap
#include <Kokkos_Core.hpp>
#define NX (128 * 8 + 2) // Grid points in X direction (including boundaries)
#define NY (128 * 6 + 2) // Grid points in Y direction (including boundaries)
#define MAX_ITER 10000 // Maximum number of iterations
#define TOL 1.0e-6 // Tolerance for convergence
int main(int argc, char* argv[]) {
// Initialize Kokkos. This should be done before any Kokkos operations.
Kokkos::initialize(argc, argv);
{
std::cout << "Using Kokkos with execution space: " << Kokkos::DefaultExecutionSpace::name() << std::endl;
Kokkos::View<double**> phi_A("phi_A", NX, NY);
Kokkos::View<double**> phi_B("phi_B", NX, NY);
// Host-side view for initialization.
Kokkos::View<double**, Kokkos::HostSpace> phi_init_host("phi_init_host", NX, NY);
std::srand(static_cast<unsigned int>(std::time(nullptr))); // Seed random number generator
for (int j = 0; j < NY; ++j) {
for (int i = 0; i < NX; ++i) {
phi_init_host(i, j) = static_cast<double>(std::rand()) / RAND_MAX;
}
}
for (int i = 0; i < NX; ++i) { // Iterate over x-direction
phi_init_host(i, 0) = 0.0;
phi_init_host(i, NY - 1) = 0.0;
}
// For columns (left and right boundaries: x=0 and x=NX-1)
for (int j = 0; j < NY; ++j) { // Iterate over y-direction
phi_init_host(0, j) = 0.0;
phi_init_host(NX - 1, j) = 0.0;
}
Kokkos::deep_copy(phi_A, phi_init_host);
Kokkos::deep_copy(phi_B, phi_init_host); .
Kokkos::fence("InitialDataCopyToDeviceComplete");
std::cout << "Start solving with Kokkos (optimized with ping-pong buffering)..." << std::endl;
Kokkos::Timer timer; // Kokkos timer for measuring wall clock time
Kokkos::View<double**>* phi_current_ptr = &phi_A;
Kokkos::View<double**>* phi_next_ptr = &phi_B;
int iter;
double maxres = 0.0; // Stores the maximum residual found in an iteration
for (iter = 1; iter <= MAX_ITER; ++iter) {
auto& P_old = *phi_current_ptr; // View to read from (previous iteration's values)
auto& P_new = *phi_next_ptr; // View to write to (current iteration's values)
Kokkos::parallel_for("JacobiUpdate",
Kokkos::MDRangePolicy<Kokkos::Rank<2>>({1, 1}, {NY - 1, NX - 1}),
// KOKKOS_FUNCTION: Marks lambda for device compilation.
[=] KOKKOS_FUNCTION (const int j, const int i) { // j: y-index, i: x-index
P_new(i, j) = 0.25 * (P_old(i + 1, j) + P_old(i - 1, j) +
P_old(i, j + 1) + P_old(i, j - 1));
});
maxres = 0.0; // Reset maxres for the current iteration's reduction (host
variable)
Kokkos::parallel_reduce("MaxResidual",
Kokkos::MDRangePolicy<Kokkos::Rank<2>>({1, 1}, {NY - 1, NX - 1}),
[=] KOKKOS_FUNCTION (const int j, const int i, double& local_maxres) { // j: y-index, i: x-index
double point_residual_val = Kokkos::fabs(
0.25 * (P_new(i + 1, j) + P_new(i - 1, j) +
P_new(i, j + 1) + P_new(i, j - 1)) -
P_new(i, j)
);
if (point_residual_val > local_maxres) {
local_maxres = point_residual_val;
}
}, Kokkos::Max<double>(maxres)); // Kokkos::Max reducer updates host variable 'maxres'
Kokkos::fence("ResidualCalculationComplete");
if (iter % 100 == 0) {
std::cout << "Iter: " << iter << " maxres: " << maxres << std::endl;
}
if (maxres < TOL) {
break; // Exit loop if converged
}
std::swap(phi_current_ptr, phi_next_ptr);
}
Kokkos::fence("SolverLoopComplete");
double end_time = timer.seconds();
std::cout << "Time taken (seconds): " << end_time << std::endl;
}
Kokkos::finalize();
return 0;
}
The OpenMP code: Takes between 1.2-2.5 seconds on my PC with 16 OMP threads
#include <iostream>
#include <cmath>
#include <cstdlib>
#include <ctime>
#include <omp.h>
#define NX (128 * 8 + 2)
#define NY (128 * 6 + 2)
#define MAX_ITER 10000
#define TOL 1.0e-6
#define DX (1.0 / (NX - 1))
#define DY (1.0 / (NY - 1))
int main() {
std::cout << "Start \n";
std::cout << "Nx="<<NX<<", NY="<<NY<<"\n";
double phi_old[NX][NY];
double phi_new[NX][NY];
double residual[NX][NY];
double maxres, diff;
int iter, i, j;
int num_threads = omp_get_max_threads();
std::cout << "Using " << num_threads << " OpenMP threads\n";
std::srand(static_cast<unsigned int>(std::time(nullptr)));
for (j = 0; j < NY; ++j)
for (i = 0; i < NX; ++i)
phi_old[i][j] = static_cast<double>(std::rand()) / RAND_MAX;
for (j = 0; j < NY; ++j)
for (i = 0; i < NX; ++i)
phi_new[i][j] = phi_old[i][j];
for (i = 0; i < NX; ++i) {
phi_old[i][0] = phi_old[i][NY - 1] = 0.0;
phi_new[i][0] = phi_new[i][NY - 1] = 0.0;
}
for (j = 0; j < NY; ++j) {
phi_old[0][j] = phi_old[NX - 1][j] = 0.0;
phi_new[0][j] = phi_new[NX - 1][j] = 0.0;
}
std::cout << "Start solving...\n";
double start_time = omp_get_wtime();
for (iter = 1; iter <= MAX_ITER; ++iter) {
maxres = 0.0;
#pragma omp parallel default(shared) private(i, j)
{
// phi_old=phi_new. Would be more efficient to switch pointers.
#pragma omp for schedule(static)
for (i = 0; i < NX; ++i)
for (j = 0; j < NY; ++j)
phi_old[i][j] = phi_new[i][j];
// Jacobi
#pragma omp for schedule(static)
for (i = 1; i < NX-1; ++i)
for (j = 1; j < NY-1; ++j)
phi_new[i][j] = 0.25 * (
phi_old[i + 1][j] + phi_old[i - 1][j] +
phi_old[i][j + 1] + phi_old[i][j - 1]);
// calculate Linf residue
Kokkos::parallel_reduce("MaxResidual",
Kokkos::MDRangePolicy<Kokkos::Rank<2>>({1, 1}, {NY - 1, NX - 1}),
[=] KOKKOS_FUNCTION (const int j, const int i, double& local_maxres) { // j: y-index, i: x-index
double point_residual_val = Kokkos::fabs(
0.25 * (P_new(i + 1, j) + P_new(i - 1, j) +
P_new(i, j + 1) + P_new(i, j - 1)) -
P_new(i, j)
);
if (point_residual_val > local_maxres) {
local_maxres = point_residual_val;
}
}, Kokkos::Max<double>(maxres)); // Kokkos::Max reducer updates host variable 'maxres'
Kokkos::fence("ResidualCalculationComplete");
if (iter % 100 == 0) {
std::cout << "Iter: " << iter << " maxres: " << maxres << std::endl;
}
if (maxres < TOL) {
break; // Exit loop if converged
}
std::swap(phi_current_ptr, phi_next_ptr);
}
Kokkos::fence("SolverLoopComplete");
double end_time = timer.seconds();
std::cout << "Time taken (seconds): " << end_time << std::endl;
}
Kokkos::finalize();
return 0;
}
The OpenMP code: Takes between 1.2-2.5 seconds on my PC with 16 OMP threads
#include <iostream>
#include <cmath>
#include <cstdlib>
#include <ctime>
#include <omp.h>
#define NX (128 * 8 + 2)
#define NY (128 * 6 + 2)
#define MAX_ITER 10000
#define TOL 1.0e-6
#define DX (1.0 / (NX - 1))
#define DY (1.0 / (NY - 1))
int main() {
std::cout << "Start \n";
std::cout << "Nx="<<NX<<", NY="<<NY<<"\n";
double phi_old[NX][NY];
double phi_new[NX][NY];
double residual[NX][NY];
double maxres, diff;
int iter, i, j;
int num_threads = omp_get_max_threads();
std::cout << "Using " << num_threads << " OpenMP threads\n";
std::srand(static_cast<unsigned int>(std::time(nullptr)));
for (j = 0; j < NY; ++j)
for (i = 0; i < NX; ++i)
phi_old[i][j] = static_cast<double>(std::rand()) / RAND_MAX;
for (j = 0; j < NY; ++j)
for (i = 0; i < NX; ++i)
phi_new[i][j] = phi_old[i][j];
for (i = 0; i < NX; ++i) {
phi_old[i][0] = phi_old[i][NY - 1] = 0.0;
phi_new[i][0] = phi_new[i][NY - 1] = 0.0;
}
for (j = 0; j < NY; ++j) {
phi_old[0][j] = phi_old[NX - 1][j] = 0.0;
phi_new[0][j] = phi_new[NX - 1][j] = 0.0;
}
std::cout << "Start solving...\n";
double start_time = omp_get_wtime();
for (iter = 1; iter <= MAX_ITER; ++iter) {
maxres = 0.0;
#pragma omp parallel default(shared) private(i, j)
{
// phi_old=phi_new. Would be more efficient to switch pointers.
#pragma omp for schedule(static)
for (i = 0; i < NX; ++i)
for (j = 0; j < NY; ++j)
phi_old[i][j] = phi_new[i][j];
// Jacobi
#pragma omp for schedule(static)
for (i = 1; i < NX-1; ++i)
for (j = 1; j < NY-1; ++j)
phi_new[i][j] = 0.25 * (
phi_old[i + 1][j] + phi_old[i - 1][j] +
phi_old[i][j + 1] + phi_old[i][j - 1]);
// calculate Linf residue
UFCS toy
Here's a toy program that tries to give UFCS (Uniform Function Call Syntax)
to a collection of standard C functions. This is either a proof of concept,
or a proof of procrastination, I'm not sure which.
---
#include <cstring>
#include <cstdio>
#include <cstdlib>
#include <cmath>
#include <cctype>
#define MAKE_UFCS_FUNC_STD(func) template<class... Types> auto func(Types... args) { \
return ufcs<decltype(std::func(value, args...))>(std::func(value, args...)); \
}
// The 'this' argument is at back of arg list.
#define MAKE_UFCS_FUNC_STD_B(func) template<class... Types> auto func(Types... args) { \
return ufcs<decltype(std::func(args..., value))>(std::func(args..., value)); \
}
template<typename T>
class ufcs
{
public:
T value;
ufcs(T aValue):value(aValue){}
operator T(){
return value;
}
MAKE_UFCS_FUNC_STD(acos )
MAKE_UFCS_FUNC_STD(asin )
MAKE_UFCS_FUNC_STD(atan )
MAKE_UFCS_FUNC_STD(atan2 )
MAKE_UFCS_FUNC_STD(cos )
MAKE_UFCS_FUNC_STD(sin )
MAKE_UFCS_FUNC_STD(tan )
MAKE_UFCS_FUNC_STD(acosh )
MAKE_UFCS_FUNC_STD(asinh )
MAKE_UFCS_FUNC_STD(atanh )
MAKE_UFCS_FUNC_STD(cosh )
MAKE_UFCS_FUNC_STD(sinh )
MAKE_UFCS_FUNC_STD(tanh )
MAKE_UFCS_FUNC_STD(exp )
MAKE_UFCS_FUNC_STD(exp2 )
MAKE_UFCS_FUNC_STD(expm1 )
MAKE_UFCS_FUNC_STD(frexp )
MAKE_UFCS_FUNC_STD(ilogb )
MAKE_UFCS_FUNC_STD(ldexp )
MAKE_UFCS_FUNC_STD(log )
MAKE_UFCS_FUNC_STD(log10 )
MAKE_UFCS_FUNC_STD(log1p )
MAKE_UFCS_FUNC_STD(log2 )
MAKE_UFCS_FUNC_STD(logb )
MAKE_UFCS_FUNC_STD(modf )
MAKE_UFCS_FUNC_STD(scalbn )
MAKE_UFCS_FUNC_STD(scalbln )
MAKE_UFCS_FUNC_STD(cbrt )
MAKE_UFCS_FUNC_STD(abs )
MAKE_UFCS_FUNC_STD(fabs )
MAKE_UFCS_FUNC_STD(hypot )
MAKE_UFCS_FUNC_STD(pow )
MAKE_UFCS_FUNC_STD(sqrt )
MAKE_UFCS_FUNC_STD(erf )
MAKE_UFCS_FUNC_STD(erfc )
MAKE_UFCS_FUNC_STD(lgamma )
MAKE_UFCS_FUNC_STD(tgamma )
MAKE_UFCS_FUNC_STD(ceil )
MAKE_UFCS_FUNC_STD(floor )
MAKE_UFCS_FUNC_STD(nearbyint )
MAKE_UFCS_FUNC_STD(rint )
MAKE_UFCS_FUNC_STD(lrint )
MAKE_UFCS_FUNC_STD(llrint )
MAKE_UFCS_FUNC_STD(round )
MAKE_UFCS_FUNC_STD(lround )
MAKE_UFCS_FUNC_STD(llround )
MAKE_UFCS_FUNC_STD(trunc )
MAKE_UFCS_FUNC_STD(fmod )
MAKE_UFCS_FUNC_STD(remainder )
MAKE_UFCS_FUNC_STD(remquo )
MAKE_UFCS_FUNC_STD(copysign )
MAKE_UFCS_FUNC_STD(nan )
MAKE_UFCS_FUNC_STD(nextafter )
MAKE_UFCS_FUNC_STD(nexttoward )
MAKE_UFCS_FUNC_STD(fdim )
MAKE_UFCS_FUNC_STD(fmax )
MAKE_UFCS_FUNC_STD(fmin )
MAKE_UFCS_FUNC_STD(fma )
MAKE_UFCS_FUNC_STD(fpclassify )
MAKE_UFCS_FUNC_STD(isfinite )
MAKE_UFCS_FUNC_STD(isinf )
MAKE_UFCS_FUNC_STD(isnan )
MAKE_UFCS_FUNC_STD(isnormal )
MAKE_UFCS_FUNC_STD(signbit )
MAKE_UFCS_FUNC_STD(isgreater )
MAKE_UFCS_FUNC_STD(isgreaterequal )
MAKE_UFCS_FUNC_STD(isless )
MAKE_UFCS_FUNC_STD(islessequal )
MAKE_UFCS_FUNC_STD(islessgreater )
MAKE_UFCS_FUNC_STD(isunordered )
MAKE_UFCS_FUNC_STD(assoc_laguerre )
MAKE_UFCS_FUNC_STD(assoc_legendre )
MAKE_UFCS_FUNC_STD(beta )
MAKE_UFCS_FUNC_STD(betaf )
Here's a toy program that tries to give UFCS (Uniform Function Call Syntax)
to a collection of standard C functions. This is either a proof of concept,
or a proof of procrastination, I'm not sure which.
---
#include <cstring>
#include <cstdio>
#include <cstdlib>
#include <cmath>
#include <cctype>
#define MAKE_UFCS_FUNC_STD(func) template<class... Types> auto func(Types... args) { \
return ufcs<decltype(std::func(value, args...))>(std::func(value, args...)); \
}
// The 'this' argument is at back of arg list.
#define MAKE_UFCS_FUNC_STD_B(func) template<class... Types> auto func(Types... args) { \
return ufcs<decltype(std::func(args..., value))>(std::func(args..., value)); \
}
template<typename T>
class ufcs
{
public:
T value;
ufcs(T aValue):value(aValue){}
operator T(){
return value;
}
MAKE_UFCS_FUNC_STD(acos )
MAKE_UFCS_FUNC_STD(asin )
MAKE_UFCS_FUNC_STD(atan )
MAKE_UFCS_FUNC_STD(atan2 )
MAKE_UFCS_FUNC_STD(cos )
MAKE_UFCS_FUNC_STD(sin )
MAKE_UFCS_FUNC_STD(tan )
MAKE_UFCS_FUNC_STD(acosh )
MAKE_UFCS_FUNC_STD(asinh )
MAKE_UFCS_FUNC_STD(atanh )
MAKE_UFCS_FUNC_STD(cosh )
MAKE_UFCS_FUNC_STD(sinh )
MAKE_UFCS_FUNC_STD(tanh )
MAKE_UFCS_FUNC_STD(exp )
MAKE_UFCS_FUNC_STD(exp2 )
MAKE_UFCS_FUNC_STD(expm1 )
MAKE_UFCS_FUNC_STD(frexp )
MAKE_UFCS_FUNC_STD(ilogb )
MAKE_UFCS_FUNC_STD(ldexp )
MAKE_UFCS_FUNC_STD(log )
MAKE_UFCS_FUNC_STD(log10 )
MAKE_UFCS_FUNC_STD(log1p )
MAKE_UFCS_FUNC_STD(log2 )
MAKE_UFCS_FUNC_STD(logb )
MAKE_UFCS_FUNC_STD(modf )
MAKE_UFCS_FUNC_STD(scalbn )
MAKE_UFCS_FUNC_STD(scalbln )
MAKE_UFCS_FUNC_STD(cbrt )
MAKE_UFCS_FUNC_STD(abs )
MAKE_UFCS_FUNC_STD(fabs )
MAKE_UFCS_FUNC_STD(hypot )
MAKE_UFCS_FUNC_STD(pow )
MAKE_UFCS_FUNC_STD(sqrt )
MAKE_UFCS_FUNC_STD(erf )
MAKE_UFCS_FUNC_STD(erfc )
MAKE_UFCS_FUNC_STD(lgamma )
MAKE_UFCS_FUNC_STD(tgamma )
MAKE_UFCS_FUNC_STD(ceil )
MAKE_UFCS_FUNC_STD(floor )
MAKE_UFCS_FUNC_STD(nearbyint )
MAKE_UFCS_FUNC_STD(rint )
MAKE_UFCS_FUNC_STD(lrint )
MAKE_UFCS_FUNC_STD(llrint )
MAKE_UFCS_FUNC_STD(round )
MAKE_UFCS_FUNC_STD(lround )
MAKE_UFCS_FUNC_STD(llround )
MAKE_UFCS_FUNC_STD(trunc )
MAKE_UFCS_FUNC_STD(fmod )
MAKE_UFCS_FUNC_STD(remainder )
MAKE_UFCS_FUNC_STD(remquo )
MAKE_UFCS_FUNC_STD(copysign )
MAKE_UFCS_FUNC_STD(nan )
MAKE_UFCS_FUNC_STD(nextafter )
MAKE_UFCS_FUNC_STD(nexttoward )
MAKE_UFCS_FUNC_STD(fdim )
MAKE_UFCS_FUNC_STD(fmax )
MAKE_UFCS_FUNC_STD(fmin )
MAKE_UFCS_FUNC_STD(fma )
MAKE_UFCS_FUNC_STD(fpclassify )
MAKE_UFCS_FUNC_STD(isfinite )
MAKE_UFCS_FUNC_STD(isinf )
MAKE_UFCS_FUNC_STD(isnan )
MAKE_UFCS_FUNC_STD(isnormal )
MAKE_UFCS_FUNC_STD(signbit )
MAKE_UFCS_FUNC_STD(isgreater )
MAKE_UFCS_FUNC_STD(isgreaterequal )
MAKE_UFCS_FUNC_STD(isless )
MAKE_UFCS_FUNC_STD(islessequal )
MAKE_UFCS_FUNC_STD(islessgreater )
MAKE_UFCS_FUNC_STD(isunordered )
MAKE_UFCS_FUNC_STD(assoc_laguerre )
MAKE_UFCS_FUNC_STD(assoc_legendre )
MAKE_UFCS_FUNC_STD(beta )
MAKE_UFCS_FUNC_STD(betaf )
MAKE_UFCS_FUNC_STD(strlen )
MAKE_UFCS_FUNC_STD(system )
MAKE_UFCS_FUNC_STD(calloc )
MAKE_UFCS_FUNC_STD(free )
MAKE_UFCS_FUNC_STD(malloc )
MAKE_UFCS_FUNC_STD(realloc )
MAKE_UFCS_FUNC_STD(atof )
MAKE_UFCS_FUNC_STD(atoi )
MAKE_UFCS_FUNC_STD(atol )
MAKE_UFCS_FUNC_STD(atoll )
MAKE_UFCS_FUNC_STD(strtod )
MAKE_UFCS_FUNC_STD(strtof )
MAKE_UFCS_FUNC_STD(strtold )
MAKE_UFCS_FUNC_STD(strtol )
MAKE_UFCS_FUNC_STD(strtoll )
MAKE_UFCS_FUNC_STD(strtoul )
MAKE_UFCS_FUNC_STD(strtoull )
MAKE_UFCS_FUNC_STD(mblen )
MAKE_UFCS_FUNC_STD(mbtowc )
MAKE_UFCS_FUNC_STD(wctomb )
MAKE_UFCS_FUNC_STD(mbstowcs )
MAKE_UFCS_FUNC_STD(wcstombs )
MAKE_UFCS_FUNC_STD(bsearch )
MAKE_UFCS_FUNC_STD(qsort )
MAKE_UFCS_FUNC_STD(srand )
MAKE_UFCS_FUNC_STD(labs )
MAKE_UFCS_FUNC_STD(llabs )
MAKE_UFCS_FUNC_STD(div )
MAKE_UFCS_FUNC_STD(ldiv )
MAKE_UFCS_FUNC_STD(lldiv )
};
#include <iostream>
#include <iomanip>
#define PRINT(a) cout << #a ": " << (a) << endl
int main()
{
using namespace std;
auto a = ufcs(1.0);
PRINT(a);
PRINT(a.sin());
PRINT(a.sin().asin());
a = 2.718;
PRINT(a);
PRINT(a.log());
PRINT(a.log().exp());
auto f = ufcs(fopen("out.txt", "w"));
f.fprintf("This\nis\na\ntest\n");
f.fflush();
f.fclose();
f = ufcs(fopen("out.txt", "r"));
char buffer[80];
auto b = ufcs(buffer);
while(f.fgets(buffer, sizeof(buffer)))
{
cout << b ;
}
f.fclose();
b.strcpy("Hello");
PRINT(b);
PRINT(b.strstr("l"));
PRINT(b.strchr('e'));
PRINT(b.strcat("There"));
auto c = ufcs('x');
PRINT(c);
PRINT(c.isalpha());
PRINT(c.ispunct());
PRINT(c.isdigit());
PRINT(c.toupper());
}
---
Compilation...
g++ -Wall ufcs.cpp -o ufcs
---
Output...
./ufcs
a: 1
a.sin(): 0.841471
a.sin().asin(): 1
a: 2.718
a.log(): 0.999896
a.log().exp(): 2.718
This
is
a
test
b: Hello
b.strstr("l"): llo
b.strchr('e'): ello
b.strcat("There"): HelloThere
c: x
c.isalpha(): 2
c.ispunct(): 0
c.isdigit(): 0
c.toupper(): 88
https://redd.it/1l1w6l0
@r_cpp
MAKE_UFCS_FUNC_STD(system )
MAKE_UFCS_FUNC_STD(calloc )
MAKE_UFCS_FUNC_STD(free )
MAKE_UFCS_FUNC_STD(malloc )
MAKE_UFCS_FUNC_STD(realloc )
MAKE_UFCS_FUNC_STD(atof )
MAKE_UFCS_FUNC_STD(atoi )
MAKE_UFCS_FUNC_STD(atol )
MAKE_UFCS_FUNC_STD(atoll )
MAKE_UFCS_FUNC_STD(strtod )
MAKE_UFCS_FUNC_STD(strtof )
MAKE_UFCS_FUNC_STD(strtold )
MAKE_UFCS_FUNC_STD(strtol )
MAKE_UFCS_FUNC_STD(strtoll )
MAKE_UFCS_FUNC_STD(strtoul )
MAKE_UFCS_FUNC_STD(strtoull )
MAKE_UFCS_FUNC_STD(mblen )
MAKE_UFCS_FUNC_STD(mbtowc )
MAKE_UFCS_FUNC_STD(wctomb )
MAKE_UFCS_FUNC_STD(mbstowcs )
MAKE_UFCS_FUNC_STD(wcstombs )
MAKE_UFCS_FUNC_STD(bsearch )
MAKE_UFCS_FUNC_STD(qsort )
MAKE_UFCS_FUNC_STD(srand )
MAKE_UFCS_FUNC_STD(labs )
MAKE_UFCS_FUNC_STD(llabs )
MAKE_UFCS_FUNC_STD(div )
MAKE_UFCS_FUNC_STD(ldiv )
MAKE_UFCS_FUNC_STD(lldiv )
};
#include <iostream>
#include <iomanip>
#define PRINT(a) cout << #a ": " << (a) << endl
int main()
{
using namespace std;
auto a = ufcs(1.0);
PRINT(a);
PRINT(a.sin());
PRINT(a.sin().asin());
a = 2.718;
PRINT(a);
PRINT(a.log());
PRINT(a.log().exp());
auto f = ufcs(fopen("out.txt", "w"));
f.fprintf("This\nis\na\ntest\n");
f.fflush();
f.fclose();
f = ufcs(fopen("out.txt", "r"));
char buffer[80];
auto b = ufcs(buffer);
while(f.fgets(buffer, sizeof(buffer)))
{
cout << b ;
}
f.fclose();
b.strcpy("Hello");
PRINT(b);
PRINT(b.strstr("l"));
PRINT(b.strchr('e'));
PRINT(b.strcat("There"));
auto c = ufcs('x');
PRINT(c);
PRINT(c.isalpha());
PRINT(c.ispunct());
PRINT(c.isdigit());
PRINT(c.toupper());
}
---
Compilation...
g++ -Wall ufcs.cpp -o ufcs
---
Output...
./ufcs
a: 1
a.sin(): 0.841471
a.sin().asin(): 1
a: 2.718
a.log(): 0.999896
a.log().exp(): 2.718
This
is
a
test
b: Hello
b.strstr("l"): llo
b.strchr('e'): ello
b.strcat("There"): HelloThere
c: x
c.isalpha(): 2
c.ispunct(): 0
c.isdigit(): 0
c.toupper(): 88
https://redd.it/1l1w6l0
@r_cpp
Reddit
From the cpp community on Reddit
Explore this post and more from the cpp community
Can I put module declarations in header files?
Issue: https://github.com/Cvelth/vkfw/issues/19
So a while ago, I added module support to the
// ...
#ifdef VKFWMODULEIMPLEMENTATION
export module vkfw;
#endif
// ...
so that the
module;
#define VKFWMODULEIMPLEMENTATION
#include <vkfw/vkfw.hpp>
However, GCC 15+ rejects compilation with
In file included from .../vkfw-src/include/vkfw/vkfw.cppm:3:
.../vkfw-src/include/vkfw/vkfw.hpp:219:8:
error: module control-line cannot be in included file
However, I can't find anywhere in the spec/cppreference that disallow this. So is this allowed at all, or it's just a GCC limitation?
https://redd.it/1lw2g0d
@r_cpp
Issue: https://github.com/Cvelth/vkfw/issues/19
So a while ago, I added module support to the
vkfw library. It works fine for my usage with Clang, but recently (not really, it's been a while) GCC 15 released with module support finally stabilized. However, the way that module support is implemented is that in the header file vkfw.hpp, there is something like:// ...
#ifdef VKFWMODULEIMPLEMENTATION
export module vkfw;
#endif
// ...
so that the
vkfw.cpp file can be just:module;
#define VKFWMODULEIMPLEMENTATION
#include <vkfw/vkfw.hpp>
However, GCC 15+ rejects compilation with
In file included from .../vkfw-src/include/vkfw/vkfw.cppm:3:
.../vkfw-src/include/vkfw/vkfw.hpp:219:8:
error: module control-line cannot be in included file
However, I can't find anywhere in the spec/cppreference that disallow this. So is this allowed at all, or it's just a GCC limitation?
https://redd.it/1lw2g0d
@r_cpp
GitHub
The current module implementation does not work with GCC · Issue #19 · Cvelth/vkfw
Attempting to import vkfw as a C++ module using GCC 16.0 yields the following error: In file included from .../vkfw-src/include/vkfw/vkfw.cppm:3: .../vkfw-src/include/vkfw/vkfw.hpp:219:8: error: mo...